The Vocabulary Statistics Fallacy

In terms of the authorship of the New Testament documents, an argument that is used in many contexts for the letters of Paul (but also Peter and others) makes much over alleged vocabulary differences that indicate a difference in author. In this article, I’d like to generally discuss this sort of argument, which I consider to be of little to no use in determining the authorship of ancient documents. 

My first misgiving about such arguments is that statistically, they are generally without merit. This is often illustrated by modern comparative examples. In one instance, R. C. Sproul relates the following story in at least two of his books: One of the least scientific methods used to criticize authorship is the study of what is called the incidence of hapax legomena. The phrase hapax legomena refers to the appearance of words in a particular book that are found nowhere else in the author's writings. For example, if we find 36 words in Ephesians that are found nowhere else in Paul's writings, we might contend that Paul could not have written Ephesians. 

The folly of putting too much stock in hapax legomena came home to me when I had to learn the Dutch language in a hurry for my graduate work in the Netherlands. I studied Dutch by the "inductive method." I was assigned several volumes of theology written by G.C. Berkouwer. I started my study by reading his volume on The Person of Christ which was in Dutch. I started on the first page with the first word and looked it up in the dictionary. I wrote the Dutch word on one side of a card and the English word on the other side and set about the task of learning Berkouwer's vocabulary. After doing this on every page of The Person of Christ, I had over 6,000 words on cards. The next volume I studied was Berkouwer's The Work of Christ. I found over 3,000 words in that book that were not found in the first one. That's significant evidence that The Work of Christ was not written by Berkouwer! Note that Berkouwer wrote The Work of Christ only one year after he wrote The Person of Christ. He was dealing with the same general theme (Christology) and writing to the same general audience, yet there were thousands of words found in the second volume that were not found in the first.

Note also that the quantity of Berkouwer's writing in the first volume far exceeds the total quantity of writing that survives from the pen of the Apostle Paul. Paul's letters were much more brief. They were written to a wide variety of audiences, covering a wide diversity of subjects and issues, and were written over a long period of time. Yet people get excited when they find a handful of words in a given Epistle that are found nowhere else. Unless Paul had the vocabulary of a six-year-old and had no literary talent whatsoever, we should pay little attention to such unbridled speculation. 

While Sproul gives us excellent food for thought, his analogy is not as good as we might like. The reason for this is that modern languages like Dutch and English may have upwards of a million words. In contrast, an ancient language like Hebrew or Koine Greek, may have only a few thousand or tens of thousands. According to several sources, the New Testament itself has a vocabulary of only about 5000 words, and one source claims that around 300 of those, account for 80% of the words in the New Testament. This would tend to accord with our own use of vocabulary; though English may have a million or more words, most of us use only a few thousand on a regular basis. 

Thus in a sense, ancient people did “have the vocabulary of a six year old” – a modern one – because they had so few words to begin with. This does not mean they were more ignorant, of course: In their high context society, a single word might have multiple “duties” assigned according to its context. This is one reason why it is oversimplified to stroll through a concordance and attempt to ascertain the meaning of a NT word across the entirety of the NT based on one or two uses. Skeptics frequently engage in this sort of erroneous practice (what one scholar calls “illegitimate totality transfer”). 

Still: Does the fact that Koine Greek may have only a few thousand words in any sense make the arguments about vocabulary more likely? Statistically, yes – but only more likely in the sense that it is more likely that an asteroid will hit a particular state or nation than that it will hit a particular person. The general logic still holds: The nature of vocabulary is such that a certain small percentage of words will appear most frequently as our most used words. For the remainder, it is foolish to use these as a basis for deciding authorship, especially given the rather narrow window we have into the literary lives of the NT authors (a point on which Sproul remains manifestly correct). 

This would be the case even if the NT authors were indeed the writers of their own works – that’s another point that needs developing:

As I have related in other contexts before, the role of a scribe in antiquity makes arguments based on vocabulary highly questionable. As authors like Richards have shown in The Secretary in the Letters of Paul, a scribe could be assigned responsibility for a work along a wide range of potentialities: They might serve purely as receivers of dictation, or they might be full-fledged authors who are merely told what to do in very general terms, with the credited author simply reviewing the work and signing off. 

My research for Trusting the New Testament indicates that many Biblical scholars simply do not give this aspect of ancient composition enough consideration. Many simply dismiss it, with a trace of impatience, as some sort of excuse manufactured by those who wish to preserve the authority of the New Testament. But this is a non-response to a genuine phenomenon of ancient composition. It seems rather that these authors do not wish to accept that scribal activity renders many of their carefully crafted arguments essentially useless. 

Indeed the resulting “chaos” for deniers of the authorship of NT documents could be considerable: They have almost unanimously accepted Romans as Paul’s work, yet it is also one of his letters where the work of a scribe is most apparent in terms of testimonial evidence: "I Tertius, who write the epistle, salute you in the Lord" (Rom 16:22). If Romans, the letter among those most certainly ascribed to Paul, was influenced by a scribe, what will this do to using Romans as a guide whereby other letters like Ephesians might be judged?
In the end, the hands of scribes make the burden much heavier on those who would deny authority (which, for ancient people, also amounted to authorship) to any New Testament book. Beyond such elements as anachronisms, factors like vocabulary simply become useless as tools for making judgments, as do numerous other factors associated with writing style. One can readily understand why critics would be hesitant to abandon what they would consider one of their “star players”.
There is one final factor we might discuss related to vocabulary as a determining factor in reckoning authorship, and that is the frequency of quotations and allusions. Since ancient writers didn’t have quote marks, we can often only recognize these occasionally. But it could be a larger factor than we realize. The classic case is that of Ephesians and Colossians, which both seem to use a great deal of creedal and hymnal material, which would obviously fudge any attempt to argue against their Pauline authority based on vocabulary. Since this was a high context society, though, it also seems likely that there would be a great many other allusions in the NT texts that we would be unable to recognize – especially if they were allusions to something a person once said, or something in a document not available to us. 

In summary: Tests based on vocabulary are questionable enough as is, and require some stringent rules to be valid. However, even more stringent rules would be needed for an evaluation of ancient documents – and the use of vocabulary tests to determine authorship is therefore far less effective than many critics are willing to concede.

