Tuesday, September 26, 2023

Vocabulary Range of the Greek of the New Testament Books

As a way of determining the vocabulary range of the Greek of the New Testament books, I ran searches on each book calculating the total number of words in the book and the number of unique roots. It is an easy task using Accordance (as I am for this exercise) or Logos Bible software. I typed the results into Excel and could then calculate a percentage of unique word roots per total words in each book.

The results can only be approximate, of course, because they are highly dependent on the number of words in the selection. For example, a 5-word sentence is likely to have 5 unique words and thus be a 100% ratio. I.e., the longer the document, the lower the ratio one should expect since commonly used words (articles, conjunctions, names, etc.) are going to be used more frequently and thus dilute the uniqueness of words overall. E.g., in the whole NT of Nestle Aland 28, there are 138069 words with 5432 unique roots and a uniqueness ratio of 3.93%. For 3 John, the shortest document, there are 221 words with 112 unique roots and a uniqueness ratio of 50.68%. The table confirms this expected result. Still, despite the length factor, there are some clear and interesting results which are both predictable and surprising.

SORTED BY BOOK LENGTH


Total Words

Unique Roots

Unique Roots / Total Words

LUKE

19483

2043

10.49%

ACTS

18451

2028

10.99%

MAT

18348

1681

9.16%

JOHN

15633

1016

6.50%

MAR

11305

1340

11.85%

REV

9853

912

9.26%

ROM

7113

1058

14.87%

1COR

6832

958

14.02%

HEB

4955

1029

20.77%

2COR

4480

785

17.52%

EPH

2424

529

21.82%

GAL

2232

521

23.34%

1JOH

2143

236

11.01%

JMS

1747

555

31.77%

1PET

1682

546

32.46%

PHIL

1631

441

27.04%

1TIM

1594

537

33.69%

COL

1583

433

27.35%

1THES

1484

364

24.53%

2TIM

1241

454

36.58%

2PET

1102

396

35.93%

2THES

826

250

30.27%

TIT

661

298

45.08%

JUDE

460

228

49.57%

PHLM

337

140

41.54%

2JOH

248

98

39.52%

3JOH

221

112

50.68%

 EXPECTED RESULTS

  • As one might guess, the Johannine literature with it spirals of dialogues and community-shaped language has the lowest ratio of unique roots. The Gospel of John is particularly striking in relation to the other three gospels for having such a lower percentage. Unsurprisingly, 1 John has an extremely low ratio compared to its peers in size like Ephesians and Galatians.
  • It is also perhaps expected that Revelation is a bit lower than expected.
  • As one might also expect, Luke has the highest ratio among the gospels.

SOMEWHAT UNEXPECTED RESULTS

  • What may come as a surprise, however, is the somewhat higher than expected ratio of unique words in Mark
  • More striking, though, at least to me, is the lower than expected ratio in Matthew. It may be, however, that Matthew’s ratio is average in relation to its number of words, and in that case, it makes Luke’s ratio all the more remarkable for being so high.
  • Given the wider range of topics and incidents, I was expecting the ratio for Acts to be higher than Luke, but that they are so similar might be taken as confirmation that the same author wrote both.

How else might this data be used?

THE ENDING OF MARK

There have been many arguments regarding the ending of the gospel of Mark.


 

Total Words

Unique Roots

Unique Roots / Total Words

1

Mark1.1-16.20

11305

1340

11.85%

2

Mark1.1-16.8

11132

1323

11.88%

3

Mark16.9-20

171

99

57.89%

4

Mark3.9-20

167

81

48.50%

What do these results show?

  • Comparing Mark 1.1.1-16.20 (row 1) and omitting 16.9-20 (row 2) does show that the ratio goes up a bit even though the number of words has increased. This only confirms which many others have already noted that the vocabulary of 16.9-20 is not fully consistent with the rest of the gospel.
  • Just looking at Mark 16.9-20 (row 3) on its own, the ratio is only slightly higher than what might be expected for a selection of that length. I did look at a random range elsewhere in Mark of similar length (Mark 3.9-20, row 4), and the difference is significant.
  • I.e., though certainly not proof, these results suggest different authorship of 16.9-20 from the rest of the gospel.

PAULINE AND DEUTERO-PAULINE LETTERS

There have been many arguments regarding which letters are authentically Paul’s and which are disputed (highlighted in blue in the table).


Total Words

Unique Roots

Unique Roots / Total Words

ROM

7113

1058

14.87%

1COR

6832

958

14.02%

2COR

4480

785

17.52%

EPH

2424

529

21.82%

GAL

2232

521

23.34%

PHIL

1631

441

27.04%

1TIM

1594

537

33.69%

COL

1583

433

27.35%

1THES

1484

364

24.53%

2TIM

1241

454

36.58%

2THES

826

250

30.27%

TITUS

661

298

45.08%

PHLM

337

140

41.54%

What do these results show? 

  • If the disputed letters are removed from the table, the ratios are expected and consistent in relation to the total number of words.
  • The main outliers are 1 and 2 Timothy which have a significantly higher ratio than expected. I.e., though certainly not proof, these results suggest different authorship of 1 and 2 Timothy from the genuine Pauline letters.
  • Ephesians, Colossians, and 2 Thessalonians are in the expected range of unique roots, so that cannot be used as an argument against their authentic Pauline  
Anyone else see anything in these results?