Wednesday, January 16, 2008

How much Greek do you really have to memorize?

In a previous post I indicated my interest in minimizing memorization in the process of learning Greek and relying more on software to aid in the process of understanding Greek. I just came across some fascinating data from the remarkable James Tauber (the person behind MorphGNT used by the equally remarkable Zhubert). Tauber produced a table indicating how many Greek words are needed to know what percentage of all words in the Greek NT. There are approximately Greek 5390 lexemes total in the NT. The good news here is that you only need to know two words (they happen to be "the"=ο and "and"=και), and you know 20.9% of all words in the NT! (Note that we are not distinguishing between inflected forms but only talking about lexemes here.)

  • You only need to know the top 27 words to know 50% of the NT vocabulary. (And you probably know more, if you can figure out compound forms like εισ·ερχομαι, etc.)
  • Learn the 100 most used words and you know 66% of the NT vocab.
  • If you want to learn all the words used 100 or more times in the NT, you would need to learn 171 words, and you would know almost 73% of the NT vocab.
  • I try to have my students learn words that are used 50 or more times in the NT. This comes out to 310 words which means that they know almost 80% of all words in the NT.
  • If I would ask them to learn words 35+ times, it would another 100 or so words to their vocabulary... and they would know 83% of all words in the NT. Not much of a gain there...
  • Learning words 25+ times means a vocab of 542 words and about 86% of all NT vocab.
One conclusion that might be drawn from these stats is that learning the first 100 lexemes is the most important. The reward for learning more after that decreases quickly...

Tauber provided further reflection on vocabulary learning in another post. He notes that more important than just knowing words is comprehending a sentence. He cites reading theorists who claim that you need to know 95% of the vocab of a sentence to understand it. He provides this chart:

vocab / coverage any 50% 75% 90% 95% 100%
100 99.9% 91.3% 24.4% 2.1% 0.6% 0.4%
200 99.9% 96.9% 51.8% 9.8% 3.4% 2.5%
500 99.9% 99.1% 82.3% 36.5% 18.0% 13.9%
1,000 100.0% 99.7% 93.6% 62.3% 37.3% 30.1%
1,500 100.0% 99.8% 97.2% 76.3% 53.5% 44.8%
2,000 100.0% 99.9% 98.4% 85.1% 65.5% 56.5%
3,000 100.0% 100.0% 99.4% 93.6% 81.0% 74.1%
4,000 100.0% 100.0% 99.7% 97.4% 90.0% 85.5%
5,000 100.0% 100.0% 100.0% 99.4% 96.5% 94.5%
all 100.0% 100.0% 100.0% 100.0% 100.0% 100.0%
This means, for example, that from a vocab point of view, if you know the top 100 words, you would only comprehend .6% of the NT verses. Not so encouraging. Even if you learned 2000 lexemes only 65.5% would be familiar. Requiring memorization of 2000 lexemes is not going to work for my first year class...
So, let me approach this another way. Note that knowing the top 200 words, however, means that you would be familiar with (i.e., you know 96.9% of) half of each of the verses in the NT. That's not terrible. I want my students to recognize enough of what is going on in a verse so that they can at least figure out what they need to look up using software aids without getting too frustrated. So, if I want them to recognize, say, about 75% of 75% of the verses in the NT, I'm roughly calculating that they need to know about 300 words--i.e., the words used 50+ times in the NT.

Okay, a lot of this is estimation, and it does not take into account all the challenges of inflected forms and such. I do think, however, that for students who will get a year or less of Greek instruction that learning those first 100 words is a priority, and that working towards that 50+ list of 310 words is a reasonable goal that will position you quite well to study Greek with the aid of software.

(BTW, Tauber has reflected further on vocabulary learning noting that it is not necessarily the most efficient to learn words simply in order of frequency. His logic makes good sense, but he starts to lose me once he starts doing Python scripts developing algorithms that implement a "simulated annealing approach"!)

Post a Comment