RedditSuggestion1234 t1_iqwn1vb wrote on October 3, 2022 at 5:14 PM

As I understand it, this is basically a graph of how diverse the vocabulary of any given language is?

orgtre OP t1_iqxdr8t wrote on October 3, 2022 at 8:03 PM

Yes, basically. But in addition to differences in vocabulary diversity, differences between the lines of different languages might be due to differences in the collections of books (corpora) these lines are based on.

On thing that seems to play a role is that the corpora are of different sizes: Lines for both Hebrew and Chinese look quite different from the other languages, and these corpora are also much smaller than the others. Hebrew and Chinese both also use a non-Roman alphabet, but so does Russian, whose corpus is larger. So this is some indication of that Hebrew and Chinese stand apart because of their smaller corpus size.