Viewing a single comment thread. View all comments

orgtre OP t1_iqxdr8t wrote

Yes, basically. But in addition to differences in vocabulary diversity, differences between the lines of different languages might be due to differences in the collections of books (corpora) these lines are based on.

On thing that seems to play a role is that the corpora are of different sizes: Lines for both Hebrew and Chinese look quite different from the other languages, and these corpora are also much smaller than the others. Hebrew and Chinese both also use a non-Roman alphabet, but so does Russian, whose corpus is larger. So this is some indication of that Hebrew and Chinese stand apart because of their smaller corpus size.

2