Home > Essays > A comparative analysis of the vocabulary richness (the no. of different (…)
A comparative analysis of the vocabulary richness (the no. of different words) in 50 of the world’s greatest novels
Monday 12 August 2024, by
Using a special in-house tool we have analysed the vocabulary of 50 of the world’s most famous Russian, English, French, German, Italian and Spanish novels [1] – in their original language – to determine which of these works have:
– the greatest number of different words
and
– the highest “vocabulary-richness ratio”, the ratio of the number of different words to the overall word-count of the work in question.
For the purpose of this analysis all punctuation marks (other than apostrophes and embedded hyphens) and numbers, special characters, initials and Roman numerals have been ignored.
Upper-case/lower-case variations of the same word have also been ignored for comparison purposes.
TABLE OF CONTENTS
2. THE NOVELS WITH THE GREATEST NUMBER OF DIFFERENT WORDS
3. THE NOVELS WITH THE GREATEST VOCABULARY-RICHNESS RATIO
1. THE NOVELS ANALYSED
We have selected the following major masterworks of Russian, American and European literature for analysis:
2. THE NOVELS WITH THE GREATEST NUMBER OF DIFFERENT WORDS
To put the above figures in perspective, the authoritative study of Shakespeare’s vocabulary by Marvin Spevack (« A Complete and Systematic Concordance to the Works of Shakespeare », 1968) established that there were 29,066 different words in his collected theatrical works.
3. THE NOVELS WITH THE GREATEST VOCABULARY-RICHNESS RATIO
Although novels with a relatively low word-count inherently have a relatively elevated vocabulary-richness ratio, this table is again quite dominated by the masterworks of Russian literature.
[1] and a poem in the case of Dante’s epic The Divine Comedy.