The problem with trying to
quantify the number of words in a language is that there is no precise way of defining the two most important things in that sentence
– words and
language.
What is a word?
What, exactly, counts as a word? We have a general sense –
dog is a word,
bnick is not, but the challenge with really figuring out what counts as a word is highlighted by some of the examples in the sentence above beginning with
nonsense.
1. Morphology
Does
nonsense count as a word? Or is it the same as
sense? What about
dog and
dogs? Or
dog and
hot dog? How many words is
flame,
flames, inflame,
inflammable, flammable? Or
grandfather,
great grandfather, great-greatgrandfather and so on?
English, like almost every other language, has morphology, which is a system of building words from meaningful word parts. Loosely, morphology can be broken down into
Inflectional morphology (run -> runs),
Derivational morphology, (run -> runner) and
Compounding (with varying degrees of coherence, e.g., cab driver, toothpick) with lots of gray area in between.
There is no way of deciding which of these word forms count as a word in a way that is not completely arbitrary. Lest you think this is a minor factor, these would easily change your answer by close to an order of magnitude as you can see from the
flame or
grandfather examples. Almost every word is subject to morphology and there is no principled way of deciding when the result should be counted as another word or not.
2. Synonyms, homonyms and heteronyms, oh my!
Crap is a verb.
Crap is a noun.
Crap means a lie and
crap means feces. I guess you can count that all as one word, but what about same spelling and a more radically different meaning, e.g.,
bank (river) and
bank ($)? Or how about same spelling, different meaning and different pronunciation, e.g.,
desert (sand) and
desert (leave)? Or if spelling is your guide, what about different spelling of the same meaning, e.g.,
advisor v.
adviser?
Indeed, almost every permutation of same v. different meaning, spelling and pronunciation can be found among (amongst,
wink wink) words.
3. Acronyms
Moving on to the next word in our little rant,
b.s. Are you counting abbreviations and acronyms in your list and if so, how?
B.S. is pretty conventionalized, but certainly not as much as
laser, though more so than
POTUS, though that depends if you’re working in politics or not, not to say what the status is of
EKG, an acronym you certainly hear more than the real word itself. As above, whatever deciding line select will be completely arbitrary. The number here probably isn’t too high – maybe on the order of 10s of thousands, but it serves to highlight another parallel problem, that of:
4) Neologisms
Did you like the word
redonkulosity? I just made it up. Or at least, I thought I just made it up, but it does show up in google w/ 4000 hits. That was after thinking I had sort of created the novel word ridiculosity – spell check says it isn’t one – but Merriam Webster says it is.
The fact is that there is no definitive way of deciding whether a new word should count as, well, a word. New entries in the OED or MW are decided by a person, or group of people, according to some general guidelines relating to the frequency of use, place of use and so on. These are not guidelines handed down from on high, as much as we revere the Oxford English Dictionary, but are, again, arbitrary. They even vary from dictionary to dictionary resulting in something like a two-fold difference in the size of different dictionaries.
5. Archaisms
Next up,
bushwa, a word I didn’t even know until I read this article
Keeping It Real on Dictionary Row, where Geoff Nunberg debunks the charlatans at Global Language Monitor, albeit briefly. That’s because the word has been going out of style since about 1950. That’s a relatively recent decline as compared to other words, like
emmet or
pismire, both words for
ant, which went out of use hundreds of years ago.
So not only do we not have a concrete way of deciding when to
add a word, we similarly have no way of deciding when to
remove a word from our list, either. Given that languages are in a constant state of flux, that creates a moving target wherein the exit criteria should be linked to the entrance criteria, which itself is arbitrary. So, again, more arbitrariness.
6. Borrowings
Finally,
bubbe-meises, my favorite in the list, which is a word in the English dictionary. It is clearly a borrowing, in this case from Yiddish roughly meaning Old Wives’ Tale, but with a bit more of a sense of dismissal. Words are borrowed into English not with a single leap, but gradually, at different rates for each word depending on pronunciation, frequency, semantics and so on. In counting the words of English, you will have to somehow define yet another cut-off point here when figuring out what to count and what not to count.
7. Specialty Words
And last but not least, indeed perhaps most, in terms of how it would affect your final number, we have the millions upon millions of words associated with different
scientific specializations. Not to say that Critical Theory hasn’t come up with its own unique vocabulary, but no one quite compares to Chemists and Entomologists in outdoing everyone else in word creation.
There are 350,000 species of beetles on this planet, and each can be given its own name. And that’s just beetles. There are up to 1 billion different species of bacteria. If any of the species in the Mammalia class each gets its own word, so too with at least some of the Prokaryot kingdom, no?
A similar problem exists with chemicals and all the permutations and combinations that lead to a near-infinite number of possibilities wherein the only real limits are those of chemistry and not language. How would that work in your word count?
Oxygen, certainly yes. What about Dihydrogen monoxide? Or its synonyms, Dihydrogen Oxide, Hydrogen Hydroxide, Hydronium Hydroxide and Hydric acid? Get to know these chemicals (
Facts About Dihydrogen Monoxide), but good luck in figuring out how to count their names.
Clearly, there are some tough (and by tough, I mean completely arbitrary) choices to be made in terms of counting words. Now what about
language?
8. What is a Language?
Oh..and this is a question that bugs linguists! I speak English. You (probably) speak English. We certainly don’t speak the same precise language in terms of word knowledge. Which one do we use? There are so many different levels at which a language can be defined that it’s impossible to declare a definition of what the limits of any given language are.
First, for a language like English, you have national differences. The language of America will have different words than that spoken in Canada, Australia and the UK, not to mention what people speak in India and Nigeria.
And even within a single country, you have regional dialects that have different lexicons:
The different ways of saying coke in America. Source: Reddit.
And on down to the individual person, or
Idiolect, where each has his own way of speaking English, with different lists of words in his head. If you want to move away from the individual person and try to define the English language that is spoken in the world, it’s not clear what that really means. Is that the sum total of all words across all self-reported English speakers? That’d be a mess, wouldn’t it!
You may try to go for some principled definition, e.g., the words in all books published in English, but that, too is problematic for who it excludes and the pride of place you give to literacy, the literary and editors. Thus, as with the definition of
word, you’re stuck with an arbitrary definition of what a language is.
Summary
Without a clear definition of
word and without a clear definition of
language you kind of sort of have no practical way of counting anything of anything. And we’re not talking about requiring a level of exactitude that is within some reasonable margin of error.
We’re talking about potentially orders of magnitude difference depending on how you decide. So, yes, by all means, count the number of word of English and say it’s 1,019,430, so long as you’re comfortable saying that’s +/- 1,000,000 words.
This is Why You Can Never Know How Many Words are in a Language.