Facts about the language
The full Oxford English Dictionary is the largest record of words used in English, past and present. It contains words that are now obsolete or rare (such as xenagogue 'a person who guides strangers' and vicine 'neighbouring or adjacent') in addition to the latest coinages such as bling and podcasting.
The second edition of the OED, published in 1989 and consisting of twenty volumes, contains
more than 615,000 entries, and the third, available online, is expanding all the time, with batches
of 2,500 new and revised words and phrases being added in regular quarterly updates.
How many words are there in English?
It is a question often asked, but not so easily answered. Even the OED does not seek to include every specialized technical term or slang or dialect expression. New words are constantly being invented, developed from existing words, or adopted from other languages. Most will be used rarely, or only by a small group of people. Hence an unlimited number of words may occur in speech and writing which will never be recorded in even the largest dictionary.
Furthermore, what exactly is a word? Clearly we should include single units such as cat and dog. But are the plurals cats and dogs separate words? Should we include compounds such as walking stick, which are made up of two existing words? What about abbreviations like BBC and Dr, which may be freely formed in limitless combinations: are they words? What about proper names?
How many words do we use?
Although it may be impossible to know the number of words in English, the Oxford English Corpus
can help us assess the number of words in current use.
It is most useful to count base words or lemmas rather than individual inflectional word-forms; for example, climbs, climbing, and climbed are counted as examples of the lemma climb. Just ten different lemmas (the, be, to, of, and, a, in, that, have, and I) account for a remarkable 25% of all the one billion words used in the Oxford English Corpus. If you were to read through the corpus, one word in four would be an example of one of these ten lemmas. Similarly, the 100 most common lemmas account for 50% of the corpus, and the 1,000 most common lemmas account for 75%. But to account for 90% of the corpus you would need a vocabulary of 7,000 lemmas, and to get to 95% the figure would be around 50,000 lemmas.
The remaining 5% of the corpus consists of a very large number of lemmas which occur rarely: words
like evidentialist or microhouse, which may occur only once every several million words. Like all natural languages, English consists of a small number of very common words, a larger number of intermediate ones, and then an indefinitely long 'tail' of rare terms.
| Vocabulary size
(no. lemmas) |
% of content in OEC |
Example lemmas |
| 10 |
25% |
the, of, and, to, that, have |
| 100 |
50% |
from, because, go, me, our, well, way |
| 1000 |
75% |
girl, win, decide, huge, difficult, series |
| 7000 |
90% |
tackle, peak, crude, purely, dude, modest |
| 50,000 |
95% |
saboteur, autocracy, calyx, conformist |
| >1,000,000 |
99% |
laggardly, endobenthic, pomological |
The long tail means that accounting for 99% of the Oxford English Corpus requires over a million
lemmas.This would include some words which may occur only once or twice in the whole corpus:
highly technical terms like chrondrogenesis or dicarboxylate, and one-off coinages like bootlickingly or unsurfworthy.
If we decide that around 90-95% of the corpus gives a reasonable idea of an average vocabulary, we are
left with a figure somewhere in the range of 7,000-50,000 lemmas: say, 25,000.What does a vocabulary
of this size represent? It represents the set of most significant words in English: those which occur
reasonably frequently and which account for all but a small part of everything we may encounter in
speech or writing. It includes all the words that we actively use in general everyday life.
It is interesting to note that most reasonably sized dictionaries contain significantly more than 25,000 lemmas.The 11th edition of the Concise Oxford English Dictionary lists more than 75,000 single-word lemmas, which means that the majority of its entries must belong to the long tail of extremely rare words.This makes good sense: such terms occur infrequently, but when they do occur they are likely to be crucial to what is being said, and the reader might well want to look them up.The idea of a quantifiable vocabulary should be seen in this light: the words we ignore for the purposes of the exercise may be very rare, but in context they may be very important.
What is the commonest word?
Based on the evidence of the billion-word Oxford English Corpus, the 100 commonest English words
found in writing around the world are as follows:
1 the
2 be
3 to
4 of
5 and
6 a
7 in
8 that
9 have
10 I
11 it
12 for
13 not
14 on
15 with
16 he
17 as
18 you
19 do
20 at
21 this
22 but
23 his
24 by
25 from
|
26 they
27 we
28 say
29 her
30 she
31 or
32 an
33 will
34 my
35 one
36 all
37 would
38 there
39 their
40 what
41 so
42 up
43 out
44 if
45 about
46 who
47 get
48 which
49 go
50 me
|
51 when
52 make
53 can
54 like
55 time
56 no
57 just
58 him
59 know
60 take
61 people
62 into
63 year
64 your
65 good
66 some
67 could
68 them
69 see
70 other
71 than
72 then
73 now
74 look
75 only
|
76 come
77 its
78 over
79 think
80 also
81 back
82 after
83 use
84 two
85 how
86 our
87 work
88 first
89 well
90 way
91 even
92 new
93 want
94 because
95 any
96 these
97 give
98 day
99 most
100 us
|
Many of the most frequently used words are short 'function words' whose main purpose is to join
other, longer words rather than determine the meaning of a sentence. We are often more interested in
the frequency of 'content words': we explore this below, showing the ranking according to the main
word classes:
| Nouns |
Verbs |
Adjectives |
1 time
2 person
3 year
4 way
5 day
6 thing
7 man
8 world
9 life
10 hand
11 part
12 child
13 eye
14 woman
15 place
16 work
17 week
18 case
19 point
20 government
21 company
22 number
23 group
24 problem
25 fact
|
1 be
2 have
3 do
4 say
5 get
6 make
7 go
8 know
9 take
10 see
11 come
12 think
13 look
14 want
15 give
16 use
17 find
18 tell
19 ask
20 work
21 seem
22 feel
23 try
24 leave
25 call
|
1 good
2 new
3 first
4 last
5 long
6 great
7 little
8 own
9 other
10 old
11 right
12 big
13 high
14 different
15 small
16 large
17 next
18 early
19 young
20 important
21 few
22 public
23 bad
24 same
25 able
|
Nouns
The commonest nouns are time, person, and year, followed by way and day (month is 40th). Notice that many of these words are very common because they have more than one meaning: way and part, for example, are listed in the Concise OED as having 18 and 16 different meanings respectively. They often also form part of common phrases: some of the frequency of time, for example, comes from adverbial phrases like on time, in time, last time, next time, this time, etc.
Verbs
As one would expect, the commonest verbs express basic concepts. Strikingly, the 25 most frequent
verbs are all one-syllable words; the first two-syllable verbs are become (26th) and include (27th). Of these 25, 20 are Old English words, and three more, get, seem, and want, entered English from Old Norse in the early medieval period. Only try and use came from Old French. It seems that English prefers terse, ancient words to describe actions or occurrences.
Adjectives
Again, most of the top adjectives are one-syllable words, and 17 out of 25 derive from Old English:
only different, large, and important are from Latin. In terms of the words' meanings, great is higher in the ranking than big, probably because of its informal sense 'very good'. Little is surprisingly high at 7, as compared with small at 15. Bad is unexpectedly low at 23: is this because we have such a large choice of synonyms available for expressing 'bad things'?
Printer friendly version
|