AskOxford Logo Space
  VIEW BASKET  
Space Home
Space
Top Search Space Space
Bottom Space
Curve low Blue
Space
Space
HOME ·  SHOP ·  EDUCATION ·  PRESS ROOM ·  CONTACT US · 
SELECT VIEW
Space UK and the Rest of the World Space USA Space
You are currently in the US view
Space Space


Using the Oxford Corpus

The Oxford English Corpus can be used in many different ways to study the English language and cultures in which it is used. Because it is large, and because it is made up of text from many different subject areas and types of text, it acts as a representative slice of contemporary English from which all aspects of written language, from vocabulary and lexis to punctuation, discourse, and register, can be studied. The Oxford English Corpus allows us to make generalizations about the language as a whole. In this section we show some examples of different types of corpus analysis, particularly those relevant to dictionary writing.

All in a word: eccentric or quirky?

Words do not exist in isolation. Words have strong attractions for other words, and form patterns and associations that are often regular and predictable, though not usually rigid or permanent.These patterns form part of the innate knowledge of a native speaker of the language.

Understanding a word and its behaviour means looking at the other words, or collocates, in whose company it is typically found. Corpus analysis software, such as the Sketch Engine software used by Oxford Dictionaries (see www.sketchengine.co.uk) has revolutionized this kind of research because it can be used to build a detailed statistical profile of a word and its collocates in a matter of seconds, revealing typical usage and indicating the connotations that the word may carry.

Below we see the collocational profile for the word eccentric in the Oxford English Corpus. The column headings describe the relationship of the words listed to the word in question, so that words listed in the first column as 'modifiers' are adverbs, as in 'slightly eccentric', 'somewhat eccentric', and so on, while words listed in the second column under 'modifies' are nouns modified by eccentric, as in 'eccentric millionaire' and 'eccentric character'. The third column lists adjectives which co-occur with eccentric.

collocational profile for the word eccentric

What does this tell us about eccentric? We can spot a number of technical uses (orbit, contraction, femoral, axial); but if we leave these aside and focus on the main sense of the word, some characteristics emerge. Eccentric often occurs with adverbs like endearingly, charmingly, and delightfully, and with other adjectives like lovable and colourful: it appears to have positive connotations. Collocates like millionaire, billionaire, old, elderly, rich, wealthy suggest that we are most likely to use eccentric of elderly, wealthy people. Recluse, reclusive, loner, lonely (and perhaps bachelor) suggest solitary people. It is intriguing that the collocational profile includes both uncle and aunt: are aunts and uncles more likely to be eccentric than any other relatives? Finally, it appears that you are most likely to be described as eccentric if you are British or German.

Compare this with the word quirky. Although quirky has a similar meaning to eccentric, collocation reveals different patterns of usage:

collocational profile for the word quirky

Whereas eccentric is associated with being elderly, rich, or reclusive, quirky is most strongly associated with being humorous or youthful: collocates include playful, cute, whimsical, funny, and adorable. Unlike eccentric, quirky is not typically used of people, but rather of their behaviour and characteristics (humour, smile, etc.). Quirky is also associated with art and creativity: songs, lyrics, films, and novels may be quirky, but very rarely eccentric.

Collocation patterns rarely indicate absolute 'rules': it would not be an error to use eccentric of a young person, or quirky with reference to an old person. But collocation does indicate the implicit connotations and attitudes that go along with the language we use, and which influence our choice of one word rather than another: it feels more natural to describe a rich old uncle as eccentric and to describe his young niece as quirky, rather than the other way round.

Corpus analysis of collocation is a powerful way to expose our implicit shared knowledge of how words behave.

Finding your inner dweeb

The idea of one's 'inner child', popularized in psychotherapy in the 1980s, has spawned an array of humorous variations. These illustrate the way that language is routinely exploited and extended, not as part of a literary endeavour but simply as part of normal creativity in language use. In the Oxford English Corpus the most common of these are (in order):

  • inner geek
  • inner nerd
  • inner diva
  • inner dweeb
  • inner slut
  • inner cynic
  • inner hippie
  • inner brat

Patterns of word formation

The corpus helps to identify the most productive ways in which new words and expressions are coined, and to rank the popularity of coinages. For example, the suffixes -fest, -speak, -tastic, and -ville are all highly prolific in English today, and their use can reveal some of the interests and concerns of our society:

What excesses do we indulge in?

The most common uses of -fest are: slugfest, lovefest, gabfest, crapfest, talkfest, gorefest, snoozefest, hatefest, bitchfest, snorefest, geekfest, gabfest, bloodfest, blogfest, songfest, shitfest, screamfest, filmfest, yawnfest, funfest, sobfest, plugfest, mudfest, fragfest, and suckfest.

Whose jargon annoys us most?

The most common uses of -speak are: management-speak, corporate-speak, marketing-speak, geek-speak, business-speak, therapy-speak, art-speak, lawyer-speak, media-speak, government-speak, consultant-speak, technospeak, adspeak, PR-speak, science-speak, politispeak, military-speak, computer-speak, BBC-speak, tech-speak, legal-speak, and left-speak.

What's worth getting excited about (or not)?

The most common uses of -tastic are: craptastic, poptastic, funktastic, fabtastic, pimptastic, creeptastic, blingtastic, ego-tastic, retrotastic, geektastic, and blogtastic.

Where don't you want to find yourself?

The most common uses of -ville are: dumpsville, dullsville, squaresville, hicksville, smallville, stupidville, and shitsville.

Some day or someday?

A number of common words in English started out as two-word phrases and eventually became fused as single-word forms: forever, somebody, everyone.

The Oxford English Corpus shows the process continuing today. The chart below gives some examples. For instance, it shows that the phrase some time now appears as the fused single-word form sometime in 32% of all occurrences in American English and 19% of all occurrences in British English.

two-word phrases now fused as single-word forms

The tendency to fuse fixed expressions is more common in American than British English. In American English someday has now become more or less standard, substantially outnumbering occurrences of some day; anymore and underway look set to follow. Although the same trend is apparent in British English, it tends to lag behind.

Does the corpus suggest any patterns in the fusing of expressions in single words?
  • Fused forms almost always emerge first in informal English (the weblog and chatroom parts of corpus) and are much slower to spread to more formal, edited text such as newspapers and magazines; of the examples shown here, only someday is well represented across all text types.
  • Fused forms seem to spread more easily if there is a direct analogy with an existing word: anymore benefits from the analogy with anyone and anybody, whereas ofcourse is almost non-existent because there are no analogous of- words.
  • The tendency to fuse may be stronger when the phrase occurs at the end of a clause: 84% of instances of anymore occur at the end of a clause, compared with 46% of instances of any more.
Could of and would of

The Oxford English Corpus contains about 1,000 instances of could of and would of, as in I would of stopped her. About 850 of these occur in representations of direct speech (mostly from the Fiction domain, but also from interviews and courtroom transcripts).This leaves 150 instances of could of and would of as a genuine written form compared with 4 million instances of the standard English syntax would have and could have. However willing we may be to convert have to of in spoken English, the corpus shows that the habit has not spread into written English.

Sly foxes and strong bulls

The most common animal word in the corpus is dog (the 997th most frequent word), followed (in order) by fish, horse, bird, cat, fox, chicken, mouse, cow, bull, lion, rat, tiger, pig, wolf, snake, and sheep. Analysis of animal words in the corpus is complicated: English uses animal words in a dazzling array of idioms and metaphors, often nothing to do with actual animals. We can use the Oxford English Corpus to explore this rich figurative language.

Statistical analysis of similes involving animal words (in the pattern as ... as a cat/dog etc.) generates a detailed picture of the characteristics that English ascribes to animals:

  • cat: nimble, curious, nervous, silent, comfortable, cool
  • dog: sick, loyal, friendly
  • horse: healthy, hungry
  • bull: strong, mad, angry
  • lion: brave, righteous, fierce, bold, protective, strong
  • pig: happy, foul, drunk, sick
  • fox: sly, smart

It is apparent that these characteristics are largely linguistic conventions and often have little to do with our understanding of real animals: horses are healthy, but dogs and pigs (and, according to footballers, parrots) are sick.



print button Printer friendly version




The Oxford English Corpus

Language Facts

Using the Corpus

Composition and Structure

Dictionary Entries

Technical Information


Corpus Demonstrations

links
Space
Space Redarrow Space
Space
Space Redarrow Space
Space
Space Redarrow Space
Space
Space Redarrow Space
Space
Space Redarrow Space
Space
Space Redarrow Space
Space
Space dotted
CurveUp
Blue RightDown
Shorter Oxford English Dictionary Space
Dotted
Space
PRIVACY POLICY AND LEGAL NOTICE  Content and Graphics © Copyright  Oxford University Press, 2008.  All rights reserved.    
Space Oxford University Press
dotted
Space
Space