The Oxford English Corpus is such stuff as Oxford Dictionaries are made on. It contains texts collected mainly from innumerable Internet sites. Their total length so far is about two billion words. The texts represent different varieties of English, different genres, styles and registers; they all come from present-day English (from the year 2000 onwards) and are supposed to be representative of the current state of the language. Here are the 100 commonest words in that vast material:
A few facts are worth noting.
Almost all these words are “native” in the sense that they continue forms inherited from Old English. Most of them can be traced back in time still further, to Proto-Germanic, and quite a large number have their roots in Proto-Indo-European, the most distant reconstructible ancestor of English. Only four of them (just, people, use, and the second syllable of because) are Old French loanwords (first attested in 14th-century documents). A few are of Old Norse origin: notably, the 3pl. personal pronouns they, their, and them, but also want, and possibly take, while get and give owe at least their initial /ɡ/ to Old Norse influence (the closely related Old English verbs began with the palatal glide /j/). The Old Norse loans were taken from the Scandinavian settlers in the Danelaw area, presumably between 800 and 1200. The remaining items (ca. 90% of the list) have “always” been English.
This illustrates the rule that the more common a word is, the less likely it is to undergo lexical replacement [see: Frequency of word-use predicts rates of lexical evolution throughout Indo-European history]. If we looked instead at the entire lexicon of present-day English, we would find that relatively recent borrowings from foreign languages, most often Latin or French, account for at least some 80% of the vocabulary. That’s because rarely used words are much more likely to be substituted.
Many of the most common items are not content words that indicate things, ideas, actions, states, etc., but function words that mean little or nothing by themseves. They join content words to modify their meaning, express grammatical relationships, glue the sentence together, and facilitate discourse. They include articles, pronouns, conjunction, prepositions, simple adverbs, auxiliary and modal verbs, quantifiers, and miscellaneous “particles”. Nearly all the words in the first two columns above are of this kind (the only exceptions being say, get, and go, whose meaning is not particularly specific either). The “Top 100” words are extremely successful replicators: practically every sentence must contain a few of them. Their occurrences make up about 50% of the total material in the Oxford English Corpus!