Scientific Conferences of Ukraine, I Всеукраїнська науково-практична онлайн конференція з прикладної лінгвістики «Корпус та дискурс»

Font Size: 
OVERVIEW OF THE FREQUENCY VOCABULARY LISTS AND ZIPF'S
Alla Kondrashova

Last modified: 2026-01-25

Abstract


Computers began to be actively used in linguistics only in the last fourth of the XX century. To that scientists that studied a language mostly engaged in description of grammatical rules and values of words, not relying upon quantitative data. However, when possibility to process the large arrays of texts appeared, it became clear that we lose a great deal, if we do not distinguish the frequency and rare phenomena.

Here in this paper, the outlook of frequency vocabulary lists and their application in the investigation of the language in terms of lexicostatistics are presented. Additionally, Zipf s law will be discussed here as the word's frequency falls under this law.

The use of vocabulary frequency lists in English language teaching and learning has been an area of continued research for roughly the past 170 years. Word lists are one way to help direct vocabulary teaching and learning and, moreover, can be successfully implied in creation of any digital vocabularies as it is already realized.

It is well known that making word lists in field of L2 learning and teaching is usually done for the purpose of designing syllabuses and in particular it is an attempt to find one way of determining necessities as a part of needs analysis.

Word frequency seems an obvious candidate for prioritizing the acquisition of lexis. Frequency information provides a rational basis for making sure that learners get the best return for their vocabulary learning effort by ensuring that the words studied will be met often. What have been achieved so far:

  1. 1953, Michael West's General Service List of English Worlds. It was mainly focused on written English and did not represent spoken English in the same fashion.
  2. 2000, Coxhead's Academic World list. This list acknowledged that students of English for Academic purposes had far different needs than those studying general English so this list vary noticeably from non-academic one.
  3. Oxford 3000. With this list, Oxford University Press created a unique list specifically for the needs of English learners.
  4. 2013, A New Academic Vocabulary List by Gardner and Davies,

Which involved the creation of the new 500 lemma list based on the 120

million-word COCA academic corpus.

  1. 2013, “ New General Service List, created by Brezina and Gabalovska. This list is constructed on transparent, replicable and quantitative criteria. Moreover, 378 modern lemmas were included in this list.

Throughout the development of these lists, different ideas on how to best use them for English language learning and teaching have been expressed. Several keys area have emerged namely, machine learning, natural language processing and machine translation (Bourkett et al., 2015). Upon detailed study of the frequency vocabulary lists, it was revealed that in order to understand a human language it is sufficient to have incomplete knowledge. What is more important, that complete knowledge of the language does not exist: no one can know all words so that understanding of any given sentence could be overwhelming and profound.

Consider the following situation. You are an active English learner and your principal goal is to understand and speak fluently. How many words should be learned to understand 20 % text in this language or at least to identify 20 % words in text? Obviously, it is recommended to study frequency words. Thus, to know a simple word “dog” is more important than “male” or “rider”. The word's frequency submits to simple mathematical pattern that was introduced by well-known American scientist George K. Zipf in the middle of XX-th century (Li, 2002). This law links frequency of word F with its grade n5. Corresponding correlation that was repeatedly tested empirically for different texts different languages looks

A

like :F = — where y is the parameter of distribution and in many cases has a near to

unit value. In practice, modified Zipf s law is applied, namely law with Mandelbrot amendments. Zipfs law was originally formulated in terms of quantitative linguistics, stating that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus, the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.

For example, in the Brown Corpus of American English text, the word “the” is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences (69,971 out of slightly over 1 million). True to Zipfs Law, the second-place word “of’’ accounts for slightly over 3.5% of words (36,411 occurrences), followed by “and” (28,852). Developers of language processing application bear in mind the frequency list and Zipf s law in order to recognize slipups, thus it simplifies to produce logical translation or interpretation.

Simple example is diversity of available applications among them: Grammarly, ProWritingAid, Ginger,Microsoft Word, Google Docs WhiteSmoke etc. Clearly, language learners take into account which word should be undoubtedly memorized as most of the frequency vocabulary lists have markers to help learners identify the frequency rate of the given word.

It should be stressed that learning foreign language does not always imply good knowledge of grammar and lexis. The awareness of the frequency vocabulary list has become a crucial unit in the modern transformation in the educational sphere. The frequency vocabulary serves as fundamental tool for designing computer applications which in turn significantly facilitate communication.

References

  1. Bougards, P., & Laufer, B. (2004). Vocabulary in a second language: Selection, acquisition and testing. Philadelphia, John Benjamins, B.V.
  2. Bourkett, T. (2015). An investigation into use of Frequency Vocabulary lists in University. Intensive English Programs. International Journal of Bilingual &Multilingual Teachers of English, 3(2), 71-83.
  3. Shmitt, N. (2000). Vocabulary in Language Teaching (pp 231-239). Cambridge: Cambridge University Press.
  4. Li, W. (2002). Zipf’s law everywhere. Glottometrics, 5, 14-21.

Full Text: PDF