CORPUS-BASED ESP LEARNING AND TEACNING

Olena Volkova

Scientific Conferences of Ukraine, I Всеукраїнська науково-практична онлайн конференція з прикладної лінгвістики «Корпус та дискурс»

Olena Volkova

Last modified: 2026-01-25

Abstract

Corpus linguistics today is a rapidly developing and widely used branch of linguistics. It is evidenced by the national corpora of various languages created not only for scientific research, but also for learning and teaching foreign languages. The available corpora contain a large collection of online electronic texts that represent different types and functional styles of written or spoken languages available for linguistic analysis. The corpus-related linguistic analysis allows educators to study terms in their natural context to identify typical multi-word units, language patterns, and the frequency of their usage to design materials for ESP teaching, learning, and self-assessment.

Many corpus-related research studies were conducted applying appropriate corpus linguistics techniques with the aim of compiling texts for a specialized corpus (Mateo & Cazevieille, 2015; Beloso, 2015), creating large-scale academic multi-word unit lists (Rogers, et al., 2021), grammar pattern lists (Ma & Qian, 2020), academic field-specific word lists (Le & Miller, 2020; Otto, 2021), and designing ESP courses based on the use of corpus data (Charles, 2014; Lee & Swales, 2006).

In their article Mateo and Cazevieille (2015) describe the steps they followed to compile specialized texts in German in the field of biochemistry and create a corpus. They concluded that the research of terminology is required for the collection of scientific highly specialized terms used in different languages.

A corpus-based EAP course for doctoral students was designed by Lee and Swales (2006). Their students studied on their own the specialized corpora consisting of academic written articles and oral speeches and then compiled two corpora: the collection of their writing (term papers, dissertation drafts, scientific article drafts) and the corpus of the published papers of experts in their discipline-specific field. The students considered the course to be motivating and confidence-building.

The research of Charles (2014) proves that the regular use of own personal corpora composed of research articles by EAP students represents a valuable resource for improving their academic writing. It was also discovered that most students consulted their corpus to check grammar and vocabulary while they were writing papers.

Laufer and Ravenhorst-Kalovski (2010) revealed that to adequately comprehend a specialized text, the lexical coverage of a student should be around 98-99%. To reach such coverage, students need to master academic and field-specific vocabulary and grammar patterns. It causes the necessity for the creation of appropriate lists of high- frequency multi-word units and grammar patterns.

Rogers et al. (2021) compiled a large-scale corpus-based list of multi-word units occurring in academic English The procedure included the following steps: “search, identification, elimination, manual processing, and ... a comparative analysis of the resultant list”. They used up-to-date corpus data from various journals to consider the latest changes in the scientific language usage caused by modern technology development. Their list was produced using ‘concgramming’ (a collocation corpus search method) and was processed by experienced ESL practitioners.

Using a corpus-based approach, Le and Miller (2020) created the list of most frequently occurring medical morphemes for students learning ESP to enhance their English medical vocabulary acquisition. Their corpus investigation procedure consisted of two phases: the compiling of a new list and its validation. In their research they adhered to the definite principles that are typical of the corpus methods: used extensively computers and a large and principled collection of natural publicly available academic medicine-specific texts to analyze the authentic patterns of language applying quantitative and qualitative analytical techniques for compiling the final list.

Ma and Qian (2020) used a self-built extraction system to create and evaluate a list of grammar patterns for the often-repeated content academic verbs. Their grammar pattern list complements the widely used EAP vocabulary list. They proposed a method of extracting grammar patterns for frequently-used content words (verbs, adverbs, nouns, and adjectives) in a corpus.

In her study, Otto (2021) also relied on corpus-based methods for identifying words that are typical of the specialized context of civil engineering, problematic, and confusing for students. The applied system was useful for identifying, teaching, and using the functions of the words characteristic to the specialized context of civil engineering.

Thus, to enhance academic reading, writing and speaking skills students can use available and currently existing resources: 1) both general academic and field- specialized corpora, gradually advancing towards the creation of their own personally compiled corpora (Charles, 2014; Lee & Swales, 2006), 2) academic and field-specific corpus-based lists of collocations and grammar patterns created by researchers to facilitate learner’s language acquisition. The ultimate aim of educators is the establishment and updating of currently existing field-specific corpus-based lists that can serve as a guide for developers of ESP and EAP courses, curricula, and textbooks.

Many studies proved that a corpus-based approach to ESP learning and teaching is a valuable tool for educators and students. Due to modern corpus technologies, academic field-specific collocation lists can be created to be maximally beneficial for learners and help them study more effectively. ESP and EAP corpora can comprise all foreign language levels: phonetics, phonology, morphology, lexicology, syntax, language typology, semantics, and pragmatics to be used as a reference source and as a basic material for students. The designing and application of corpus-based courses have great potential for the formation of student’s language competencies in reading, listening, writing, and speaking.

References

Beloso, B. S. (2015). Designing, Describing and Compiling a Corpus of English for Architecture. Procedia - Social and Behavioral Sciences, 198, 459-464. Doi:10.1016/j.sbspro.2015.07.466
Charles, M. (2014). Getting the corpus habit: EAP students’ long-term use of personal corpora. English for Specific Purposes, 35, 30-40. Doi:10.1016/j.esp.2013.11.004
Laufer, B., & Ravenhorst-Kalovski, G. (2010). Lexical threshold revisited: Lexical text coverage, learners’ vocabulary size and reading comprehension. Reading in a Foreign Language, 22(1), 15-30.
Le, C. N., & Miller, J. (2020). A corpus-based list of commonly used English medical morphemes for students learning English for specific purposes. English for Specific Purposes, 58, 102-121. Doi:10.1016/j.esp.2020.01.004
Lee, D., & Swales, J. (2006). A corpus-based EAP course for NNS doctoral students: Moving from available specialized corpora to self-compiled corpora. English for Specific Purposes, 25(1), 56-75. Doi:10.1016/j.esp.2005.02.010
Ma, H., & Qian, M. (2020). The creation and evaluation of a grammar pattern list for the most frequent academic verbs. English for Specific Purposes, 58, 155-169. Doi:10.1016/j.esp.2020.01.002
Mateo, C. L., & Cazevieille, F. O. (2015). Compiling Texts for a Specialized Corpus in the Biochemistry Domain: Theoretical and Methodological Aspects. Procedia - Social and Behavioral Sciences, 198, 300-308. Doi:10.1016/j.sbspro.2015.07.448
Otto, P. (2021). Choosing specialized vocabulary to teach with data-driven learning: An example from civil engineering. English for Specific Purposes, 61, 32-46. Doi:10.1016/j.esp.2020.08.003
Rogers, J., Müller, A., Daulton, F. E., Dickinson, P., Florescu, C., Reid, G., & Stoeckel, T. (2021). The creation and application of a large-scale corpus-based academic multi-word unit list. English for Specific Purposes, 62, 142-157. Doi:10.1016/j.esp.2021.01.001

Full Text: PDF