Iberica 13 Ibérica 43 (2022): 129-154 ISSN: 1139-7241 / e-ISSN: 2340-2784 Abstract The aim of this paper is to offer a much-needed longitudinal description of lexical richness in the L2 historical writing of CLIL bilingual secondary school students over a three-year period. The automated tool Coh-Metrix 3.0 was used to analyse the evolution of the lexical diversity, density and sophistication of a learner corpus made up of 75 history essays composed by the same 15 students as part of their L2 (English-taught) history lessons. The results show an increase in the number of lexical items employed by the students and in the abstractness and associability of these items. This indicates that students improved their lexical richness, while developing their writing proficiency and history literacy skills. Keywords: lexical richness, L2 writing proficiency, history literacy, Coh- Metrix, CLIL. Resumen Estudio longitudinal de la escritura histórica en una L2: la riqueza léxica y la competencia escrita en el Aprendizaje Integrado de Contenidos y Lenguas Extranjeras El propósito de este estudio es ofrecer una descripción longitudinal de la riqueza léxica en el discurso sobre Historia en L2 de estudiantes de enseñanza secundaria bilingüe AICLE a lo largo de tres años. Se ha empleado la herramienta computacional Coh-Metrix 3.0 para analizar la evolución de la diversidad, densidad y sofisticación léxicas de un corpus de aprendices A longitudinal study of L2 historical writing: lexical richness and writing proficiency in Content and Language Integrated Learning Adrián Granados, María Dolores López-Jiménez & Francisco Lorenzo Universidad Pablo de Olavide (Spain) agranav@upo.es, mdlopezji@upo.es, fjlorber@upo.es 129 Ibérica 43 (2022): 129-154 ADRIáN GRANADOS, MARíA DOLORES LóPEz-JIMéNEz & FRANCISCO LORENzO compuesto por 75 textos producidos por 15 estudiantes en sus clases de Historia en inglés como L2. Los resultados muestran un incremento en la cantidad de unidades léxicas empleadas por los estudiantes, así como en la abstracción y asociabilidad de los términos, lo cual indica que los estudiantes mejoraron su riqueza léxica, a la vez que desarrollaron su competencia escrita y su literacidad histórica. Palabras clave: riqueza léxica, competencia escrita en la L2, literacidad histórica, Coh-Metrix, AICLE. 1. Introduction The writing-to-learn approach (e.g., Britton, Martin & Rosen, 1966) and its secondary trends, ‘writing across the curriculum’ (e.g., Young & Fulwiler, 1986) and ‘writing in the disciplines’ (e.g., Deane & O’Neill, 2011), share the assumption that disciplinary content is deeply ingrained in the very act of literacy development. Furthermore, they are concerned with disciplinary discourses and their enactment in genres, which have their own particular characterisation across all language levels (Rose, 2008; Shanahan & Shanahan, 2008). These trends as a whole have paved the way for the current consideration of disciplinary content and writing conventions as focal points in literacy research, so much so that disciplinary literacy is often added to the traditional distinction between basic interpersonal communicative skills (BICS) and cognitive academic language proficiency (CALP) (Dressen- Hammouda, 2008; Harwood & Hadley, 2004; see Cummins, 1979, for BICS and CALP). In this study, the focus is on historical literacy, and the writing proficiency of a group of students is examined regarding the specialised language of history. Writing proficiency is a subset of language competence in which the mastery of genres and rhetorical devices should be combined with language-specific abilities, such as the use of a range of vocabulary and syntactic structures (Wolfe-Quintero, Tnagaki and Kim, 1998; in Lahuerta, 2015). Indeed, the linguistic features employed by most writing researchers fall into three areas: lexis, syntax and cohesion (McNamara, Crossley & McCarthy, 2010). Of these constructs, lexical richness is considered one of the most important proxies for text quality (e.g., Crossley & McNamara, 2011; Engber, 1995; Grobe, 1981; Jarvis, 2002; Malvern, Richards, Chipere & Durán, 2004; McNamara et al., 2010; Nold & Freedman, 1977) and is perhaps the most commonly used one (Crossley, 2020). Both usage-based and psycholinguistic 130 approaches (see Ellis, 2002, 2012, respectively) assume that “more proficient writers produce words that are more difficult to process and recognise, either because of exposure to the words or because of properties inherent to the words” (Crossley, 2020, p. 418). The aim of this paper is therefore to provide a longitudinal description of the lexical richness of secondary school students’ L2 history essays (see Breeze & Gerns, 2019, for the impact of academic writing instruction on this population). The description of L2 development in a school setting is still incomplete for several reasons. Firstly, language assessment is not often content-bound and therefore the evidence of competence derives from language output disconnected from both the discipline in question (e.g., history, science, mathematics, etc.) and the curriculum (the Industrial Revolution, ecosystems, integers, etc.). Secondly, corpora produced in formal learning settings are mostly cross-sectional, which means that writers’ development is difficult to track (Dóczi & Kormos, 2016; Nikula, 2017; Pellicer-Sánchez, 2018). By tracing how the lexical richness of secondary education students’ L2 historical essays developed over a period of three years, we wanted to gain further insights into the evolution of academic writing proficiency at a critical moment in students’ literacy development, namely from early to mid-adolescence. 2. The development of lexical richness 2.1. Lexical richness Despite the popularity of linguistic lexical features as a measure of text quality and learners’ writing proficiency, there is still a certain disparity in the conceptualisation of lexical richness. For Crossley (2020), it consists of the number of unique words (lexical diversity), the proportion between content and function words (lexical density) and the proportion of advanced words (lexical sophistication) in a text. The problem lies in the operationalisation of the advanced word construct. Traditionally, research has focused on the number of low frequency words (Laufer & Nation, 1995), but, as Crossley puts it, this construct has evolved to encompass a vast number of word properties (Crossley, 2020, p. 418): Sophisticated words have been defined as words that are more likely found in academic texts (Coxhead, 2000), words that are less concrete, imageable, and familiar (Crossley & Skalicky, in press; Saito et al., 2016; Salsbury, A LONGITuDINAL STuDY OF L2 HISTORICAL WRITING Ibérica 43 (2022): 129-154 131 Crossley, & McNamara, 2011), words that have fewer phonological and orthographical neighbors, words that have higher latencies in word naming and lexical decision tasks (Balota et al., 2007), more specific words (Fellbaum, 1998), and words that are less diverse based on context (McDonald & Shillcock, 2001). For Jarvis (2013, 2017; in Vanhove, Bonvin, Lambelet & Berthele, 2019), the lexical richness of a text may be reflected in six dimensions: the total number of words (volume), the lexical diversity (variability), the equal or unequal repetition of words (evenness), the frequency of words in the language as a whole (rarity), the similarity of words (disparity) and the distribution of repeated words in the text (dispersion). The problem with this theoretical model is, once again, the operationalisation of the dimensions, particularly those pertaining to the textual relationship between words: evenness and dispersion. For evenness, Jarvis (2013b) uses the standard deviation of the counts of tokens per type. Regarding dispersion, he considers it to be the mean distance between different tokens of the same type, averaged over all types in the text, but admits that he presently computes it as “the number of times that types are repeated within the next n (e.g., 20) tokens” (Vanhove et al., 2019: 502). As can be appreciated, both conceptualisations are totally compatible, only differing in the grouping of indices in dimensions and the extent to which textual relations between words are considered. Nevertheless, studies focusing on the development of lexical richness tend to use a combination of the aforementioned parameters. 2.2. Lexical richness and writing proficiency A comprehensive review of the empirical findings regarding the relationship between text quality and lexical richness can be found in Crossley (2020). With respect to L1 writing, research has shown that more proficient writers tend to use more academic words (Douglas, 2013), more specific and less1 polysemous words and more imageable and concrete words (Crossley, Roscoe, McNamara & Graesser, 2011; McNamara, Crossley & Roscoe, 2013), less meaningful words (McNamara et al., 2013), longer words (Crossley, Weston, McLain & McNamara, 2011; Gardner, Nesi & Biber, 2019; Haswell, 2000), less familiar words (Crossley, Weston, McLain & McNamara, 2011) and more infrequent words (Crossley, Roscoe, McNamara & Graesser, 2011; McNamara et al., 2010). ADRIáN GRANADOS, MARíA DOLORES LóPEz-JIMéNEz & FRANCISCO LORENzO Ibérica 43 (2022): 129-154132 As to L2 writing, similar patterns have been reported. According to the research, more proficient L2 writers tend to use more specific and less polysemous words (Guo, Crossley & McNamara, 2013; Kyle & Crossley, 2016), less meaningful words (Crossley & McNamara, 2012), longer words (Grant & Ginther, 2000; Reppen, 1994) and less familiar and more infrequent words (Crossley & McNamara, 2012). The only difference in L1 writing has been observed in the imageability and concreteness of words, as more proficient L2 writers have been found to use less imageable words (Crossley, Kyle, Allen, Guo & McNamara, 2014). According to Jarvis’s theoretical model (2013, 2017), three of his six dimensions predict expert ratings of overall text quality (Vanhove et al., 2019): variability or lexical diversity, (Crossley & McNamara, 2011; Engber, 1995; Grobe, 1981; Jarvis, 2002; Kuiken & Vedder, 2014; Malvern et al., 2004; McNamara et al., 2010), rarity or the number of less frequent words (Crossley & McNamara, 2011; Guo et al., 2013; Malvern et al., 2004; McNamara et al., 2010) and volume or the overall number of words (Grobe, 1981; Jarvis, Grant, Bikowski & Ferris, 2003; Nold & Freedman, 1977). Given the considerable number of lexical indices, Crossley, Salsbury, McNamara and Jarvis (2010) attempted to identify those that better predict human ratings of lexical proficiency. After analysing word length, lexical diversity, word frequency, hypernymy, polysemy, semantic co-referentiality, word meaningfulness, word concreteness, word imageability and word familiarity, they concluded that the best predictors of written lexical proficiency were lexical diversity, hypernymy and frequency, which explained 44 per cent of the variance in the human evaluations. 3. The development of history literacy The development of lexical richness and writing proficiency needs to be framed within the development of disciplinary literacy. Biliteracy –and therefore disciplinary literacy in an L1 and L2– is a continuum (Hornberger, 2004). The transition from plain, here-and-now conversational language (BICS) to mature there-and-then academic language (CALP) is a watershed in individual language use, especially in writing. The longitudinal study of general academic language has brought out various aspects of language growth and development across life stages (see Biber, 1992, on academic genre acquisition; Christie, 2012, on language education from a functional A LONGITuDINAL STuDY OF L2 HISTORICAL WRITING Ibérica 43 (2022): 129-154 133 perspective; Grabe, 2002, on the transition from narrative to expository texts; Ortega & Byrnes, 2008, on advanced discourse). The development of academic language results from the consolidation of the language of a discipline in the form of sentential components (lexicogrammar) and discourse aspects (the discipline’s functions and genres). These language features shape knowledge structures in each academic area and constitute disciplinary literacy, to wit, literacy in biology, mathematics, history, etc. (Shanahan & Shanahan, 2008). Thus, just as descriptions and definitions are key to science (Mohan, Leung & Davison, 2001) and argumentation is important in algebra (Prediger & Hein, 2017), so too is history characterised by particular language features in which lexical richness plays a substantial role: nominalisations, implicit causal and temporal organisation and cause-effect relations within clauses (see Achugar & Schleppegrell 2005; Coffin, 2006, 2009; Lorenzo, 2017; Nokes, 2013; Schleppegrell & Colombi, 2002; see also Achugar & Carpenter, 2014, for a description of language in history, both as an L1 and L2). Furthermore, history is fundamentally a written discipline, to the point that the historical periods for which there are no written testimonies are referred to as prehistory. The linguistic turn of this discipline has even led historiographers to declare that ‘history as science’ is nothing more than a ‘literary artefact’ (White, 2010). History literacy is an evolutionary construct. Coffin (2006) proposes a three- stage model of historical thinking: first, a purely narrative period (recording, corresponding to the 11-13 age bracket); then, an exploration of causes and consequences including multifactorial causality (explaining, corresponding to the 14-16 age bracket) and, finally, personal judgement, plus an ideological stance (arguing, corresponding to the 16-18 age bracket). When learning history, students need to leverage arguments, evaluations, generalisations, and abstractions in order to progress (Christie & Maton, 2011). Mature history narratives employ a higher concentration of nominals, more morphological narrative complexities and more present and past participles over time (Asención-Delaney & Collentine, 2011). These features modify the writing styles of students, who can consequently meet more complex academic and discursive requirements. ADRIáN GRANADOS, MARíA DOLORES LóPEz-JIMéNEz & FRANCISCO LORENzO Ibérica 43 (2022): 129-154134 4. Methodology 4.1. Research background This research builds on a series of previous studies in the European field of CLIL (Granados, Lorenzo-Espejo & Lorenzo, 2021; Lorenzo & Granados, 2020; Lorenzo & Moore, 2009; Lorenzo & Rodríguez, 2014; Lorenzo, Granados & Rico, 2021), inquiring into issues such as the advantages of CLIL versus monolingual education and the description of incidental learning and positive transfer between an L1 and an L2. Particularly, Lorenzo, Granados and ávila (2019) explored the development of fluency, syntactic complexity, and text easability of the learner corpus analysed in this paper, and Granados and Lorenzo (2021) described the development in the use of connectives. In the research context described above, the aim of this study is to analyse the development of L2 written lexical richness in the discipline of history and how it fluctuated in a formal bilingual setting within an established time frame: three academic years. 4.2. Research questions In order to fulfil this objective, several research questions were formulated: 4.2.1. RQ 1. Did the lexical richness of the students’ L2 historical writing develop over time? Employing Crossley’s (2020) conceptualisation, written lexical richness was analysed as the number of unique words (lexical diversity), the proportion between content and function words (lexical density) and the proportion of advanced words (lexical sophistication) in the essays. 4.2.2. RQ 2: If so, which lexical dimensions evolved in the students’ L2 historical writing? Each dimension was studied separately to determine whether or not they developed differently as students matured. If a dimension evolved, it meant that it developed during the critical period of adolescence and that it was sensitive enough to maturation to vary over a three-year period. 4.2.3. RQ 3: If so, was there any lexical dimension that did not evolve in the students’ L2 historical writing? This would imply that there are dimensions that did not develop during the A LONGITuDINAL STuDY OF L2 HISTORICAL WRITING Ibérica 43 (2022): 129-154 135 critical period of adolescence or which were not sensitive enough to maturation to vary over a three-year period. 4.3. Sampling and participants This study was performed on a sample of students from a state secondary school in Andalusia (southern Spain), which has been running an optional English-Spanish bilingual programme for the past 15 years, in keeping with the Spanish and European trend towards CLIL-type multilingual education. The study sample was made up of 20 students enrolled in the bilingual programme, all of whom were L1 Spanish speakers and belonged to the same grade and class. This made it possible to neutralise many of the variables present in learning environments (the teaching methodology, the quality and quantity of language exposure, the number and nature of courses taught in English, etc.), thus providing an adequate setting for a longitudinal study. Since students were in the Andalusian bilingual programme, they received 4- 5 weekly contact hours of explicit L2 instructions (depending on the school year they were in) and at least two content subjects were taught in the L2 (one of them always being Social Sciences). A minimum of 30% and a maximum of 50% of the curriculum had to be taught in English (see Andalusian Department of Education, 2017, for more information). These students were tested five times over a three-year period. When the first test (Test 1) was administered, they were all ninth-graders (aged between 14 and 15) who had already received two years of education in the bilingual programme. They had an English level of A2 according to the Common European Framework of Reference for Languages (CEFR). By the time the fifth test (Test 5) was administered, they had become eleventh-graders (aged between 16 and 17) and were expected to oscillate between a B1 and B2 level of English within the following two academic years. Nevertheless, the sample suffered attrition. Three of the students abandoned the bilingual programme in the tenth grade (because of the extra cognitive demands involved, the amount of work required or other academic reasons) and two had to retake a year (ninth grade). As a result, a total of 15 students sat the five tests. 4.4. Instrumentation and data collection Five tests in the form of history essays were administered to the students without prior notice. The test topics were in keeping with the official history ADRIáN GRANADOS, MARíA DOLORES LóPEz-JIMéNEz & FRANCISCO LORENzO Ibérica 43 (2022): 129-154136 curriculum being taught in class, specifically, 9/11 and the Clash of Civilisations, the Avant-Gardes, the Industrial Revolution, the American Revolution, and the Spanish Civil War. During the three-year study time frame, the history teacher informed the research team when each topic was being studied. On the basis of this information, Tests 1 and 2 were administered in the ninth grade (14-15 years old), Tests 3 and 4 in the tenth grade (15-16 years old) and Test 5 in the eleventh grade (16-17 years old). The essays composed by the students were based on what they had learnt in the ordinary history course. They were not allowed to consult any additional materials when sitting the tests, which were supervised. Besides the minimum length for the essay, they were only given the following guidelines: (a) define the given concept or historical period; (b) explain its causes and consequences; and (c) give your opinion on its historical implications. The data collection process resulted in a learner corpus made up of 75 essays, totalling 12,000 words, which were organised in three periods of composition (Year 1, Year 2, and Year 3), corresponding to the three years of the study. Year 1 encompassed the combined results of Tests 1 and 2, and Year 2, those of Tests 3 and 4. In the third and final year (Year 3), only one test (Test 5) was administered in order to avoid further dropouts which would have seriously compromised the study. 4.5. Computational analysis The students’ essays were coded and processed with the Coh-Metrix computational tool, which produces indices of the linguistic and discursive representations of a text in five major dimensions: narrativity, syntactic simplicity, word concreteness, referential cohesion and deep (causal) cohesion (McNamara, Louwerse, Cai & Graesser, 2014). Coh-Metrix has been validated by numerous researchers, including Polio and Yoon (2018). These authors compared Coh-Metrix results with hand-coding, confirming that it is a non-redundant and reasonably transparent tool for measuring cohesion, complexity, and coherence metrics, as well as being capable of reflecting differences in genres among English-as-an-L2 (ESL) writers with reliability. Similarities in results and metrics have also been found with other software tools used to analyse the lexical complexity of history essays (Lorenzo & Rodríguez, 2014). In this study, Crossley’s (2020) conceptualisation was used. Written lexical richness was therefore analysed as the number of unique words (lexical A LONGITuDINAL STuDY OF L2 HISTORICAL WRITING Ibérica 43 (2022): 129-154 137 diversity), the proportion between content and function words (lexical density) and the proportion of advanced words (lexical sophistication) in the essays. The following Coh-Metrix indices were employed: 4.5.1. Written lexical diversity A. The type-token ratio for content words and for all words Lexical diversity assesses the range of vocabulary employed in a text (McNamara et al., 2014). The most reputed measure of lexical diversity is the type-token ratio (hereinafter TTR), a coefficient resulting from dividing the number of unique words in a text (i.e., types) by the overall number of words (i.e., tokens). The type token ratio can be measured by means of two Coh- Metrix indices: (a) Coh-Metrix index 48, which only processes content words (i.e., nouns, verbs, adjectives, and adverbs) sharing a common lemma (e.g., tree/treed; mouse/mousey; price/priced, etc.); and (b) Coh-Metrix index 49, which measures the type-token ratio for all words. B. The Measure of Textual Lexical Diversity (MTLD) and vocd lexical diversity measures for all words TTR has proved to be extremely sensitive to text length and, therefore, a poor predictor of lexical proficiency when text length is not constant. In fact, “as the number of word tokens increases, there is a lower likelihood of those words being unique” (McNamara et al., 2014, p. 67) and TTR tends to be lower. In order to overcome these metric limitations, Coh-Metrix includes indices that use estimation algorithms such as the Measure of Textual Lexical Diversity (hereinafter MTLD) and vocd, indices 50 and 51, respectively. The MTLD is calculated as the mean length of sequential word strings in a text which maintain a given TTR value (McNamara et al., 2014, p. 67). Similarly, vocd is calculated through a computational procedure that matches TTR random samples with ideal TTR curves (McNamara et al., 2014, p. 67). Both indices allow researchers to compare the lexical diversity of texts differing in length, although validation studies tend to favour MTLD over vocd (McCarthy & Jarvis, 2010). 4.5.2. Written lexical density Lexical density is the proportion between content and function words. Coh- Metrix indicates the incidence of nouns, verbs, adjectives and adverbs ADRIáN GRANADOS, MARíA DOLORES LóPEz-JIMéNEz & FRANCISCO LORENzO Ibérica 43 (2022): 129-154138 (indices 84-87) per 1000 words. By combining these indices, the proportion between content and function words can be calculated. 4.5.3. Written lexical sophistication This dimension was operationalised by means of the following indices: A. Familiarity, concreteness, imageability and meaningfulness for content words Familiarity (Coh-Metrix index 98) refers to how familiar a word seems to an adult on a 700-point scale (100 for unheard words and 700 for those heard almost every day), according to the MRC Psycholinguistic Database (McNamara et al., 2014). Concreteness (Coh-Metrix index 99) indicates how concrete or non-abstract a word is on the same scale –100 for words that score low in concreteness, like ‘protocol’ (264), and 700 for words referring to things that can be touched, heard, or tasted, like ‘box’ (597)– according to the MRC Psycholinguistic Database (McNamara et al., 2014). Meaningfulness (Coh-Metrix index 100) refers to the extent to which one word can be associated with others, on the same scale –100 for words with a weak association, like ‘abbess’ (218), and 700 for those with a strong association, like ‘people’ (612)– according to the MRC Psycholinguistic Database (McNamara et al., 2014). Imageability (Coh-Metrix index 101) indicates how easy it is to construct a mental image of a word, on a similar scale –100 for low-imagery words, like ‘reason’ (267), and 700 for high-imagery words, like ‘hammer’ (618)– according to the MRC Psycholinguistic Database (McNamara et al., 2014). B. Word length Word length is calculated in relation to the mean number of syllables per word (Coh-Metrix index 8) and the mean number of letters per word (Coh- Metrix index 10). C. Word frequency Word frequencies for content words (Coh-Metrix index 94) are given in accordance with the CELEX lexical database (Baayen, Piepenbrock & Gulikers, 1995). D. Polysemy and hypernymy A LONGITuDINAL STuDY OF L2 HISTORICAL WRITING Ibérica 43 (2022): 129-154 139 Polysemy (Coh-Metrix index 102) is computed as the mean number of senses (core meanings) of the content words used in a text, according to the WordNet lexicon. Hypernymy indicates the rank of a word on the hierarchical scale of the WordNet lexicon. For instance, ‘entity’ is considered as the hypernym of all nouns and, therefore, has a hypernymy value of 1. For its part, ‘chair’ has many higher hypernymy categories (only as regards the object, e.g., ‘furniture’, ‘furnishing’, ‘instrumentality’, ‘artefact’, ‘whole’, ‘object’, ‘physical entity’ and ‘entity)’ and, therefore, has a hypernymy value of 8.5. Coh-Metrix provides the hypernymy values for nouns (index 103) and for verbs (index 104). The results and the mean values of each Coh-Metrix index were studied in order to perform a descriptive analysis on lexical development, with the aim of revealing quantitative and qualitative tendencies in multiple case studies. 5. Results 5.1. Written lexical diversity 5.1.1. The type-token ratio for content words and for all words As can be seen in Figure 1, the mean type-token ratio for content words and the mean type-token ratio for all words followed the same decreasing pattern, although the type-token ratio for all words registered lower results. This gap in Figure  1 is perfectly logical: non-content words such as conjunctions (‘and’, ‘but’, etc.), prepositions (‘in’, ‘out’, etc.), or pronouns (‘he’, ‘who’, etc.) are much more frequent and therefore repeated in a prototypical text. When considered, lexical diversity is necessarily lower. At first sight, therefore, the students’ lexical diversity decreased as the study advanced. ADRIáN GRANADOS, MARíA DOLORES LóPEz-JIMéNEz & FRANCISCO LORENzO Ibérica 43 (2022): 129-154140 Figure 1. Mean type-token ratio of the students’ essays. However, as already explained in section 3.5, TTR has proved to be extremely sensitive to text length and, therefore, a poor predictor of lexical proficiency when textual length is not constant. That is precisely the case here, as the students displayed higher levels of conceptual fluency over time, which led to much longer texts. In order to remedy the metric limitations, Coh-Metrix includes indices that use estimation algorithms such as the MTLD and vocd. 5.1.2. MTLD and vocd lexical diversity measures for all words As shown in Figure 2, even though the degree to which lexical diversity increased differed noticeably from MTLD to vocd (the former being regarded as more reliable), both measures indicated a clear growth (by more than 20%, according to MTLD; and by almost 50%, according to vocd). These gains were constant over time and developed gradually from Year 1 to Year 2, and from this mid-point to Year 3. One full-text example of this development is shown in Table 2, at the end of the results section. A LONGITuDINAL STuDY OF L2 HISTORICAL WRITING Ibérica 43 (2022): 129-154 141 ! ! ! ! ! ! ! ! ! ! 5 ! l ! s Figure 2. Mean MTLD and vocd of the students’ essays. From this steady increase in lexical diversity, it can be inferred that the students became more proficient as they grew older and progressed in the bilingual education system (Jarvis, 2002; McNamara et al., 2010; Crossley, Weston, McLain & McNamara, 2011; Crossley & McNamara, 2012). 5.2. Written lexical density The proportion between content and function words is shown in Table 1. The overall proportion of content words increased slightly from Year 1 to Year 3. This was due to the considerable growth in the proportion of adjectives, which compensated for the slight decreases in the proportion of nouns and verbs. This growth in the proportion of adjectives could be related to the expansion of noun phrases, a feature of academic writing. Despite these minor variations, however, no remarkable development was observed in this dimension. Table 1. Mean proportion of content words in the students’ essays. ADRIáN GRANADOS, MARíA DOLORES LóPEz-JIMéNEz & FRANCISCO LORENzO Ibérica 43 (2022): 129-154142 5.3. Written lexical sophistication 5.3.1. Familiarity, concreteness, imageability and meaningfulness of content words The study results displayed in Figure 3 show that familiarity levels remained very similar, with a variation of less than one point up or down the scale over time (573, 572 and 574, respectively), meaning that word difficulty remained constant. The concreteness and imageability indices fell considerably during the first two years, while remaining constant during the final year, thus implying a higher degree of abstraction. The essays became less picturesque and anecdotal, hinting at a transition from more narrative to more expository texts. Finally, lexical meaningfulness increased moderately but steadily, pointing to the construction of more cohesive texts with words better knitted in clusters and lexical bundles, thus proving that lexical development is not random but develops in semantic networks. A full-text example can be consulted in Table 2, at the end of the results section. Figure 3. Mean familiarity, concreteness, imageability and meaningfulness of the students’ essays. 5.3.2. Word length, word frequency, polysemy, and hypernymy Finally, no remarkable development was observed for word length, word frequency, polysemy, and hypernymy. Their evolution was subtle and irregular. Their values can be consulted in the Appendix. A LONGITuDINAL STuDY OF L2 HISTORICAL WRITING Ibérica 43 (2022): 129-154 143 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! Table 2. Student 10’s first and last essays. Original grammar and spelling. 6. Discussion This study has analysed the development of lexical richness in L2 historical writing according to three dimensions: the number of unique words (lexical diversity), the proportion between content and function words (lexical density) and the proportion of advanced words (lexical sophistication). In these dimensions, a clear evolution was observed in only two: lexical diversity and lexical sophistication. The second dimension, lexical density, remained constant during the three-year study period. Within lexical sophistication, development was detected in the familiarity, concreteness, imageability and meaningfulness of words. Word length, word frequency, polysemy and hypernymy also remained unaffected. The development of each dimension will now be addressed in turn. Furthermore, in order to flesh out the raw data, the results will be discussed on the basis of a corpus sample of one of the student’s essays from Year 1 and Year 3 (i.e., from the first and last tests administered), which are included in Table 2 (results section). ADRIáN GRANADOS, MARíA DOLORES LóPEz-JIMéNEz & FRANCISCO LORENzO Ibérica 43 (2022): 129-154144 High levels of lexical diversity entail lower cohesion and higher difficulty: there are more unique words introducing new information that needs to be processed and integrated into the discourse by the reader (McNamara et al., 2014). In contrast, the greater the frequency with which the same words are used multiple times across the text, the lower the lexical diversity and the higher the text cohesion will be. In this study, the increase in the number of lexical items employed by the students has been confirmed, a trait that indicates that they were becoming more proficient L2 writers (Crossley & McNamara, 2011; Crossley & McNamara, 2012; Engber, 1995; Grobe, 1981; Jarvis, 2002; Kuiken & Vedder, 2014; Malvern et al., 2004; McNamara et al., 2010). Moreover, if the nature of this increase in the breadth of vocabulary is examined in the students’ essays, the first feature to emerge is the persistence of semantic extension over time (Harmon & Kapatsinski, 2017). Learners initially extend the L1 semantic load of lexical items to L2 equivalents. This was represented in the students’ essays by the presence of calques, like ‘conform’, which is employed with the meaning of the Spanish verb conformar (‘form’, ‘make up’, ‘constitute’). Semantic extensions decline over time, however, when L1 intake is blocked out and L2 intake is connected only to L2 relations. Research has called this process a move from ‘word association representation’ to ‘conceptual mediation representation’ (Spöttl & McCarthy, 2004). These results show that in this bilingual model, once compulsory education has been completed, there are still indications of overreliance on L1 for word generation in the form of transliteration, calques or extreme translanguaging. These are different forms of the ‘one- to-one principle’, namely, the naïve belief that lexical units in the two languages match perfectly. High idiomaticity levels are an indication of L2 proficiency, but here the first language still contaminates L2 production, especially as regards academic vocabulary (see, for example, the misspelling of cognates like *‘comunists’ and *‘acused’, and the structural calques *‘the separation that suffered the society’ and *‘get a job to those people’). The second dimension in which evolution was observed is lexical sophistication, particularly as regards the familiarity, concreteness, imageability and meaningfulness of words. These are key indices for writing proficiency: less familiar words are more difficult to learn and take longer to process (McNamara et al., 2014), word concreteness and word imageability are indirectly proportional to abstraction (Barber, Otten, Kousta & Vigliocco, 2013) and the average meaningfulness of a text is indirectly A LONGITuDINAL STuDY OF L2 HISTORICAL WRITING Ibérica 43 (2022): 129-154 145 proportional to text difficulty, since words with a stronger association imply that readers need to process and integrate less new information into the discourse (Crossley & McNamara, 2012). In this regard, research has found that more proficient L2 writers tend to use less concrete and imageable words (Crossley et al., 2014) and less familiar and meaningful words (Crossley & McNamara, 2012). In this study, the students did indeed use less concrete and imageable words; that is, there was a greater degree of abstractness. Contrary to Crossley and McNamara’s (2012) findings, however, familiarity remained constant and there was an increase in the meaningfulness of lexical items; namely, terms had a greater degree of associativity. This divergence might have been due to the age and the developmental stage of the students making up the sample, as they are far from reaching the top proficiency levels. In terms of associativity, one implication of the net gains reported is that lexical growth is not random but develops in semantic networks. Example 2, from Year 3 (Table 2, results section), includes a wide variety of words related to conflict: ‘war’, ‘battle’ and ‘rebellion’. Indeed, lexical development goes hand in glove with a better control of derivational mechanisms which improve the quality of academic writing. In Example 2, three different word forms for the same word family concur: noun (‘rebellion’), adjective (‘rebel’) and verb (‘rebelled’). Derivational expertise goes a long way to helping text cohesion and textual cross-references. The new constellation of semantic fields not only includes nominal groups, because grammar words for expressing functional categories also increase over time, as will be seen below regarding the expression of causality. As to abstractness, research has observed that abstraction in academic writing is achieved by means of signalling nouns, namely, abstract nouns which refer to a general area of meaning whose specific meaning is found elsewhere in the clause or text (Flowerdew, 2014: 96). One such example can be detected in Example 2 (Year 3), in the account of a historical episode in which ‘difficulties’ are mentioned ‘for the armies involved in warfare’. The actual embodiment of such difficulties is only found further on in the sentence. This dummy word exists mostly for the sake of anticipating semantic processes, here of a historical nature. Lexical gains, therefore, follow a tendency towards more abstract language. The development of abstraction in written language relates in part to that of nominalisation. Nominalisation characterises mature academic language like ADRIáN GRANADOS, MARíA DOLORES LóPEz-JIMéNEz & FRANCISCO LORENzO Ibérica 43 (2022): 129-154146 no other construction (Lorenzo et al., 2019, Granados et al., 2021). At earlier ages, as in Test 1, language includes more verbs and more prototypical theme/rheme sentences. Over time, language evolves and becomes more nominal. Terms like ‘separation’ and ‘support’ represent the typical grammatical metaphor, whereby noun phrases are used instead of verb-like sentences. As is well-known in functional systemics, nominalisations freeze actions and transform eventful episodes into non-temporal abstract processes: as in the use of ‘rising’ (as in a coup d’état) in Example 2, as opposed to a non-nominalised ‘X rose against Y’ pattern, which would have been more typical in the case of a younger student. When describing this composition device, Halliday (2004) posited that when writers express a process by means of nominalisation, a rhetorical tension is created between the semantic level (which describes a process as if it were an agent undertaking an action) and the lexicogrammatical level (the actual nominal word forms which embody the action). He went on to say that this is regarded as a metaphor because the end result is a virtual entity which only exists as semiosis. The use of ‘rising’, instead of military insurrection, in Example 2 further elaborates on the metaphor within. The fact that the action described (‘the military rose in arms’) is represented by a neutral or even positive action (‘rising’) ties in with the fascist propaganda following the military coup. This bilingual student’s command of history vocabulary demonstrates not only advanced lexical knowledge, but also the consolidation of abstract thought in ideological writing (e.g., ‘the support of fascist countries such as Germany or Italy, the numerous *comunist politicians that *conformed the government, and as a result of past disagreements’). To conclude the discussion on lexical richness, it should also be noted that it was possible to glimpse the bigger picture by just glancing at the general discourse structure of the essays. In addition to the differences in lexical constituents, the essays composed in Test 5 show variations in discourse texture. In later stages, they are more densely packed with lexical collocations (e.g., adverb + adjective, like the phrase ‘extremely high number of *dissapeared people’, in Example 2 from Test 5). This implies a new approach to text construction involving longer units with more pre- and post-modifications. Having said that, the results should be interpreted with caution due to two research limitations. Firstly, in longitudinal multiple-case studies like this one, which aimed at neutralising contextual variables (by testing students taught A LONGITuDINAL STuDY OF L2 HISTORICAL WRITING Ibérica 43 (2022): 129-154 147 under the same conditions), the results are merely descriptive. The findings discussed here should pave the way for future large-scale, cross-sectional analyses aimed at testing their generalisability. Secondly, even though the study included five tests per student over a three-year period, individual text topics might have affected the range of vocabulary used by the students (Tracy-Ventura, Mitchell & McManus, 2016). Despite these limitations, this paper describes a pioneering longitudinal study of lexical development in L2 historical writing in a formal bilingual setting, at a crucial moment for the academic language development of students. 7. Conclusion Writing proficiency is usually analysed by means of three constructs: lexical, syntactic and cohesion (McNamara et al., 2010). Our study has focused on the lexical construct and has examined the lexical richness of secondary school students’ L2 historical writing in relation to their lexical diversity, density, and sophistication. Our results show that, after three years of formal bilingual education, the students in the sample used more lexical items and employed terms with a larger degree of abstractness and associability. This points to their greater lexical richness, their higher writing proficiency, and their more profound grasp of history literacy. History is a fundamentally written discipline, a literary artefact to postmodern historiography (White, 2010). Following this conception, international institutions such as the Council of Europe have even developed language descriptors which evaluate writing in relation to the discipline of history. In this context, the fact that the students’ history literacy and writing proficiency matured during the three-year study period may help us to understand content learning and to support further academic success. Article history: Received 03 March 2022 Received in revised form 02 May 2022 Accepted 03 May 2022 References ADRIáN GRANADOS, MARíA DOLORES LóPEz-JIMéNEz & FRANCISCO LORENzO Ibérica 43 (2022): 129-154148 Achugar, M., & Carpenter, B. (2014). Tracking movement toward academic language in multilingual classrooms. Journal of English for Academic Purposes, 14, 60-71. A LONGITuDINAL STuDY OF L2 HISTORICAL WRITING Ibérica 43 (2022): 129-154 149 Achugar, M. & Schleppegrell, M.J. (2005). Beyond connectors: The construction of cause in history textbooks. Linguistics and Education, 16(3), 298- 318. Andalusian Department of Education (2017). Acuerdo de 24 de enero de 2017. BOJA, 24, 10- 57. Asención-Delaney, Y., & Collentine, Y. (2011). A multidimensional analysis of a written L2 Spanish corpus. Applied Linguistics, 32, 299-322. Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX lexical database [CD-ROM]. University of Pennsylvania, Linguistic Data Consortium. Barber, H. A., Otten, L. J., Kousta, S. T., & Vigliocco G. (2013). Concreteness in word processing: ERP and behavioral effects in a lexical decision task. Brain & Language, 125, 47-53. Biber, D. (1992). The multi-dimensional approach to linguistic analyses of genre variation: An overview of methodology and findings. Computers and the Humanities, 26(5/6), 331-345. Breeze, R., & Gerns, P. (2019). Building literacies in secondary school history: The specific contribution of academic writing support. E- JournALL, EuroAmerican Journal of Applied Linguistics and Languages, 6(1), 21-36. Britton, J., Martin, N., & Rosen, H. (1966). Multiple Marking of Compositions. Her Majesty’s Stationary Office. Christie, F., & Maton, K. (2011). Disciplinarity: Functional linguistic and sociological perspectives. Continuum. Christie, F. (2012). Language education: A functional perspective. Wiley-Blackwell. Coffin, C. (2006). Reconstructing personal time as collective time: Learning the dis-course of history. In R. Whittaker, M. O’Donnell, M. & A. McCabe (Eds.), Language and literacy: Functional approaches (pp. 15-45). Continuum. Coffin, C. (2009). Historical discourse. Continuum. Crossley, S. A., & McNamara, D.S. (2011). Understanding expert ratings of essay quality: Coh-Metrix analyses of first and second language writing. International Journal of Continuing Engineering Education and Life Long Learning, 21(2–3), 170-191. Crossley, S. A., &. McNamara, D.S. (2012). Predicting second language writing proficiency: The roles of cohesion and linguistic sophistication. Journal of Research in Reading, 35(2), 115-135. Crossley, S. A., Kyle, K., Allen, L. K., Gou, L., & McNamara, D.S. (2014). Linguistic microfeatures to predict L2 writing proficiency: A case study in automated writing evaluation. Journal of Writing Assessment, 7(1). Crossley, S. A., Roscoe, R. D., & McNamara, D. S. (2011). Predicting human scores of essay quality using computational indices of linguistic and textual features. In G. Biswas, S. Bull, J. Kay & A. Mitrovic (Eds.), Artificial Intelligence in education (AIED 2011) (pp. 438-440). Springer. Crossley, S. A., Salsbury, T., McNamara, D. S. & Jarvis, S. (2010). Predicting lexical proficiency in language learner texts using computational indices. Language Testing, 28(4), 561-580. Crossley, S. A., Weston, J., McLain Sullivan, S. T., & McNamara, D.S. (2011). The development of writing proficiency as a function of grade level: A linguistic analysis. Written Communication, 28(3), 282-311. Crossley, S.A. (2020). Linguistic features in writing quality and development: An overview. Journal of Writing Research, 11(3), 415-443. Cummins, J. (1979). Cognitive/academic language proficiency, linguistic interdependence, the optimum age question and some other matters. Working Papers on Bilingualism, 19, 121-129. Deane, M., & O’Neill, P. (2011). Writing in the disciplines. Palgrave Macmillan. Dóczi, B., & Kormos, J. (2016). Longitudinal developments in vocabulary knowledge and lexical organization. Oxford University Press. Douglas, R. D. (2013). The lexical breadth of undergraduate novice level writing competency. The Canadian Journal of Applied Linguistics, 16(1), 152-170. Dressen-Hammouda, D. (2008). From novice to disciplinary expert: disciplinary identity and genre mastery. English for Specific Purposes, 27(2), 233- 252. Ellis, N. (2002). Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. ADRIáN GRANADOS, MARíA DOLORES LóPEz-JIMéNEz & FRANCISCO LORENzO Ibérica 43 (2022): 129-154150 Studies in Second Language Acquisition, 24(2), 143-188. Ellis, N. C. (2012). Formulaic language and second language acquisition: Zipf and the phrasal teddy bear. Annual Review of Applied Linguistics, 32, 17-44. Engber, C. A. (1995). The relationship of lexical proficiency to the quality of ESL compositions. Journal of Second Language Writing, 4(2), 139- 155. Flowerdew, J. (2014). Corpus-based approach to language description for specialized academic writing. Language Teaching, 50, 90-106. Gardner, S., Nesi, H., & Biber, D. (2019). Discipline, level, genre: Integrating situational perspectives in a new MD analysis of university student writing. Applied Linguistics, 40(4), 646- 674. Grabe, W. (2002). Narrative and expository macro- genres. In A. N. Johns (Ed.), Genre in the classroom: Multiple perspectives. Lawrence Erlbaum Associates Publishers. Granados, A., & Lorenzo, F. (2021). English L2 connectives in academic bilingual discourse: a longitudinal computerised analysis of a learner corpus. Revista Signos, 54(106), 626-644. Granados, A., Lorenzo-Espejo, A., & Lorenzo, F. (2021). Evidence for the interdependence hypothesis: A longitudinal study of biliteracy development in a CLIL/bilingual setting. International Journal of Bilingual Education and Bilingualism. Grant, L., & Ginther, A. (2000). Using computer- tagged linguistic features to describe L2 writing differences. Journal of Second Language Writing, 9, 123-145. Grobe, C. (1981). Syntactic maturity, mechanics, and vocabulary as predictors of quality ratings. Research in the Teaching of English, 15(1), 75-85. Guo, L., Crossley, S. A., & McNamara, D.S. (2013). Predicting human judgments of essay quality in both integrated and independent second language writing samples: A comparison study. Assessing Writing, 18(3), 218-238. Halliday, M. A. K. (2004). An introduction to functional grammar. Oxford University Press. Harmon, Z., & Kapatsinski, V. (2017). Putting old tools to novel uses: The role of form accessibility in semantic extension. Cognitive Psychology, 98, 22- 44. Harwood, N., & Hadley, G. (2004). Demystifying institutional practices: critical pragmatism and the teaching of academic writing. English for Specific Purposes, 23(4), 355-377. Haswell, R. (2000). Documenting improvement in college writing: A longitudinal approach. Written Communication, 17(3), 307-352. Hornberger, N. (2004). The continua of biliteracy and the bilingual educator: Educational linguistics in practice. International Journal of Bilingual Education and Bilingualism, 7(2), 155-171. Jarvis, S. (2002). Short texts, best-fitting curves and new measures of lexical diversity. Language Testing, 19(1), 57-84. Jarvis, S. (2013). Capturing the diversity in lexical diversity. Language Learning, 63(1), 87-106. Jarvis, S. (2017). Grounding lexical diversity in human judgments. Language Testing, 34(4), 537- 553. Jarvis, S., Grant, L., Bikowski, D., & Ferris, D. (2003). Exploring multiple profiles of highly rated learner compositions. Journal of Second Language Writing, 12(4), 377-403. Kuiken, F.- & Vedder, I. (2014). Rating written performance: What do raters do and why? Language Testing, 31(3), 329-348. Kyle, K.- & Crossley, S.A. (2016). The relationship between lexical sophistication and independent and source-based writing. Journal of Second Language Writing, 34(4), 12-24. Lahuerta Martínez, A. C. (2015). The written competence of Spanish secondary education students in bilingual and non- bilingual programs. Porta Linguarum, 24, 74-61. Laufer, B., & Nation, P. (1995). Vocabulary size A LONGITuDINAL STuDY OF L2 HISTORICAL WRITING Ibérica 43 (2022): 129-154 151 and use: Lexical richness in L2 written production. Applied Linguistics, 16, 307-322. Lorenzo, F., & Granados, A. (2020). One generation after the bilingual turn: Results from a large-scale CLIL teachers’ survey. Estudios de Lingüística Inglesa Aplicada, 20, 77-101. Lorenzo, F., & Rodríguez, L. (2014). Onset and expansion of L2 cognitive academic language proficiency in bilingual settings: CALP in CLIL. System, 47, 64-72. Lorenzo, F., & Moore, P. (2009). European language policies in monolingual southern Europe: Implementation and outcomes. European Journal of Language Policy, 1(2), 121-135. Lorenzo, F. (2017). Historical Literacy in bilingual settings: Cognitive academic language in L2 History narratives. Linguistics and Education, 37(1), 32-41. Lorenzo, F., Granados, A., & Ávila, I. (2019). The development of cognitive academic language proficiency in multilingual education: Evidence of a longitudinal study on the language of history. Journal of English for Academic Purposes, 41, 100767. Lorenzo, F., Granados, A., & Rico, N. (2021). Equity in bilingual education: socioeconomic status and content and language integrated learning in monolingual Southern Europe. Applied Linguistics, 42(3), 393-413. Malvern, D., Richards, B., Chipere, N., & Durán, P. (2004). Lexical diversity and language development: Quantification and assessment. Palgrave Macmillan. McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd- D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381-392. McNamara, D. S., Crossley, S. A., & McCarthy, P.M. (2010). Linguistic features of writing quality. Written Communication, 27(1), 57-86. McNamara, D. S., Crossley, S. A., & Roscoe, R. (2013). Natural Language Processing in an intelligent writing strategy tutoring system. Behavior Research Methods, 45(2), 499-515. McNamara, D. S., Louwerse, M. M., Cai, Z., & Graesser, A. (2014). Automated evaluation of text and discourse with Coh-Metrix. Cambridge University Press. Mohan, B., Leung, C., & Davison, C. (2001). English as a Second Language in the mainstream. Pearson Education. Nikula, T. (2017). Emerging themes, future research directions. In A. Llinares & T. Morton (Eds.), Applied Linguistics Perspectives on CLIL (pp. 307-313). John Benjamins. Nokes, J. D. (2013). Building students’ historical literacies: Learning to read and reason with historical texts and evidence. Routledge. Nold, E. W., & Freedman, S.W. (1977). An analysis of readers’ responses to essays. Research in the Teaching of English, 11(2), 164-174. Ortega, L., & Byrnes, H. (Eds.). (2008). The Longitudinal Study of Advanced L2 Capacities. Routledge. Pellicer-Sánchez, A. (2018). Examining second language vocabulary growth. Replications of Schmitt (1998) and Webb & Chang (2012). Language Teaching, 52(4), 512-523. Prediger, S., & Hein, K. (2017). Learning to meet language demands in multi-step mathematical argumentations: Design research on a subject- specific genre. European Journal of Applied Linguistics, 5(2), 309-337. Reppen, R. (1994). Variation in elementary student language: A multi-dimensional perspective. [Unpublished doctoral dissertation]. Northern Arizona University. Rose, D. (2008). Writing as linguistic mastery: The development of genre-based literacy pedagogy. In R. Beard, D. Myhill, J. Riley & M. Nystrand (Eds.), Handbook of Writing Development (pp. 151-166). Sage. Schleppegrell, M.J., & Colombi, M.C. (2002). Developing advanced literacy in first and second languages. Lawrence Erlbaum. Shanahan, T., & Shanahan, C. (2008). Teaching disciplinary literacy to adolescents. Rethinking content-area literacy. Harvard Educational Review, 78, 40-59. Spöttl, C., & Mccarthy, M. (2004). Comparing the knowledge of formulaic sequences across L1, L2, L3 and L4. In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing and use (pp. 191-225). John Benjamins. Adrián Granados is a postdoctoral researcher at universidad Pablo de Olavide (Seville, Spain). His research focuses on the study of second language acquisition and bilingualism, and he is specialised in processing academic corpora with linguistic analysis software. He has published in journals such as Applied Linguistics, Journal of English for Academic Purposes, and International Journal of Bilingual Education and Bilingualism. María Dolores López-Jiménez is a senior lecturer at universidad Pablo de Olavide (Seville, Spain). She has also held visiting scholar positions at Indiana university. Her research focuses on the teaching and learning process of vocabulary and (inter)cultural aspects in an L2, publishing in journals such as Porta Linguarum, Revista de Educación, and RESLA. Francisco Lorenzo is a full professor at universidad Pablo de Olavide (Seville, Spain). He has held visiting scholar positions at Harvard university, university of London and university of Jyväskylä. His research focuses on the study of second language acquisition and bilingualism, sociolinguistics and sociology of language, and European language policies. He has authored more than sixty publications in journals such as Applied Linguistics, Language Policy, System, and Language and Education. NOTES 1 Throughout the manuscript, when less is followed by an adjective, it is always functioning as an adverb, not a determiner (e.g., by less polysemous words we mean words which are less polysemous, and not fewer polysemous words). ADRIáN GRANADOS, MARíA DOLORES LóPEz-JIMéNEz & FRANCISCO LORENzO Ibérica 43 (2022): 129-154152 Tracy-Ventura, N., Mitchell, R., & McManus, K. (2016). The LANGSNAP longitudinal learner corpus. Design and use. In M. Alonso-Ramos (Eds.), Spanish Learner Corpus Research. Current trends and future perspectives (pp. 117- 142). John Benjamins. Vanhove, J., Bonvin, A., Lambelet, A., & Berthele, R. (2019). Predicting perceptions of the lexical richness of short French, German, and Portuguese texts using text-based indices. Journal of Writing Research, 10(3), 499-525. White, H. (2010). The fiction of narrative: Essays on history, literature, and theory, 1957-2007. The Johns Hopkins University Press. Wolfe-Quintero, K, Inagaki, S., & Hae-Youn, K. (1998). Second language development in writing: Measures of fluency, accuracy, and complexity. University of Hawaii Press. Young, A., & Fulwiler, T. (1986). Writing across the disciplines: Research into practice. Boynton/Cook. Acknowledgements This research was supported by the European Regional Development Fund (ERDF, 80%) and by the Department of Economy, Knowledge, Business and university of the Andalusian Regional Government (20%), within the framework of research project uPO-1380541. Appendix. Mean values of word length, word frequency, polysemy, and hypernymy Table 3. Mean values of word length, word frequency, polysemy, and hypernymy. A LONGITuDINAL STuDY OF L2 HISTORICAL WRITING Ibérica 43 (2022): 129-154 153