1This paper is based on the master’s thesis titled ‘The Effects of Task Induced Involvement Load Hypothesis on Turkish EFL Learners’ Incidental Vocabulary Learning’ of the first author. Yorgancı, M., & Subaşı, G. (2022). The Effects of Task Induced Involvement Load Hypothesis on Turkish EFL learners’ incidental vocabulary learning. International Online Journal of Education and Teaching (IOJET), 9(3). 1181-1202. Received : 14.03.2022 Revised version received : 13.05.2022 Accepted : 15.05.2022 THE EFFECTS OF TASK INDUCED INVOLVEMENT LOAD HYPOTHESIS ON TURKISH EFL LEARNERS’ INCIDENTAL VOCABULARY LEARNING 1 (Research article) (Corresponding author) Mehtap Yorgancı https://orcid.org/0000-0002-2018-3153 Coordinatorship of Foreign Languages, Konya Technical University, Konya, Turkey myorganci@ktun.edu.tr Gonca Subasi https://orcid.org/0000-0001-7049-5940 ELT Department, Anadolu University, Eskisehir, Turkey goncas@anadolu.edu.tr Biodata: Mehtap Yorgancı is an English language instructor at Konya Technical University, Konya, Turkey. She carries out studies in language learning strategies, implicit and explicit vocabulary teaching, teaching pronunciation, and using technology in language classrooms. Gonca Subaşı is an assistant professor doctor in the ELT Department at Anadolu University, Turkey. Her research interests are teaching writing skills, vocabulary acquisition, affective factors in language teaching, language testing and evaluation, and language teacher education. Copyright © 2014 by International Online Journal of Education and Teaching (IOJET). ISSN: 2148-225X. Material published and so copyrighted may not be published elsewhere without written permission of IOJET. mailto:myorganci@ktun.edu.tr mailto:goncas@anadolu.edu.tr Yorgancı & Subaşı 1182 THE EFFECTS OF TASK INDUCED INVOLVEMENT LOAD HYPOTHESIS ON TURKISH EFL LEARNERS’ INCIDENTAL VOCABULARY LEARNING Mehtap Yorgancı myorganci@ktun.edu.tr Gonca Subasi goncas@anadolu.edu.tr Abstract Recently vocabulary studies have mainly focused on two forms of vocabulary acquisition: incidental and intentional vocabulary acquisition. For incidental vocabulary acquisition, Task- induced Involvement Load Hypothesis (TILH) was put forward by Hulstijn and Laufer (2001) to investigate the vocabulary tasks by comparing their levels of involvement load to each other. To test this hypothesis, the current study utilized six different vocabulary tasks with varying levels of involvement load. On the other hand, in order to investigate the task type effect, each task with another task from the other task type group was compared. The last part of the study was designed specially to test the task type effect which was neglected by the hypothesis as the hypothesis suggested that only involvement load levels affect the results. The findings concluded that different involvement load levels yielded varying results most of which provided support for the hypothesis. However, task types did not provide evidence in favour of the hypothesis by not leading to similar results for the tasks who shared the same involvement load index. The study concluded with some pedagogical implications and suggestions for further studies. Keywords: incidental vocabulary, TILH, Task-induced Involvement Load Hypothesis (TILH), Turkish EFL prep students, vocabulary task type effect. 1. Introduction Vocabulary is a need for all language learners and one of the biggest challenges they face in academic settings. Wilkins (1972) drew attention to the significance of vocabulary by stating that a student may convey little without grammar but nothing without vocabulary. As Folse (2006) suggests for a successful performance in all four skills including reading, having a great deal of vocabulary is necessary. Each student needs to achieve a quality vocabulary in the reading curriculum. One of the reasons that students have difficulty in reading is that they do not have a functional vocabulary for reading. Thus, in EFL teaching and academic studies, the major goal should be enriching and developing learners’ vocabulary and finding the most efficient ways for it. All vocabulary teaching techniques and ways have a main role in vocabulary instruction as they are an open door to vocabulary acquisition. Incidental learning, intentional learning, implicit learning, and explicit learning are some of the most known and used vocabulary learning forms. After using incidental learning in vocabulary teaching, some theories and hypotheses were put forward. One of the most important of them was Depth of Processing Theory proposed by Craik and Lockhart in 1972 and it was criticized for the lack of a clear definition of the level of processing. As a result of the deficiencies, Task-induced Involvement mailto:myorganci@ktun.edu.tr mailto:goncas@anadolu.edu.tr International Online Journal of Education and Teaching (IOJET) 2022, 9(3), 1181-1202. 1183 Load Hypothesis (TILH) was proposed by Laufer and Hulstijn (2001) to provide a more observable and measurable definition. Because the TILH studies in the literature (Hulstijn, Hollander, and Greidanus, 1996; Wesche and Paribakht, 2000; Hulstijn and Laufer, 2001; Rott, Williams, and Cameron, 2002; Folse, 2006; Sonbul and Schmitt, 2009; Rassaei, 2015; Karalık, 2016; Hazrat, 2020; Teng and Zhang, 2021; Ehsani and Karami, 2022; and Çekiç, 2022) and the hypothesis itself put forward some valuable benefits, the current study was designed within the framework of TILH in an attempt to shed light on the effects of TILH on the EFL prep students’ incidental vocabulary learning. 2. Review of Literature 2.1. Task Induced Involvement Load Hypothesis TILH is used to find the involvement load of vocabulary tasks. As TILH suggests that the higher load of involvement lead to higher vocabulary gain (VG) and vocabulary retention (VR), these involvement loads might be taken into consideration while designing vocabulary tasks. To sum up, teaching vocabulary incidentally using TILH framework might offer some benefits to the language teachers including saving time by combining different ways such as teaching vocabulary while reading. TILH has three components: need, search, and evaluation. All components have three degrees: absent, moderate, and strong (absent is marked as 0 or -; moderate is marked as 1 or +; and strong is marked as 2 or ++). The moderate and strong degrees for need component change according to type of motivation of the students: extrinsic motivation or intrinsic motivation. As cognitive dimensions, search and evaluation are contingent upon the form meaning relationship (Hulstijn and Laufer, 2001). Search component is marked according to whether the meaning is provided to students or students find it themselves. Evaluation component is more related to an assessment for the appropriate meaning among other meanings of that word and the context of the words. In incidental vocabulary learning, TILH holds a crucially important place. As a consequence of limited classroom time, the advantages of incidental vocabulary learning were pointed out. And, TILH provides the chance of teaching vocabulary through receptive skills. The vocabulary learning occurs naturally and incidentally for learners. This study aimed at turning these benefits into advantage in the language classrooms. So that, the reading and vocabulary activities may be organised or designed accordingly. TILH is constructed on some assumptions: (1) presence or absence of the components of need, search, and evaluation determine the level of the retention of the incidentally acquired words; (2) words with higher TILL (Task-induced Involvement Load Level) are retained better than the words with lower TILL. Thus, studies came up with the conclusion that TILH should be taken into account while preparing incidental vocabulary tasks in a specific context. For all these reasons, the current study is conducted to test the effects of TILH on Turkish EFL learners’ incidental vocabulary learning. 2.2. Empirical Studies on TILH As the first research study on TILH, Hulstijn and Laufer (2001) tested their hypothesis with a study in which they compared three tasks. The study is conducted with 186 students in two countries. The aim of the study was to test their own hypothesis, which has the assumption of that the tasks with higher involvement loads lead to higher VG and VR. In this study, the tasks had different involvement load indexes and had different levels for each component (need, search, and evaluation). The effect of involvement load on the retention of ten English words was investigated. The results of this study showed that the tasks with higher involvement load Yorgancı & Subaşı 1184 led to better incidental vocabulary learning. Therefore, the results of this study were compatible with what the hypothesis put forward. Beal (2007) conducted a study using tasks with varying TILLs to test TILH. A short story reading text was used with some unfamiliar words selected for the students and used under four conditions: low, glossary provided; moderate, multiple choice glossary; high, dictionary-based sentence writing; and control, reading only. The findings supported what TILH suggests. Keating (2008) also tested whether VG and VR were contingent upon tasks’ involvement load index as claimed by Laufer and Hulstijn (2001) or not. Seventy-nine beginner level students participated in the study to complete three tasks which have varying levels of involvement load (mental effort). These are: only reading comprehension (no mental effort), providing TWs along with reading comprehension (moderate effort), and formulating original sentences (strong effort). In parallel with what TILH suggests, the task with the highest involvement load led to higher retention than the others, and the moderate level task led to higher retention than the lowest level task. Kim (2008) also conducted a similar study to compare two tasks with the same TILL. However, that study only compared them to see whether the tasks led to similar amount of VG or not. The results showed that two tasks with the same TILLs may lead to a similar amount of incidental vocabulary learning. On the other side, Zou (2017) conducted a study to compare tasks with the same TILLs to claim that evaluation component should be given another load degree. Although these two tasks (sentence writing and composition writing) seemed that they had the same level, evaluation component should be reconsidered and be given another degree: very strong evaluation. Hazrat (2020) also came up with the conclusion that evaluation factor needs to be given four degrees rather than three concerning its effectiveness for vocabulary learning in her study with ten groups of intermediate level learners. Hazrat (2020) continued by stating that the search component should not have a predetermined degree of prominence and needs to be evaluated based on its relationship with the type of evaluation component with which it is combined in the vocabulary task. Teng and Zhang (2021) investigated the effects of four different tasks which had different TILLs (reading; reading + gap-fill; reading + writing; and reading + writing with the use of a digital dictionary) and supported TILH with the conclusion that the tasks with higher TILLs led to higher VG and VR compared to others. Çekiç (2022) compared three conditions (traditional gloss, multiple-choice gloss and no gloss) to find out gloss effects on incidental vocabulary learning. Although the two conditions’ results outscored no gloss group. Any significant difference was not found between traditional gloss and multiple-choice gloss groups. However, the multiple-choice group was supposed to get higher scores compared to traditional gloss group as they were required to make an appropriate choice. The multiple-choice group had evaluation component which was not present in tradition gloss condition. Çekiç (2022) concluded that the findings of the study seem to contradict TILH based on the results. The research studies in the literature related to incidental vocabulary learning and TILH mainly included the studies which utilised a variety of vocabulary tasks which had different involvement load levels. The main aim of these studies was to test only TILH without adding any new dimension to this area. However, TILH is limited to only one factor to determine the effectiveness of vocabulary tasks on vocabulary acquisition and more studies need to be conducted to unearth any possible factors. In the literature, there are some TILH studies (Yaqubi, Rayati, and Allemzade Gorgi, 2012; Sarani, Mousapour Negari and Ghaviniat, 2013; Pourakbari and Biria, 2015; Jones-Mensah, Tabiri, Fenyi, Kongo and Amexo) which took task type effect into regard. However, these studies were all conducted in other countries. International Online Journal of Education and Teaching (IOJET) 2022, 9(3), 1181-1202. 1185 In the Turkish context, empirical studies on TILH are limited. Sarbazi (2014) conducted a study in Iran with 30 Turkish EFL learners. He designed three tasks each of which had different TILLs. As the other purpose of the study was to compare the results across gender, the students were assigned to the tasks with the same number of students from each gender. Two-way ANOVA was used for statistical analysis and the results were consistent with what the hypothesis suggested. Any interaction between TILH and gender was not found in the study. In another study, Karalık (2016) compared 139 Turkish ELT students from eight intact groups with four tasks (fill-in by searching TILL: three, fill-in with glossary TILL: two, retelling by searching TILL: four, and retelling with glossary TILL: three) in a Turkish state university. The researcher tried to find if the tasks with higher TILLs yielded higher VG and VR. Another aim of the study was to test if the tasks with the same TILL but having different contributions of the components led to the same results. The results suggested that the tasks with higher involvement loads yielded better results in post-tests. On the delayed post-test, the only significant differences were found between retelling by searching and fill-in groups. The results provided partial support for TILH. Zou (2017) conducted a study to check the effects of TILH on students’ incidental vocabulary learning. For this purpose, 147 participants were assigned to three groups (cloze exercises, sentence writing, and composition writing) randomly. The involvement load of the first task was lower than other two tasks which had the same involvement load. The results showed that the lower involvement load yielded less vocabulary learning. However, to the contrary to TILH there was a statistically significant difference between sentence writing and composition writing group. TILH put forward that two tasks with the same involvement load are expected to result in similar VG and VR. Based on these results, Zou (2017) claimed that the tasks like composition writing which needed a deeper processing and more involvement should be given another degree for the evaluation component. For example, composition writing might have “very strong” evaluation instead of “strong” evaluation as proposed by TILH. As seen in the studies above, TILH studies in Turkish context investigated the effects of TILH on VG and VR by only comparing the vocabulary tasks with varying levels of involvement load. However, some studies in the literature (Yaqubi et. al., 2012; Sarani et. al., 2013; and Pourakbari and Biria, 2015) added a new dimension to TILH studies by using input- output or receptive-productive vocabulary tasks and testing the effects of task type which was neglected in TILH studies in Turkish context. Laufer and Hulstijn (2001) acknowledged that task’s efficacy is only determined using TILLs. In other saying, two receptive and two productive tasks yield equal vocabulary learning as long as they share the same TILL. However, as a suggestion for further research, they also claimed that some studies may be designed to see if there is any difference between receptive and productive tasks which are equal in involvement loads. Yaqubi et. al. (2012) came up with the conclusion that the task type whether it was an input or output task had a crucial effect on incidental vocabulary learning. The input tasks were compared with each other and found that the task with higher involvement load led to higher VG and VR. The hypothesis claimed that the tasks with the same involvement load index yield similar results in VG and VR. Sarani et.al. (2013) conducted a study to see task type effect on TILH through reading. For this aim, three receptive and three productive tasks were designed for six groups. Two pairs with the same involvement load (e.g. tasks with involvement load of 1: true-false, receptive task and short response, productive task) gave a contrary result to TILH. Pourakbari and Biria (2015) designed a study with three receptive and three productive tasks with different TILLs. The last research question was asked to see if or not task type would make any difference in incidental Yorgancı & Subaşı 1186 VG and VR. Task type influenced TILH as productive tasks were found to be more beneficial for incidental VG and VR. In conclusion, previous research findings have shown that TILH has a crucial place in incidental vocabulary teaching. However, TILH is limited to only involvement load levels and does not take any other factor into consideration. Any particular task type – input output or receptive productive – does not make any difference and is not more effective than the other according to TILH as the only factor that affects efficacy of tasks in TILH is how much involvement they require. As a result of this, it is a requirement to conduct more studies focusing on tasks with similar levels of involvement but from different task types. To follow Laufer and Hulstijn’s (2001) suggestion to conduct a study in which tasks from different types but having identical involvement loads are examined, this study was designed with three receptive and three productive tasks with different involvement loads. Receptive and productive tasks had involvement loads of 1, 2, and 3 respectively in their task type. On the contrary, each task has a conjugate task with the same load in the other task type. To shed light on this lack in the literature, this study posed the following research questions: 1) On the basis of English receptive vocabulary tasks, will EFL prep learners obtain better gain of lexical items in higher task load conditions compared to lower ones? If so, will the benefits of tasks retain over time? 2) On the basis of English productive vocabulary tasks, will EFL prep learners obtain better gain of lexical items in higher task load conditions compared to lower ones? If so, will the benefits of tasks retain over time? 3) On the basis of English receptive and productive vocabulary tasks with the same levels of involvement index, will EFL prep learners obtain the same gain and retention of the lexical items on both types of tasks? 3. Methodology The aim of the study was to investigate the effects of task-induced involvement load on incidental vocabulary acquisition of EFL learners through different vocabulary tasks which were designed taking TILH framework into consideration. To this end, different groups of students were assigned to different incidental vocabulary tasks with different involvement loads. For the design of the current study, non-control grouped quasi-experimental research design was chosen as the study lacked a pre-test but comprised of two post-tests (immediate and delayed post-tests). Moreover, the present study was designed without a control group but six different experimental groups to test the effect of various tasks on students’ incidental vocabulary learning. As Creswell (2005) states, researchers generally use intact groups either because of the availability of the participants or because of the setting which does not let creating artificial groups. Similarly, in this study forming groups of students was not applicable. Instead, the classes were taken as intact groups; therefore, a random assignment of the subjects was unlikely. 3.1. The Setting of the Study and the Participants The study was conducted at the School of Foreign Languages of a foundation university in one of the cities of Turkey. The students who do not pass the English proficiency exam conducted by the universities at the beginning of the academic year are obliged to have a one year Intensive English Programme at universities’ School of Foreign Languages in Turkey. The participants of the study were 122 Turkish EFL students who were having A2 level intensive English course during the study. All these intact classes were experimental groups. International Online Journal of Education and Teaching (IOJET) 2022, 9(3), 1181-1202. 1187 Convenience sampling was preferred for the selection of the participants. Hence, all A2 level students were asked to participate in the current study. Before implementations, the ethic committee of the university was applied for the informed consent. After that, a consent form was collected from all instructors and the students who were willing to participate in the study. 3.2. Instruments In this study, four instruments were utilised for research purposes. The students were required to read one text and complete its reading comprehension activities which were taken directly from a reading skill book. After the text and reading activities, each group was asked to complete one vocabulary task. The vocabulary tasks were designed differently for each group to measure its effect on students VG and VR scores within TILH framework. Nine target words (TW) in the text were chosen for the study. The TWs should be unknown to the participants. Therefore, TWs were checked by four colleagues. The TWs chosen for the study were forehead, holy, mud, to please, prosperity, to receive, stray, trail, and to worship. The vocabulary activities were designed by the researcher and all activities were checked by four other instructors at the same school and two professors at English Language Teaching department of a state university for their appropriateness for the participants and to check validity. In order to have a robust design, first, expert opinion was gathered for the validity of the instruments utilized in the current study. Then, the reading text’s readability was analysed through Flesch Kincaid Grade Level to achieve reliability. The students were asked to complete a vocabulary task which was assigned to each group randomly for the purpose of the current study. The tasks were true/false, matching with definitions, multiple choice, short response, fill in the blanks, and sentence writing. The tasks were categorized into productive and receptive tasks. Table 1. shows the tasks and their total involvement load indexes. Table 1. Total task induced involvement load levels of vocabulary tasks Tasks Need Search Evaluation Total TILL Receptive Tasks True/False (a) 1 0 0 1 Matching (b) 1 0 1 2 Multiple Choice (c) 1 1 1 3 Productive Tasks Short Response (d) 1 0 0 1 Fill in the Blanks (e) 1 0 1 2 Sentence Writing (f) 1 0 2 3 As seen in Table 1., the participants were divided into six groups and the groups were categorized according to their task type as receptive and productive. The receptive and productive task type groups both had three different vocabulary tasks. The TILLs of the tasks were designed as to see if there was any difference of the same TIL but different task type had any effect on incidental vocabulary learning. On the other hand, the tasks with the same Yorgancı & Subaşı 1188 involvement but from different task type were designed with the intention of comparing them to each other to see any possible task type effect on incidental learning which was not mentioned in the TILH. The questions in the vocabulary tasks included the TWs, so the students needed to know the meanings of the TWs in order to give the right answers. The design would help the researcher compare each task with other tasks in its own task type and compare them to their conjugate tasks from the other task type group which had the same TILL. The tasks sharing the same TILL also shared the same allocated time. While completing the task, the students were encouraged to use the glossary provided at the end of the text. A modified version of VKS (Vocabulary Knowledge Scale) used in Hulstijn and Laufer (2001) was preferred in both immediate and delayed post-tests to measure the VG and VR. The self-reported VKS consists of four items and is shown in Figure 1. below. Target word: ____________ Items Score I can’t recall having seen this word before. 0 I have seen this word before, but I can’t remember what it means. 1 I have seen this word before, and I think it means: _________ 2 I can use this word in a sentence: ________________________ 3 Figure 1. The self-reported modified VKS To score this modified VKS, the participants did not receive any point when they marked that they did not remember the word; one point was awarded when only the form of the TW was recalled; the students received two points when they provided the Turkish equivalents or English definitions of the TWs; and the students who generated a sentence using the TWs received three points. The immediate post-test was administered immediately after the students finished reading the passage and completed reading comprehension question and their vocabulary tasks. The delayed post-test was administered three weeks later. The delayed post-test was the same with the immediate post-test. The only difference was the order of appearance of the TWs to prevent the students’ from remembering them in that order and giving their answers accordingly. 3.3. Analysis of the Data Two vocabulary tests were used to compare the results of the effects of two task types. The students’ scores on immediate post-test was compared to each other to measure their immediate VG. On the immediate post-test, the students were asked to provide Turkish equivalents, English synonyms/definitions of the TWs or generate a meaningful sentence using the TWs. There were four options for each TW and the students were to put a tick next to only one of them. The scores of the options were 0, 1, 2, and 3 respectively. In a similar way, the same procedure was applied for the delayed post-test which was unannouncedly administered three weeks later. The scores of the students from delayed post-test were compared to test VR. The data was analysed using SPSS 22. For the research purposes of the current study, three research questions were posed with different purposes and a different design from many of the TILH studies in the literature. A summary of the design of the current study is presented in Table 2. International Online Journal of Education and Teaching (IOJET) 2022, 9(3), 1181-1202. 1189 Table 2. A Summary of the research questions and the design Research Questions Method of Analysis Purpose of the Research Question 1. On the basis of English receptive vocabulary tasks, will EFL prep learners obtain better gain of lexical items in higher task load conditions compared to lower ones? If so, will the benefits of tasks hold up over time? one-way ANOVA and paired t-test to test the TILH among the receptive vocabulary tasks and to find out the effect of time interval 2. On the basis of English productive vocabulary tasks, will EFL prep learners obtain better gain of lexical items in higher task load conditions compared to lower ones? If so, will the benefits of tasks hold up over time? one-way ANOVA and paired t-test to test the TILH among the productive vocabulary tasks and to find out the effect of time interval 3. On the basis of English receptive and productive vocabulary tasks with the same levels of involvement index, will EFL prep learners obtain the same gain and retention of the lexical items on both types of tasks? independent samples t-test to test any possible task type effect between the vocabulary tasks with the same TILL which was neglected in the TILH 4. Results and Discussion To serve the purpose of the current study, three research questions were posed. The research questions were addressed to find out the vocabulary gain and vocabulary retention of Turkish EFL learners through incidental vocabulary learning within the construct of TILH and vocabulary task types. Six different vocabulary tasks from two different task types were utilised to see the effect of TILL on EFL learners’ incidental vocabulary acquisition. A reading text was chosen to operationalise different indexes of involvement loads. Before analysing the data, the distribution of the scores of six groups from both immediate and delayed post-test were examined. The results of the normality tests, skewness and kurtosis values showed that the scores were normally distributed. Therefore, parametric analyses (one-way ANOVA, independent samples t-tests, and paired t-tests) were utilized for the data analysis of the current study. For the first two research questions, a comparison was made in order to find out the most effective vocabulary tasks in each task type. To answer the last research question, three different comparisons were made, and each task was compared to its conjugate task which shared the same TILL from the other task type group. 4.1. Tasks with Different Involvement Load Levels The comparisons will be made between tasks with different involvement loads. However, each task will be compared to the other tasks in their own task type on the immediate and delayed post-tests. Yorgancı & Subaşı 1190 The scores of all receptive and productive task groups are presented in Table 3. Table 3. is used to compare the mean scores of all groups, and also to compare the highest and the lowest scores of the groups. Table 3. A summary of mean scores of all groups Immediate Delayed R1 (True/False) 12.60 8.85 R2 (Matching with Definitions) 18.05 12.55 R3 (Multiple Choice) 9.42 7.63 P1 (Short response) 15.35 10.04 P2 (Fill-in) 14.89 11 P3 (Sentence Writing) 15.33 11.43 4.1.1. Receptive Vocabulary Tasks The results on both immediate and delayed post-tests provided support for TILH partially. The higher TILLs yielded better VG and VR in most of the statistical analyses. Table 4. Immediate vocabulary gain scores of receptive tasks group N M SD Min. Max. R1 (True/False) 20 12.60 4.096 5 19 R2 (Matching with Definitions) 20 18.05 4.850 11 26 R3 (Multiple Choice) 19 9.42 3.820 3 16 Table 5. Vocabulary retention scores of receptive task groups N M SD Min. Max. R1 (True/False) 20 8.85 2.834 3 14 R2 (Matching with Definitions) 20 12.55 3.395 7 20 R3 (Multiple Choice) 19 7.63 2.985 3 14 As seen in table 4. and table 5., on both the immediate and delayed post-tests of receptive group, contrary to TILH, R2 with a TILL of 2 received the highest scores which was followed by R1 and R3 groups. To find out the difference between the receptive task groups, one-way ANOVA was conducted for both tests. International Online Journal of Education and Teaching (IOJET) 2022, 9(3), 1181-1202. 1191 Table 6. One-way ANOVA for immediate post-test scores of receptive task groups Sum of Squares df Mean Square F Sig. Sig. Difference Between Groups 746.025 2 373.013 20.312 .000 R1-R2; R2-R3 Within Groups 1028.382 56 18.364 Total 1774.407 58 Table 7. One-way ANOVA for delayed post-test scores of receptive task groups Sum of Squares df Mean Square F Sig. Sig. Difference Between Groups 258.181 2 129.090 13.590 .000 R1-R2; R2-R3 Within Groups 531.921 56 9.499 Total 790.102 58 Table 6. and table 7. show that a significant difference in terms of vocabulary gain and vocabulary retention of receptive tasks was indicated separately in the results of one-way ANOVA (F=20.312, p<.05 for immediate post-test) (F=13.590, p<.05 for delayed post-test). In order to detect which groups differed from each other significantly, post-hoc Tukey tests were conducted for both of the tests. A significant difference was found between R1 (M=12.60, SD=4.096) and R2 (M=18.05, SD=4.850) groups and R2 (M=18.05, SD=4.850) and R3 (M=9.42, SD=3.820) groups based on the results of immediate post-test. And again, for the delayed post-test the results of the post-hoc Tukey test resulted in that a significant difference was found between R1 (M=8.85, SD=2.834) and R2 (M=12.55, SD=3.395) groups and R2 (M=12.55, SD=3.395) and R3 (M=7.63, SD=2.985) groups. Any difference between R1 and R3 groups was not found for both of the tests. Thus far, the first part of the first research question was tried to be answered. The rest of the first research question was “if so, will the benefits of tasks hold up over time?” To this end, paired samples t-tests were conducted to compare each receptive task. A comparison between immediate and delayed post-tests of each receptive was made and a significant difference was found between all of the tasks’ immediate and delayed post-tests. 4.1.2. Productive Vocabulary Tasks The results of immediate post-test and delayed post-test were compared for VG and VR of the productive task groups. All groups gained the meanings of the target words to some extent. The results were different in productive task groups. Yorgancı & Subaşı 1192 Table 8. Immediate vocabulary gain scores of productive tasks group N M SD Min. Max. P1 (Short response) 23 15.35 4.018 9 22 P2 (Fill-in) 19 14.89 5.363 0 23 P3 (Sentence Writing) 21 15.33 5.228 3 26 As seen in table 8., on the immediate post-test, P1 outscored P3 and P1 group’s scores were followed by P3 and P2, respectively. This time, the results on the delayed post-test were different from the results of productive groups’ immediate post-test. Table 9. One-way ANOVA for immediate post-test scores of productive task groups Sum of Squares df Mean Square F Sig. Between Groups 2.644 2 1.322 .056 .946 Within Groups 1419.674 60 23.661 Total 1422.317 62 Table 9. shows that according to the one-way ANOVA results of the participants, a significant difference between was not found between groups (F=0.56, p>.05). As the results of one-way ANOVA was insignificant, a post-hoc test was not conducted. Table 10. Vocabulary retention scores of productive task groups N M SD Min. Max. P1 (Short response) 23 10.04 2.585 2 13 P2 (Fill-in) 19 11.00 4.256 3 22 P3 (Sentence Writing) 21 11.43 3.340 3 17 To find out the long-term effect of TILL, a similar statistical analysis was carried out for the delayed post-test data to examine the differences between the groups. The delayed post-test results show in table 10 that all results were in line with TILH. Hence, the highest group was found to be P3 and was followed by P2 and P1. These results showed that, among productive vocabulary tasks, although P1 was found to be the task which provided highest scores on the immediate post-test, time interval affected the situation negatively and on the delayed post-test P1 group was found to be lowest group. International Online Journal of Education and Teaching (IOJET) 2022, 9(3), 1181-1202. 1193 Table 11. One-way ANOVA for delayed post-test scores of productive task groups Sum of Squares df Mean Square F Sig. Between Groups 22.218 2 11.109 .958 .390 Within Groups 696.099 60 11.602 Total 718.317 62 One-way ANOVA results of the participants did not yield a significant difference between the groups (F=.958, p>.05) in table 11. As the results of one-way ANOVA was insignificant, a post-hoc test was not conducted. The rest of the first research question was “if so, will the benefits of tasks hold up over time?” For this aim, three paired samples t-tests were conducted to compare the immediate and delayed post-tests of all productive tasks. As a result, a significant difference was found between all post-tests. 4.2. Tasks with the Same Involvement Load Levels The third purpose of the current study was to investigate whether different tasks with the same TILL from different task types would lead to similar results in VG and VR which made the current study different from other TILH studies in literature. In order to attain this purpose, each task was compared to its conjugate task from the other task types on immediate and delayed vocabulary post-tests. To this end, the statistical analysis was conducted using independent samples t-test. For this aim, two tasks sharing the same TILL from two task types were compared in terms of VG and VR. Table 12. Comparison of immediate vocabulary gain scores of groups with their conjugate tasks M SD M SD R1 (True/False) 12.60 4.096 15.35 4.018 P1 (Short response) R2 (Matching with Definitions) 18.05 4.850 14.89 5.363 P2 (Fill-in) R3 (Multiple Choice) 9.42 3.820 15.33 5.228 P3 (Sentence Writing) Table 13. Comparison of vocabulary retention scores of groups with their conjugate tasks M SD M SD R1 (True/False) 8.85 2.834 10.04 2.585 P1 (Short response) R2 (Matching with Definitions) 12.55 3.395 11.00 4.256 P2 (Fill-in) R3 (Multiple Choice) 7.63 2.985 11.43 3.340 P3 (Sentence Writing) As seen in table 12. and table 13., P1 group (M=15.35, SD=4.018) had higher scores than R1 group (M=12.60, SD=4.096); and P3 group (M=15.33, SD=5.228) and R3 group (M=9.42, SD=3.820) had a substantial difference on both the immediate and delayed post-tests. And the results of independent samples t-test indicated a significant difference on immediate post-test Yorgancı & Subaşı 1194 scores. Although P1 group outperformed R1 group and P3 group had higher scores than R3 group on the delayed post-test in terms of VR, the differences were insignificant between the delayed post-test scores. On the other hand, R2 group (M=18.05, SD=4.850) and P2 group (M=14.89, SD=5.363) yielded similar results on the immediate post-test and the independent samples t-test results indicated a significant difference between these two groups. Another similar result was observed between the scores of delayed post-tests of P2 (M=11.00, SD=4.256) and R2 (M=12.55, SD=3.395) groups. Even though the students who completed R2 outperformed the students who completed P2, a significant difference was not found on the delayed post-test scores of two groups in terms of retention. 4.3. Discussion of the Findings A division of productive and receptive tasks made it possible to compare each task in its own task type as in research question one and research question two. It also facilitated to compare two tasks sharing the same TILL from two different tasks types to find out the task type effect as in research question three. The receptive group tasks required the participants to recognize the form and meaning of the TWs and choose the correct answer by matching, determining if they are true/false, and choosing the meanings in multiple choice questions. However, the productive group tasks required to provide a product by writing a few words to answer questions, fill in the blanks of a text, and generating a meaningful sentence. The findings of both post-tests implied that involvement load level had an effect on the participants’ incidental vocabulary learning to some extent. Not all comparisons yielded the expected results caused by TILH. However, most of the comparisons were in line with TILH. The reasons of not having the same results in line with the other TILH studies in the literature might include the fact that the students might not take the tasks seriously as they were informed that they were not going to get any score from these tasks. Another reason might be related to difference between the classes. Although all the participants were A2 level during the current study, there were some differences between the classes. The vocabulary tasks were assigned randomly, therefore the results might have been affected from these language level differences. Time limitation was another factor as each quarter at the School of Foreign Languages lasted 8 weeks, the implementation which included the tasks, immediate post-test and three weeks later a delayed post-test was also affected by this restriction. 4.3.1. The effects of tasks with different TILLs The statistical analysis on both of the post-tests of receptive task groups showed that the groups differed from each other. According to TILH, R3 was expected to get the highest scores. R2 was supposed to yield better results than R1. However, the highest scores belonged to R2, R3, and R1task groups, respectively. The findings of the statistical analyses conducted for productive task groups indicated that the productive groups differed from each other on both post-tests. As TILH suggested, the highest scores should have belonged to P3 group and the lowest scores belonged to P1 group. However, the results of the immediate post-test showed that P1 group outscored the other groups and the lowest scores were obtained by P2 group. And delayed post-test results provided full support for TILH by having the highest scores from P3, P2, and P1 groups, respectively. The answers for the first research question provided partial support for TILH contrary to the similar research studies in the literature (Sarani et. al., 2013 and Pourakbari and Biria, 2015). The current study did not provide full support for TILH on both post-tests. Kim (2008) also concluded with partial support to TILH as the task with the highest TILL did better on the post- test. However, the task with the moderate level of involvement load was not found to be superior to the task with the lowest TILL. The task with the highest TILL was found to provide International Online Journal of Education and Teaching (IOJET) 2022, 9(3), 1181-1202. 1195 the lowest scores on both post-tests and this might be because of the fact that adding search component to a task might not provide the expected results. The second part of the first research question was related to the effect time interval. The results showed that the scores decreased to some extent when the participants’ scores of two post-tests were compared. Therefore, it could be stated that three weeks’ time interval affected the scores of the participants of the current study negatively as in Behbahani, Pourdana, Maleki, and Javanbakht (2011), Arpaci (2016), and Ehsani and Karami (2022). On the other hand, the answers for the second research question provided two different results. While the results of the immediate post-test provided partially support for TILH, the results of the delayed post-test provided full support which was only obtained from the comparison scores of the productive task groups’ post -tests. Similarly Folse (2006) and Walsh (2009) did not conclude with the results which were in line with TILH. Like Walsh (2009), the current study did not provide any significant difference on the results of one-way ANOVA. The mean scores of the groups were very similar. The results of delayed post-test score comparisons to find out effects of TILH on word retention provided full support for TILH like many studies in the literature (Hulstijn and Laufer, 2001; Beal, 2007; Keating, 2008; Kim, 2008; Eckerth and Tavakoli, 2012; and Mármol and Sánchez-Lafuente, 2013). All concluded that the tasks with higher involvement loads led to higher VG and VR. In the current study, sentence writing group who has the highest TILL outscored the other two groups, namely short response and fill in the blanks. As Behbahani et. al., (2011) put forward in their study, it is not a surprising fact to have students who did better on the immediate post-test and then their scores decreased on the delayed post-test. This situation could be associated with negative time interval effect between the two post-tests. Hence, the scores of the participants of the current study negatively might have been affected negatively due to three weeks’ time interval. 4.3.2. The effects of tasks with the same TILLs An attempt was made to find whether the tasks with the same TILLs would yield similar results or not. To this end, the third research question of the current study was posed to find out any possible task type effect on students’ VG and VR. The tasks were matched with their conjugate tasks. Each pair was compared to each other on both immediate and delayed post- tests. The comparison of the first pair (P1 and R1) on the immediate and delayed post-tests showed that productive tasks lead to higher VG and VR on post-tests. The findings are in line with the findings of Ellis and He (1999) who suggested that the students remember productive tasks better than non-productive tasks. In the current study, as the second pair, P2 and R2 groups were compared. To the contrary of the suggestions of TILH, these two groups did not have similar results on the post-tests. Some studies in the literature provide support for the situation. Laufer (2003) revealed that sentence completion group (TILL:3) had higher scores on the tests than sentence writing group (TILL:3). In Esfahani’s study (2012), firstly productive group outperformed the other group in writing test, and then the receptive group did better in the reading comprehension test. As Webb (2005) suggested most of the vocabulary tasks in a classroom setting are receptive tasks. Hence, the students in P2 and R2 groups might be more familiar with the receptive tasks. As a result, the reason of R2 group’s having higher scores on both post-tests might be explained. Folse (2006) compared three receptive tasks (cloze exercises) with one productive task (sentence writing). The results showed that receptive task groups outperformed the productive task group. Yorgancı & Subaşı 1196 To the contrary of this fact, some other studies were conducted and provided counterevidence for receptive tasks’ superiority. Laufer and Rozovski-Roitblat (2011) advocated that most of the linguistic resources should be used for productive tasks. Webb (2009) concluded that the students assigned to productive tasks did better on the tests compared to the students assigned to receptive group. Like these studies, in the present study P3 group obtained higher scores than R3 group in the comparison between them as the last pair. To sum up, the tasks sharing the same involvement load did not lead to similar results in any of the pairs. The findings of the study supported the findings of Yaqubi et. al. (2012) who suggested that other than the involvement index, task type (receptive or productive) has a crucial role in incidental vocabulary learning of EFL learners. Therefore, taking task type effect into consideration while designing vocabulary tasks along with TILH might provide useful insights for scholars and language teachers. 5. Conclusion For the current study, six different vocabulary tasks with varying total involvement load indexes were designed to conduct the present study which aimed to find out the effects of Task- induced Involvement Load Hypothesis on the incidental vocabulary acquisition of 122 EFL prep students at a private university. A reading text with its nine target words was utilised to test the participants’ incidental VG and VR. The text was accompanied first with two different reading comprehension activities and then each group was given a vocabulary task which was specifically designed for that group. To measure VG and VR, unannounced immediate and delayed post-tests were conducted. The scores that participants obtained from these two post- tests were analysed to find out the effects of TILH on the participants’ incidental vocabulary acquisition. In order to answer the first research question which sought whether three receptive tasks with varying levels of involvement load had any effects on students’ VG and VR, the scores obtained from immediate and delayed post-tests were compared and it was found that the target words were remembered by most of the participants on both post-tests. Although the results of two post-tests for receptive tasks were similar to each other, these results did not support TILH completely. Increasing the total involvement load indexes did not bring about the expected results as anticipated in the hypothesis which can be seen for multiple-choice group who were supposed to outscore the other two groups. Although the two lowest were not found as expected (R2>R1>R3), the difference between R1 and R3 groups were found to be insignificant on both post-tests. It showed that increasing involvement load levels of all vocabulary tasks might not provide the desired results. Some tasks might be affected by other factors. In order to explore it in detail, more receptive vocabulary tasks with varying TILLs might be compared to each other. Similar to research question one, the second research question aimed at finding whether three productive vocabulary tasks with different total involvement load indexes had any effects on the participants’ VG and VR on the post-tests. The hypothesis put forward that between these three receptive tasks the highest scores should have belonged to P3, the higher scores to P2, and the lowest scores to P1. Contrary to the findings of the immediate post-test, the results of the comparisons supported the hypothesis fully (P3>P2>P1). Although P1 group received the highest scores on immediate post-test, it was the group who obtained the lowest scores on the delayed post-test. This might prove that providing short response to the questions as in P1 group might help students remember the words in their short-term memory. However, it does not help retaining the words in the long term. International Online Journal of Education and Teaching (IOJET) 2022, 9(3), 1181-1202. 1197 The effect of time interval for both receptive and productive vocabulary tasks have also been investigated as a part of the research questions one and two. It was found out that for both groups the three weeks’ time interval affected negatively. However, this was an expected result as the students did not receive any treatment or language education related to these TWs. For the third research question, it was aimed to find out any possible significant difference between the groups who shared the same level of involvement load. To this end, three pairs were compared to each other on VG and VR. All comparisons yielded a significant difference between the groups in each pair on the immediate post-test. However, the differences between the scores of the groups in each pair (P1-R1; P2-R2; P3-R3) on the delayed post-test were found to be insignificant. This might be due to the differences between the levels of being affected by the time interval of both parts of the pairs. As a result of comparing the tasks to their pairs, the pairs sharing a TILL of 1 and 3, productive vocabulary tasks outperformed the receptive vocabulary tasks. However, the comparison between the tasks sharing a TILL of 2 concluded that receptive task group (R2) did better than productive task group (P2). Although, some studies in the literature like Ellis and He (1999) provided results in support of productive tasks’ superiority, the second pair (P2- R2) provided counterevidence in this present study. In fact, the findings might change according to not only the task type but also to other factors because Esfahani (2012) also concluded with firstly the results in favour of productive tasks and then counterevidence to productive tasks. It can be concluded that productive tasks’ superiority over receptive tasks might be found in most of the comparisons. However, it would be a good idea to take other factors such as task features and requirements into consideration not to overgeneralize the results. Additionally, Ehsani and Karami (2022) came up with the conclusion that Technique Feature Analysis (TFA) is a more powerful predictor for incidental vocabulary learning than TILH as TILH has many shortcomings and they are compensated for by TFA model. 5.1. Implications In an attempt to test TILH, six vocabulary tasks with different TILLs were designed. These tasks were categorized into two groups, receptive and productive, both to compare them in their own task type and to compare each task to its conjugate task which shares the same involvement load level in the other task type group. Unlike other TILH studies in the literature, the current study aimed at adding a new dimension to the hypothesis by taking the effects of task type into consideration. Hence, the findings of the study offer some implications for both TILH literature and classroom practices regarding incidental vocabulary acquisition. Regardless of task type, any vocabulary task should be designed by taking its involvement load index into consideration as in most of the comparisons of the current study, it was found out that the higher TILL both led to higher VG and VR. As many studies in the literature like Yaqubi et. al. (2012), Sarani et. al. (2013), Pourakbari and Biria (2015), and Karalık (2016) suggested, the tasks with higher involvement loads should be selected in order to increase VG and VR. The present study tested TILH. However, on the other hand, it was found out that making use of vocabulary tasks for incidental learning also helped draw students’ attention on the target words. Karalık (2016) and Eysenck (1982) put forward that it was not the willingness of the students but how deeply the word is processed at the first encounter to be able to store the words in the memory successfully. Hence, the vocabulary tasks like the tasks of the current study might be helpful for incidental vocabulary learning. As classroom time is limited to teach everything intentionally, incidental teaching techniques should be preferred. Yorgancı & Subaşı 1198 The reason of not having found similar results in the current study as TILH suggested that the students might be used to doing some specific vocabulary tasks such as matching with definitions and true/false as many course books provide these two tasks mostly in the first levels (A1 and A2). Alavinia and Rahimi (2019) advocated that some other factors related to the students such as attention span, writing skills, and dictionary use might hinder the effect of TILH. In the context of the current study, the students are always encouraged to use a dictionary. However, any training on choosing the best definition for the context is not provided to the students. The participants of the current study have practice in short response activities mostly and they are mostly asked to answer these questions in the exams of their school. Hence, the attention of the students is generally drawn to short response vocabulary tasks. Writing sentences and paragraphs using the target words studied in the reading passages are postponed until B1 level. Therefore, the students do not get used to writing sentences immediately and it takes more time until they feel comfortable with writing sentences and using the target words in them. As Zou (2017) stated writing exercises help students more in vocabulary learning compared to other vocabulary exercises like cloze exercises as writing exercises require pre- planning and systematic organization which are absent in other vocabulary exercises. It would be a good idea to start writing sentences along with vocabulary teaching in order to have more comfortable students in producing the target language verbally. Zou (2017) added for the reading-based exercises of the teaching materials, writing sentences using the target vocabulary should be attached the necessary importance as the students are supposed to use chunking, pre- task planning, and hierarchical organization for writing. As Ehsani and Karami (2022) suggested, the internal structures of the vocabulary tasks lead to different test results. These structures also identify the TILLs of the vocabulary tasks. 5.2. Limitations of the Present Study For this purpose of the study, more vocabulary tasks might be designed. The results of the current study may be generalized for the tasks included here. Each task has its own peculiar result on different tests. Hence, for the long-term retention the results of the immediate post- test might be taken into consideration. The study implemented the study just once in order not to make students be aware of the upcoming tests and the aim of the present study. More implementations of the same design with different reading passages over time might yield different results. However, as the nature of the incidental vocabulary learning having students who knew about the upcoming procedure would not suitable with the nature of incidental teaching. The vocabulary test scores were not graded by a second professional. Only for the ambiguous answers, expert opinion was gathered. That might have affected some of the results. The findings of the study are limited to this specific context. Different studies with participants from state universities, different departments, backgrounds and with different levels of English might yield different results. 5.3. Suggestions for Further Studies In the current study, the findings concluded that both receptive and productive tasks might yield more different results than what TILH suggested. Hence, the comparisons of the post-test scores might be taken into consideration as to find out the most useful vocabulary tasks. Zou (2017) who conducted a study and compared two productive tasks with a TILL of 3 concluded that although sentence writing and composition writing shared the same TILL, composition writing group outperformed the other. Hence, a new degree of evaluation should International Online Journal of Education and Teaching (IOJET) 2022, 9(3), 1181-1202. 1199 be added for new studies. In this study, the productive tasks did not differ from each other much. Therefore, a new design like Zou (2017) might be preferred in the further studies. Unlike TILH, the current study came up with the conclusion that in all of the comparisons a task type had superiority over the other. TILH suggested that the tasks sharing the equal involvement load levels yield similar results. Therefore, further studies might utilize productive tasks more than receptive tasks. In the current study, only one delayed post-test was conducted three weeks after the implementation. Another delayed post-test might be conducted more weeks later in order to vocabulary retention in longer time periods. Many studies in the literature and the current study concluded with some counterevidence to TILH. Although the hypothesis leads to more VG and VR based on TILLs of the vocabulary tasks, the fact that it might not be so effective for all vocabulary tasks should be taken into consideration while designing further studies. Yorgancı & Subaşı 1200 References Alavinia, P. and Rahimi, H. (2019). Task types effects and task involvement load on vocabulary learning of EFL learners. International Journal of Instruction, 12(1), 1501-1516. Arpaci, D. (2016). The effects of accessing L1 versus L2 definitional glosses on L2 learners’ reading comprehension and vocabulary learning. Eurasian Journal of Applied Linguistics, 2(1), 15-29. Beal, V. (2007). The weight of involvement load in college level reading and vocabulary tasks. Doctoral dissertation. Canada: Concordia University. Behbahani, S. M. K., Pourdana, N., Maleki, M., Javanbakht, Z. (2011). EFL task induced involvement and incidental vocabulary learning: Succeeded or surrounded. International Conference on Languages, Literature and Linguistics. IPEDR Proceedings, 26, 323-325. Craik, F. I. and Lockhart, R. S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11(6), 671-684. Retrieved December 22, 2018, from http://dx.doi.org/10.1016/S0022-5371(72)80001-X Creswell, J. W. (2005). Educational research: Planning, conducting, and evaluating quantitative and qualitative research (2nd ed.). Upper Saddle River, NJ: Pearson. Çekiç, A. (2022). Incidental L2 vocabulary learning from audiovisual input: the effects of different types of glosses. Computer Assisted Language Learning, 1-28. Eckerth, J. and Tavakoli, P. (2012). The effects of word exposure frequency and elaboration of word processing on incidental L2 vocabulary acquisition through reading. Language Teaching Research, 16(2), 227-252. Ehsani, M., & Karami, H. (2022). Comparing the predictive power of involvement load hypothesis and technique feature analysis. International Journal of Language Studies, 16(2). Ellis, R. and He, X. (1999). The roles of modified input and output in the incidental acquisition of word meanings. Studies in Second Language Acquisition, 21(2), 285-301. Esfahani, F. R. (2012). Impact of vocabulary learning tasks on communicative gains of advanced EFL learners of Persian. American Journal of Economics, 14-17. Eysenck, M.W. (1982). Incidental learning and orienting tasks. In C. R. Puff (Ed.), Handbook of research methods in human memory and cognition. New York: Academic Press. Folse, K. S. (2006). The effect of type of written exercise on L2 vocabulary retention. TESOL Quarterly, 40(2), 273-293. Hazrat, M. (2020). The Involvement Load Hypothesis and Its Impact on Vocabulary Learning (Doctoral dissertation, University of Auckland). Retrieved February 28, 2022, from https://researchspace.auckland.ac.nz/handle/2292/51729 Hulstijn, J. H. and Laufer, B. (2001). Some empirical evidence for the involvement load hypothesis in vocabulary acquisition. Language Learning, 51(3), 539-558. Jones-Mensah, I., Tabiri, M. O., Fenyi, D. A., Kongo, A. E. and Amexo, D. (2022). Vocabulary knowledge of collocation in business texts: a case of ESL tertiary students. International Journal of Education, Technology and Science(IJETS), 2(1), 001–023. Karalık, T. (2016) The Effects of Task Induced Involvement Load Hypothesis on Turkish EFL Learners’ Incidental Vocabulary Learning. Unpublished master’s thesis. Eskişehir: Anadolu Üniversitesi, Eğitim Bilimleri Enstitüsü. http://dx.doi.org/10.1016/S0022-5371(72)80001-X https://researchspace.auckland.ac.nz/handle/2292/51729 International Online Journal of Education and Teaching (IOJET) 2022, 9(3), 1181-1202. 1201 Keating, G. D. (2008). Task effectiveness and word learning in a second language: The involvement load hypothesis on trial. Language Teaching Research, 12(3), 365-386. Kim, Y. (2008). The role of task‐induced involvement and learner proficiency in L2 vocabulary acquisition. Language Learning, 58(2), 285-325. Laufer, B. (2003). Vocabulary acquisition in a second language: Do learners really acquire most vocabulary by reading? Some empirical evidence. Canadian Modern Language Review, 59(4), 567-587. Laufer, B. and Hulstijn, J. (2001). Incidental vocabulary acquisition in a second language: The construct of task-induced involvement. Applied Linguistics, 22(1), 1-26. Laufer, B. and Rozovski-Roitblat, B. (2011). Incidental vocabulary acquisition: The effects of task type, word occurrence and their combination. Language Teaching Research, 15(4), 391-411. Mármol, G. A. and Sánchez-Lafuente, Á. A. (2013). The involvement load hypothesis: The effect on vocabulary learning in primary educaion. Revista Española de Lingüística Aplicada, (26), 11-24. Pourakbari, A. A. and Biria, R. (2015). Efficacy of task-induced involvement in incidental lexical development of Iranian senior EFL students. English Language Teaching, 8(5), 122-131. Sarani, A., Mousapour Negari, G. and Ghaviniat, M. (2013). The role of task type in L2 vocabulary acquisition: a case of involvement load hypothesis. Acta Scientiarum. Language and Culture, 35(4). Sarbazi, M. R. (2014). Involvement load hypothesis: Recalling unfamiliar words meaning by adults across genders. Procedia-Social and Behavioral Sciences, 98, 1686-1692. Teng, M. F., & Zhang, D. (2021). Task-induced involvement load, vocabulary learning in a foreign language, and their association with metacognition. Language Teaching Research, 13621688211008798. Retrieved February 28, 2022, from https://bit.ly/36SFJJ3 Walsh, M. I. (2009). The involvement load hypothesis applied to high school learners in Japan: Measuring the effects of evaluation. Unpublished master’s thesis. United Kingdom: Birmingham University. Webb, S. (2005). Receptive and productive vocabulary learning: The effects of reading and writing on word knowledge. Studies in Second Language Acquisition, 27(1), 33-52. Webb, S. A. (2009). The effects of pre-learning vocabulary on reading comprehension and writing. Canadian Modern Language Review, 65(3), 441- 470. Wilkins, D. A. (1972). Linguistics in language teaching. London: Arnold. Yaqubi, B., Rayati, R. A. and Allemzade Gorgi, N. (2012). The involvement load hypothesis and vocabulary learning: The effect of task types and involvement index on L2 vocabulary acquisition. Journal of Teaching Language Skills, 29 (1), 145-163. Zou, D. (2017). Vocabulary acquisition through cloze exercises, sentence-writing and composition-writing: Extending the evaluation component of the involvement load hypothesis. Language Teaching Research, 21(1), 54-75 https://bit.ly/36SFJJ3 Yorgancı & Subaşı 1202