JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 149 JOALL (Journal of Applied Linguistics and Literature) Vol. 7 No. 1, February 2022 ISSN (print): 2502-7816; ISSN (online): 2503-524X Available online at https://ejournal.unib.ac.id/index.php/joall/article/view/19920 http://doi.org/10.33369/joall.v7i1.19920 Video or audio listening tests for English language teaching context: Which is more effective for classroom use? 1Clara Herlina Karjo , 2Menik Winiharti , 3Safnil Arsyad 1,2English Department, Bina Nusantara University, INDONESIA 1,2Jalan Kebon Jeruk Raya No. 27, Kebon Jeruk, Jakarta Barat 11530, Indonesia 3English Education Postgraduate Program, Bengkulu University, INDONESIA 3Jalan WR Supartaman Kandang Limun Kota Bengkulu 38371, Indonesia ARTICLE INFO ABSTRACT Article history: Received: Jan 05, 2022 Revised: Jan 16, 2022 Accepted: Feb 04, 2022 Multimodal inputs (both auditory and visual) in the forms of films and videos have long been used in teaching EFL listening comprehension. Previous studies have shown that listening while watching videos can significantly aid students’ comprehension. However, videos were rarely used as testing materials since they contained more than aural input so they did not ‘really’ test listening. This study explored the extent to which multimodal testing materials can be used in testing listening comprehension for EFL students and how the results would differ from that of mono modality testing materials. The participants were 100 students of the English Department, Bina Nusantara University (henceforth Binus) University Jakarta. The researchers gave them two kinds of tests: the video listening test (VLT) and audio listening test (ALT). The materials were two short videos from YouTube. The first test, ALT was given after the participants listened to the videos twice. On the contrary, VLT was administered after they watched the videos twice. To examine the differences in the effects of VLT or ALT on EFL students’ performance in listening comprehension, the data were analyzed quantitatively. The results indicate that students got better scores for VLT compared to ALT. The findings imply that students’ performance in listening comprehension is significantly improved with multimodal testing materials. Keywords: Audio Listening Test Listening Comprehension Video Listening Test Multimodality English Language Teaching Conflict of interest: None Funding information: Bina Nusantara University Correspondence: Safnil Arsyad, English Education Postgraduate Program, the University of Bengkulu, INDONESIA. safnil@unib.ac.id ©Clara Herlina Karjo, Menik Winiharti, Safnil Arsyad This is an open access article under the CC BY-SA 4.0 international license. How to cite (APA Style): Karjo, C.H., Winiharti, M., Arsyad, S. (2022). Video or audio listening tests for English language teaching context: which is more effective for classroom use? JOALL (Journal of Applied Linguistics and Literature), 7(1), 104-118 https://doi.org/10.33369/joall.v7i1.19920 Since elementary schools, English has been studied by Indonesian students as a foreign language. Nevertheless, their length of learning English https://crossmark.crossref.org/dialog/?doi=10.33369/joall.v7i1.19920&domain=pdf https://ejournal.unib.ac.id/index.php/joall/article/view/19920 http://doi.org/10.33369/joall.v7i1.19920 https://creativecommons.org/licenses/by-sa/4.0/ https://doi.org/10.33369/joall.v7i1.19920 https://orcid.org/0000-0002-7371-240X https://orcid.org/0000-0001-7245-806X https://orcid.org/0000-0003-4174-2556 Clara Herlina Karjo, Menik Winiharti, Safnil Arsyad 150 JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 does not guarantee that they are proficient in four language skills. Ur (1984) mentioned that even though their grammar skills are good enough, they still have problems in doing listening exercises. Listening comprehension is considered the most important among the four language skills because it provides the aural input which is the foundation of language acquisition (Gao, 2012). Listening is considered difficult by most EFL learners. In literature, there are a lot of complex factors such as rate of speech (Blau, 1990; Griffiths, 1992), prosody, accent, phonological features (Matter, 1989), hesitations, complex grammar (Gao, 2012), rhetorical signaling cues (Cross, 2011), lack of background knowledge (Chiang and Dunkel, 1992) and low language proficiency (Brown, 1995) can affect listening comprehension. Despite being the most difficult skill, listening is the most frequently used language skill in the classroom, compared to the other language skills (Ferris, 1998). In learning grammar, reading, vocabulary and speaking, listening skill is applied. For example, students must have sufficient listening ability to be able to understand the materials that the teacher is teaching. Unfortunately, many teachers still have the perception that listening is a passive skill, in which the students only have to listen passively without doing anything. This perception is not accurate, since listening is an active skill, not as passive as it seems. When a person listens to something, there are a lot of cognitive processes going on. Purdy (1997) stated that in listening there are active and dynamic processes of attending, perceiving, interpreting, remembering, and responding to verbal and nonverbal needs, concerns, and information offered by other people. Listening is also a complicated process that involves linguistics, cognitive, cultural and social knowledge (Wang & Miao, 2003). Moreover, listening comprehension is also an inferential process that involves various background knowledge (Gilakjani & Ahmadi, 2011). Similarly, Rost (2002) and Hamouda (2013) described listening comprehension as an interactive process of meaning construction that involved listeners. Listeners, according to Gilaksjani & Sabouri, 2016) should comprehend the oral input through sound discrimination, previous knowledge, grammatical structures, stress and intonation, and other linguistic or non-linguistic clues. Therefore, successful listening, according to Anderson and Lynch (1988), does not only depend on what a speaker says, but also on how the listener plays part in the process, by activating various background knowledge, and by applying his previous knowledge to what he hears and tries to understand what the speaker means. To be able to listen well, listeners must have the ability for decoding the message, apply a variety of strategies and processes to make meaning, and the ability for responding to what is being said in various ways, depending on the purpose of communication. From the above discussion, it is clear that listening comprehension requires active participation on the part Video or audio listening tests for English language teaching context: Which is more… JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 151 of the listeners (in this case EFL the students) as well as the speaker because listening is a highly complex problem-solving activity where the listeners interact with a speaker to construct meaning in the context of their experiences and knowledge. Regarding the teaching of listening comprehension in the EFL classroom, Gilakjani and Ahmadi (2011) propose three activities consisting of pre-listening, while listening and post-listening activities. Pre-listening activities include the outline of the listening text and teaching the key concepts. Here, a teacher can give a general description about the topic or theme of the text that the students will listen to, discuss some difficult words or concepts, and probably discuss the accent of the speakers, etc. These activities are used to activate students’ prior knowledge and expectation and to provide the necessary context for specific listening tasks. While listening activities are aimed to construct clear and accurate meaning as they interpret the speaker’s verbal message and nonverbal cues. The purpose is to focus the comprehension of speaker ideas, to focus on organizational patterns, to encourage students’ reaction to the speaker’s ideas and the use of language. The activities include open-ended activities like question and answer, filling in the missing words, or pair work activity. Post-listening activities have the purpose of connecting what the students heard with their experience, encouraging critical listening and reflective thinking. Post listening activities include checking students’ comprehension and clarifying understanding, answering questions that have not been understood. Traditionally, listening comprehension materials are given in audio- only input. However, if learners can see how the language is used in an actual situation, listening activity can be more meaningful. To put it in another way, when learners see how the speakers speak the language, they can learn the language from both audio and visual inputs. Harmer (2007) cited several materials that can give access to both audio and visual inputs such as DVD, film clips on video, or online watch while listening. By watching videos, some paralinguistic behaviors such as how intonation matches the facial expression and gestures can be seen. The availability of the various materials for listening comprehension will enable teachers to give students continuous and theoretical directions that will make students change their previous notion that listening course is only teacher’s playing audio and students repeat to improve their linguistic skills through listening (Ruan, 2015). Providing multimodal inputs (visual and auditory) in the form of materials for listening comprehension has been done for quite a long time. Wagner (2008) claimed that language teachers have already utilized movies, TV shows, and other sources of audiovisual media in their teaching, especially in teaching L2 listening. Wagner (2008) further asserted that the use of multimodal media Clara Herlina Karjo, Menik Winiharti, Safnil Arsyad 152 JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 such as videos allows the listener to process both the aural and visual information communicated by the speakers as in real-life situations. Compared to audio-only listening materials, multimodal presentation (i.e., movies or videos with subtitles or captions) is found to improve EFL learners’ listening comprehension (Zareian, Adel &Noghani, 2015). This is because, multimodal presentation (particularly in the case of captioned video) can aid comprehension for language learners who are “hard of listening” and find the speech of foreign language TV, films, and videos difficult to follow and understand (Vanderplank, 2016). Captions or subtitles which accompany the video have been found highly effective in promoting listening comprehension (Yang & Chang, 2014). Thus, it will be highly likely that videos will be used in the teaching of L2 listening in the future because of considering the encompassing influence of video technology and the internet in daily life. In the field of instructional design, the focus of future studies will rely on multimedia learning regarding the effective use of video to enhance learning and the impact of video on learning (Guo, Kim & Rubin, 2014; Kay, 2012). However, irrespective of the popular use of videos as teaching materials for L2 listening, they are not equally popular as testing materials. This notion was affirmed by Wagner (2010) who said that video is commonly employed in L2 classrooms, but test developers have been reluctant to use videotexts on the tests of L2 listening ability. The test developers’ reluctance for using videos in a listening test might be related to technology and practicality. Previously, Wagner (2008) stated that more resources were needed to create a video listening test than the more traditional audio-only test. Some researchers are concerned that visual channels may affect the test- taker performance during the video listening test. In other words, whether the students who take the video listening test will be more influenced by the visual factors rather than concentrating on the aural content of the video. According to Taylor and Garenpayeh (2011), test takers’ performance may be affected by external contextual factors and individual characteristics. Visual images may distract test takers. Moreover, even though the students’ language proficiency is the same, not all students may understand he content. Test-takers’ performance may also be affected by their internal cognitive factors through a loading effect while they are processing information. Despite the concerns regarding the effect of visual images on test- takers’ performance, previous studies of video listening tests (VLT) and audio-only listening tests (ALT) showed various results.In the study of Basal, Gulozer, and Demir (2015), even with the visual elements of the videos, the ALT group performed significantly higher than the VLT group. On the contrary, Shin’s (1998) study discovered that participants of video tests performed significantly (around 25%) better compared to an audio test Video or audio listening tests for English language teaching context: Which is more… JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 153 group.However, Gruba (1993) found out that the scores of video-mediated and audio-mediated groups did not show a statistically significant difference. Nevertheless, further research still needs to be carried out, either to corroborate or to refute the previous studies. The present study is aimed to compare the EFL students’ performance in listening comprehension tests using mono-modality presentation (audio-only test) and multimodality presentation (audio, visual and textual) using video clips from YouTube. The research is also aimed at finding the possibility of using multimodal materials for the teaching of listening comprehension for EFL students. As a guideline, the following hypotheses are addressed in this study: 1) The use of multi-modality presentation (audio, visual and textual) can significantly improve student’s listening ability, and 2) The use of multi-modality presentation cannot significantly improve students’ listening ability. METHOD Research design The present study utilized the post-test only control group design (Creswell, 2009). Two types of test modality (audio or video) in the listening comprehension test were used to measure participants’ performance. These two different modalities were the Audio listening test (ALT) and the Video listening tests (VLT) were administered to the same group of participants. The data were analyzed quantitatively using SPSS software. The statistical calculation used was paired samples T-test to find out the answer to the hypotheses of whether the use of multi-modality presentation can significantly improve the students’ listening ability which is reflected by their listening test scores. Participants The participants of the research were 100 students of the English Department of Bina Nusantara University, Jakarta. They were purposely chosen from semesters 3, 5, and 7, with the assumption that they have sufficient English knowledge to participate in this research. However, the research did not differentiate the students’ results based on their years of study (the length of semesters they have taken) and their English proficiency levels, but only measured the results based on the modes of presentation (audio and visual) used. Listeners did not demonstrate hearing and vision problems, nor did they stay in English-speaking countries before. The following table 1 shows the summary of participants’ data concerning their age, semester, and gender. Clara Herlina Karjo, Menik Winiharti, Safnil Arsyad 154 JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 Table 1: Summary of Participant Data Gender Semester Average Age Total 3 5 7 Male 5 (18.5 %) 15 (55.5 %) 7 (25.9 %) 20 27 Female 21 (28.8 %) 39 (53.4 %) 13 (17.8 %) 19.5 73 Instruments The topics for ALT and VLT listening comprehension tests were taken from the video clips of Ted Talks from YouTube. TED (Technology, Entertainment, Design) is a global set of conferences run by the private nonprofit Sapling Foundation, with the slogan "Ideas Worth Spreading”. TED emphasizes on the educational aspects. There were two topics taken as the materials for the tests, i.e. The language of lying and The effect of sleep deprivation. These topics were chosen randomly and were not related whatsoever to the participants’ study course. The duration of each video is around 5 minutes. The first video was used to devise the listening test in the Audio (ALT) modality, while the second video was used to devise the listening test in Audio Visual (VLT) modality. The test for each modality consists of 20 question items, which were grouped into three types of test items. Table 2 below shows the representation of the topics and the type of test items. Table 2: Topics and test items. Testing Modality Topic Type of test Number of items ALT The language of lying Word listing Cloze summary Multiple choice 7 8 5 VLT The effect of sleep deprivation Cloze summary Number listing Multiple choice 7 4 9 Total 40 Data collecting procedure and analysis This study used a post-test-only design as proposed by Creswell (2009) to measure the learners’ performances in two types of modality in listening tests. Participants were given two types of tests: (a) Audio listening test (ALT) This test was administered after the participants listened to the first video The language of lying for 5 minutes approximately. In this kind of test, the screen was turned off so the participants only got the audio input. After a three-minute break, they listened to the audio content of the video once again. Finally, they were given a written test regarding the content of the video. Video or audio listening tests for English language teaching context: Which is more… JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 155 (b) Video listening test (VLT) The second test was administered after the participants finished doing their ALT. The procedure was similar. This time, they watched the second video The effect of sleep deprivation with the screen turned on. Thus they got audio as well as visual input to help them understand the materials in the video. After watching the video two times, they were given a written test regarding the content of the video. FINDINGS The descriptive statistic calculation gives the following results for both samples. The average means score of VLT is 11.80. This means that VLT participants answered 11.80 questions correctly out of 20 items. The minimum score gotten is 4 and the maximum score gained is 20. On the other hand, the ALT participants only answered 10 out of 20 questions correctly, with the minimum score and the maximum score of 19. In general, the results indicate that VLT generated a better score than ALT. These scores were further analyzed for finding the correlation between the VLT and ALT. The analysis yielded the Pearson correlation coefficient of 0.755 with a probability value of 0.000, far below the significant level α = 0.05. This shows a significant correlation between VLT and the ALT scores. Table 3 below provides the results of the paired samples test. Paired samples t-test was chosen because the same participants were tested using two modes of presentation (VLT and ALT). Thus, the purpose of this test is to find out the significance value of the differences between VLT and ALT. The mean difference between VLT and ALT is 1.8. T-test for the hypothesis of H0: VLT = ALT gave the score of t = 11.808. The p-value for the two sides was 0.000, which is less than α = 0.05. This result is convincing evidence to reject the hypothesis that VLT gives the same result as ALT. The conclusion that can be drawn is that VLT gives a better score than ALT. Table 3: Paired Sample T-Test Besides the modality of testing, the types of questions might also affect the scores attained by the students. Since there are three types of questions in both VLT and ALT, table 4 below shows the interaction between the question types with the modality. The questions are divided into three types: summary, listing and multiple choices. The dependent variable is the score gained by the Testing Modality Paired differences t Sig. (2-tailed) Mean Differences Std. Error Mean VLT vs ALT 1.80 .294 11.808 .000 Clara Herlina Karjo, Menik Winiharti, Safnil Arsyad 156 JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 participants. The statistical calculation is done using Two-way ANOVA. There are two kinds of outputs given, the test between-subject effects and post hoc comparison. Table 4 below shows the comparison and the correlation between the modality and the type of test items. The results indicate that there are no significant differences between the types of test items and the modality of presentation. This is not a surprising result because Brindley and Slatyer (2002) reported that learners’ performance in the competency-based listening assessment is affected by the test item format and item difficulty, not by the modes of presentation. Thus, the findings indicate that modality and the types of test items do not have interaction even though the mean scores for each modality (VLT and ALT) differ significantly. Or, in other words, modality of presentation does not affect the attainment of scores for different types of test questions. Table 4: Test between Subject Effects Source Dependent Variable: Score Mean Square F-stat Sig. Corrected Model 1.764 3.488 0.035 Intercept 202.608 400.698 0.000 Modality 6.044 11.952 0.005 Type 1.387 2.743 0.104 (Modality x type) 0.000 0.000 1.000 a. R-squared = .592 (Adjusted R Squared = .423) The above table of ANOVA shows the statistical values for the main effect as follows: a. For the modality factor: the F value is 11.952 with a degree of freedom (df) = 1 and p= 0.000, which is less than α =0.05, then H0 : ALT = VLT is rejected. The conclusion for this factor is that ALT scores differ from VLT scores. b. For the question type factor: the F value is 2.743 with a degree of freedom (df) = 2 and p= 0.005, which is less than α =0.05, then H0 : summary = listing = multiple choices is rejected. Then it can be concluded that each question type gives diverse results. c. For the interaction factor: the F value is 0.000 with a degree of freedom (df) = 2 and p = 1.000, which is bigger than α =0.05, then H0 : (μ summary – modality) = (μ listing – modality) = (μ choice – modality) cannot be rejected. This means that the modality (VLT or ALT) does not affect the attainment of scores for each type of question. Video or audio listening tests for English language teaching context: Which is more… JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 157 To see which test type shows different means, a Post Hoc Multiple Comparison was done. The summary can be seen in table 5. The results clearly show that test types do not influence the score gained. Again, these results confirm that the score difference results from the modality used in testing listening, not the types of test items given in the testing. Thus, it can be concluded that test results depend on the testing modality, not on the test items. The results are in line with Jafari and Hashim (2012) who no interaction effect between the test format and the students’ listening proficiency level. Table 5: The Multiple comparisons (I) types (J) type Mean Difference (I- J) Std. Error Sig. cloze listing -.0250 .41054 1.000 choice -.8450 .41054 .186 listing cloze .0250 .41054 1.000 choice -.8200 .41054 .207 choice cloze .8450 .41054 .186 listing .8200 .41054 .207 The multiple comparisons in table 5 show that the mean of cloze type differs from the mean of listing type and multiple-choice items. The mean differences between each type are very small, for instance between cloze and listing, the difference is only 0.025. The highest mean difference occurs between multiple-choice and summary, which is 0.8450, followed by multiple-choice and listing, which is 0.8200. The results indicate that multiple-choice items get the higher correct answers compared to listing and cloze. Further analysis of the results demonstrates that these differences can also be attributed to the modality used in the tests. Table 6 below displays a descriptive distribution of each type of question shown in percentages. Table 6: Percentage Of Each Type Of Questions Video Listening Test Audio Listening Test Cloze Summary 3.64 / 7 52.00 % 2.50 / 8 31.25% Listing - - 2.49 / 7 35.57% Multiple Choice 4.49 / 9 49.89% 3.33 / 5 66.60% Cloze Number 3.67 / 4 91.75 % - - Mean score 11.80/ 20 59.00% 10 / 20 50.00% When the results of the test are broken down into different test types, several things can be noted. In VLT, the total mean score for all items is 11.80 or around 59% of the test items can be answered correctly. However, for each part, students show diverse results. The highest score is achieved for the cloze number test that reaches 91.75%. This means that the students can answer Clara Herlina Karjo, Menik Winiharti, Safnil Arsyad 158 JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 accurately 3.67 from 4 questions regarding numbers. It is understandable since in the video the numbers are visually shown. Numbers are relatively harder to memorize if they are just spoken and not seen. The second type, which is also benefited from the video, is the close summary test. In this type of test, students have to fill in the summary with one or two words they heard from the video. The students got 3.64 out of 7 items. Again, by watching the video, they can see the visualization of the keywords, which appear on the screen. However, out of 9 items, students were only able to correctly answer 4.49 items in multiple-choice question type, thus making it the worst result. Meanwhile, ALT results showed a different image. In total, out of 20 items, the students managed to correctly answer 10. Thus, the total mean percentage for all items was only 50%. In two types of test items, word listing and cloze summary, students only get 2.49 out of 7 and 2.5 out of 8 items. In these types of questions, students have to rely on their short-term memory to recall the words that they have to fill. Some of the words in this test are unfamiliar scientific terms that are quite difficult to catch such as electroencephalograph and convoluted. Visual showing of new vocabulary on the screen will help students to comprehend their meaning. This finding was similar to Winke et al. (2010) who stated that captions contributed better to learning than no caption concerning novel vocabulary recognition. Surprisingly, in ALT, students were able to achieve 66.6% correct answers in multiple choices items. This may be because, in multiple-choice questions, students can make informed guessing. DISCUSSION The statistical results above corroborate the indication that by listening to the audio and watching the video (which also includes subtitles or captions) at the same time, the students can perform and comprehend better. Visual modality enhances the students’ understanding of the materials. Yang and Chang (2014) confirmed that the application of visual materials was very effective in promoting learning, particularly the use of caption which may enhance listening comprehension. Huang and Eskey (1999-2000) argued that captions and subtitles make audiovisual input more accessible and comprehensible to L2 learners, which is in line with Krashen’s (1985) input hypothesis. The findings of these studies also revealed the students who were exposed to more than one modality, i.e., audio and visual can improve their performance compared to when they were only exposed to one modality such as to audio-only. Behroozizad & Majidi (2015) claimed that the affordance of three channels or modalities (aural, visual, and textual) might reduce the students’ listening anxiety. In turn, these three channels will lead to better listening performance and more confidence in students’ listening ability. The audio mode only provides students with aural input, so they have Video or audio listening tests for English language teaching context: Which is more… JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 159 to fully concentrate on the sounds provided. Consequently, they had to remember a lot of sound-based information. On the contrary, through audiovisual mode, students were equipped with more various information in addition to sound, such as pictures and also captions which occasionally occur in the video. These additional elements may have assisted the students’ comprehension of the contents. Students’ comprehension might be more thorough with the assistance of these three modalities compared to that supported by the audio mode only. The effects of these three modalities on students’ comprehension were corroborated by the studies of Basal, Gulozer and Demir (2015), and Shin (1998). Another reason for the higher scores gained by the VLT group is because according to Kruger and Doherty (2016), the capacity of working memory can be significantly enhanced by multimodal presentation as learners can process information in both channels (auditory narration and visual text). This statement is affirmed by Goh (2000) who argued that in general, EFL students had a limited capacity in their short-term memory, and they had a tendency to forget immediately what they heard previously because they were hasty to understand the new input. This means that students will easily fail to recall the materials if they are only presented in audio mode. On the other hand, if the information they receive is given in various modes, students can remember better. Consequently, they can perform better in video listening tests rather than in audio listening tests. The same result was confirmed by Chang, Lei and Tseng (2011) who found that multi modes (text plus sound), were more effective than single-mode listening instruction. However, even though the modality of presentation did affect the attainment of the overall scores, the results did not show any significant interaction between the modes of presentation and the types of test items. Thus, in this case the students’ listening proficiency level is not determined by the test format. The results are in line with Jafari and Hashim (2012) who found that there was no interaction between the test format and the students; listening proficiency level. Meanwhile, Brindley and Slatyer (2002) found different results. They reported that learners’ performance in competency based listening assessment are affected by the test item format and item difficulty, not by the modes of presentation. Nevertheless, the findings in Table 6 indicate that students performed differently for each type of question in VLT and ALT. In general, the students performed better in listening comprehension using the VLT mode. Students’ understanding may be supported by the visual elements of the video such as the images and the subtitles/captions even though they might not understand the contents of the video. In other words, the video gives them multimodal inputs, which are beneficial for the understanding of the learning materials. Clara Herlina Karjo, Menik Winiharti, Safnil Arsyad 160 JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 The visualization of difficult words on the screen also helps students to connect the pronunciation, and the spelling with the meaning, which in turn can make them retain the words better. For example, the words electroencephalograph was spoken, shown in the picture and written on the screen. Jing (2010) also confirmed that subtitles or captions in the video could help students with the spelling of difficult words and writing a summary after listening. That is why 80% of the students were able to recall the difficult words in the test of VLT. On the other hand, the difficult word convoluted found in ALT can only generate 5% correct answers. Markam, Peter and McCarthy (2001) claimed that L2 learners generally have higher reading comprehension skills than listening comprehension skills; thus, subtitles can be beneficial when they listen to L2 reading materials. The lower rate of word recognition in ALT suggests that students need visual input besides aural input to recall L2 listening materials better. These results confirm several researchers’ findings that bimodal input (audio and visual) can speed up the recognition of words, and the comprehension of content (Chung, 1999; Guillory, 1998; Koolstra & Beentjes, 1999). Another interesting finding relates to the results for multiple choice questions. Unexpectedly, the VLT group only achieved 50% accuracy for multiple-choice questions compared to the 66.6% of the ALT group. This might be caused by the possibility for the students to make informed guessing for this type of question. Hence, although the multiple-choice question type is the most widely used format to measure listening ability it does not necessarily offer the best result in the listening comprehension test (Hemmati & Ghaderi, 2014). The findings of this study bring about several implications either for the teaching or testing of listening comprehension. The first implication is the use of videos in teaching listening. Vandergrift (2011) has predicted that the language learning environment will be transferred into a new era of teaching listening based on the use of authentic audiovisual materials, because of the emergence of technology. Videos offer multimodal inputs that will enhance both situational and interactional authenticity and aid learners’ comprehension (Wagner, 2007). Visual elements in videos can also activate the background knowledge of the listeners (Ockey, 2007). Thus, it is highly recommended to use videos in teaching listening for EFL students. Hosogoshi (2016); Danan (2004), and Vanderplank (2013) have explored some positive effects of using audiovisual materials for L2 learners, among others it can improve listening comprehension, foster vocabulary learning, develop oral production skills, and lower the learners’ anxiety. However, care must be taken when choosing the materials to be used in the classroom. Teachers should previously select and watch the videos themselves before using them as the materials for listening subjects. Videos should be correlated with the student's proficiency in English. The topics Video or audio listening tests for English language teaching context: Which is more… JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 161 should be within the students’ understanding and interest. Students may have been involved in choosing the materials for their learning. Moreover, teachers should not forget that the purpose of video viewing is for teaching listening comprehension. Thus, teachers should create some activities, which can provoke engagement and expectation (Harmer, 2007) on the part of the listeners or students. Some of the activities, among others, are picture less listening (similar to the audio listening procedure in which the teacher covers or turns off the screen and the students only listen to the dialogue or the talks) and using subtitles (the sound is turned off and the students try to construct the dialogue based on the subtitles). There are many more activities using audiovisual materials that can be done to improve students listening comprehension. The second implication is the use of videos in testing listening. Although there has been no conclusive opinion, videos begin to be used in testing listening comprehension especially for EFL students. Sovorov (2015) believes that videos have the possibility of providing multimodal inputs which will result in a greater level of authenticity of test tasks. They will also create testing conditions that may be closely similar to the situation of the target language domain. Videos are readily available materials that can be taken (mostly freely) from the Internet or language learning websites. Again, care must be taken in choosing the materials for testing listening comprehension. The same principles apply that the materials should be appropriated with the student's proficiency level and the test should be carefully prepared so that it fulfills the purpose of testing students’ listening ability. CONCLUSIONS This study confirms previous studies that multimodal presentation can improve EFL students’ listening comprehension. The findings show that the video listening test produced a mean score of 11.80 compared to the mean score of the audio listening test, which is only 10.00. This result indicates that the video listening test increased the students’ comprehension of the lesson materials. Videos enhance learners’ comprehension because they give multimodal inputs in the forms of visuals (context, subtitles, pictures, etc.) as well as auditory input. Multimodal inputs are considered to enhance working memory and comprehension. Therefore, multimodal presentation is highly recommended to be used in the teaching of EFL listening comprehension. Even, it is also suitable for teaching and learning other language skills (reading, writing and speaking). With the advance of technology, language teaching and learning materials can be easily obtained. However, care should be taken in selecting and designing the instruction and testing materials to achieve the desired goal. Thus, the fact that the materials for this study were Clara Herlina Karjo, Menik Winiharti, Safnil Arsyad 162 JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 chosen randomly from TED Talks recording became the limitation of the present research. For future research, therefore, the researcher can carefully choose the materials which can fulfill the testing objectives. Moreover, larger sample size can be used to examine the impact of multimodalities on the comprehension of various text types, such as lectures, dialogues, and authentic listening materials. This study only used a posttest only control group design in which the students’ listening improvement may be assisted by other individual learning activities done by the students outside the research treatments. Therefore, future studies should use a pretest and posttest controlled and experimental group design to get more valid and reliable data on the students’ listening ability improvement. This will give more convincing evidence on the effective use of multi-modal presentation in listening tests to use in English as a foreign language class. ACKNOWLEDGMENTS The authors would like to thank Bina Nusantara University for the funding of the present study. This study was supported by the 2016 Annual Competitive Grant from Bina Nusantara University. REFERENCES Anderson, A., & Lynch, T. (1988). Listening. Oxford: Oxford University Press. Başal, A., Gülözer, K. & Demir, İ. (2015). Use of Video and Audio Texts in EFL Listening Test. Journal of Education and Training Studies, 3(6), 83-89. Behroozizad, S. ; Majidi, S. (2015). The effect of different modes of English captioning on EFL learners’ general listening comprehension: Full text vs. keyword captions. Advances in Language and Literary Studies, 6(4), 1670-1677. Blau, E. K. (1990). The effect of syntax, speed and pauses on listening comprehension. TESOL Quarterly, 24, 746-753. Brindley, G. & Slatyer, H. (2002). Exploring task difficulty in ESL listening assessment. Language Testing, 19(4), 369-394 Brown, G. (1995). dimensions of difficulty in listening comprehension. In D. Mendelshohn & J. Rubin (Eds). A Guide for the Teaching of Second Language Listening, 59-73. San Diego: Domine Press. Chang, C. C., Lei, H. and Tseng, J. S. (2011) Media presentation mode, English listening comprehension and cognitive load in ubiquitous learning environments: Modality effect or redundancy effect? Australasian Journal of Educational Technology, 27(4), 633–654 Chiang, C. S. & Dunkel, P. (1992). The effect of speech modification, prior knowledge and listening proficiency on EFL lecture learning. TESOL Quarterly, 26, 345-374. Video or audio listening tests for English language teaching context: Which is more… JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 163 Chung, J. M. (1999). The effects of using video texts supported with advance organizers and captions on Chinese college students’ listening comprehension: An empirical study. Foreign Language Annals, 32(3), 296–308 Creswell, J. (2009). Research design: Qualitative, Quantitative and Mixed Methods Approaches, 3rd Edition. London, United Kingdom: SAGE. Cross, J. (2011). Comprehending news videotexts: the influence of visual contents. Language Learning and Technology 15(2), 42-68. Danan, M. (2004). Captioning and subtitling: Undervalued language learning strategies. Meta: Translators’ Journal, 49(1), 67–77. Ferris, D. (1998). Students' view on academic aural/oral skills: A comparative needs analysis. TESOL Quarterly, 289-318. Gao, Y. (2012). Effects of speaker variability on learning spoken English for EFL learners. Faculty of Arts and Social Sciences, 59-67. Gilakjani, A., & Ahmadi, M. (2011). A study of factors affecting EFL learners' English listening comprehension and the strategies for improvement. Journal of Language Teaching and Research, 2(5), 977-988. Gilakjani, A. P., & Sabouri, N. B. (2016). The Significance of listening comprehension in English language teaching. Theory and Practice in Language Studies, 6(8), 1670–1677. Griffith, R. (1992). Speech rate and listening comprehension: Further evidence of the relationship. TESOL Quarterly, 26, 385-391. Gruba, P. (1993). A comparison study of video and audio in language testing. JALT Journal 15, 85-88. Gruba, P. (1997). The role of video media in listening assessment. System, 25(3), 335–345. Guillory, H. G. (1998). The effects of keyword captions to authentic French video on learner comprehension. Calico Journal, 15(1-3), 89–108. Guo, P.J., Kim, J. & Rubin, R. (2014). How video production affects students engagement: An empirical study of MOOC videos. Proceedings of the First ACM Conference on Learning @ Scare Conference. Atlanta, Georgia. Harmer, J. (2007). The Practice of English Language Teaching. Harlow: Pearson Longman. Hemmati, F., & Ghaderi, E. (2014). The Effect of Four Formats of multiple- choice questions on the listening comprehension of EFL learners. Procedia - Social and Behavioral Sciences, 98, 637–644 Hosogoshi, K. (2016). Effects of captions and subtitles on the listening process : Insights from EFL learners ’ listening strategies. Jalt Call Journal, 12(3), 153–178. Huang, H. C., & Eskey, D. E. (1999-2000). The Effects of closed-captioned television on the listening comprehension of intermediate English as a Clara Herlina Karjo, Menik Winiharti, Safnil Arsyad 164 JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 second language (ESL) students. Journal of Educational Technology Systems, 28(1), 75-96. Jafari, K. and Hashim, F. (2012). The effects of using advance organizers on improving EFL learners’ listening comprehension: A mixed-method study. System, 40(2), 270–281. Jewitt, C. (2013). Multimodal Teaching and Learning. In C. Chapelle, The Encyclopaedia of Applied Linguistics (pp. 1-5). Chichester: Blackwell Publishing. Jing, Z. (2010). Testing via news videos: An exploratory study. International Journal of Applied Linguistics, 20(2), 178–205 Kay, R. H. (2012). Exploring the use of video podcast in education: A comprehensive review of the literature. Computers in Human Behavior, (28) 3, 820-831 Kelly, R. (1991). Lexical ignorance: The main obstacle to listening comprehension with advanced FL learners. IRAL, 29, 135-150. Koolstra, C. M. & Beentjes, J. W. J. (1999). Children’s vocabulary acquisition in a foreign language through watching subtitled television programs at home. Educational Technology Research & Development, 47(1), 51–60. Krashen, S. (1985). The input hypothesis. London, England: Longman Kruger, J. & Doherty, S. (2016). Measuring cognitive load in the presence of educational video: Towards a multimodal methodology. Australasian Journal of Educational Technology, 32(6), 19-31. Markham, P. L., Peter, L. A., & McCarthy, T. J. (2001). The effects of native language vs. target language captions on foreign language students’ dvd video comprehension. Foreign Language Annals, 34(5), 439–445. Matter, J. (1989). Some fundamental problems in understanding French as a foreign language. In H.W. Dechert & M. Raupach (Eds.). Interlingual processes. 105-119. Gunter Narr: Tubingen. Ockey, G. (2007). Construct implication of including still image or video in computer-based listening tests. Language Testing, 24, 517–537. Plastina, A.F. (2013). Multimodality in English for specific purposes: Reconceptualizing meaning-making practices. LFE: Revista de Lenguas Para Finas Especificos, 19, 385-410 Purdy, M. (1997). What is listening? In M.Purdy, & Borisoff, Listening in everyday life: A personal and professional approach (pp. 1-20). Lanham: University Press of America. Ruan, X. (2015). The role of multimodal in Chinese EFL studentsautonomous listening comprehension & multiliteracies. Theory and Practice in Language Studies, 5(3), 549-565. Shin, D. (1998). Using videotaped lectures for testing academic language. International Journal of Listening 12, 56-79. Video or audio listening tests for English language teaching context: Which is more… JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 165 Suvorov, R. (2009). Context visuals in L2 listening test: the effects of photograph and video vs audio-only format. In C. Chapelle, H. Jun, &I. Katz, Developing and Evaluating Language Learning Materials (pp. 53-68). Ames: Iowa State University. Suvorov, R. (2014). The use of eye-tracking in research on video-based second language (L2) listening assessment: A comparison of context videos and content videos. Language Testing, 32 (4),463-483. Suvorov, R. (2015). Interacting with visuals in L2 listening test: An eye- tracking study. ARAGs Research Report Online: British Council. Taylor, R. & Geranpayeh, A. (2011). Assessing listening for academic purposes: Defining and operationalizing the academic construct. Journal of English for Academic Purposes, 10, 89-110. Ur, P. (1984). Teaching Listening Comprehension. Cambridge: Cambridge University Press. Vandergrift, L. (2004). Listening to learn or learning to listen? Annual Review of Applied Linguistics, 24, 3-25 Vanderplank, R. (2013). “Effects of” and “effects with” captions: How exactly does watch a tv program with same-language subtitles make a difference to language learners? Language Teaching, 1-16. Vanderplank, R. (2016). The State of the Art I: Selected Research on Listening Comprehension and Vocabulary Acquisition. In Captioned Media in Foreign Language Learning and Teaching, 75-104. Palgrave: Macmillan. Wagner, E. (2008). Video listening tests: What are they measuring? Language Assessment Quarterly, 5/3, 218-243. Wagner, E. (2010). The effect of the use of video texts on ESL listening test- taker performance. Language Testing, 27, 493-513. Wagner, E. (2013). An Investigation of how the channel of input and access to test questions affect L2 listening test performance. Language Assessment Quarterly, 10(2),178-195 Wang, J. & Miao, Y. (2003). Theory and method for EFL listening teaching. Computer-assisted Foreign Language Teaching, 8(2), 1-5. Winke, P., Gass, S., & Sydorenko, T. (2010). The effects of captioning videos used for foreign language listening activities. Language Learning & Technology, 14(1), 65–86. Yang, J. C., & Chang, P. (2014). Captions and reduced forms instruction: The impact on EFL students’ listening comprehension. ReCALL : The Journal of EUROCALL, 26(1), 44-61. Zareaian, G., Adel, S. M. & Noghani, F. A. (2015). The effect of multimodal presentation on EFL Learners' listening comprehension and self- efficacy. Academic Research International, 6(1), 263-271 Clara Herlina Karjo, Menik Winiharti, Safnil Arsyad 166 JOALL (Journal of Applied Linguistics and Literature), 7(1), 2022 THE AUTHORS Clara Herlina Karjo is an Associate Professor at English Department, Bina Nusantara University. She teaches Sociolinguistics, History of English and Research Methods. She obtained her Doctoral Degree in English Applied Linguistics from Indonesian Catholic University of Atma Jaya. Her research interests vary from English phonology, language acquisition, language teaching, translation and discourse analysis. Her research papers have been disseminated in various journals and international conferences. Menik Winiharti is a faculty member at English Department, Bina Nusantara University. She teaches English Grammar, Writing, Syntax, and Research Methods. Her research interests include pragmatics, translation, as well as language skills. She is now pursuing her Doctoral Degree in Linguistics at Universitas Pendidikan Indonesia, Bandung. Her research for dissertation discusses the performance of online machine translation focusing on undergraduate lecturers’ academic writing. Safnil Arsyad is a professor in English Language Education at the English Department of Education Faculty of University of Bengkulu in Bengkulu Indonesia. He has published in many international journals, such as Asia- Pacific Education Researcher, Australian Review of Applied Linguistics, Journal of Multicultural Discourses, Asian ESP Journal, Asian Englishes, Discourse and Interaction, International Journal of Instruction, Language and Linguistics Studies, Studies in English Language Education, International Journal of Language Education and Malaysian Online Journal of Education Management. His research interests are on discourse analysis of academic texts and English teaching and learning materials.