Gist final1.indd


51

A Comparison of the Effect of 
Text-Picture and Audio-Picture 
Annotations in Second Language 
Vocabulary Recall among Iranian 
EFL Learners1 
Una Comparación del Efecto del Uso de  Anotaciones 
de Texto-Imagen y Audio-Imagen para Facilitar la 
Recordación de Vocabulario en Segunda Lengua en 
Estudiantes  Iraníes de Inglés como Lengua Extranjera

Alireza Karbalaei, Ali Sattari and Ziba Nezami 2*
Farhangian University, Kish International Branch, Islamic Azad University, Iran

Abstract
The present study compared the effect of text-picture and audio-picture multi-
media annotations in second language vocabulary recall among Iranian EFL 
learners. The participants were selected from two classes of 80 students who 
were studying advanced-level English at in a language institute in Iran. Their 
level of English proficiency was determined on the basis of their scores on the 
PET proficiency test. Sixty-two students were selected for the main procedure, 
and were then randomly divided into two experimental groups: the text-
annotation and audio-annotation group; and a control group. After answering 
a vocabulary pretest, participants clicked on the highlighted unknown words to 
access available annotations while reading. The text-picture group was able to 
see textual explanation and pictorial description, and the audio-picture group 
was able to see pictorial description explanation and hear a spoken explanation. 
After reading, students completed the post-tests. The results of the study 
demonstrate that audio-picture annotation is more effective than text-picture 

1 Received: February. 28, 2015 / Accepted: January 16, 2016 
2 karbalaei2008@gmail.com / ali.sattari.2014@gmail.com /  nezami_ziba@yahoo.com

Gist Education and Learning Research Journal. ISSN 1692-5777.
No.12. (January - June) 2016. pp. 51-71.

USING A MOBILE APPLICATION (WHATSAPP)

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


52

annotation in facilitating immediate L2 vocabulary recall. The results suggest 
that providing audio or text annotation of new words can help recall of new 
vocabulary when reading. 

Key words: Text annotation, audio annotation, multi-media annotation, 
vocabulary

Resumen
El presente estudio comparó el efecto del uso de anotaciones multimedia de 
texto-imagen y  anotaciones de audio-imagen para facilitar la recordación de 
vocabulario en segunda lengua en estudiantes  iraníes de inglés como lengua 
extranjera. Los participantes fueron seleccionados de dos aulas de clases de 80 
estudiantes de nivel avanzado de inglés de un instituto de idiomas en Irán. El 
nivel de inglés de los participantes se determinó a partir de sus calificaciones 
en la prueba de proficiencia PET. Fueron seleccionados 62 estudiantes para el 
estudio general y posteriormente fueron divididos aleatoriamente en dos grupos 
experimentales: grupo de anotaciones de texto,  grupo de anotaciones de audio 
y un grupo de control. Después de responder la prueba de vocabulario, los 
participantes hicieron clic en las palabras desconocidas para tener acceso a las 
anotaciones disponibles mientras realizaban la lectura. El grupo de estudiantes 
que utilizaron anotaciones de texto- imagen pudo visualizar la explicación  
textual  y la descripción pictórica  y el grupo que utilizó anotaciones de audio-
imagen pudo ver la descripción pictórica y escuchar una explicación oral. Al 
terminar la lectura, los estudiantes finalizaron el examen final. Los resultados 
del estudio demostraron que el uso de anotaciones de audio-imagen es más 
eficaz que el uso de anotaciones texto-imagen para la recordación  inmediata  
de vocabulario en segunda lengua. Los resultados sugieren que el proporcionar 
anotaciones de texto o audio de nuevas palabras puede ayudar a recordar 
vocabulario nuevo al leer.

Palabras claves: Anotación de texto, anotación de audio, anotación 
multimedia, vocabulario

Resumo 
O presente estudo comparou o efeito do uso de anotações multimídia de texto-
imagem e anotações de áudio-imagem para facilitar a recordação de vocabulário 
em segunda língua em estudantes iranianos de inglês como língua estrangeira. 
Os participantes foram selecionados de duas salas de aula de 80 estudantes 
de nível avançado de inglês de um instituto de idiomas no Irã. O nível de 
inglês dos participantes se determinou a partir de suas qualificações na prova 
de competência PET. Foram selecionados 62 estudantes para o estudo geral e 
posteriormente foram divididos aleatoriamente em dois grupos experimentais: 
grupo de anotações de texto, grupo de anotações de áudio e um grupo de 
controle. Depois de responder a prova de vocabulário, os participantes fizeram 
clique nas palavras desconhecidas para ter acesso às anotações disponíveis 
enquanto realizavam a leitura. O grupo de estudantes que utilizaram anotações 
de texto-imagem pôde visualizar a explicação textual e a descrição pictórica, e 

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


53

o grupo que utilizou anotações de áudio-imagem pôde ver a descrição pictórica 
e escutar uma explicação oral. Ao terminar a leitura, os estudantes finalizaram 
a prova final. Os resultados do estudo demonstraram que o uso de anotações 
de áudio-imagem é mais eficaz que o uso de anotações texto-imagem para a 
recordação imediata de vocabulário em segunda língua. Os resultados sugerem 
que o proporcionar anotações de texto o áudio de novas palavras pode ajudar a 
lembrar do vocabulário novo ao ler.

Palavras claves: Anotação de texto, anotação de áudio, anotação 
multimídia, vocabulário

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


54

Introduction

Second language (L2) learners at all levels of ability encounter the problem of learning vocabulary. According to Nation (2001), a native speaker of English is aware of 20,000 word families. 
This poses a challenging task for English as a Second Language (ESL) 
learners. Vocabulary learning has generally been long overlooked 
within the field of Second Language Acquisition (SLA) (Nation, 2001, 
Zimmerman, 1997).

Recent years have observed augmented interest in L2 vocabulary 
research. According to Gass (1999), one relevant discussion is 
between incidental and intentional vocabulary learning. The difference 
between the two learning conditions involves the learning task, learner 
attention and the instructional context of the learning (Read, 2004). 
Both approaches have been found to aid the gradual learning of L2 
vocabulary (Hulstijn, 2001). Second language research has also treated 
incidental vocabulary learning through reading (Nation, 2001). As 
Jacobs et al. (1994) states, this conforms to L2 learners’ reports that 
vocabulary learning happens, in most cases, accidentally during reading 
or listening. However, L2 incidental vocabulary learning tends to be 
incremental and slow.

Jacobs, Dufon & Fong (1994), Joyce (1997) describe how 
annotation has been a standard characteristic in L2 reading, which aids 
in simplifying comprehension, and in which L2 vocabulary learning 
happens as a by-product. As an instructional intermediation, a note 
draws learner attention away from reading, and concentrates it on the 
form and meaning of the word, thus raising vocabulary learning and 
reading comprehension. This reflects the interactionist view of SLA 
(Long, 1996) and the depth of processing hypothesis. Rott, Williams, 
and Cameron (2002) and Watanabe (1997) describe how studies on 
the influences of text notation on L2 vocabulary learning and reading 
comprehension have produced mixed findings. Al-Seghayer (2001) 
stated that different from the traditional marginal notation, multimedia 
notations can draw vocabulary information in multiple modalities, 
including audio (sound) and visual (text, picture and video). 

Studies have also taken into consideration the effects of various 
kinds of multimedia notations on incidental L2 vocabulary learning, 
specifically, the utilization of picture notation and video notation 
accompanied with text notation (Al-Seghayer, 2001). According to 
Paivio (1990), these studies scaffold double-coding theory and accept 
the cognitive theory of multimedia learning (Mayer, 2001), which 
describes how meaningful learning involves learners in both verbal 
and visual cognitive processing systems. Yoshii (2000, as cited in Al-

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


55

Seghayer, 2001) represented double notation of text and picture or text 
and video are unanimously discussed to be better than single notations 
in simplifying incidental L2 vocabulary learning .

In addition, Svenconis and Kerst (1995) and Yeh & Wang (2003) 
stated that research suggests that that the increase of an audio element 
to dual annotations does not seem to have a deterministic effect on L2 
vocabulary learning. One possible illustration is that the information 
sent at the same time through different modalities (audio, verbal and 
visual) might excess the cognitive processing.

An overview of the studies on L2 vocabulary annotation, 
particularly multimedia annotation, offers that there is little information 
about how different dual annotations, specifically text-picture and 
audio-picture annotations, influence L2 vocabulary learning and reading 
comprehension in Iranian EFL context. This information is required to 
recognize the extent to which multimedia learning can be utilized in L2 
reading teaching and the role of multimedia in L2 vocabulary learning 
in Iranian EFL context. The slight information on audio annotation in 
multimedia L2 learning in comparison to other multimedia annotations 
verdicts more examination. Furthermore, incidental and intentional 
vocabulary learning in a multimedia environment has never been 
studied. The influences of multimedia double notation using different 
modalities on L2 vocabulary learning and reading comprehension in 
incidental and intentional learning conditions remain unclear.

Motivated by prior studies on multimedia annotation and 
available slots in this literature, the overarching question considered in 
this study was how different dual annotations influence L2 vocabulary 
recall and reading comprehension in both incidental and intentional 
environments. This study was designed to expand our indelibility of 
the use of multimedia leaning in a second language acquisition setting 
through the framework of cognitive theory of multimedia learning to 
second language vocabulary learning and reading comprehension. It 
investigated the ways in which two different types of dual annotation, 
namely, text-picture and audio-picture, influenced L2 vocabulary 
learning and reading comprehension. Furthermore, it noticed the 
influential of multimedia annotation on L2 students’ vocabulary 
learning in both incidental and intentional learning conditions.

This study was guided by the following research questions: 1) 
Does text-picture annotation play any significant role in facilitating 
L2 vocabulary immediate recall among Iranian EFL learners? 2) Does 
audio-picture annotation play any significant role in facilitating L2 
vocabulary immediate recall among Iranian EFL learners? and 3) Is 
there any significant difference between the effect of text-picture and 

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


56

audio-picture annotations in facilitating L2 vocabulary immediate 
recall among Iranian EFL learners? 

Literature Review

Text Annotation and L2 Vocabulary Learning

In printed reading materials, text annotations are often placed in 
the margin, at the bottom, or at the end of the reading text. In multimedia 
texts, when students click on an annotated word, they can observe the 
meaning of the word in a certain place of the computer screen. In this 
part, debate will be first concentrated on text annotation in printed 
reading texts, followed by a review of text annotation in multimedia 
texts. 

Hulstijn, Hollander, and Greidanus (1996) examined incidental 
vocabulary learning for second language learners. Their study showed 
the utilization of marginal text annotation as an influential method. 
Other studies accepted that text annotation in printed reading text could 
reinforce second language learners’ retention of vocabulary (Hulstijn, 
1992). 

Dufon and Hong’s (1994) study on L2 Spanish reading used 
three formats: (1) no gloss, (2) L1 gloss and (3) L2 gloss. Their results 
demonstrated that students who had access to glosses did better than 
students without glosses on the immediate vocabulary translation post-
test. Therefore, the effectiveness of gloss was not discovered in the 
delayed post-test four weeks later. Due to this, Jacobs et al. noticed that 
although gloss is preferred over no gloss, gloss only has a potentially 
positive effect on vocabulary acquisition with sufficient L2 competence. 
Furthermore, certain proficiency level was requisite to make effective 
use of L2 gloss. In conclusion, the positive relation between gloss and 
vocabulary learning was held, at least for immediate retention if not for 
long-term retention.

In order to examine the possible distinction between L1 and 
L2 glosses, Ko (1995) utilized a similar design as Jacobs, Dufon 
and Hong (as cited in Al-Seghayer, 2001) with 189 Korean college 
students learning English as a foreign language (EFL). Students took a 
vocabulary pre-test and were asked to read an 854-word English text. 
Contrary to Dufon and Hong (1994), the multiple-choice vocabulary 
post-test immediately after reading displayed important difference 
between L1 and L2 gloss. In other words, students with access to L1 
gloss significantly outperformed those with access to L2 gloss. The 
effect was found significant in the delayed post-test one week later.

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


57

The effectiveness of L2 over L1 gloss in vocabulary retention 
was also challenged by Laufer and Shmueli’s (1997) study. Hebrew-
speaking high school EFL students (N=128) were asked to read an 
English text in which 10 target words were glossed in Hebrew and 
another 10 in English. Multiple-choice assessment was used in both 
the immediate and delayed post-test five weeks later. Both tests showed 
that L1 gloss resulted in more vocabulary retention than L2 gloss. This 
conflicts with the finding by Jacobs et al. (1994), but students’ level of 
the second language in Laufer and Shmueli (1997) might be used as an 
explanation. Certain proficiency of the second language was necessary 
to fully utilize glosses in L2.

Picture Annotation

Visual assists have long been hypothesized to profit second 
language learning. Tuttle (as cited in Omaggio, 1979) discussed that 
“foreign language students can profit from many kinds of visual 
material to be a rich resource in the foreign language classroom” (p. 9). 
The use of imagery display of foreign words by real objects or imagery 
was also displayed by Kellogg and Howe (1971) to facilitate children’s 
vocabulary acquisition in a foreign language.

A number of researchers have also investigated the effect of visual 
stimuli on L2 vocabulary learning and reading comprehension. Kellogg 
and Howe’s (1971) study contrasted written words and pictures as 
key words for oral acquisition of Spanish vocabulary by children. 
The pictures produced faster learning of new words than the written 
stimuli. The effect was also kept in the long term as displayed by greater 
recall of words represented in pictures. Terrel (as cited in Kost et al, 
1999) suggested that combining the form and visual representation of 
unknown L2 vocabulary assisted learners to acquire concrete ideas and 
references. In reviewing the techniques used in learning L2 vocabulary, 
Oxford and Crookall (1990) expressed the effectiveness of visual 
imagery and maintained that “most learners link new information to 
notions in memory by means of meaningful visual images, and that 
visual images make learning more influential” (p. 17) and “the pictorial-
verbal combination contains many sections of the brain, thus providing 
greater cognitive power” (p. 17).

Omaggio’s (1979) study focused on pictorial contexts to French 
as a second language students such as advanced organizers. It was 
assumed that the preparation of the other visual context would simplify 
reading comprehension. The outcomes represented that students with 
a pictorial context did significantly better on the recognition test and 

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


58

recall than those with access only to text. This supported evidence of 
the positive effect of pictures on reading comprehension.

In annotation studies, picture annotation has been used to clarify 
the meaning of those unknown words second language learners 
encounter in reading. According to dual coding theory, the way learners 
comprehend pictures differs greatly from that of comprehending textual 
information (Paivio, 1990). In other words, text is processed by the 
verbal cognitive subsystem, while a picture is processed by the non-
verbal cognitive subsystem. Research has compared L2 vocabulary 
learning from text annotation, picture annotation, and a combination of 
text and picture annotation.

Audio Annotation

It is worth noting that little research has been done on audio 
annotation. Audio annotation gives pronunciation, a sample sentence, 
and definition or meaning of a target word in spoken form. It has never 
been studied separately from other annotation modes, but mostly as an 
additive component. The only format in which audio annotation has 
been studied is the pronunciation of target words. Findings on audio 
annotation are rather mixed and uncertain. 

Svenconis and Kerst (1995) investigated the effectiveness of 
semantic mapping techniques in L2 vocabulary learning in a hypertext 
environment. The participants (N=48) were English-speaking high 
school students in grades 9 through 12 learning Spanish as a second 
language. The 72 target words were presented in word listing and 
semantic mapping. In the multiple-choice vocabulary post-test, no 
significant effect was found for the word presentation method, which 
suggested that semantic mapping does not necessarily lead to better 
vocabulary retention than the traditional word listing method. But the 
group of semantic mapping with sound produced the highest overall 
mean score, higher than the other three groups. 

Chun and Plass (1996) challenged the positive effect of audio 
annotation. In their studies, an audio component was added to three 
different annotations types (text, text-picture, and text-video); that is, 
a German native speaker pronounced each target word. Of the three 
successive studies, participants from the first and second studies were 
asked to report their use of retrieval cues for vocabulary learning. The 
authors suggest that the audio component was not useful in learning 
vocabulary since it showed very limited importance as a retrieval cue.

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


59

Methodology

Research Design

Training program. The interactive multimedia program used 
in this study was designed by the researcher to help intermediate EFL 
students with vocabulary learning. The program provided students with 
annotations for unknown words via hypermedia links in two different 
modes: text-picture and audio-picture. The annotations were used to 
assist the learning of unknown words. The program was written in 
HTML. HTML was chosen as the programming language due to its 
user-friendly integration of hypermedia and its compatibility for both 
PC platforms. The picture annotations were processed with Adobe 
Photoshop 6.0 (Adobe, 2000), and the audio clips were processed with 
Vegas 4.0 (Sonic Foundry, 2003). The screen was divided into two 
frames. The left screen was used for the reading text with the title at 
the top, and the right screen was reserved for the annotation. In the 
text-picture version, when participants click on a highlighted word, 
the right screen offers a textual definition of the words together with a 
picture that describes the word. In the audio-picture annotation, when 
participants click on a highlighted word, they could see on the right 
screen a picture that depicts the meaning of the word and hear an audio 
clip that explains the meaning of the word. 

Procedure. The study was conducted during the participants’ 
regular class time, and required two consecutive 50-minute sessions. 
The participants were randomly assigned to a control and two 
experimental groups: text-annotation and audio-annotation groups. In 
the first 50-minute session, the researcher first gave a brief introduction 
of the study and answered any questions that the participants might have. 
Then, two neighboring students had access to different annotations, 
one text-picture and the other audio-picture. In the computer lab, the 
researcher gave a brief introduction of the online reading activity. 
Headsets were used for those who were in the audio-picture annotation 
group. During reading, the participants clicked the highlighted 
unknown words to access available annotations. The text-picture group 
was able to see textual explanation and pictorial description, and the 
audio-picture group was able to see pictorial description explanation 
and hear a spoken explanation. When they finished reading, they raised 
their hands to receive the post-tests.

Participants

The participants in the study were selected from two intact classes 
consisting of 80 students studying English in at advance level in Bandar 

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


60

Abbas, Iran. They had a mean age of 24 and had been studying English 
Translation as their field of study. Their level of English proficiency 
was determined on the basis of their scores on the PET proficiency 
test. Based on the results of PET proficiency test, those participants 
placed between one standard deviation above and below the mean were 
regarded as the main participants. Finally, 62 students were selected for 
main procedure and data analysis based on the research question. Then 
they were randomly selected to two experimental groups including 
text-annotation group and audio-annotation group, and a control group. 
Because some of the students were absent during the implementation 
of one of the tests, they were excluded from the main subjects resulted 
in 38 participants in the respective experimental groups and 20 in the 
control group. 

Data Collection Instruments 

General English Proficiency Test. The PET proficiency test 
was utilized to assess the subjects’ level of proficiency in English. This 
test included 30 multiple-choice vocabulary, grammar, and reading 
comprehension items. The researcher piloted the test with 27 students 
with the same level and similar characteristics to those of subjects of this 
study. It should be mentioned that the reliability of PET proficiency test 
estimated by KR-21 (Kudar Richardson) formula appeared to be .69.

Reading material. The reading text, “European Settlers of 
Australia,” was written by the researcher based on three criteria: text 
length, syntactic complexity, and content. In terms of length, the text 
has 449 words (including the title). It consists of short, uncomplicated 
sentences and simple past tense is used throughout the text. There 
is an average of 6.8 sentences in each paragraph, and an average 
sentence contains 10.8 words. The percentage of simple sentences in 
the text is over 80%. With regard to the content, it seems reasonable 
to assume that ESL students knew more or less the same amount of 
general information about the European colonization of Australia and 
have comparable background knowledge of the reading text (i.e., since 
none has been to Australia and its history is foreign to all participants). 
The content of the text does not require any specific culturally related 
knowledge. The readability of the text is considered to be between grade 
level 5 and 6 based on the Flesch-Kincaid measure. It tells of the story 
of the European colonists in Australia in the 1800s. The text was given 
to experienced EFL instructors who teach reading/writing classes and 
was confirmed to be appropriate for advance  students. The student’s 
cloze score of 67% indicate that the reading text was appropriate for 
advanced  students in terms of difficulty level.

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


61

Target words. The 20 target words were all nouns. They were 
selected for frequency. Based on the word frequency corpora of Francis 
and Kucera (1982), the 20 target words have a mean of 12.7 per million 
words. The reading text was modified into two different forms: a text 
with text-picture annotations, and a text with audio-picture annotations. 
The 20 target words were highlighted in both texts.

Word Recognition Test (WRT): The participants were asked to 
complete a Word Recognition Test (WRT) as pretest at the beginning of 
the study. In this test, the 20 target words were presented in their original 
context taken from the reading text. For each word, the participants 
were asked to choose one correct meaning out of four given choices. 
Of the four choices, one was the correct meaning, and the other three 
were distractors.

Data Analysis and Interpretation 

Research question 1: Does text-picture annotation play any 
significant role in facilitating L2 vocabulary immediate recall among 
Iranian EFL learners?

In order to see whether we are able to use t-test as a parametric test, 
first we should check whether the data have been normally distributed 
or not. If the level of significance is more than 0.05, it indicates the 
normality of data distribution. Therefore, we can use parametric test for 
further data analysis.

Table 1. One-Sample Kolmogorov-Smirnov Test for text-annotation, 
audio-annotation and control group

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


62

As it is evident from Table 1, the result of normality test shows that p 
values of three groups (.229, .112, and .123) are more than significance 
level (0.05). Therefore, we can accept the assumption of normality and 
we can use paired sample t-test for comparing the results of pretest and 
posttest in text-annotation, audio-annotation and control group.

Table 2. Paired sample test for pre- and posttest vocabulary knowledge 
scores for samples in in text-annotation, and control group 

As is evident from Table 2, there is a significant difference 
between pre- and posttest in text-picture annotation group in Iranian 
EFL context (t=-20.407; P= .000). In other words, participants scored 
higher in posttest (M=10.39, SD=1.680), when they were exposed to 
text annotation during their reading, than pretest (M=4.33, SD= 1.680). 
With respect to this point, the first hypothesis (text-picture annotation 
does not play any significant role in facilitating L2 vocabulary 
immediate recall among Iranian EFL learners) is rejected. In other 
words, text-picture annotation could play a significant role in learning 
new vocabularies during reading text. Regarding the control group, 
there is no significant difference between the students’ vocabulary 
knowledge during pre and posttest (t=1.234; Sig= .232).

Research question 2. Does audio-picture annotation play any 
significant role in facilitating L2 vocabulary immediate recall among 
Iranian EFL learners?

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


63

Table 3. Mean pre- and posttest of vocabulary knowledge scores for 
samples in audio picture annotation and control group

As is evident from Table 4, there is a significant difference 
between pre- and posttest in audio-picture group in Iranian EFL 
context (t=-16.496; P= .000) when they were exposed to audio-
picture annotation during reading.  Further, it is clear from Table 3 that 
students learned new words better when they were exposed to audio-
picture annotation (posttest) than the time they were not exposed to 
(pretest) (Mean=12.55 and 4.25, respectively). Therefore, the second 
hypothesis (Audio-picture annotations does not play any significant 
role in facilitating L2 vocabulary immediate recall among Iranian EFL 
learners) is also rejected. In other words, audio-picture annotation could 
play a significant role on increasing  adult EFL learners’ vocabulary 
knowledge. As far as the control group is concerned, as it is observed 
from Table 4, there is no significant difference between the students’ 
performance in vocabulary knowledge after reading the text without 
any kind of annotation (t=1.234; P= .232).

Table 4. Paired sample test for pre- and posttest vocabulary knowledge 
in audio-picture and control group

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


64

Research question 3: Is there any significant difference between 
the effect of text-picture and audio-picture annotations in facilitating 
L2 vocabulary immediate recall among Iranian EFL learners?

In order to answer the third question, the vocabulary posttest in 
text-picture, audio-picture and control group were computed and then 
ANOVA was used to see whether there was any significant difference 
among the three groups in posttest stage. The following tables show the 
results: 

The results of data analysis (ANOVA) in Table 5 below indicates 
that there is a statistically significant difference between text-picture 
group, audio-picture group and control group in the results of posttest 
because obtained F value of 181.376, was found to be significant at 
.001 level (P=.000). In other words, the third null hypothesis (There is 
no significant difference between the effect of text-picture and audio-
picture annotations in facilitating L2 vocabulary immediate recall 
among Iranian EFL learners.) is confirmed. 

Table 5. Results of ANOVA for mean posttest scores of samples in text-
annotation, audio-annotation, and control group

 
In order to see where the difference stands, the post hoc Scheffe 
test (see Table 6) showed that the audio-picture group performed 
significantly better than text-picture group (Mean=12.55 vs. 
Mean=10.39). Finally, text-picture group performed significantly better 
than control group (Mean =3.78 vs. 10.39). The results indicated that 
the scores of audio-picture group increased at a significantly higher 
rate than the text-picture and control group. As a result, audio-picture 
annotation was recognized to be the best method for learning new 
words during reading comprehension text.

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


65

Table  6. Post hoc Scheffe Test

Results

On the basis of the quantitative analyses, annotation provides 
an efficient way for learners to expand their vocabulary knowledge. 
Annotation can promote noticing of the target form, in semantic 
processing.

The first and second research questions addressed the effect of text 
type and audio type annotations. On average, participants retained 70% 
of the 20 target words on the Vocabulary Knowledge Scale. The average 
retention rates were comparable to previous multimedia annotation 
studies (e.g. Al-Seghayer, 2001; Yoshii, 2000). The results confirmed 
the impact of annotation in helping second language vocabulary 
learning. According to Paivio (1990), the main reason can be related 
to the dual-coding effect that words annotated with both verbal (text 
or audio) modes of information lead to effective vocabulary retention.

This study was designed to compare the effectiveness of text-
picture annotation with audio-picture annotation on L2 vocabulary 
immediate recall. As shown by the above table, the audio-picture 
annotation group consistently outperformed the text-picture annotation 
group. The dual channel assumption, especially the modality principle, 
of the cognitive theory of multimedia learning can be used to explain this 
finding (Mayer, 2001). Mayer distinguishes the two separate channels 
for processing visual/pictorial and auditory/verbal information. 
The modality effect articulates that working memory has partially 
independent processors for handling visual and auditory information. 
The effective capacity of working memory could be increased by using 
both visual and auditory channels (Mayer & Moreno, 1998).

Text annotation and audio annotation are both verbally-presented 
information; thus both annotations contain a combination of verbal and 
non-verbal information. Based on the modality principle (Baddeley, 
1999; Mayer, 2001), text annotation and picture annotation will be 

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


66

processed by the visual channel, while audio annotation will be processed 
by the auditory channel. Therefore, in text-picture annotations, the 
simultaneous register of both text and picture caused the visual channel 
to be overloaded. This led to an information processing that was, at least 
initially, carried out solely in the visual working memory. Thus, the 
cognitive resources available in the visual working memory had to be 
divided between textual and pictorial information, whereas the auditory 
(phonological) working memory was left unused.

In comparison, in audio-picture annotations, the audio was 
registered by the auditory channel and processed in the phonological 
working memory, while the picture was registered by the visual 
channel and processed in the visual working memory. This combination 
allowed cognitive resources in both working memories to be used. In 
other words, more cognitive resources were utilized in audio-picture 
annotations than in text-picture annotations.

The preference of audio-picture annotation on L2 vocabulary 
immediate recall can also be explained with the split-attention principle 
(Mousavi, Low, & Seller, 1995). Participants with access to text-picture 
annotations had to split their attention in the visual working memory 
between multiple visual resources (written text and picture). Participants 
with access to audio-picture annotations approached the audio as an 
auditory resource and the picture as a visual resource through auditory 
working memory and visual working memory respectively, which did 
not require an attention split in either of the working memories. In 
this way, effective working memory might be increased by presenting 
information in a mixed (visual and auditory) rather than a unitary 
mode (visual only). Hence, audio-picture annotation resulted in higher 
vocabulary immediate recall than text-picture annotation.

Conclusions

Previous studies have examined the effects of multimedia 
annotations on L2 vocabulary learning. These studies have supported 
the effectiveness of multimedia annotations in facilitating L2 
vocabulary learning. However, no study in second language acquisition 
has examined audio annotation in combination with text as a dual 
multimedia annotation type. This study focused on this issue by 
comparing audio-picture annotation to text-picture annotation in their 
effects on L2 vocabulary immediate recall.

The results of the study demonstrate that audio-picture annotation is 
more effective than text-picture annotation in facilitating L2 vocabulary 

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


67

immediate recall. The results suggested that providing the new words 
whether in audio and text annotation during reading comprehension 
can help recalling new words. Some scholars investigated the effect 
of presenting different words and the results were inconsistent. For 
example, McKeown (as cited in Read, 2004) suggests that current 
dictionary definitions are not effective even in initiating the process of 
understanding word meaning, at least for younger learners. Nagy and 
Scott (as cited in Read, 2004) indicated a chief strength of definitions 
because they provide explicit information about word meanings that 
is normally only implicit in context; therefore, if a student is to learn 
a word, giving the specific meaning of a word may provide the best 
chance for competence. It is possible that older students may have a 
better understanding of how explicit definitions work and how to 
manipulate the meaning into other contexts.

It is important to note that students will need to be prepared 
to read and use weak or insufficient clues to unlock the meaning of 
new words in a variety of texts. This study suggests the need to allow 
more instructional time to support different types of annotation and to 
identify stories with well-developed clues so that students can develop 
a repertoire of different strategies to unlock the meaning of words in the 
different contexts in which the words are encountered.

It is hoped that the findings of this study will shed some light 
on blurred issues of text annotation and audio annotation and its 
effect on reading comprehension performance. Regarding theoretical 
implications, the findings of this study suggest a number of implications 
and extensions for the classroom. Firstly, this study adds to the growing 
body of research in multimedia annotation studies in second language 
acquisition. Previous multimedia annotation studies have focused on 
the comparison of text-picture annotation to text-only annotation or 
picture only annotation (Yoshii, 2000) or on the differences between 
text-picture annotation and text-video annotation (Al-Seghayer, 2001). 
However, audio annotation, as a different sensory modality from visual 
(text, picture), has never been studied before. The present study fills this 
gap in the literature.

This study provided the much-needed information on the effect 
of audio annotation on L2 vocabulary learning. By comparing audio-
picture annotation to text-picture annotation, it shed light on the use 
of different dual annotations for multimedia L2 learning. The thesis 
has established that audio-picture annotation is superior over text-
picture annotation in facilitating L2 vocabulary immediate recall. This 
contributes to the extension of the cognitive theory of multimedia 

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


68

learning to second language learning by verifying both the modality 
effect and split-attention effect. 

In addition to the contributions and implications for the field 
of second language acquisition, especially in the area of multimedia 
annotation research, this study provides some insights for CALL 
material designers in choosing the right combination of modalities in 
facilitating L2 vocabulary learning. This study confirmed that the use of 
audio-picture combinations facilitates L2 vocabulary immediate recall 
in a more effective manner than text-picture annotation. In designing 
multimedia courseware or materials, this finding could be taken into 
consideration when making decisions about presenting information 
in different modes. This could also inform language teachers and 
administrators in making decisions about the most effective multimedia 
programs to enhance L2 vocabulary learning.

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


69

References

Al-Seghayer, K. (2001). The effect of multimedia annotation modes on 
L2 vocabulary acquisition: A comparative study. Language Learning 
& Technology, 5(1), 202-232.

Baddeley, A. D. (1999): Working memory (Oxford Psychology Series, 
No 11). New York: Oxford University Press.

Chun, D. M., & Plass, J. L. (1996). Effects of multimedia annotations 
on vocabulary acquisition. The Modern Language Journal, 80(2), 
183-198.

Gass, S. (1999). Incidental vocabulary learning. Studies in Second 
Language Acquisition, 21(2), 319-333.

Hulstijn, J. H. (2001). Intentional and incidental second language 
vocabulary learning: A reappraisal of elaboration, rehearsal and 
automaticity. In P. Robinson (Ed.), Cognition and second language 
instruction (pp. 258-286). Cambridge University Press.

Hulstijn, J. H., Hollander, M., & Greidanus, T. (1996). Incidental 
vocabulary learning by advanced foreign language students: The 
influence of marginal glosses, dictionary use, and reoccurrence of 
unknown words. The Modern Language Journal, 80(3), 327-339.

Jacobs, G. M., Dufon, P., & Hong, F. C. (1994). L1 and L2 vocabulary 
glosses in L2 reading passages: Their effectiveness for increasing 
comprehension and vocabulary knowledge. Journal of Research in 
Reading, 17(1), 19-28.

Kellogg, K. S., & Howe, A. J. A. (1971). Using words and pictures in 
foreign language learning. Alberta Journal of Educational Research, 
17, 87-94.

Ko, M. H. (1995). Glossing in incidental and intentional learning foreign 
language vocabulary and reading comprehension. Unpublished MA 
thesis, University of Hawaii at Manoa.

Kost, C. R., Foss, P., & Lenzini, J. J. Jr.  (1999). Textual and pictorial 
glosses: Effectiveness on incidental vocabulary growth when reading 
in a foreign language. Foreign Language Annals, 32(1), 89-113.

Laufer, B., & Shmueli, K, (1997). Memorizing new words: Does 
teaching have anything to do with it? RELC Journal, 28, 89-108.

Mayer, R. E. (1997). Multimedia learning: are we asking the right 
questions? Educational Psychologist, 32, 10-19.

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


70

Mayer, R. E. & Moreno, R. (1998). A split attention effect in multimedia 
learning: Evidence for dual processing systems in working memory. 
Journal of Educational Psychology, 90, 312-320.

Mousavi, S., Low R. & Sweller, J. (1995). Reducing cognitive load 
by mixing auditory and visual presentation modes. Journal of 
Educational Psychology, 87, 319-334.

Nation, I. S. P. (2001). Learning vocabulary in another language. 
Cambridge: Cambridge University Press.

Omaggio, A. C. (1979). Pictures and second language comprehension: 
Do they help? Foreign Language Annals, 12(2), 107-116.

Oxford, R., & Crookall, D. (1990). Vocabulary learning: A critical 
analysis of techniques. TESL Canadian Journal, 7(2), 9-30.

Paivio, A. (1990). Mental representation: A dual-coding approach. 
Oxford, UK: Oxford University Press.

Read, J. (2004). Research in teaching vocabulary. Annual Review of 
Applied Linguistics, 24(1), 146-161.

Rott, S., Williams, J., & Cameron, R. (2002). The effect of multiple-
choice L1 glosses and input-output cycles on lexical acquisition and 
retention. Language Teaching Research, 6(3), 183-222.

Svenconis, D. J., & Kerst, S. (1995). Investigating the teaching 
of second-language vocabulary through semantic mapping in a 
hypertext environment. CALICO Journal, 12 (2/3), 33-57.

Zimmerman, C. B. (1997). Do reading and interactive vocabulary 
instruction make a difference? An empirical study. TESOL Quarterly, 
57(1), 121-140.

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)


71

Authors 

*Alireza Karbalaei earned his PhD in TEFL from Mysore 
University in India. He is a faculty member of Farhangian 
University in Iran and Head of the English Department at 
the Kish International Branch. He is working as an editorial 
member of different international journals. His main research 
areas include reading strategies, affective variables, language 
acquisition and learning, TEFL, and TESL. Dr. Karbalaei has 
published 60 papers on these subjects in various journals.

 
*Ali Sattari obtained his MA in TEFL in the Qeshm 
International Branch. He has been teaching English at high 
school in Hormozgan Province.

*Ziba Nezami obtained her MA in TEFL in Maraghe Azad 
University. She is the translator of the Nursing and Midwifery 
Department of Tabriz University. She is currently is teaching at 
different English language institutes in Tabriz. 

A COMPARISON OF THE EFFECT OF TEXT-PICTURE KARBALAEI

                No. 12 (January - June 2016)     No. 12 (January - June 2016)