ENGLISH REVIEW: Journal of English Education  p-ISSN 2301-7554, e-ISSN 2541-3643  

Volume 10, Issue 2, June 2022  https://journal.uniku.ac.id/index.php/ERJEE 

431 

HOW TO TEACH ENGLISH CONVERSATION? AN 

IMPLEMENTATION OF A MULTIMODAL DISCOURSE ANALYSIS 

THROUGH IMAGES 
 

Partohap Saut Raja Sihombing 
Faculty of Teacher Training and Education,  

Universitas HKBP Nommensen Pematangsiantar, Indonesia 

Email: partohap.sihombing@uhn.ac.id 

 
Herman (Corresponding author) 
Faculty of Teacher Training and Education,  

Universitas HKBP Nommensen, Medan, Indonesia 

Email: herman@uhn.ac.id 

 
Nanda Saputra 
STIT Al-Hilal Sigli, Indonesia 

Email: nandasaputra680@gmail.com 

 
APA Citation: Sihombing, P. S. R., Herman., & Saputra, N. (2022). How to teach English conversation? An 

implementation of a multimodal discourse analysis through images. English Review: Journal 

of English Education,      10(2), 431-438. https://doi.org/10.25134/erjee.v10i2.6244 

 
Received: 27-02-2022        Accepted: 24-04-2022                        Published: 30-06-2022 

 
INTRODUCTION 

Language is a human specialized instrument in 

conveying thoughts, sentiments and explanations. 

All in all, people can not be isolated from language 

since people in each movement will require 

language as something critical in carrying on with 

their life processes (Herman, Van Thao, & Purba, 

2021; Van Thao & Herman, 2021). Language is not 

just as verbal language, to be specific 

communicated in and composed language, yet in 

addition nonverbal language like movement, sound, 

items, colors, etc. In this correspondence, these two 

sorts of language assume a nearly adjusted part on 

the grounds that by depending on verbal language 

alone disregarding nonverbal language 

comprehension of something will be restricted 

(Purnaningwulan, 2015). As confirmed by 

Hutabarat, Herman, Silalahi, & Sihombing (2020), 

verbal language alone without all movement, sound, 

shading, and material articles restricts' how one 

might interpret the intricacy of a collaboration and 

interactional importance and can give limits on 

correspondence. To pass on a message, people do 

different ways, for example, through addresses, 

Abstract: The purpose of this research is to identify how is the role of images implementation in teaching 

English conversation. Rapid technological developments increasingly highlight the use of multimodality theory. 

Multimodal in this case has a metafunction. Multimodal is now used as a new learning resource that can be used 

in the learning process. This multimodal aims as an evolving approach to knowledge in visual or image sources. 

This study used a qualitative descriptive method. In this study, an analysis of multimodal literature is carried out, 

namely through pictures of English conversations, these images are then used in learning about conversations in 

English. Three steps of representational metafunction, an interpersonal metafunction, and a compositional 

metafunction were used to analyze the multimodality of images used as learning media in learning English. The 

results obtained are that there is an increase in students' understanding when using images as learning media in 

conversational material in English. Based on the analysis of three multimodal components, namely the 

representational, interpersonal, and compositional, it is possible to conclude that the image of the English 

conversation employed leads to adaptation to the qualities of the kid as a learner. Children enjoy animated 

characters, and the use of color and animation in drawings is designed to pique pupils' interest in participating in 

learning by displaying images of discussions and encouraging them to practice them. 

Keywords: multimodal analysis; English conversation; learning; images. 

 
Partohap Saut Raja Sihombing, Herman, & Nanda Saputra 

How to teach English conversation? An implementation of a multimodal discourse analysis through images 
 

432 

addresses, declarations, signs or images, ads, etc. 

This multitude of exercises surely require an 

apparatus, to be specific language. In relational 

associations, Sinar states that there are three 

significant components that participate in 

correspondence, in particular: verbal, sound or 

sound (communicated in language) or chart 

(composed language), and visuals. Verbal language 

is communicated in and composed language, while 

the result part of verbal language is sound or sound, 

and composing is diagram (Anstey & Bull, 2019). 

Visual association is nonverbal language which 

incorporates motions, non-verbal communication, 

etc. The three components of relational connection 

now and again have various degrees of job, 

however here and there they have a decent degree 

of job in passing on messages. Perusing abilities 

involving components of chart or composed 

language in it (Martinec, 2015). Dominating the 

abilities to have the option to comprehend the 

message or message in the perusing is significant. 

Snow states that youngsters who have perusing 

hardships not just will generally battle all through 

their school professions, yet in addition experience 

challenges in work, social working in the public 

arena, and different parts of day to day existence. 

Consequently, understanding the multimodal 

approach in language abilities, particularly 

understanding texts, is something essential to 

dominate, particularly in learning (Purba & 

Herman, 2020). Multimodal learning can be applied 

to understudies and understudies who as of now 

comprehend innovation as a component of regular 

daily existence (Ngongo & Ngongo, 2022). 

Characterizes multimodal as all verbal and 

visual semiotic sources that can be utilized to 

understand the sorts and levels of dialogical 

association in a reading material (Herman, Murni, 

Sibarani, & Saragih, 2019). With regards to 

message examination, multimodal is perceived as 

an investigation that consolidates the apparatuses 

and steps of semantic examination, like 

fundamental utilitarian phonetics (SFL) or useful 

language with logical devices to get pictures, when 

the message being dissected utilizations two modes, 

verbal and picture (Juliswara, 2017). Multimodality 

is certainly not another peculiarity. Baldry and 

Thibault (see that we live in a multimodal society. 

Individuals of this period will encounter the world 

multimodally and thusly, make importance from 

their encounters multimodally utilizing language, 

pictures, motions, activities, sounds and different 

assets. Bilfaqih & Qomarudin (2017) clarify that 

they accept that practically speaking, texts of 

assorted types are consistently multimodal, using, 

and consolidating, the assets of different semiotic 

frameworks to work with nonexclusive (for 

example standard) and explicit for example 

individual, and, surprisingly, inventive, approaches 

to making meaning. Innovation, both in giving 

relative simplicity in text creation and pervasive 

access in text utilization, likewise emphasizes the 

multimodal idea of text.  

This study about teaching English conversation 

through images based on Multimodal Discourse 

Analysis has been never conducted by any other 

researchers before. But, the researchers tried to look 

for other previous study related to this research. The 

previous study of increasing students’ conversation 

skill has been conducted by Syafiq, Rahmawati, 

Anwari, & Oktaviana (2021). The research 

discussed about one alternative solution for 

teaching speaking during pandemic is to use a 

YouTube video. This study seeks to discover the 

utilization of YouTube videos to improve students' 

speaking skills, as well as how the teaching and 

learning process utilizing YouTube videos is 

implemented in the classroom. In 2020, 

Muhammadiyah University of Kudus first semester 

college students participated in a classroom action 

research. This study's population consisted of all 

non-English programs, and the samples consisted of 

85 students from redundant classes who were 

chosen via purposive sampling. The data was 

collected using a speech evaluation and an 

interview, and it was then analyzed using the 

constant comparative method and descriptive 

statistics. This study found that using YouTube 

videos as English learning material enhanced 

students' speaking skills in terms of fluency, 

vocabulary, pronunciation, grammar, and content. 

As a result, it is possible to conclude that the usage 

of YouTube videos might improve students' 

speaking skills while online learning in pandemic. 

Covid-19. Further research could focus on the use 

of YouTube videos to teach other English skills 

such as reading and writing. 

 
METHOD 
The method used in this research is qualitative with 

descriptive nature. According to Purba, Sibarani,  

Murni, Saragih, & Herman (2022), the research 


ENGLISH REVIEW: Journal of English Education  p-ISSN 2301-7554, e-ISSN 2541-3643  

Volume 10, Issue 2, June 2022  https://journal.uniku.ac.id/index.php/ERJEE 

433 

method with a qualitative approach utilizes data in 

the form of word, spoken or picture descriptions of 

an individual, phenomenon or symptom of a group 

with various dimensions that can be observed by 

researchers (Sihombing, Silalahi, Saragih, & 

Herman, 2021; Simajuntak, Napitupulu, Herman, 

Purba, & Thao, 2021). Kress and van Leeuwen 

(2016) provide a method for examining advertising 

images using three meta-semiotic steps (Ngongo, 

2021). The three steps consist of a representational 

metafunction, an interpersonal metafunction, and a 

compositional metafunction.  

In this study, the researchers analyzed the 

multimodality of images used as learning media in 

learning English with the following stages: 

The first step is to analyze the representational 

metafunction of the image presented using an 

animated form containing conversation in English 

as a medium for learning English. 

 
Picture 1. Animated form of English conversation 

The second step is to analyze the interpersonal 

metafunction in the conversational picture by 

paying attention to the form of the picture used 

The third step is to analyze the compositional 

meta-function in the English conversation picture 

which can be used as a medium for learning about 

English conversation. 

 
Picture 2. Conversation picture 

The fourth step is to conclude the meaning of the 

discourse conveyed through the images used in 

learning. 

The fourth step concludes the data whether there 

are differences in the results of students' 

understanding of the material by using pictures. 

RESULTS AND DISCUSSION 

Results 

From the analysis of three social aspects of 

semiotics,  namely the representational, 

interpersonal,  and compositionally, it can be stated 

that the picture of the English conversation used 

leads to adjustment to the characteristics of the 

child as a learner. Children like animated characters 

and the use of color and animation in pictures is 

intended to attract students' interest in participating 

in learning and seeing pictures of conversations and 

practicing them. 

The following is an analysis of a simple example 

of how the analytical steps described above can be 

applied. The analysis was carried out on two 

pictures (1 and 2) which act as illustrations, which 

were taken from an English book for elementary 

schools published by a publisher in Bandung. 

Figure 1. (Description/identification) This image is 

a kitchen image. In the image above there are two 

participants; a child who said “This is the kitchen. 

My mom is cooking”, with outstretched arms 

wearing a blue shirt with an orange collar and 

brown trousers. Position the child to the right of the 

image. 

 
 Picture 3. Illustration example 

His gaze, although not very clear, turned to 

another participant, a mother who was cooking. The 

mother, on the left of the image, is standing slightly 

sideways and her gaze is not directed at the child 

but at what is in front of her. In front of the mother 

is a frying pan and kettle. The skillet, kettle, table, 

and other objects become the setting in which the 

process takes place. (Analysis/significance) The 


Partohap Saut Raja Sihombing, Herman, & Nanda Saputra 

How to teach English conversation? An implementation of a multimodal discourse analysis through images 
 

434 

vector in this image is realized through the eyes, 

moving from the child to the mother. Thus the child 

becomes a 'reactor' (reactor) while the mother 

becomes a phenomenon; someone whose activities 

are described.  

According to Unsworth relying on Kress and van 

Leeuwen (2016), when a vector is formed by the 

line/eye gaze of one or more 'participants' so that 

they look at something, then the process that occurs 

is seen as a reaction rather than an action, and ' 

participants' is called 'reacter' not 'actor'.  

The object in this image is depicted in a small 

size, and using a 'medium close shot', places us as 

the 'viewer' slightly above the image; when we look 

at the picture we are like looking down a little. This 

means that as 'viewers' we have more power than 

the 'represented participants' in the picture. 'Medium 

close shot' makes us only have social relations, not 

very close to 'represented participants' (Chen, 

2021). We know them as we know people in 

general; they are not part of those closest to us. This 

further means, 'represented participants' represent a 

general description of mothers and what activities 

are commonly attached to the domestic role of 

mothers in the social practices of our society.  

The gazes of the two participants are not 

directed at us who are looking but in another 

direction so that there is no contact between them 

and us. When there is no eye contact between the 

'represented participants' and the 'viewers', the 

'viewers' are placed as observers; they don't 

'demand' but 'offer' us. As observers, we are offered 

to observe what is happening in the kitchen. The 

size of the 'participant' in the thumbnail as well as 

the frame. This makes us as 'interactive participants' 

have greater power over the 'represented 

participants'. From a compositional layout point of 

view, the mother is to the left of the image, while 

the child is to the right of the image.  

According to Kress and van Leeuwen (2016), 

what is placed on the left of the image is 'given' 

while what is on the right is 'new'. In the picture 

above, the mother and what she does are 'given' 

while the child is 'new'. In this context, it should be 

explained that the 'given-new' composition applies 

to cultures in which the reading flow moves from 

left to right, while for cultures that use a right to left 

or top-down reading flow such as Arabic and 

Chinese cultures, the 'given-new' composition new' 

cannot be used. Such cultures use the term 

compositional layout differently. This form of 

composition has been criticized for being 

considered, in between, to overgeneralize over all 

cultures. However, for a reading and writing culture 

that moves from left to right, such as Indonesia. In 

the analyzed images, 'given' means mother and 

especially what mothers do is something that is 

normal, natural, and should be. While what is new 

is more important what the child does. Blue as the 

color of children's clothes is a soft color and is 

usually considered to represent calm. This is 

because blue is often associated with the color of 

the sea or mountains (Russell, & Norvig, 2020). 

Mother's clothes color, orange 'wrapped' brown. 

Orange is a 'warm' color, usually considered to be a 

symbol of passion. While brown, the color that 

'wraps' orange, is a soft color. It can be interpreted 

that the mother is enthusiastic, happy to do what is 

considered 'should' be done by a mother/wife.  

Verbal analysis shows that the sentences spoken 

by children contain relational (This is the kitchen) 

and material (is cooking) processes. Relational 

clauses serve to characterize and to identify. While 

the material clause is used to indicate, 'doing' or 

'happening', doing something, or an ongoing event 

(Sari, 2020). In the context of picture 1, the 

sentence spoken by the child identifies the room as 

a kitchen, one of the characteristics of which is the 

presence of a cooking utensil and someone, in this 

case, the mother being an actor who is doing the 

work that 'should' be done, namely cooking in the 

kitchen. 

 
Discussion 

Images or pictures, have various possible 

relationships with verbal, words. For Kress & van 

Leeuwen (2016), verbal adds/extends (extends) the 

meaning of the image and vice versa. Or, verbally 

explain (elaborate) the image and vice versa. Kress 

and van Leeuwen (2016) further say that for 

Barthes, the meaning of images in particular, and 

other semiotic modes, is always associated with, 

and dependent on, the meaning of the verbal text. 

Meanwhile, for Kress and van Leeuwen (2016), the 

visual component of the text, the image, is a 

message that is composed and arranged 

independently, related to the verbal text but does 

not depend (slash from the researcher) on it.  

Culache & Obadă (2014) use four categories of 

heteroglossic dimension proposed by Martin and 

White, finding that visual 'voice' or messages 

conveyed can contradict (disclaim) with 'voice' or 


ENGLISH REVIEW: Journal of English Education  p-ISSN 2301-7554, e-ISSN 2541-3643  

Volume 10, Issue 2, June 2022  https://journal.uniku.ac.id/index.php/ERJEE 

435 

verbal messages. In fact, these verbal and visual 

texts appear simultaneously in Budi Hermawan: 

Multimodality: Interpreting Verbal Opportunities. 

This relationship was also found by Bednarek and 

Caple who researched photojournalism in print and 

online newspapers. Royce saw that visual and 

verbal which are used as modes to convey messages 

in a text have an 'intersemiotic relationship', the 

relationship between various semiotic modes 

(Firdausy, 2015). The relationship between the two 

can be 'intersemiotic repetition', intersemiotic 

synonymy (similarity relations), intersemiotic 

antonymy (opposition relations), intersemiotic 

hyponymy (class-subclass relations), intersemiotic 

meronymy (partwhole relations), intersemiotic 

collocation (expectancy relations).  

Furthermore, to explain the relationship between 

verbal and visual text, Martinec and Salway offer a 

system to see the relationship between visual and 

verbal in a multimodal text. Their approach is based 

on status and logical-semantics (logico-semantics) 

relationships between visuals and verbals (Chen, 

2021). This differs from the system developed by 

Liu and O'Halloran (2009) which is more based on 

the 'discourse relation' between verbal and visual 

texts in their analysis of 'cohesive devices' between 

verbal and visual texts.  This has a positive impact 

on the development of learning outcomes and 

students' understanding of conversational material 

in English, this result can also be seen from the 

increase in student learning outcomes after using 

pictures. 

 
Figure 1. Average students’ score in English 

conversation 

From the diagram above, it was obtained that 

students' understanding increased after using 

pictures of English conversation in learning. 

The picture function, which should contain 

enough material for material about conversation, 

has now shifted. The picture is made as attractive as 

possible so that students are interested in learning 

about conversation using pictures presented by the 

teacher. So that the multimodality of English 

conversation images has a good role and produces 

developments in learning. 

This research related about teaching 

conversation by using images in the perspective of 

multimodal discourse analysis is very rarely 

conducted by other researchers. The researchers 

have tried to search but dominantly, the researches 

were oriented to the multimodal analysis on printed 

advertisement and other literatures perspective. The 

researcher did get one research related to this 

research. The research was done by Vungthong, 

Djonov, and Torr (2015) in their research entitled  

Images as a resource for supporting vocabulary 

learning: A multimodal analysis of Thai EFL tablet 

apps for primary school children. The research was 

about the use of One Tablet per Child (OTPC) by 

Thao government in supporting students' learning in 

the digital world. The software included in each 

child's OTPC tablet provide multimedia teaching 

applications (apps) on a variety of disciplines, 

including English as a foreign language (EFL). This 

essay analyzes how one element of the apps (song 

videos) uses visuals and words to build meaning 

and considers the potential of visual-verbal 

interactions to help vocabulary teaching and 

learning using the Grade 1 and 2 English apps as a 

case study. The article concludes with a discussion 

of related pedagogical implications for the use and 

design of EFL materials integrated into multimedia 

technologies: the critical role of teachers in guiding 

EFL learners' use of such materials, the need for 

increased awareness of the potential and limitations 

of images and visual-verbal relations to support 

EFL teaching and learning, and understanding the 

relationship between multimodal design of EFL 

materials and related learning outcomes.  

There are some similarities and differences 

between the research done by Vungthong et al. 

(2015) with this research. The differences are the 

images used in their research were aimed to 

enhance Thai students in vocabulary learning. 

Hence, the use of blended learning (OTPC) by their 

government in students’ learning vocabulary 

process. However, this research used images in 

helping the students in improving their speaking 

skills in the form of conversation. The images used 

here were taken from the English book used by the 


Partohap Saut Raja Sihombing, Herman, & Nanda Saputra 

How to teach English conversation? An implementation of a multimodal discourse analysis through images 
 

436 

students. Although there are differences, the 

researchers also depicted some similaries between 

both researches. The first was on the use of images 

in helping the students in their learning process. 

While, the goal was different, but by having these 

both researches, the researchers believe that images 

in perspective of multimodal are becoming more 

important nowadays in teaching English to the 

students whether in any kinds of English skills such 

as listening, speaking, reading and writing, 

including to other aspects such as vocabulary, and 

so on.  

 
CONCLUSION 
From the data analysis, the following conclusions 

can be drawn: (1) This multimodality of English 

conversation images is a representation of fantasy 

about animated images presented in learning. (2) 

The analysis of the three metafunctions shows that 

there is a shift from student focus from images 

related to English conversation material by 

displaying animation in the image. The visual 

appearance shown in the image with this animation 

is a fantasy genre modality because of the selection 

of attractive colors according to the characteristics 

of students. (3) There is an increase in the average 

learning outcomes of students before and after 

learning with animated images of English 

conversation material given by the teacher from 67 

to 89. (4) The multimodality of the ad analysis 

shows that the selection of animated images Visual 

in Multymodal in the images made using animation 

on the material about English conversation has a 

significant influence on the development of student 

learning. 

 
REFERENCES 
Anstey, M., & Bull, G. (2019). Helping teacher to 

explore multimodal texts. Curriculum and 

Leadership Journal, (Online), 8 (16), 103-110. 

Bilfaqih, Y., & Qomarudin, M. N. (2017). Multimodal 

analysis. In Dee Publish (Vol. 1). Dee Publish. 

Chen. (2021). Analisis audio visual pada iklan. Jurnal 

Pengabdian Masyarakat, 1(3).1–16 

Culache, O., & Obadă, D. R. (2014). Multimodality as a 

premise for inducing online flow on a brand 

website: A social semiotic approach. Procedia - 

Social and Behavioral Sciences, 149(2), 261–268. 

Firdausy, C. . (2015). Audio visual in multymodal 

analysis. LIPI Press. 

Herman, Murni, S. M., Sibarani, B., & Saragih, A. 

(2019). Structures of representational 

metafunctions of the “Cheng Beng” ceremony in 

pematangsiantar: A multimodal analysis. 

International Journal of Innovation, Creativity and 

Change, 8(4), 34–46. 

https://www.ijicc.net/images/vol8iss4/8403_Herma

n_2019_E_R.pdf 

Herman., Purba, R., Thao, N. V., & Purba, A. (2020). 

Using genre-based approach to overcome students’ 

difficulties in writing. Journal of Education and E-

Learning Research, 7(4), 464-470. 

https://doi.org/10.20448/journal.509.2020.74.464.4

70 

Herman, Van Thao, N., & Purba, N. A. (2021). 

Investigating sentence fragments in comic books: 

A syntactic perspective. World Journal of English 

Language, 11(2), 139–151. 

https://doi.org/10.5430/WJEL.V11N2P139 

Hutabarat, E., Herman, H., Silalahi, D. E., & Sihombing, 

P. S. R. (2020). An analysis of ideational 

metafunction on news Jakarta Post about some 

good covid-19 related news. Voices of English 

Language Education Society, 4(2), 142–151. 

https://doi.org/10.29408/veles.v4i2.2526 

Juliswara, V. (2017). Mengembangkan model literasi 

media yang berkebhinekaan dalam menganalisis 

informasi berita palsu (hoax) di media sosial. 

Jurnal Pemikiran Sosiologi, 4(2), 1–23. DOI: 

https://doi.org/10.22146/jps.v4i2.28586 

Kress, G., & van Leeuwen, T. (2016). Reading images 

the grammar of visual design. New York: 

Routledge. 

Liu, Y. And O’Halloran, K. L. (2009). Intersemiotic 

texture: Analyzing cohesive devices between 

language and images. Social Semiotics, 19(4), 367-

388. DOI:10.1080/10350330903361059 

purbaMartinec, R. (2015). A System for image-text 

relation in new (and old). LIPI Press. 

Ngongo, M. (2021). The investigation of modality and 

adjunct in spoken text of proposing a girl using 

Waijewa language based on Halliday’s systemic 

functional linguistic approach. English Review: 

Journal of English Education, 10(1), 223–234. 

https://doi.org/0.25134/erjee.v10i1.5382 

Ngongo, M., & Ngongo, Y. (2022). Mood clauses in 

spoken text of proposing a girl using Waijewa 

language : A systemic functional linguistics 

approach. Journal of Language and Linguistic 

Studies, 18(1), 669–691. 

Purba, R., & Herman. (2020). Multimodal analysis on 

ertiga car advertisement. Wiralodra English 

Journal, 4(1), 21–32. 

https://doi.org/10.31943/wej.v4i1.77 

Purba, R., Sibarani, B., Murni, S. M., Saragih, A., & 

Herman. (2022). Conserving the Simalungun 

language maintenance through demographic 

community: The analysis of taboo words across 


ENGLISH REVIEW: Journal of English Education  p-ISSN 2301-7554, e-ISSN 2541-3643  

Volume 10, Issue 2, June 2022  https://journal.uniku.ac.id/index.php/ERJEE 

437 

times. World Journal of English Language, 12(1), 

40–49. https://doi.org/10.5430/WJEL.V12N1P40 

Purnaningwulan, R. D. (2015). Hubungan terpaan iklan 

televisi produk revlon dengan motivasi konsumen 

wanita dalam melakukan pembelian produk di mall 

Surabaya. Commonline Departemen Komunikasi, 

4(2), 56-68. 

Russell, S., and Norvig, P. (2020). Visual image as media 

education. In Pearson series in artificial 

intelligence. 

Sari, S. (2020). Analisis multimodal. Journal of 

Reflection: Ekonomic, Accounting, Management 

Business, 3(2), 291–300. 

Sihombing, P. S. R., Silalahi, D. E., Saragih, D. I., & 

Herman, H. (2021). An analysis of illocutionary act 

in incredible 2 movie. Budapest International 

Research and Critics Institute: Humanities and 

Social Sciences, 4(2), 1772–1783. 

https://doi.org/10.33258/birci.v4i2.1850 

Simajuntak, V. D. S., Napitupulu, E. R., Herman, Purba, 

C. N., & Thao, N. Van. (2021). Deixis in the song 

lyrics of Hailee Steinfeld’s “Half Written Story” 

album. Central Asian Journal of Social Sciences 

and Histoey, 2(3), 98–105. 

Syafiq, A. N., Rahmawati, A., Anwari, A., & Oktaviana, 

T. (2021). Increasing speaking skill through 

YouTube video as English learning material during 

online learning in pandemic covid-19. Elsya: 

Journal of English Language Studies, 3, 50-55. 

Van Thao, N., & Herman. (2021). An analysis of 

idiomatic expressions found in Ed Sheeran’s 

selected lyrics songs. Central Asian Journal of 

Literature, Philosophy And Culture, 2(1), 12–18. 

https://doi.org/2660-6828 

Vungthong, S., Djonov, E. and Torr, J. (2015). Images as 

a resource for supporting vocabulary learning: A 

multimodal analysis of Thai EFL tablet apps for 

primary school children. Tesol Quarterly, 50(1), 

32-40. DOI: https://doi.org/10.1002/tesq.274. 

 
Partohap Saut Raja Sihombing, Herman, & Nanda Saputra 

How to teach English conversation? An implementation of a multimodal discourse analysis through images 
 

438