Rudy, M., D. Kristina, & S.S. Tarjana (2019). Measuring spoken vocabulary load on medical English students: A learner corpus evaluation. International Online Journal of Education and Teaching (IOJET), 6(4). 774-787 http://iojet.org/index.php/IOJET/article/view/595 Received: 31.12.2018 Received in revised form: 20.06.2019 Accepted: 23.09.2019 MEASURING SPOKEN VOCABULARY LOAD ON MEDICAL ENGLISH STUDENTS: A LEARNER CORPUS EVALUATION Research Article Muhammad Rudy Malahayati University muhammadrudy6@gmail.com Diah Kristina Sebelas Maret University Kristina_diah@yahoo.com Sri Samiati Tarjana Sebelas Maret University msrisamiati@yahoo.com Muhammad Rudy is a lecturer of English for Specific Purposes (ESP) at Malahayati University, Indonesia. His research interest includes English in Medicine and its teaching methodology as well as teaching technology. Diah Kristina is an Associate Professor of English Department of Cultural Science Faculty of Sebelas Maret University, Indonesia. Sri Samiati Tarjana is a Professor of English Department of Cultural Science Faculty of Sebelas Maret University, Indonesia. Copyright by Informascope. Material published and so copyrighted may not be published elsewhere without the written permission of IOJET. http://iojet.org/index.php/IOJET/article/view/595 mailto:muhammadrudy6@gmail.com mailto:Kristina_diah@yahoo.com mailto:msrisamiati@yahoo.com https://orcid.org/0000-0002-0981-5201 https://orcid.org/0000-0002-0448-3893 https://orcid.org/0000-0002-4245-1665 International Online Journal of Education and Teaching (IOJET) 2019, 6(4), 774-787 774 MEASURING SPOKEN VOCABULARY LOAD ON MEDICAL ENGLISH STUDENTS: A LEARNER CORPUS EVALUATION Muhammad Rudy muhammadrudy6@gmail.com Diah Kristina Kristina_diah@yahoo.com Sri Samiati Tarjana msrisamiati@yahoo.com Abstract English for Specific Purposes (ESP) teaching urges the students to have a deep understanding of specific vocabularies. Specifically, in medical English class, spoken diagnosis explanation involves specific vocabularies. This corpus study was aimed to reflect the students’ achievement of spoken vocabulary during speaking practice on explaining the diagnosis. Computer software was utilized to calculate frequency and range of words. The students’ vocabularies were compared with listening tapescript corpora from a medical English textbook to evaluate vocabulary pattern. Additionally, the students' spoken corpora were contrasted with 2000 high- frequency words and other three word lists to assess word distribution. This study revealed that medical students used few specialized vocabularies in order to deliver their explicable message to the patient. The analysis of students’ vocabulary can be used as a reference to contemplate the success of language instruction and future betterment, particularly spoken diagnosis explanation at medical English program. Keywords: corpus; ESP; evaluation; medical English; spoken diagnosis 1. Introduction Medical English is one of English for Specific Purposes (ESP) branches that has its own needs on the teaching instruction and materials. Richards (2017) argues that some practical concerns should be considered in designing ESP program, one of them is material preparation. Further, he recommends vocabulary selection as the basis for expressing meaning in a language. One of the best ways to do vocabulary selection for language teaching is corpus analysis. He explains that corpus analysis can select the highest frequency words that are considered most useful aspect in the instruction. In line with Richards, Nation (2001) states that the breadth or size of vocabulary knowledge is the number of words of learners. He also agrees that vocabulary acquisition is crucial since it is considered as the basic yet important component in language learning. The nation also divides the vocabulary into four levels, namely high-frequency words, academic vocabulary, technical vocabulary, and low-frequency words. The division clarifies that certain words need more attention as they may function for different purposes. mailto:muhammadrudy6@gmail.com mailto:Kristina_diah@yahoo.com mailto:msrisamiati@yahoo.com Rudy, Kristina & Tarjana 775 For a long time, English Language Teaching (ELT) has considered vocabulary as an important part. As O’Keeffee, McCarthy and Carter (2007) state that one of the efficient procedures to observe vocabulary is through the corpus. They explain that corpus can provide not only linguistic phenomena but also pedagogical implication. Before instruction, the material can be designed by analyzing needed language features while at the end of the process, the corpus can be a reflective instrument to evaluate achievement. In addition, Knight (1994) mentions that vocabulary knowledge is the most important facet of the second language (L2) learning. Vocabulary target is set at the initial time of material design. In the process of teaching and learning, teachers can start their activities with vocabulary understanding first. At the end of the session, vocabulary is commonly used as a standard to evaluate students' achievement. Thus, this study tried to shed light on the assessment of vocabulary load of medical students as part of teaching evaluation. Research to evaluate lexical coverage and spoken medical English class is considered to be great importance, especially related to the students' medical professionalism in the future. Applying corpus, this study tries to investigate vocabulary coverage in a classroom context. Textbook vocabularies and four-word lists were set as the threshold to count vocabularies load of the students. Thus, the researchers formulated the research questions as follows: • What are the words in the textbook which students are likely to use? • How many of 2000 high-frequency words found on students' speaking during explaining diagnosis? • How many words are produced by the students does not belong to 2000 high- frequency words? 2. Theoretical Framework 2.1 Vocabulary Thresholds West (1953) creates General Service List (GSL) containing 2000 most common English vocabularies that can help learners to communicate comprehensively. This list has been widely used by many countries as a framework to develop good teaching material in English (Richards, 2017). Later, GSL is revised by the NGSL (Browne, 2013) which is claimed as a better word list with bigger coverage, mainly because NGSL was created with advanced corpus device and big size data. Unfortunately, NGSL has not been built in the computer software used in the present research. As a consequence, GSL becomes the standard of this study since it has been set as a base word list at the computer in this study. In ELT curriculum and material development, corpus studies have been used to measure vocabulary load as a success indicator of textbook creation. Mukundan and Aziz (2009) compared words occurrence in five Malaysian English textbooks used in schools and GSL by West (1953). They found that, out of five, there is no single book that fully applies the whole 2000 most frequent words. 71.9% of words are overused in the textbooks as indicated by seven times and more repetition. These findings can be a reflection for the material designer to revise the textbooks to meet the entire GSL. In accordance with the previous research, Zarifi and Mukundan (2015) evaluated one of the linguistic aspects presented in Malaysian English as Second Language (ESL) secondary school textbook. By utilizing Wordsmith software and Oxford Dictionary of Phrasal Verbs, they tried to contrast phrasal verbs found in the textbooks and what are presented on the dictionary. Their International Online Journal of Education and Teaching (IOJET) 2019, 6(4), 774-787 776 research becomes a recommendation for future material designer in developing an ideal textbook since some phrasal verbs such as take away is overlooked. They regret that take away is not elaborated well in the textbook meanwhile it is very sensible in daily use of language such as in fast food restaurant. Recently, corpora basis can be used to assess standardized test items. Beng and Keong (2017) examined lexical bundles in an English reading test for varsity. They investigate lexical bundles in reading test for five disciplines; applied science, pure science, business, humanities and social sciences. They found that different lexical bundles are employed in different disciplines and genres, for instance, research-oriented bundles tend to appear more on applied science while social sciences have a more dependent clause. For consideration, teaching students to familiar with subject-related element can help the development of better tests. As an addition, Zorluel Özer and Okan (2018) contrast discourse markers produced by English teachers. They compared two Turkish teachers and two native teachers to find that Turkish teachers used fewer discourse markers than native teachers. They argued that the Turkish teacher did not use some important discourse markers in which the native teachers utilized them frequently. As an implication, they suggested the nonnative teachers exposed with discourse markers to gain authentic language use. The training and understanding of the importance of discourse markers can be begun from the level of the pre-service teacher. The last, exposing teaching material with a discourse marker can benefit nonnative teacher to be common with discourse marker. Students’ work corpora for evaluating learning achievement have been built by Khojasteh, Shokrpour and Torabiardakani (2017). They try to overview English modals use among 136 advanced Iranian learners. The students’ narrative writings from six English institutions are gathered. They calculate 429 times modals occurrences. It is surprising that one of the modals, shall, did not appear (0%) in the corpus. In other hand, some dominating modals such as can, will and could are overused by students. The phenomena bring them to a conclusion that students employe avoiding strategy during writing in order to conceal their incompetence of modals use. They conclude that the use of modals in the students’ writing do not show their language use in natural English. Accordingly, teaching various modal types with numerous samples and repetition is highly recommended. As the emergence of English as an academic language, Coxhead (2000) tries to bring a solution by proposing Academic Word List (AWL). She classified 570 most frequent academic words taken from the various academic text. In practice, AWL assists students in university-level to understand academic text like journal and textbook. Moreover, English for Academic Purposes (EAP) programs are benefited by AWL in designing their material. To ease the writing process of students from Hotel and Management Faculty in Malaysia, M.Nordin, Stapa and Darus (2013) built specialized word list related to the culinary course. They compiled 116 lecturing PowerPoint materials to result 3,698 running words. They employed RANGE and FREQUENCY software (Heatley, Nation and Coxhead, 2002) in finding 113 selected vocabularies for food writing. These specific vocabularies benefit ESP teachers in designing teaching materials because the objective of writing instruction is concentrated on subject-related words. In the other fields of ESP, such as engineering, agriculture and business, word list becomes a serious concern. Martinez, Beck and Panza (2009) initiated agriculture academic domain. Rudy, Kristina & Tarjana 777 Konstantakis (2007) and Hsu (2011) work for English for Business Purposes (EBP). Coxhead and Hirsh (2007) presented their Science-specific Word List. In Engineering, Hsu (2014) developed English Engineering Word List (EEWL) with the assistance of corpus software. In English for Medical Purposes (EMP), word lists are developed extensively. Medical Academic Word List (MAWL) is developed by Wang, Liang and Ge (2008). They identified 623 medical words which used in several journals. These words are very valuable for medical learners who want to read and write academic papers. Afterward, Lei and Liu (2016) revised it into New Medical Academic Word List (NMAWL) which is created from bigger corpora and listed in the lemma. Lei and Liu (2016) presented their NMAWL together with part of speech symbols like a for adjective and v for the verb. It is, therefore, very important to identify specific word list for specific learners, as ESP learners need to focus on their subject which is different from general English learners. A certain field of a study sometimes contains the very wide specific subject. As suggested by Lei and Liu (2016) related to medical English, EMP consists of very broad knowledge in one subject, designing more specific word list areas will help medical learners deepen their interest. In the Medical English class, students are trained to have good performance in speaking skills. One of the speaking components, vocabulary, is used as a good communication indicator. Lexical coverage of the students should be evaluated in order to understand the success of Medical English program. This study was done in the Indonesian context where English is considered as Foreign Language (FL). Medical English (ME) program is taught in the university level to prepare medical students to be capable of communicating through spoken and written English with foreign patients. As the program run in Malahayati University-Bandar Lampung, ME is done for two semesters with two credits each. Spoken skills that must be mastered by medical students include taking history, examining patient, giving reference, making diagnosis and giving treatment (Glendinning and Holmstrom, 2001). 3. Method 3.1. Participants This study was done on an intensive class which consisting of nine female students. They took intensive class of Medical English because they did not meet the regular schedule offered by Language Center (LC) of Malahayati University, Bandar Lampung. They were medical students on sixth semester taking Medical English (ME) Level Two. One of the focuses in the instruction was explaining diagnosis both in spoken and written. Those students had followed general English training in the same institution in the previous semesters, from first until the fourth semester. On the semester five, they took ME level one with concentration on taking history and patient examination. Both levels tried to integrate three English skills; listening, speaking, and reading, the writing was limited. In this study, language background of students, mother language (L1) or local language was denied since they used Indonesian language for daily conversation in the class. Before this study done, the students had gotten some training about explaining a diagnosis which involves listening activity, role play, and reading materials. Explaining diagnosis belonged to last section of the textbook together with treatment part because it follows the structure of general practitioners daily chores, started with interviewing patient and ended with medication. International Online Journal of Education and Teaching (IOJET) 2019, 6(4), 774-787 778 By this situation, it can be inferred that students had adequate EMP exposure and practice especially explaining diagnosis orally based on guidelines in the textbook. 3.2. Data Elicitation The students were asked to prepare their diagnosis based on medical case in their textbook page 65 to 75 (Glendinning and Holmstrom, 2001). The students were allowed to choose any medical case they were familiar with. This democratic assignment was done in order to trigger students’ speaking performance. They chose an easy topic they would be cope with vocabulary. As a consequence, there were varieties of topics explained by the students. The students were provided forty-five minutes to design their individual speaking performance. After that, they were asked to do a role play with the teacher. The teacher acted as patient and the students were doctors. During their speaking practice the students were requested to speak close to voice recorder. Each student was allowed to speak for one until five minutes time allocation. Then, their voices were carefully transcribed for further data processing, non- distinguish sounds were not proceeded. Meanwhile, tapescripts from textbook were retyped. 3.3. Learner Corpus In term of data size, O'Keeffe et al (2007) categorize small and large corpus. A written corpus is considered as quite small when it contains below five million words. On the contrary, the spoken corpus is categorized as large with more than a million words. This study contains more than a thousand words of students' spoken corpora because it was done only in a classroom setting. Hence, it can be said this is a mini corpus yet worth to bring factual image concerning vocabulary achievement. The researchers created simple corpus to contrast students’ corpora and textbook corpora. The students’ corpora contained 1,505 running words which proportion is elaborated on Table 1. On the other hand, the textbook corpora was made of 954 running words which was taken from tape scripts especially about explaining diagnosis on textbook from page 106 – 108 (Glendinning and Holmstrom, 2001). Both corpora were not combined in order to distinguish students’ creation and textbook. Table 1. Number of Words Produced by Medical English Students Students Excerpt Running Words 1 2 3 4 5 6 7 8 9 …it is possible Mr X suffering vascular dementia… …It can be investigated such as cholesterol LDL… ..You might not know you have it until you… …And now I want to tell you about my diagnosis… …you had complained that you got headache on… Good evening Mr. Hudson my name is Dewi…. … Okay you never go to doctor before… …You must take rest okay?... … Hundred ninety per one hundred ten mmHg… 115 67 108 120 162 257 188 143 345 Total 1,505 All corpora were processed by using RANGE and FREQUENCY Programs created by Heatley et al (2002). The software proceeds vocabulary in txt file format. RANGE was used to Rudy, Kristina & Tarjana 779 compare vocabulary up to 32 different texts at the same time. It provides a range or distribution figure, headword frequency figure, family frequency figure, and frequency figure for each of the texts the word occurs in. The program has 2000 high-frequency words (West, 1953) and 570 AWL (Coxhead, 2000) features. Those word lists were set as base words that can be used as comparison standard. FREQUENCY program analyzes word occurrence based on its hit on the text. It can only run one text at a time. The output can be listed in alphabetical or frequency order. The txt output file presents rank of the words, raw frequency and cumulative percentage frequency. A side by side frequency tables can be contrasted by utilizing FREQUENCY output. Ahead of RANGE analysis, some words from students’ transcription were eliminated. The elimination was done on name, both doctor and patient name. Numbers, such as age, time and single numbers, were retyped in English. The selected words were saved in txt format. The reduction is called stop list. 3.4. Data Processing and Analysis In order to evaluate vocabularies that have been achieved by students after learning process, the contrasting process was done on corpora of students and textbook through FREQUENCY program. The contrasting process was done by creating frequency list on both corpora. The frequency lists were contrasted side by side to find students’ words that occur in the textbook. Words were categorized into content and function words (Gerot and Wignell, 1995). The division comes out with assumption that function word which is close to grammatical aspect had been covered when the students in general English class (in semester one to four). Therefore, content word was focused in this study. From the frequency contrast, the researchers found some words that should not be analyzed further. The words are authentically related to local context such as name of students or patients, name of hospital, name of city, borrowing, and coinage words. Names which were not included in analysis were Putri, Nicol, Hudson, Jameson and Wulandari. Hospital name is Bintang Amin. The area names were Bandar Lampung and Kemiling. The rests were UGD, khas, formly, anamnetion and obstained. In the RANGE program those fifteen words were set as stop list words, which meant not counted in analysis. On the other hand, words that relate to medical terms, disease and abbreviation were allowed. The examples are thrombocyte, aedes aegypti, osteoporosis, hypothyroidism, MRI (Magnetic Resonance Imaging), X-Ray, CT Scan, etc. Those words were included in RANGE analysis because the students have been common with them either in EMP class or in other lecture classes. Subsequently, knowing the distribution of student words in the textbook, students’ words were contrasted with GSL (West, 1953). The contrast was aimed to see coverage of their vocabularies in the 2000 high frequency words. Subsequently, the words that did not belong to GSL were contrasted to 570 AWL (Coxhead, 2000). On the final stage, words that did not belong to either GSL or AWL were matched with MAWL (Wang et al, 2008) and NMAWL (Lei and Liu, 2016) word lists. The GSL and AWL contrast was done in once by using RANGE. While matching non GSL and AWL words with MAWL and NMAWL was done by searching the words manually with find menu on txt file. International Online Journal of Education and Teaching (IOJET) 2019, 6(4), 774-787 780 4. Results 4.1. Classroom Spoken Diagnosis Word List The mini corpus from spoken diagnosis done by ME students results 396 word types. The word you places the highest position with 83 frequency and word yes on the last list with only one frequency as mentioned on Appendices 1 and 2. On the other hand, corpora from tape script generates 378 word types with words the and yourself place the top and bottom rank, 46 and 1 frequency respectively. The word types from student source vary more than from tape script. The twenty most frequent words can be seen on Table 2: Table 2. First Twenty Words List of Textbook and Student Textbook Student Word Form Rank Frequency Word Form Rank Frequency THE 1 46 YOU 1 83 AND 2 33 AND 2 63 A 3 28 YOUR 3 46 YOUR 4 28 IS 4 39 TO 5 24 THE 5 38 OF 6 23 TO 6 31 IS 7 21 OF 7 30 IT 8 20 A 8 23 YOU 9 20 HAVE 9 23 IN 10 16 I 10 21 I 11 15 NOT 11 21 NOT 12 11 THAT 12 20 THIS 13 11 OKAY 13 19 WITH 14 11 HEADACHE 14 17 THAT 15 9 THIS 15 17 CONDITION 16 8 DO 16 16 WHEN 17 8 IT 17 14 WHICH 18 8 MIGRAINE 18 14 Rudy, Kristina & Tarjana 781 AN 19 7 ALSO 19 13 HIS 20 7 ARE 20 13 The above table shows that there were thirteen words that occur as similar words, even though they had different frequency on each list. The similar words were a, and, I, is, it, not, of, that, the, this, to, you and your. Those thirteen words were commonly uttered both by tapescript and students. However, the domain rank of the two lists do not show identical position, except the word and. With lower running words number on textbook list, and occurred thirty-three times compared with sixty three times occurrence on students' spoken diagnosis. It shows that and is significant on both word lists. Dissimilar words on both lists were an condition, his, in, when, which and with from tape script corpora. Whilst, the students’ spoken diagnosis has also, are, do, have, headache, migraine and okay. In these differences, it can be seen that the symptoms or diseases that are commonly mentioned during explaining diagnosis are headache and migraine. Those two words listed in the most twenty words uttered by the students because some students use similar medical case. It is confirmed by the name of patient they stated. Further, the frequency list from students and textbook were contrasted. The process was done with side by side analysis, the frequency tables were put together in one page. The contrasting shows that there were 126 word types that identically appeared on both tables as can be seen on Appendix 1. The frequency of words available on both columns were 768 (51% of total student words) and 573 (58% from text book). Among the list of words, there appear on student and textbook, they had 35 similar function word types; A, an, can, I, is, me, etc. In the list of words appear on students and textbook, the frequency of function words were 538 (70%) and 373 (65%), consecutively. While the content words from students were 230 (30%) and 200 (35%) from textbook. It was too hurry if we conclude function words dominate students and textbook corpora by denying the words that do not have similar appearance on the frequency. Moving to the list of dissimilar words on students and textbook, it can be seen that there were 271-word types from students and 251 types from the textbook. Among the number, there were only six-word types (2%) containing function words in students' corpora. While the textbook had only four (1.5%) word types of function words. It seems that content words dominate list of dissimilar words from students and textbook corpora. 4.2. Students Words vs. Established Word Lists In order to find a better comparison, the students' spoken vocabularies were compared with well-constructed word lists; GSL, AWL, MAWL and NMAWL. The comparison of the first two word lists was done by using RANGE program. 2000 high-frequency words were the first analysis stage. The words that did not appear on GSL were moved to next contrast stage, matched with AWL. The final comparison, the words that did match neither GSL nor AWL, will be looked up on MAWL and NMAWL. By doing so, the words produced by students can be assessed on coverage. International Online Journal of Education and Teaching (IOJET) 2019, 6(4), 774-787 782 The result of RANGE analysis showed that among 1,505 running words; 1,294 words occur in GSL (86%), 23 (1.5%) words in AWL, and 188 (12.5%) words not found in those three lists. The following table depicts RANGE analysis output. Table 3. Students Words Found in GSL and AWL Word List Token (%) Type (%) GSL AWL Not in GSL and AWL 1,294 (86%) 23 (1.5%) 188 (12.5%) 301 (76.2%) 14 (4.3%) 17 (19.5%) Total 1,505 395 Word types created by students are 395. They occur in GSL and AWL for 301 (76.2%) and 14 (4.3%) types respectively. Meanwhile, out of 188 words which were not categorized in GSL and AWL formed 17 (19.5%) word types. Later, the 17 word types will be matched with MAWL and NMAWL. The coverage of students' words towards 2000 high-frequency words only 301 (15%) types. It is very far from the expected numbers as mentioned by Nation (1990) in Mukundan and Aziz (2009) that the GSL covers 87% vocabulary in a text. Comparing 15% and 87% seems the students are not successful. The second comparison was students' corpus and AWL. There are only 23 out of 1,505 words (1.5%) produced in academic way. The actual word type was only 17 out of 395 (4.3%) shown by Table 4. The small number of flowery words created by students indicates that they prefer high-frequency words to scholarly stylistic words. Table 4. Words Found in AWL NO TYPE RANGE FREQ F1 1 ACCOMPANIED 1 1 1 2 CONDUCT 1 1 1 3 DATA 1 1 1 4 DEPRESSION 1 1 1 5 FUNCTION 1 2 2 6 INVESTIGATED 1 1 1 7 INVESTIGATION 1 2 2 8 MINIMAL 1 1 1 9 NORMAL 1 5 5 10 OCCUR 1 1 1 Rudy, Kristina & Tarjana 783 11 PHYSICAL 1 1 1 12 PLUS 1 1 1 13 SECTION 1 1 1 14 STABLE 1 1 1 15 STYLE 1 1 1 16 TRIGGER 1 1 1 17 TRIGGERED 1 1 1 The last comparison was done at words which were not found in GSL and AWL. There were 188 words which distributed into 77 words types. Due to the elimination of fifteen words which is mentioned on previous part, the word types which were put on not found neither in GSL nor AWL were only 62 as presented on Appendix 3. Scanning the table in Appendix 3, it is obvious that, the words which were not found neither in GSL nor AWL were dominated by medical terms. The words such as Aedes aegypti, densitometer, extremities, hypertension and osteocalcium are the words which are familiar in medical world. Those words relate to health examination and diseases. Medical students are skillful to talk and read about these terminologies in their lecturing or every day conversation. Even though the words were not found both neither in GSL nor AWL, the students are proficient to use them. This fact leads the researchers to compare the 62 word types with MAWL and NMAWL which are well known as established medical word lists. In accordance to MAWL which consists of 623 words, the researchers found only fourteen words consisting twelve equal words and two derivational words. The twelve duplicate words from Appendix 3 are alcohol, calcium, density, diagnose, diet, hypertension, protein, routine, scan, symptom, vascular and vital. While two other words are in the form of singular and verb presented in MAWL but they are presented in plural and noun on students corpora, drug to be drugs and prescribe to be prescription. There is a slight difference when 62 words were contrasted with NMAWL. With 16 words (15 identical and 1 derivational) found in NMAWL list, this comparison falls in very short apart. The fifteen similar words are density, diabetes, diagnose, diagnosis, diet, hypertension, nerve, prescription, protein, scan, symptom, urine, vascular and vitamin. The only one derivational is drug which is presented in NMAWL as plural word, drugs. 5. Discussion To get general insight, we can see total frequency combination of content and function words from both similar and dissimilar words from students and textbook. In the textbook word list, there were 385 function words (39.5%) and 591 content words (60.5%). The textbook was out- numbered by content words. Meanwhile, students word list, out of 1,505 running words, there are 564 function words and 941 content words. With 62% proportion, content words were prominent on student's corpora. The student's corpus agrees with textbooks, the content words International Online Journal of Education and Teaching (IOJET) 2019, 6(4), 774-787 784 were higher than function words. As a final decision on evaluation by considering the similarities, students were successful in achieving vocabulary target. Related to low coverage of students on 2000 high-frequency words, Richards (2008) argues that one of the spoken language characters is repetition. It is acceptable that the students, the candidate of medical doctors, use similar words during spoken communication. They performed fixed procedure or pattern on explaining the diagnosis. Even though they create 1,294 words in GSL the students tend to utter similar words with high frequency, see Appendix 1. Repetition of usual words happens on spoken diagnosis. Richards (2008) states that spoken language tends to use generic words, it is supported by Fauziati (2016) who indicates participants of a discourse influence the language use. That medical English students prefer daily lexical in explaining diagnosis is an attempt to bring understanding to the patient. They avoid misinterpretation by leaving difficult words. It is very understandable that non GSL and AWL words from students were very rare in MAWL and NMAWL because MAWL and NMAWL are deal with reading and writing academic article (Wang et al, 2008; Lei and Liu, 2016). Moreover, in the process of constructing both list Wang et al and Lei and Liu eliminated several words from GSL and AWL. Nonetheless, Mukundan and Aziz (2009) set seven times word repetition as the standard of good frequency words in a textbook. Applying the use of seven times occurrence in students speaking, though it is not proper, there are only three words out of 62 (4.8%) that meet the criteria. The words are headache 17 times, migraine 14 times and okay 19 times. Both headache and migraine, with total 31 frequency, denoted common symptoms that met in daily. While Okay, with the highest frequency of the all 62 words, commonly used in spoken language to show agreement or back channels as indicated by Gerot and Wignell (1995). Medical students who pretend as real medical doctor tried to ask confirmation to patient to agree with or an effort for doctor to start new topic. Thus, the word is used frequently in conversation. Heng and Abdullah (2013) emphasize that “who speaks what language to whom and when” are the keys of language use. The medical students who chose to use more common words than sophisticated words try to get successful explanation during informing diagnosis. Patients are people who have less knowledge on medical terms or scholarly medical words, doctors need to prefer tranquil lexical to contribute understandable diagnosis. 6. Conclusions Using corpus software analysis on evaluating spoken language in EMP classroom brings better insight on vocabularies load as part of assessing performance skill. The composition spoken diagnosis words on medical English students are not outlying from what the textbook has as shown by similarities on students and textbook corpora. Both students and tape script textbook are dominated by content words in numbers. However, the words produced by students mostly fall into 2000 high frequency words characterized by big repetition on some word types. Interestingly, students do not use sophisticated language in explaining their diagnosis to the patient. Students of medical English program tend to avoid academic words to get a straightforward understanding for patients. Looking at medical specific words, there are few medical terms that they use during speaking to patient. Students, as candidate of medical doctors overuse common words and exclamation to have smooth conversation with patients. Overall, Rudy, Kristina & Tarjana 785 general English words benefit patient and doctor communication for the sake of comprehension of the messages. The small number of data size becomes flaw on this corpus study. The small number of word types found on 2000 high frequency words can be caused by lack of data number. The future researchers are suggested to use bigger running words to get reflection of medical spoken phenomena. Nevertheless, generalization of this study can be used as foundation to measure success of EMP program related to vocabulary coverage. Teachers of English program, especially medical English, are suggested to calculate vocabulary load among their students in order to identify classroom strength and weakness, vocabulary achievement. After knowing condition of students’ corpora, teachers can design viable strategies and materials for successful instruction. It is recommended that medical students are exposed with GSL words that represent characteristic of spoken language. Explaining diagnosis that needs more daily vocabularies should be supported with vocabulary training. One of teaching technique that can be utilized is repetition of words as suggested by experts, at least seven times repetition. Acknowledgements The researchers would like to profoundly express their gratitude to Paul Nation, Averil Coxhead and Heatley for the free RANGE and FREQUENCY Programs as essential instrument in this research. International Online Journal of Education and Teaching (IOJET) 2019, 6(4), 774-787 786 References Beng, C. O. S., & Keong, Y. C. (2017). Comparing structural and functional lexical bundles in MUET reading test. Pertanika Journal of Social Sciences and Humanities, 25(1), 133–148. Browne, C. (2013). The new general service list: Celebrating 60 years of vocabulary learning. Language Teacher, 37(4), 13. Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 251 – 263. Fauziati, E. (2016). Applied linguistics: Principles of foreign language teaching, learning and researching. Surakarta: Era Pustaka Utama. Gerot, L. & Wignell, P. (1995). Making sense of functional grammar: An introductory workbook. New South Wales: Gerd Stable. Glendinning, Eric. H. & Holmstrom, Beverly, A. S. (2001). English in medicine third edition. Cambridge: Cambridge University Press. Heatley, A., Nation, I. S. P. & Coxhead, A. (2002). RANGE and FREQUENCY [Computer software]. Retrieved from https://www.victoria.ac.nz/lals/about/staff/paul-nation Heng, C. S., & Abdullah, A. N. (2013). Norms of language choice and use in relation to listening and speaking?: The realities of the practice in the Malaysian banking sector.” Pertanika Journal of Social Sciences and Humanities, 21, 117–130. Hsu, W. (2011). The vocabulary tresholds of business textbooks and business research articles for EFL learners. English for Specific Purposes, 30(4), 247-257. Hsu, W. (2014). Measuring the vocabulary load of engineering textbooks. English for Specific Purposes, 33(2014), 54-65. Khojasteh, L., Shokrpour, N., & Torabiardakani, N. (2017). EFL advanced adult learners. Use of English modals in narrative composition.” Pertanika Journal of Social Sciences and Humanities, 25(4), 1803–1820. Knight, S. (1994). Dictionary: The tool of last resort in foreign language reading? A new perspective. The Modern Language Journal, 78, 285-299 Konstantakis, N. (2007). Creating a business word list for teaching business English.” ELIA, 7(1), 79 – 102. Lei, L., & Liu, D. (2016). A new medical academic word list: A corpus-based study with enhanced methodology. Journal of English for Academic Purposes, 22(November), 42–53. https://doi.org/10.1016/j.jeap.2016.01.008 M.Nordin, N.R., Stapa, S. H., & Darus, S. (2013). Developing a specialized vocabulary word list in a composition culinary course through lecture notes.” Advances in Language and Literary Studies, 4(1), 78-88 https://doi.org/10.7575/aiac.alls.v.4n.1p.78 https://www.victoria.ac.nz/lals/about/staff/paul-nation https://doi.org/10.1016/j.jeap.2016.01.008 https://doi.org/10.7575/aiac.alls.v.4n.1p.78 Rudy, Kristina & Tarjana 787 Martinez, I. A., Beck, S. C., & Panza, C. B. (2009). Academic vocabulary in agriculture research articles: A corpus-based study.” English for Specific Purposes, 28(3), 183 – 198. Mukundan, J., & Aziz, A. (2009). Loading and distribution of the 2000 high frequency words in Malaysian English language textbooks for form 1 to form 5.” Pertanika Journal of Social Sciences and Humanities, 17(2), 141–152. Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge, England: Cambridge University Press. O’Keeffee, A., McCarthy, M., & Carter, R. (2007). From corpus to classroom. Cambridge: Cambridge University Press. Richards, J. C. (2008). Teaching listening and speaking: From theory to practice. New York: Cambridge University Press. Richards, J. C. (2017). Curriculum development in language teaching. Cambridge: Cambridge University Press. Salager-Meyer, F. (2014). Origin and development of English for medical purposes. Part II: Research on spoken medical English.” Medical Writing, 23(2), 129–131. https://doi.org/10.1179/2047480614Z.000000000204 Wang, J., Liang, S. lan, & Ge, G. chun. (2008). Establishment of a medical academic word list.” English for Specific Purposes, 27(4), 442–458. https://doi.org/10.1016/j.esp.2008.05.003 West, M. (1953). A general service list of English words. London: Longman, Green, & Co. Zarifi, A., & Mukundan, J. (2015). A corpus-based study of semantic treatment of phrasal verbs in Malaysian ESL secondary school textbooks.” Pertanika Journal of Social Sciences and Humanities, 23(4), 793–808. Zorluel Özer, H., & Okan, Z. (2018). Discourse markers in EFL classrooms: A corpus-driven research. Journal of Language and Linguistic Studies, 14(1), 50-66. https://doi.org/10.1179/2047480614Z.000000000204 https://doi.org/10.1016/j.esp.2008.05.003