LLT JOURNAL VOL. 16 NO. 1 ISSN 1410-7201 23 Content Validity and Authenticity of the 2012 English Test in the Senior High School National Examination Frisca Ayu Desi Widyaningrum Carla Sih Prabandari English Language Education Study Program Sanata Dharma University ABSTRACT This paper discusses content validity and authenticity of the English test items in National Examination (UN) year 2012. It is worth discussion because UN, which was administered nationally, was the most important standardized testto assess Indonesian students’ competence. The study aimed to find out howvalidthe content of the English test items of National Examination year 2012 for senior high schools isand how authentic the English test items of National Examination year 2012 for senior high schools is. The writers employed a qualitative research with document analysis to conduct the analysis of both content validity and authenticity of the English test items. The data were obtained from the document and analyzed by using checklists. Besides, to maintain the validitiy of the analysis, a triagulation was done by distributing aquestionnaire to four experts in language assessment. There were twofindings resulted form the analysis. First, the content of the National Examination year 2012 was 98.8% valid since almost all of the contents were relevant to the test specifications. However, there were three reading test versions which failed to represent kinds of texts, namely explanation text. Second, the National Examination year 2012 met the criteria of authenticity with percentage 79.5% since some listening and reading test items failed to conform to authenticity criteria. Natural language use, the relevance of the test topics, and real-world representativeness became problematic aspects to meet the higher standard of authenticity. Keywords: content validity, authenticity, English test items, National Examination, document analysis 1. INTRODUCTION National Examinationin Indonesia is the highest standardized test employed toassess and measureIndonesian students’ competence(Education Ministry Regulation No. 59/2011). By passing National Examination, Indonesian students are able to graduate from a certain education level and to continue their study to the further education level. Therefore, the administration of National Examination is regulated orderly by Education Ministry as well as the test itemsshould be well-prepared and referring to particular test specifications and lesson objectives. MentionedinEducation Ministry Regulation(No.22/2006), National Examinationmaterials are generally based on Competence Standard and Basic Competence of each level of educational unitswhich are concluded in Content Standards(Standard Isi).Furthermore, Competence Standard and Basic Competence of each level of educational units become reference to createGraduate Competence Standard (Standard Kompetensi Lulusan) which consists of test specifications. 24 Due to the reasons above, the test- makersneed to pay attention at least to the test’scontent validity and authenticity in order to make good test items, particularly in National Examination. Content validity helps the test reflect the measured skills which should be performed by students. American Psychological Association (1985) advances validity of a test is to revealthe relevant test scores (as cited in Rudner and Shafer, 2002, p.12).Seif (2004) explains that if a test does not have content validity, the test-examiners may not be able to determine that the students achieve the set of learning objectives in a particular level of education (as cited in Jandhagi and Shaterian, 2008, p.2). The test-makers also need to pay attention to the authenticity of the test. Authenticity isimportantsince it builds figures of the target language used in the real situation for students (Brown, 2004). Students will be confused to use language in context unless National Examination reflects authenticity. Moreover, it is important that the materials used in the test are relevant to students’ majors in order to ease students in comprehending the content. The researcher analyzes content validity and authenticity of English test items on National Examination in order to obtain more information about the quality of English items of National Examination year 2012 for senior high schools. The analysis was conducted using document analysis. This is supported by Fraenkel and Wallen (2008) who state that document analysis is useful to prevail information in dealing with educational matters (p.497).In this research, the primary document which is analyzed is the listening recording and the five reading test versions of English test of National Examination year 2012 for senior high schools.The study is based on following two questions: 1. How valid is the content of English test items of National Examination year 2012 for senior high schools related to the lesson objectives and the test specifications? 2. How authentic is English test items of National Examination year 2012 for senior high schools related to the criteria of authenticity set by Brown? 2. LITERATURE REVIEW A language test is a systematic method to measure someone’s capability, knowledge, or performance in a certain domain in its relation with the language use. In order to meet usefulness of a language test, the test should meet a good test’s criteria, for instance: reliability, validity, practicality, and authenticity (Brown, 2004). Therefore, the language test should has high quality since it is a measurement of students’ capability. In terms of methods, National Examination is a kind of paper-and-pencil language tests or written test and it belongs to receptive tests because it test somebody’s receptive skills such as listening and reading skills. Besides, National Examination is categorized into achievement tests in terms of test purposes (McNamara, 2000). As an achievement test, National Examination corresponds to the classroom lessons, units, or curriculum (Brown, 2004). The bases of composing National Examination are the Competence Standard, Basic Competence, and Graduate Competence Standard. In order to meet the function as an assessing tool, language test such as National Examination should meet at least two of the principles of language assesment namely content validity and authenticity. A valid listening test is a test where the content is composed based on the blueprints. If the topics are relevant with the test specifications, the listening test is valid (Brown, 2004). A valid reading test is Content Validity and Authenticity of the 2012 English Test .... LLT JOURNAL VOL. 16 NO. 1 ISSN 1410-7201 25 a test where the content is composed based on the blueprints. If a language test does not meet content validity it probably affects the students’ capability to perform the intended skill and the students are probably not capable to answer the test questions (Seif, 2004). In order to check the validity of language test, the test-designers or teachers are able to check it by matching the test items with the relevant test specifications and lesson objectives. Authenticity is one of the important language assessment facets since it resemble how the language test show the real-world tasks and true language use (Richards, 2001). It performs the true language in context and they help students by providing appropriate information about the target language (Richards, 2001). In addition, authenticity is a matter of appropriateness of the content and construction of both test tasks and test texts as well as it is not used to teach grammar or language discourse. Instead, it shows genuine and reliable language (Richards, 2001). In order to determinethe authentic assessment, the test-designers should consider two important parts of authenticity namely test task characteristics and test text characteristics (Bachman and Palmer, 1996). Task characteristics include five aspects namely the naturalness of test language, the contextualized items represented in the test, the relevance of the test topics and the learners, the existence of some thematic organization items, and the representativeness of the world tasks (Brown, 2004). The naturalness of test language in reading test items consists of linguistic aspects namely typography, lexis, morphology, syntax, and semantics. The naturalness of test language shows the appropriateness of the test language to the target language. The target language use of the English test on National Examination is American English and British English. It is because American English and British English becomes international language as means of communication spoken by most of people throughout the world. The naturalness of listening test refers to the existence of hesitations, white noise, and interruptions in listening tests (Brown, 2004). The contextualization of the test items refers to the test items organizations which are related to the existence of some thematic organization items. Another indicator is relevance of the test topics and the learners which has meaning that the materials should be appropriate to learners. The last indicator is that the tasks should represent the real-world tasks which mean that authentic materials are taken from real- world sources.Besides the test tasks, the test text characteristics become important in order to achieve authenticity and the text characteristics adapt the five indicators of test authenticity. There are three indicators used to check authenticity of reading texts namely the naturalness of test language, the relevance of the test topics and the learners, and the representativeness of the world tasks. 3. DISCUSSION The results of the analysis on both content validity and authenticity of the test items are presented in the following table: Table 1The Percentages of Validity and Authenticity of the Test Items No Research Findings Percentages 1. The validity of the English test items according to Competence Standard- Basic Competence and Graduate Competence Standard 98.8% 2. The authenticity of the English test items 79.5% 26 The table shows that the percentages of validity and authenticity of English test items on National Examination year 2012 were not able reach the highest percentages namely 100% due to some causal factors which were found in data analysis. The reasons are described as follows: 3.1 Content Validity of the Listening Test Items According to Competence Standard and Basic Competence According to the analysis carried out by the researcher, there was none of the listening test items represents samples of responding to short spoken functional texts as stated in Competence Standard and Basic Competence for senior high schools. It is that listening learning topic was not written in Graduate Competence Standard as one of the test materials. Instead, all listening test materials on National Examination year 2012 make reference to the learning topics stated in Competence Standard and Basic Competence for senior high schools. That is supported by Brown (2004),he argues that test specifications include the general outline of the test and the test tasks (p.50). The test specifications in Graduate Competence Standard referred to a certain curriculum and it consisted of only the general outline of whole materials and skills due to test practicality. Therefore, it was not a matter as long as all materials in the listening test make reference to Competence Standard and Basic Competence. 3.2 Content Validity of the Listening Test Items According to Graduate Competence Standard The results which were obtained show that each listening test items one up to 15 reflected the content of the test specifications on Graduate Competence Standard. It correlates to APA (1954) that content validity refers to the scale of the correlation between the content of the assessment items and domain of interest. The listening test items includes the listening skill which are going to be measured in the listening test section. It does not include measurements or test items which measure other skills like speaking, reading or, writing skills. 3.3 Content Validity of the Reading Items According to Competence Standard and Basic Competence Test items on versions A57, B69, C71, D32, and E45 were considered 100% valid in case of the content since the items on each test version of National Examination represented reading topics of learning which was written in Competence Standard and Basic Competence, not other skills. This is in line with Seif (2004) who claims if test does not meet validity in its content, there will possibly be two negative outcomes(as cited in Jandhagi and Shaterian, 2008, p.2). In this case, the two negative outcomes would not happen since all test items in the five test versions of National Examination year 2012, based on Competence Standard and Basic Competence. 3.4 Content Validity of the Reading Test Items According to Graduate Competence Standard The data analysis shows that there were three test versions did notrepresent explanation text. In this case, those three test versions of National Examination year 2012 for senior high schools lacked content validity. If the reading test of National Examination completely refers to Graduate Competence Standard, there will be 13 kinds of reading materials (message, letter/e- mail, advertisement, narrative text, news item, recount text, announcement, report, descriptive text, explanation, exposition text, discussion, and review) on the reading test. Instead, test versions A57, C71, and D32 only demonstrated 12 kinds of reading materials while B69 and E45 could demonstrate all 13 kinds of reading materials which were written on Graduate Competence Standard. Content Validity and Authenticity of the 2012 English Test .... LLT JOURNAL VOL. 16 NO. 1 ISSN 1410-7201 27 It showed the difference in quality of content validity in all five test version. This is in line with ElindDriana’s statement, an observer in educational field, Tempo (Tuesday, 11 September 2012). ElindDriana (2012) states composing several versions of National Examination should consider that all those various test versions should have the same quality of difficulties. Therefore, it is important that each test version should have the same kinds of test instructions and test topics in order to meet validity of the result. 3.5 Authenticity of the Listening Test Items . There was one significant problem related to the naturalness of language use in listening test items. There was no significant problem related to other factor namely contextualization of the test items, thematic item organization, relevance of the test topics to the learners, and real-world representativeness. The language used in the conversations was similar to the real- world conversations and there were also some word reduction in order to make the conversations natural. In the listening test question number 2, for instance, the woman reduced the word did and not into didn’t. However, there was no hesitations and white noise found in the conversations. Therefore, the conversations sounded like designed recordings.According to Brown (2004), there are two of three features which can be used to express natural language use in listening comprehension section; they are hesitations and white noise (p.28). Afterwards, all listening test items on National Examination year 2012 are considered as contextualized items because the test items are developed from two learning topics integrated in the blueprints namely transactional/interpersonal ex- pressions and monologue texts. Besides, all learning topics of the fifteen test items on the listening test are relevant for the learners. The learners in this context are senior high school students and the learning topics used in the conversations are about asking for and giving direction, expressing pleasure, thanking, complaining, asking for and giving information, and offering help. All topics in the listening test take place in daily-life situation.In the listening test on National Examination year 2012, the researcher found out that four test items are organized in a form of story lines. Lastly, the real-world representativeness could be exhibited in all listening test items. The conversations and the spoken monologue texts often take place in daily-life situation. 1.6 Authenticity of the Test Tasks The preliminary data shows that the total different test items from A57, B69, C71, D32, and E45 were 123 test items and there were 50 different passages employed in it. The researcher also recognized that most of the test tasks had problem to fulfill the naturalness of language used in the test instructions and the optional answers as well as the relevance of a particular test topic for the learners.Although the language test was not intended to test some grammatical or lexical items, the test-designers should avoid linguistic mistakes in order to represent highly authentic reading test. According to Richards (2001), the visible characteristic of authentic materials was that it provides true language (pp.252- 253). It means that there should no linguistic mistakes such as typographical mistakes, lexis, morphemes, word orders and grammar (syntactic matters), diction, and meaning (semantic matters)in the test tasks in order to avoid test takers’ confusion in understanding the test instructions. From 123 test tasks or test instructions there are only 105 test items which meet the natural language use criterion. Consequently, the test takers were possibly confused in 28 understanding the meaning. It was related to Widdowson (1976) who emphasizes that authenticity is not only about the quality of a text at all but authenticity is reached when the readers understand the writer’s intention (p.264). The other mistake belongs to morphosyntactic mistake which is related to singular and plural forms. Itis related to the use of determiner as well. The researcher also considered that all the reading test items on National Examination year 2012 are contextualized. All test tasks were developed from certain learning topics namely functional texts and essays. In relation to the thematic items organization, the researcher identified there were 118 test tasks constructed thematically while there were five test items constructed independently. Besides, the test tasks on the reading test do not attempt to ask for Englishgrammatical forms but it indicated asking for information or the meaning of some vocabulary. Lastly, the relevance of a particular test topic to senior high school students becomes a problem in the reading test tasks of National Examination year 2012 since there are two test tasks found in A57 test versionwere considered not relevant to senior high school students. 3.7 The Authenticity of the Test Texts The result of the analysis shows that most test texts face problem to fulfill the naturalness of language used in the test passages and the real-world representativeness as well as the relevance of a particular test topic to the learners. According to the data, there was only 36% of the test texts which met the indicator of naturalness of the language used in the test texts. The failure of the test passages to meet the indicator was caused by the existence of linguistic facets like: typographical mistakes, lexis, morphemes, word orders and grammar (syntactic matters), diction, and meaning. Afterwards, there were only 98% of the test text topics which were relevant to senior high school students. The topic was not relevant to senior high school students because the passage used specific terms related to electrical installation. Almost all of the passages used in the reading test failed to represent the real- world context even though the topics of the passages were rational and based on real- world context. Unlike what Brown (2004) states that authentic reading passages are taken from real-world sources (p.28), meanwhile the test-designers of the English test items did not mention the sources where the passages were taken from. Another reason was that the samples of the formal letters, announcements, and the advertisements look unnatural viewed from the format and design. 3. 8. Other Findings The main goal of the Education Ministry by applying different kinds of test version in National Examination year 2012 was to clamp down on students’ fraudulence in the implementation of National Examination. From the pre-analysis, the researcher found out interesting results. It was that there were several similar passages and test questions used in all five test versions. The other interesting findings were that most of the passages and the test tasks in test version C71 were similar to the passages in test version D32. The difference between both test versions was only found in the test item numbers 39, 40, 41 of both test version since the passages used in each test version, related to those three questions, were different. It implied a mismatch between the Education Ministry’s objectives to apply several test versions in National Examination and the facts founded in reading test items of National Examination year 2012. Content Validity and Authenticity of the 2012 English Test .... LLT JOURNAL VOL. 16 NO. 1 ISSN 1410-7201 29 4. CONCLUSIONS There were several findings in this research concerning to answer two research questions as follows. First, it was found that the content of the test items (including listening and reading test items) are 98.8% valid. Second, the results of analyzing the authenticity of National Examination listening test items year 2012 show that the listening and reading test items are authentic. It reaches 79.5% as the percentages of the authenticity. According to the findings, it implies that the English test items of National Examination year 2012 need evaluation and improvement since both content validity and authenticity of the English test items on National Examination year 2012 are not able to reach the highest percentages. REFERENCES American Psychological Association. (1954). Technical recommendations for psychological tests and diagnostic techniques: Preliminary proposal. American Psychologist, 7, 461-476. American Psychological Association.(1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association. Bachman, L. F., & Palmer, A. S. (1996).Language testing in practice: Designing and developing useful language tests. Oxford: Oxford University Press. Brown, H. D. (2004). Language assessment: Principles and classroom practices. USA: Pearson Education, Inc. Fraenkel, J. R., &Wallen, N. E. (2008).How to design and evaluate research in education (7thed.). Boston: McGraw- Hill Higher Education. Gronlund, N. (1998). Assessment of student achievement (6thed.). Needham Heights, MA: Allyn& Bacon. Jandaghi, G., &Shateria, F. (2008).Rate of validity, reliability and difficulty indices for teacher-designed exam questions in first year high school. International Journal of Human Sciences [Online]. 5:2. Retrieved September 19, 2013 from http:// www.insanbilimleri.com McNamara, T. (2000). Language testing. Oxford: Oxford University Press. Oxford Advanced Learner’s Dictionary (7thed). (2005). Oxford: Oxford University Press. PeraturanMenteriPendidikanNasional No. 22 tahun 2006 tentangUjianNasional. PeraturanMenteri No. 59 tahun 2011 tentangUjianNasional. Richards, J. C. (2001). Curriculum development in language teaching. New York: Cambridge University Press. Seif, A. A. (2004). Assessment and Evaluation in Education. Tehran: Doran Publication. Retrieved September 19, 2013 from http://www.insanbilimleri.com Tempo Interactive. (June, 2012). National exams warrant re-evaluation. Tempo. Retrieved on August 31, 2012, from http://www.tempo.co/read/ news/2012/06/.../055409165 Anggrita Desyani. (September, 2012). Rencanavariasi 20 soalujiannasionaldikritik.Tempo. Retrieved on September 19, 2013, from h t t p : / / w w w . t e m p o . c o / r e a d / n e ws / 2 0 1 2 / 0 9 / 1 1 / 0 7 9 4 2 8 8 5 3 / R e n c a n a -Va r i a s i - 2 0 -S o a l - U j i a n - Nasional-Dikritik Widdowson, H. G. (1976). The authenticity of language data. In John F. Fanselow& Ruth H. Crymes (Eds.), TESOL ’76 (pp. 261-270). Washington, D.C.: TESOL. Cover Vol 16 2013_rep Isi LLT_Vol_16_2013_A_save as Isi LLT_Vol_16_2013_B_save as