IJEE (Indonesian Journal of English Education), 4 (2), 2017, 165-184 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344. This is an open access article under CC-BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) Available online at IJEE (Indonesian Journal of English Education) Website: http://journal.uinjkt.ac.id/index.php/ijee A MODEL OF AN ONLINE READING COMPREHENSION SUMMATIVE TEST FOR COLLEGE STUDENTS Sofa1, Gunadi H. Sulistyo2 Received: 14th October 2017; Revised: 18th November 2017; Accepted: 24th December 2017 ABSTRACT There is an emerging phenomenon in some universities including STKIP PGRI Jombang regarding a compelling need of a test that can replace the existing paper-and-pencil based reading comprehension test, which is conventional, impractical, and time consuming. To fulfill the need, a model of an online reading comprehension summative test was developed, involving a number of essential micro skills of reading. The design of the study was Educational Research and Development (R&D), involving 100 subjects in the try-out stage. The instruments used were interview guides and questionnaire. Based on the tryout analysis, the reliability was .779, in which thirty one items were categorized as valid items. For the ease of scoring and the balanced number of the indicators under interest, only 25 items were included in the model test. Based on the students’ questionnaire, more than 80% subjects responded positively. The final product of this research was a set of an online reading comprehension test kit that includes the blueprint, the test (in form of paper and screenshot of the online version), the answer key, and the instruction to access the online test. Key Words: online summative test; reading comprehension ABSTRAK Di beberapa universitas termasuk STKIP PGRI Jombang, muncul kebutuhan penting sebuah tes yang bisa menggantikan tes membaca berbasis paper-and-pencil sebelumnya yang konvensional, tidak praktis dan memakan banyak waktu. Untuk memenuhi kebutuhan tes yang bisa mengatasi masalah tersebut, dikembangkanlah sebuah model tes membaca sumatif online. Desain penelitian ini adalah penelitian pengembangan, yang melibatkan 100 subjek dalam tahap try-out. Instrumen yang digunakan adalah interview guide dan kuesioner. Berdasarkan analisis butir soal, nilai alpha atau reliabilititas adalah 0.779. 31 butir soal dikategorikan sebagai butir soal yang valid. Untuk kemudahan penilaian dan keseimbangan jumlah indikator yang diinginkan, hanya 25 butir soal yang digunakan dalam model tes. Berdasarkan kuesioner mahasiswa, lebih dari 80% subjek merespon secara positif. Produk akhir dari penelitian ini adalah satu set online reading comprehension test yang meliputi kisi- kisi, tes (dalam bentuk kertas dan screenshot versi online), kunci jawaban dan instruksi untuk mengakses tes online. Kata Kunci: tes sumatif online; reading comprehension How to Cite: Sofa., Sulistyo, G.H. (2017). A Model of an Online Reading Comprehension Summative Test for College Students. IJEE (Indonesian Journal of English Education), 4(2), 168-187. doi:10.15408/ijee.v4i2.8344 IJEE (Indonesian Journal of English Education), 4 (2), 2017 169-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license INTRODUCTION Reading is one’s inevitable daily needs. Sulistyo (2011, p.20) states that on one occasion, we read for information; on the other for enjoyment. This implies that reading comprehension plays a critical role in our daily lives. To reading teachers who are concerned with students’ competence to read for information or knowledge through reading activities, there is a compelling need for them to always find an appropriate way to teach their students and to assess their reading comprehension with a greater attention as the ability to read is an important asset one must have on any occasion, let alone, in the digital era. Reading (critically) is believing; it is the window through which abundance of information is accessed. A test is a subset of assessment (Brown, 2004, p.4). Further Brown (2004, p.4) states that a test is prepared administrative procedures that occur at identifiable times in a curriculum when learners muster all their faculties to offer peak performance, knowing that their responses are being measured and evaluated. In this way, learners are required to demonstrate their optimum competences elicited through tests in the form of manifest language behaviors. To develop a good test, there are several criteria that need to be not only known but also fulfilled satisfactorily as a test is a set of data collection instruments that should function properly if accurate information about the learners is to be observed optimally to avoid the so-called gi-go effects – garbage in garbage out impacts. The first is validity. Gronlund and Linn (1990, p.47) state that validity refers to the appropriateness of the interpretations made from test courses and other evaluation results, with record to a particular use. It means that the result of the test should be meaningful, appropriate, informative, and useful. The second is reliability. Brown (2004, p.20) states that a reliable test is consistent and dependable in terms of the scores yielded by the testing procedures. If we give the same test to the same students on two different occasions, the test should yield about similar results. The third is practicality. Djiwandono (1996) states that practicality means something to do with the test administration, scoring, interpreting of the test results, even with the financial factors of the test administrations. Practicality may be concerned with economy in terms of resources, time, and energy. In line with the idea of Djiwandono (1996), Gronlund and Linn (1990) emphasize that there are some considerations that IJEE (Indonesian Journal of English Education), 4 (2), 2017 170-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license can be used to see the practicality of the test. The first is the use of test administration. For this purpose, the direction should be simple and clear, the subtest should be relatively few, and the timing of the test should not be too long. The second consideration is timing required for administration; it deals with allocated time to do the test. The other consideration is the ease of scoring which includes the clarity in the directions for scoring and simplicity in the scoring key. The following consideration is cost of testing which is important in selecting a test. The last is economy. Gronlund and Linn (1990, p.103) explain that testing should be relatively inexpensive and cost should not be a major consideration. One of the types of tests that a teacher almost certainly needs to make is an achievement test. There are two types of achievement test: they are formative and summative tests (Brown, 2004, p. 48). A formative test aims at measuring the extent to which students have mastered the learning outcomes of a rather limited segment or instruction, such as a unit or a textbook chapter (Gronlund & Waugh, 2009, p.7). A summative test or it is also known as summative assessment aims to measure, or to summarize what students have grasped, and typically occurs at the end of a course or unit of instruction (Brown, 2004, p. 6). Popularly, the test that is mostly and continually carried out by classroom teacher is a summative test to know the students’ mastery of the course. So, as it is crucial to know what the students have grasped, the concern about the summative test in reading needs to get greater attention. Nowadays, considerable attention is paid to the nature a test as a part of three partite functions of assessment: assessment of learning, for learning, and that as learning. Earl, Katz, and WNCP team (2006, p. 55) state that assessment of learning refers to strategies designed to confirm what students know, demonstrate whether or not they have met curriculum outcomes or the goals of their individualized programs, or to certify proficiency and make decisions about students’ future programs or placements. It is designed to provide evidence of achievement to parents, other educators, the students themselves, and sometimes to outside groups (e.g., employers, other educational institutions). It means that assessment is a crucial tool to show the students’ learning mastery of the lesson based on the curriculum applied and further to decide what fits them in the future. Assessment of learning is in other words on the students’ side. On the other hand, Earl, Katz, and WNCP team (2006, p. 29) also state that assessment for learning occurs IJEE (Indonesian Journal of English Education), 4 (2), 2017 171-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license throughout the learning process. It is designed to make each student’s understanding visible so that teachers can decide what they can do to help students progress. In this part, teachers should investigate the students in the way they are studying, their problems, etc. to later find out the way to solve them and help them to understand the lesson. Assessment of learning is in other words on the teachers’ side. The last is assessment as learning. Earl, Katz, and WNCP team (2006, p. 41) have stated that assessment as learning focusses on students and emphasizes assessment as a process of metacognition (knowledge of one’s own thought processes) for students. It means that in the process of learning with their own understanding, students can do self-assessment to make sense of the information and use it for new learning under the guidance and the direction of the teacher. Assessment as learning in other words involves both the teachers’ and students’ side as well. Supporting the ideas above, further Sulistyo (2015, p.5) states that assessment then implies an ongoing monitoring process on students’ learning applied as soon as the teaching learning process begins, continuing up to the end of each class session. It informs teachers about their teaching effectiveness, students’ learning progress, and even feedback on the level of implementation of a curriculum. As such, assessment is inseparably aligned to instruction. Further he also states that in a way, if carefully planned and implemented accurately, assessment can provide teachers with a source of useful information to reflect their teaching practices. It means that teaching cannot be separated from testing; they are linked to each other. Test results provide an important basis for the teacher to better design their teaching so that the teaching delivery can boost the students’ performance in learning. In recent days, reading from computer screens is becoming more and more common in human daily life as the amount of reading material available from online is rapidly increasing. This phenomenon has been seen in the field of language assessment such as computer-based tests (CBTs), computer-adaptive tests (CATs) and also TOEFL. As stated by Sulistyo (2009), for instance the advances in computing technology also boosts the presence of the new version of TOEFL, the iBT in 2005 which has been a significant shift from older TOEFL versions of computer based TOEFL (CBT for short) as well as paper-and- pencil based TOEFL (PBT, henceforth). This iBT version, as its name indicates, makes the functional use of information and communication technology (ICT). It IJEE (Indonesian Journal of English Education), 4 (2), 2017 172-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license means that the Internet in testing is already in broad use and it can support and optimize the assessment. One of the proofs that it is in fact quite important is that the growing demands of the services or software in online testing which increases year to year. Mason (1998) and Weisburgh (2003) (as cited in Hricko & Howell, 2006, p. 4) said, “The availability of assessment software to address these tasks is leading to assessment services becoming one of the fastest growing software niches, both in the corporate and in the educational markets.”. Regardless the rapid growth of the demand in this area, development and implementation of this new mode of testing is currently in its initial stages. Therefore, sufficient empirical data, which would allow researchers to look into the soundness of computerized language tests with regard to construct validity and fairness, are yet to be available. STKIP PGRI Jombang is one private university in operation in Jombang, East Java. In this university, the rapid use of the Internet network is also increasing but not yet functioned in the best way. Online assessment is in fact very helpful to not only students but also the lecturers to be the media in assessing processes. As Pallof and Pratt (2009, p. 3) put it to say, “The convenience of working online has proven to be very attractive to students and instructors alike.” Further, Lynch (1997) (as cited in Millsap, 2000, p. 4) found that subjects responded more honestly on computer-administered tests than on paper and that the test- retest reliability was comparable for both groups. This means that online assessment offers convenience more than the traditional one in the now era. In this university, in the Reading Comprehension 2 class, a substantial problem emerges. The test of the course is held by using a face-to-face interview to make the students explore more, to minimize the cheating, and to simplify the test. This face-to-face test is time consuming since with total students of forty has spent six hundreds minutes (10 hours) to assess student reading comprehension. A more efficient yet accurate and reliable test is then needed. The choice is an ICT-based test. By using an online test, the teacher can manage the time in the computer and score student reading performance in the test more quickly. In addition, online assessment is cost effective as lecturers do not need to copy the paper test to the whole students. As it has been said by Dowsing, Long, & Craven, (2000), Weisburgh, (2003) (as cited by Hricko & Howell, 2006, p.11) that “it has been proposed that one of the main advantages of using assessment software over manually assessing IJEE (Indonesian Journal of English Education), 4 (2), 2017 173-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license performance is primarily the savings in cost and time”. In addition, computer- administered testing benefits include rapid up-dates, random item selection, test item banks, and automatic data collection and scoring (Millsap, 2000, p. 6). Practicality will also improve since the manual scoring will not be carried out by the lecturer like paper and pencil tests. As Weisburgh (2003) (cited in Hricko & Howell, 2006, p.11) said “Scoring and evaluating tests used to take a lot of manual effort, whereas software can dramatically reduce, or even eliminate, the manual effort, and results can be instantaneous”. By all the facts elaborated above, this online test has huge possibility to be lower in cost. Another weakness point to be discussed is about the existing reading comprehension test is that the questions are in the form of oral questions, which implies impracticality of administration. Furthermore, these questions do not completely represent the indicators in the syllabus as the questions are only about the content, the generic structure and feature of the test and text building. The test only covers one type of text while the students must know all genres. This fact may lead to invalidity i.e. inaccuracy and error test results because of the teacher’s subjectivity or tiredness. By having an online test, the problems will be solved as Krug (1989) reported that in an estimated ten percent of hand-scored objective tests, errors of one point or more in the final score were made. Computerized test administration ensures accurate test scores (as cited in Millsap, 2000, p.16). Studies on the use of technology in testing have been conducted. A study by Sawaki (2001) aimed to examine the comparability of conventional and computerized tests of reading in a second language. The study used a survey design by a large sample as the subjects of the research. The general trends found in this study indicated that comprehension of computer- presented texts is, at best, as good as that of printed texts (Sawaki, 2001, p. 49). The second study was conducted by Noyes and Garland (2008) that investigated whether computer and paper-based tasks are equivalent. A survey design was conducted by reviewing literature and research. In the study, it is indicated that in some cases, paper and computerized tests were equivalent, but in some cases they were not for example in the form of the test. In addition to this finding, achievement of equivalence in computer-based and paper-based tasks poses a difficult problem. It is probably influenced by the test takers’ confidence in using the computer, and other psychological factors. IJEE (Indonesian Journal of English Education), 4 (2), 2017 174-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license Both studies basically state that computerized and paper-based tests cannot be said equivalent, but now in the year of 2017, it is very possible if they are equivalent or even computerized test will be more effective as people can see some of schools have conducted the computerized test (and the online one). Even now in senior high schools, the national examination is held online, too (UNBK or Ujian Nasional Berbasis Komputer). Teachers certification as well as lecturers certification is also conducted online. It means that online testing is broader in use, becoming more popular and offers more benefits despites its technical challenges. Based on the context described in the previous section, the problem to be addressed in the present study is how can a model of an online reading comprehension test be conceptually and empirically developed to replace the existing test. The present study is therefore an attempt made to conceptually develop a set of an online reading comprehension test and empirically validate the reading comprehension test. Furthermore, the developed product is significant to replace the previous time consuming and non-effective test, to get the students’ achievement score in the end of the lesson and to be a model for test developers (and/or lecturers) to develop a similar test for other reading courses (Reading Comprehension 1, Reading Comprehension 3, and Extensive Reading) or also other courses in general. METHODS The design of the test development model was adapted from Sulistyo (2015, p. 106). To meet the need of the present R&D research, some adaptations were carried out, so the model of the online test development used the following stages: conducting needs assessment, creating content specification/blueprint, blueprint expert review, prototype writing, prototype review, test installing, test and ICT expert review, try-out, item analysis, final form/publishing the final form. The test installing or on-lining the test was carried out after the website namely www.sotaki.com was ready. The stages were logging-in as the admin to start the creating of the online test, creating the course to name the course which is Reading Comprehension 2 course, creating the test to name the test which is Summative test 1 and 2, creating questions to provide the questions, type of questions, the options, the texts in form of images, key answers and the score, test setting which includes timing and score viewing, http://www.sotaki.com/ IJEE (Indonesian Journal of English Education), 4 (2), 2017 175-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license users adding to input the user of the test in the database, publishing to bring the test online so it can be accessed by the students enrolled the course, the last is result exporting to take the data easily for later use. Data in this case refers to the students’ names, scores, duration, timing and others in the excel format for later use in the item analysis stage. The name of the computer program utilized was Chamilo version 1.9.10.2. The design of the needs assessment was qualitative. The instrument was interview to one Reading Comprehension 2 lecturer. It was about how the lecturer previously conducted the test, the form of the test, the reason why choosing certain form of test, the material included in the test and the availability of later online reading test for the students. After the information was gathered, the activity of collecting and preparing appropriate passages in various genres for the material in the body of the test started. Three test and three ICT experts were invited to review and conceptually validate the products. The instrument used was in the form of questionnaire. In the test expert review, it was focusing on the items, the instruction (wording), and the construction of language test. The analysis was qualitatively carried out since the date got was in form of description. In the ICT expert review, it was focusing on the easiness of the instruction, the loading of the questions, the ease of the navigation menu, the readiness of the font and the User Interface generally. The subjects of the tryout involved were 100 students of STKIP PGRI Jombang who had finished their Reading Comprehension 2 course. The decision of choosing the subjects employed simple random sampling. Latief (2012, p. 183) states that simple random sampling technique is the best technique in assuring the representativeness of the sample from the accessible population. It fits the needs of the samples since all students have an equal chance to be the representativeness of the sample. The try-out was carried out within two sessions to minimize the subjects to get tired. A set of questionnaires is also addressed to the subjects. It is about the ease of the instruction, the ease of the questions, the time allotment, the suitability of the test and the material given in the class, the easiness of the texts, the length of the texts, the number of items and the level of difficulty of the items. After conducting the informal try- out, the process of analyzing the test’s result by using software called IJEE (Indonesian Journal of English Education), 4 (2), 2017 176-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license ITEMAN 3.00 was carried out. The reliability is shown by the alpha score, which ranges from 1.00 for perfect reliability to 0.00 for completely unreliable (Ary et al., 2002, p. 261). The item validity can be known by the point-biserial correlation coefficient or symbolized by r-pbis coefficient. It is a statistic used to estimate the degree of relationship between naturally occurring dichotomous nominal scale and an interval or ratio scale (Brown, 2001, p.13), if the coefficient is > .2 it is categorized that the item is good. Item difficulty is shown by the proper correct score (category easy range >.7, moderate range between .3-.7, and difficult is < .3) (Brown, 2001), item discrimination is presented in p-bis coefficients. The categorization of the item discrimination is shown below. Table 1. Item Discrimination Categorization Index range Interpretation ≥ .40 Very good .30-.39 Good .20-.29 Fair ≤ .19 Poor (Adapted from Djiwandono, 2011, p.230) The effectiveness of distractor is important to be known as Brown (2004, p. 60) notes that the efficiency of distractor is the extent to which (a) the distracters “lure” a sufficient number of test takers, especially lower ability ones and (b) those responses are somewhat evenly distributed across all distractors. The efficiency of distractor can be known by the positive of negative value in p-bis key in each item. If there is a positive score of the efficiency distracter it means the distracter should be reviewed or changed. FINDINGS AND DISCUSSIONS Findings The results of the development have been known after the research was carried out in STKIP PGRI Jombang. The Result of Needs Assessment It was found that the previous test was not practical, time consuming, and the material was only few than what it should be tested. The other fact from the interview was the availability of an online test in recent days has become a trend so that the availability of a model of a Reading Comprehension 2 summative test is needed to be carried out. The Test Characteristics Based on the syllabus of Reading Comprehension 2 course, the course intends to measure several micro reading skills that follow: identifying topics, identifying main ideas, identifying specific and detailed information (explicit and implicit), understanding the organization of ideas IJEE (Indonesian Journal of English Education), 4 (2), 2017 177-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license in texts, identifying reference, identifying vocabulary to derive meaning, identifying writer’s tone or purpose and evaluating expressions in context. Based on the indicators stated in the syllabus then the item indicators can be used as a basis to develop test items. Sulistyo (2008) distinguished three domain of skills in reading, they are word attack, sentence attack and text attack skills. Based on the syllabus of Reading Comprehension 2, all these three skills are included. The level chosen is advocating the ideas by Crawley and Mountain (1995, p. 104- 105) as follows: literal and inferential. The critical level is not included since the level of the students is intermediate and the critical level will be beyond of the scope of the competences for them. In the test, the literal level has 40% out of 100 items since it easier, inferential level have 60% out of 100 items. This percentage is taken for the inferential level dealing with inferring implicit information from the text which is more difficult but fit to the students’ level. So, based on the percentage, there are 40 items in the literal level, and 60 items in the inferential level. In this present study the passage theme is mostly those dealing with education, literature, science, life, and entertainment. They range from 212-495 words since the average students are still in the low level of intermediate. Although the biggest number is 495 words but the passage is in the level of 8th which means it is still standard in terms of the level. The readability of the texts that were used is calculated by using Flesch- Kincaid Formula. The result can be seen in Table 2. Table 2 The Result of Flesch-Kincaid Reading Ease Scores and Its Interpretation No The Genre of the Text Flesch-Kincaid Reading Ease Score Estimated Reading Grade Interpretation/ Description Style 1 Narrative (The Necessity of Salt) 73.6 8th Standard 2 Recount (Edgar Allan Poe) 54.5 High School Students Fairly difficult 3 Spoof (Goat jumping into deep hole) 97.7 5th Very easy 4 New Item (Tectonic earthquake sparked, Mt. Merapi’s recent activity) 51.1 High School Students Fairly difficult 5 Descriptive (Macquarie University) 44.5 College Students Difficult 6 Report (A Museum) 41.4 College Students Difficult 7 Explanation (How Was the Earth 46.1 College Difficult IJEE (Indonesian Journal of English Education), 4 (2), 2017 178-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license No The Genre of the Text Flesch-Kincaid Reading Ease Score Estimated Reading Grade Interpretation/ Description Style Formed?) Students 8 Procedure (How to make Candles) 81 6th Easy 9 Analytical (Opportunity in the Global Financial Crisis) 39 College Students Difficult 10 Hortatory (Should not Bring Mobile Phone to School) 67.1 8th Standard 11 Discussion (The advantages and Disadvantages of Distance Learning) 48.3 College Students Difficult 12 Review (2012 film) 56.2 High School Students Fairly difficult 13 News Item (Strait of Malacca still not safe from pirates) 57.2 High School Students Fairly difficult 14 Hortatory (Why Should Wearing a Helmet when Motorcycling) 56.9 High School Students Fairly difficult 15 Analytical (Death Penalty) 68.1 8th Standard 16 Explanation (How does body react to the heat?) 69 8th Standard 17 Discussion (Pro and con of Computers for Students) 65.1 8th Standard 18 Review (Twilight) 53.6 High School Students Fairly difficult 19 Narrative (The Colossal UFO) 79.3 7th Fairly easy 20 Recount 88 6th Easy 21 Report (Dolphin) 56.8 High School Students Fairly difficult The Result of Expert Review There were two domains of experts in the validation stage. There were test experts and the ICT experts. The test experts did validation twice, the first one was about the blueprint review validation and the second one was the online test or the product itself. Blueprint Review Based on the feedback from the three experts, the inputs were about the level of skills, the numbering of the items, the grammar, the order of the item indicators, and title for the texts and record for number of sub competences to be rationally balanced. Test Review The inputs were the running of the try-out which should be divided into two sessions to diminish tiredness of subjects which can influence the result, the readability, the order of questions based on paragraph, and the language mistakes. The last was about the sources, quality of options and IJEE (Indonesian Journal of English Education), 4 (2), 2017 179-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license grammar, the level of difficulty and face validity checking. Suggestions from the three ICT experts were about the type of passage format, the attractiveness of the test, the use of auto-save for the saving, and the interface. The Result of Item Analysis Based on the ITEMAN analysis, it was found that the alpha reliability of the test was .653, which was categorized as acceptable and fair. The next analysis was item difficulty, which result is shown in the Table 3. Table 3. The Results of Item Difficulty Analysis Index Range Category Item Number % > 0.7 Easy 9,11,18,29,34,38,45,46,51,52, 56,57,60,69,72,87,88,90,94,97 20 0.3-0.7 Moderate 2,4,5,8,10,13,15,16,17,20,21,23,25,26,2 7,30,31,32,33,35,36,37,40,41,44,48,50, 54,55,61,62,63,65,66,68,71,73,74,75,76 ,77,78,79,80,81,82,83, 93,95,100 50 < 0.3 Difficult 1,3,6,7,12,14,19,22,24,28,39, 42,43,47,49,53,58,59,64,67,70, 84,85,86,89,91,92,96,98,99 30 Table 4. The Results of Item Discrimination Analysis Index Range Interpretation Item Number % ≥ .40 Very good 9,16,18,27,30,32,36,38,44,45,46, 64,70,74,78,83,85,90,92,94,95,97 22 .30-.39 Good 8,11,15,24,35,41,49,50,53,62, 63,65,66,72,75,87,100 17 .20-.29 Fair 1,4,6,12,13,23,26,29,33,40,48, 55,60,68,73,80,84,89,99 20 ≤ .19 Poor 2,3,5,7,10,14,17,19,20,21,22,25,28, 31,34,37,39,42,43,47,51,52,54,56, 57,58,61,67,69,71,76,77,79, 81,82,86,88,91,93,96,98 41 Based on the result shown in the table 3, there are 20 easy items, 50 moderate items, and 30 difficult items. In order to know how good the item in discriminating the low and high ability students, the analysis of item IJEE (Indonesian Journal of English Education), 4 (2), 2017 180-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license discrimination was carried out. From the ITEMAN version 3.00, the result is presented in the Table 4. There are 22 items categorized as very good items, 17 items as good items, 20 items as fair items and 41 items as poor items. Regarding the item validity, based on the result in the ITEMAN, the item validity is shown in Table 5. From the result shown in the table 5, it can be seen that there are 31 items categorized as valid items and 69 items categorized as not valid items. These 69 items were dropped from the product and only 31 valid items were used. The last analysis was the effectiveness of distractor. Based on the data from ITEMAN result analysis, there are 32 items which have suggested answer keys. These 32 items were dropped from the products and they were items numbers 2, 7, 19, 20, 21, 22, 24, 25, 28, 31, 39, 42, 43, 47, 51, 54, 56, 57, 58, 60, 61, 67, 71, 73, 77, 79, 81, 86, 91, 93, 98, 99. The 31 good items were run to the ITEMAN 3.00 to be re-analyzed. The reliability is shown by the alpha score, which score is 0.779 and it can be categorized as good and can be used as the items in the test. The next thing is item difficulty, as shown in Table 6. There are 8 items categorized as easy items, 17 items as moderate items and 6 items are difficult items. Table 5 The Result of Item Validity Analysis Index Range Item Number % Interpretation r > 0.2 8,9,15,16,18,27,30,35,36,38,44,45,46,49,53, 31 Valid items 62,64,65,66,70,74,78,83,85,90,92,94,95,97,100 r < 0.2 1,2,3,4,5,6,7,10,11,12,13,14,17,19,20,21,22,23, 69 Not valid items 24,25,26,28,29,31,32,33,34,37,39,40,41,42,43, 47,48,50,51,52,54,55,56,57,58,59,60,61,63,67, 68,69,71,72,73,75,76,77,79,80,81,82,84,86,86 88,89,91,93,96,98,99 Table 6 The Result of Item Difficulty Analysis Index Range Category Item Number f % > 0.7 Easy 2,5,11,13,14,26,28,30 8 26 0.3-0.7 Moderate 1,3,4,6,7,8,9,10,12,17,19,20,22,23,24,29,30 17 55 < 0.3 Difficult 15,16,18,21,25,27 6 19 IJEE (Indonesian Journal of English Education), 4 (2), 2017 181-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license From item difficulty, then the item discrimination has also run, the result has been shown in the Table 7. Based on the result, 25 items are categorized as very good and 6 items are categorized as good which means that they can discriminate the students well. Related to the item validity, all the 31 items are categorized as valid items and later for the easiness of scoring and the balanced number of the indicators under interest, the used items are only 25 items. The Result of Students’ Questionnaire Analysis To gain the information about how the online test worked for the subjects’ point of view, questionnaires with 10 multiple choice items and 2 essay questions were distributed to the 100 subjects. The result of the subjects’ answer is presented in the table 8. The typical format appearance of the product of the present study is presented in the figure 1. Table 7. The Result of Item Discrimination Analysis Index Range Interpretation Item Number f % ≥ .40 Very good 2,3,4,5,6,7,8,10,11,12,13,14,15, 17,18,21,22,23,24,25,26,27,28, 29,30 25 81 .30-.39 Good 1,9,16,19,20,31 6 19 .20-.29 Fair - - ≤ .19 Poor - - Table 8. The Result of Item Validity Analysis Index Range Item Number f % Interpretation r > 0.2 1,2,3,4,5,6,7,8,9,10 11,12,13,14,15,16,17,18,9,20 21,22,23,24,25,26,27,28,29,30,31 31 100 Valid r < 0.2 - - Not Valid IJEE (Indonesian Journal of English Education), 4 (2), 2017 182-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license Table 9. The Result of Students’ Questionnaire No Questions Often Seldom Never Sum f % f % f % 1 Before this test, how often have you been doing this type of online test? 5 5 46 46 49 49 100 Very Easy Fairly Easy Very Difficult 2 Are the instructions easy to be understood? 42 42 56 56 2 2 100 Very Clear Fairly Clear Less Clear 3 Is the way to answer the question clearly written? 65 65 33 33 1 1 99 Very Easy Fairly Easy Very Difficult 4 Generally, are the questions easy to be understood? 10 10 63 63 25 25 98 Very Enough Fair Less 5 Is the time allocation enough? 10 10 49 49 41 41 100 6 Generally, what do you think about the instructions to do the test? 97 % subjects said that the instructions are clear, simple and easy to be understood. 3 % said that the instruction is too many, but still it is clear. Very Suitable Fairly Suitable Less Suitable 7 Is the test suitable with the material given in the classroom? 30 30 64 64 6 6 100 Very Easy Fairly Easy Very Difficult 8 Based on the text difficulty level, are the texts easy to be understood? 5 5 50 50 44 44 99 Too Many Fair Less 9 Based on the number of the items, how are they? 34 34 65 65 1 1 100 Too Long Fair Too Short 10 Based on the lengths of the texts, how are they? 40 40 56 56 3 3 99 Very Difficult Fairly Difficult Easy 11 Based on the difficulty level generally, how are they? 16 16 81 81 3 3 100 12 Generally, what is your opinion about this online reading comprehension 2 test? 86 % subjects said that the online test is good, interesting, effective, practical, has less chance of cheating, fun and do not need to open the page too often, go along with the era, but 14 % subjects somehow said it also makes the eyes tired, the time is less and it is difficult. IJEE (Indonesian Journal of English Education), 4 (2), 2017 183-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license Figure 1. Final Summative Test Online IJEE (Indonesian Journal of English Education), 4 (2), 2017 184-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license Discussion The result of needs assessment has revealed all the problems in the previous test, which is considered to be impractical. This online test is practical since it is easy in administration, easy in scoring and interpreting the result. The previous test is time consuming while this online test is time effective. The previous test covers only one genre while this online test covers all of the genres. The additional benefits of this online test are that this online test is cost effective and up to date. All the result of the needs assessment indicated that the online test has fulfilled the theory of criteria of a good test elaborated above by Djiwandono (1996) and Gronlund & Linn (1990). This online test is also has the advantages as what previous study by Noyes and Garland (2008) elaborated for example the richness of interface, accessible at home, less error in administration, online scoring which is greater in accuracy and less human error, and cost saving. Singh, Rylander & Mims (2012) also support the increase use of the Internet. They said that as preferences for online learning increases, mostly due to the convenience and flexibility it offers students, universities find themselves increasing the number of online format courses to meet the growing demand (p.96). Coiro (2014, p.12) added that there are many opportunities when students do learning activities online, such as question, wonder, and think more deeply about things with puzzle games, creating digital products , it also offers time for students to practice questioning, locating, evaluating, and synthesizing information collaboratively with a partner or in a small group (Coiro, p.16). Based on those facts, it is argued that the test in the present study can overcome technical problems in the previous tests ever developed. As the items analysis was run, most of the items, the 69 items were invalid items, which should be dropped from the test. This means that only 31 items can be saved and used for the test. The reliability that is shown by the Alpha coefficient is .779, which can be categorized as good. The coefficient demonstrated that this scores generated from the test are consistent and reliable across measurement to show the real student’s performance. The result indicates that this online test has one more quality of a good test in terms of reliability as explained above by Brown (2004). The questionnaires show that most students respond positively toward the online test. Most of them respond that the test instruction (the instruction to operate the test and to answer the IJEE (Indonesian Journal of English Education), 4 (2), 2017 185-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license question) is generally easily understood which means the instruction is clear, causing no bias. They also respond positively that the questions are easily understood. The time is sufficient which means that the texts, the questions, and the time allocation are proportional to their level. The result is in line with what stated above by Gronlund and Linn (1990) about practicality and Zandvliet and Farragher, (1997) as cited in Noyes and Garland (2008, p.1369) about the advantages of computer testing. The material used in the test are suitable which means the test does not cover the material that was never taught in the classroom. A number of the subjects (44%) stated that the passages are difficult, which possibly because some students are actually in the lower proficiency level while this test is designed for the intermediate ones as it is stated in the syllabus. This fact also could be a reason behind the non- optimal alpha score. The last is about the subjects’ opinion. Although few subjects say that the online test makes the eyes tired, mostly they say that the online test is good, interesting, effective, fun, practical, minimizing the chance of cheating. They also think that they do not need to open the page too often, and the test goes along with the ICT era. This means that the availability of this online test overcomes the problem emanating from the previous test used. CONCLUSION AND SUGGESTION The conclusions comprise the strengths and also the weaknesses of the product of this research. Related to the strengths, first, the product of this research can be a model of an online reading summative test in STKIP PGRI Jombang. Second, based on the try-out stage, it is shown that some items of the proposed test are valid and reliable. The product of this research is packaged into one part. It covers the blueprint, the test in the paper printed form and the screenshot of the online version, the answer key, and the instruction for access to the online test. As the product has strengths, it also has weaknesses. The final product of this test only consists of 25 items due to the elimination of the non-valid items. The reading level is not in the precise percentage as this study suggested. This product has no construct validity process to reveal the psychological quality of the students. In addition, this study is still at the automaticity process from the paper-based format to the computer-format one. Some suggestions are presented after completing the whole processes in conducting this research. This online test can be a model for other reading IJEE (Indonesian Journal of English Education), 4 (2), 2017 186-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license courses and also other courses in general in conducting tests since it has been validated. This product can be an insight for the effectiveness of an online reading test in enhancing students’ reading motivation with better qualifications for example random setting. And as this research had limited subjects (only 100 subjects), it is suggested that future researcher can have larger subjects to gain more reliable and valid result. Further, although low, but as this test still open the chance for the students to do the cheating, so the researcher will be working on the online test in randomized options. This attempt is hoped to not only diminish the cheating action but also increase the students’ independence and self-esteem. REFERENCES Ary, D., Lucy, CJ., & Asghar, R. (2002). Introduction to Research in Education. (Sixt Edition). Belmont: Wadsworth. Brown, H.D. (2004). Language Assessment: Principles and Classroom Practices. White Plains: Pearson Education. Brown, H.D. (2007). Teaching by Principles: An Interactive Approach to Language Pedagogy (3rded). White Plains, NY: Pearson Education. Brown, J.D. (2001). Statistics corner: Questions and Answers about Language Testing Statistics: Point Biserial Correlation Coefficients. Shiken: JLT Testing & Evlution SIG Newsletter. 5 (3):13-17. Retrieved from http://jalt.org./test/bro_12/htm on 12/12/2013. Coiro, J. (2014). Online Reading Comprehension: Challenges and Opportunities. Retrieved from https://www.researchgate.net/publi cation/277897021_online_reading_co mprehension_challenges_and_opport unities on 5/9/2018. Djiwandono, M.S. (1996). Tes Bahasa Dalam Pengajaran. Bandung: Penerbit ITB Djiwandono, M.S. (2011). Tes Bahasa: Pegangan bagi Pengajar Bahasa. Jakarta: PT Indeks. Earl, L, Steven, K., & WNCP team. (2006). Rethinking Classroom Assessment with Purpose in Mind: Assessment for Learning, Assessment as Learning, Assessment of Learning. Western and Northern Canada: Ministers of Education. Gronlund, N. E., & Linn, R. L. (1990). Measurement and Evaluation in Teaching (Sixth Edition). New York: Macmillan. Gronlund, N.E., & C. Keith, W. (2009). Assessment of Students Achievement. (Ninth Edition). Upper Saddle River: Pearson Education. Hricko, M., & Scott L.H. (2006). Online Assessment and Measurement: Foundations and Challenges. Hershey: Information Science Publishing. Latief, M.A. (2012). Research Methods on Language Learning: An introduction. Malang: UM Press. Millsap, C.M. (2000). Comparison of Computer Testing Versus Traditional Paper and Pencil Testing. Published Dissertation. Denton: Department of http://jalt.org./test/bro_12/htm%20on%2012/12/2013 http://jalt.org./test/bro_12/htm%20on%2012/12/2013 https://www.researchgate.net/publication/277897021_ONLINE_READING_COMPREHENSION_CHALLENGES_AND_OPPORTUNITIES https://www.researchgate.net/publication/277897021_ONLINE_READING_COMPREHENSION_CHALLENGES_AND_OPPORTUNITIES https://www.researchgate.net/publication/277897021_ONLINE_READING_COMPREHENSION_CHALLENGES_AND_OPPORTUNITIES https://www.researchgate.net/publication/277897021_ONLINE_READING_COMPREHENSION_CHALLENGES_AND_OPPORTUNITIES IJEE (Indonesian Journal of English Education), 4 (2), 2017 187-187 http://journal.uinjkt.ac.id/index.php/ijee | DOI: http://dx.doi.org/10.15408/ijee.v4i2.8344 P-ISSN: 2356-1777, E-ISSN: 2443-0390 | This is an open access article under CC-BY-SA license Philosophy University of North Texas. Noyes, J.M. & Garland, K.J. (2008). Computer- vs. paper-based tasks: Are they equivalent?. Ergonomics. Vol. 51, No. 9 pp. 1352–1375. Sawaki, Y. (2001). Comparability of Conventional and Computerized Tests of Reading in a Second Language. Language Learning & Technology. Vol. 5, No. 2 pp. 38-59. Singh, S., Rylander, D.H., Mims, T.C. (2012). Efficiency of Online vs. Offline Learning: A Comparison of Inputs and Outcome. International Journal of Business, Humanities and Technology. Vol. 2 No. 1; January 2012. Retrieved from http://ijbhtnet.com/journals/Vol_2_ No_1_January_2012/12.pdf on 5/9/2018. Sulistyo, G.H. (2007). Tests, Assessment, and Measurement in English as a Second Language at Schools. Malang: State University of Malang Press. Sulistyo, G.H. (2009). TOEFL in a Brief Historical Overview from PBT to IBT. Retrieved from http://sastra.um.ac.id/ on 3/9/2012. Sulistyo, G.H. (2011). Reading for Meaning: Theories, Teaching Strategies, and Assessment. Malang: Pustaka Kaiswaran. Sulistyo, G.H. (2015). EFL Learning Assessment at Schools: An Introduction to Its Basic and Principles. Malang: Bintang Sejahtera. STKIP PGRI Jombang. (2010). Syllabus of Reading2. Jombang: English Department. http://ijbhtnet.com/journals/Vol_2_No_1_January_2012/12.pdf%20on%205/9/2018 http://ijbhtnet.com/journals/Vol_2_No_1_January_2012/12.pdf%20on%205/9/2018 http://ijbhtnet.com/journals/Vol_2_No_1_January_2012/12.pdf%20on%205/9/2018 http://sastra.um.ac.id/