Englisia NOVEMBER 2017 Vol. 5, No. 1, 17-28 CORRELATION BETWEEN ABILITY TO RECOGNIZE SENTENCE ERRORS AND ABILITY TO PRODUCE GRAMMATICALLY CORRECT UTTERANCES Masrizal University of Southampton, England masrizal@unsyiah.ac.id ABSTRACT This article summarizes and reports an empirical study investigating students’ ability in recognizing grammatical errors and producing grammatically correct sentences. 38 university students were involved in a set of grammar tasks which were specifically created to measure their ability to both identify errors and avoid them in language productions. The main purpose of the study is to prove whether their ability to pinpoint errors within sentences resembles their ability in producing grammatically correct sentences using the same features. The study also measures the appropriateness of the test items in order to see how it affects students’ performance. Final test data collected from the students in two different groups reveal that their ability to recognize sentence errors has positive correlation to their ability to produce correct sentences. The correlation figure among the more proficient students (group 2) is relatively larger, indicating that the amount of knowledge on relevant features positively influences, to a certain extent, the quality of language production and responses. Keywords: error recognition; sentence production INTRODUCTION The ability to spot grammatical errors in sentences is a very important skill required from L2 students. In an academic setting, this ability shows learner’s proficiency of a particular language, both in passive (receptive) and active (productive) skills (Read, 2015). In a passive context, an L2 learner is required to be able to recognize errors and decide whether a sentence, or parts of it, has fulfilled CORRELATION BETWEEN ABILITY TO RECOGNIZE SENTENCE ERRORS AND ABILITY TO PRODUCE GRAMMATICALLY CORRECT UTTERANCES 18 | Englisia Vol. 5, No. 1, NOVEMBER 2017 necessary grammatical requirements. On the other hand, this skill is necessary when the learner is required to supply a part of a phrase or sentence into either written or oral production. It is essentially required to assure that the produced utterances comply with basic language requirements. This particular study has particularly looked at students’ ability in recognizing English language errors and supplying correct parts into sentences. Thirty eight undergraduate university students have been involved in a set of grammar tests which took place in two different classes. The main purpose of this study was to look at whether participants’ ability in recognizing sentence errors correlates positively with their ability in producing correct sentence structures. In addition, it would finally try to evaluate the appropriateness of the test items by using two different measures, difficulty index and discrimination index. Test Specifications Table 1:Test Specification Purpose of the instrument This test was designed to assess test-takers’ ability in recognizing English sentence errors and supplying correct parts in the similar context. It would also predict whether their ability to recognize sentence errors resembles their ability to produce correct form in the same context. Construct or domain that will be measured English grammar knowledge was assessed in this test. Length of the test Thirty minutes. Context in which the instrument is to be used This instrument was used in an English medium education. In this case, it is used to assess university students in the department of English Education. Characteristic of intended participants Participants are university students from the Faculty of Education and Teacher Training, majoring English Education. Two groups of participants took part in the test, one of which being in the third semester while the Masrizal Englisia Vol. 5, No. 1, NOVEMBER 2017 | 19 other is in the fifth semester. Conditions and procedure of administering the instrument The test was administered in two sample classes by an assigned lecturer. Test sheets were manually distributed and participants had been required to complete the test within the allocated time. Procedures of scoring For the multiple choice questions in part A, each correct answer was given one point. Incorrect and unanswered questions were marked ‘0’. For part B, the marking procedure is still the same. However, spelling was checked before deciding whether an answer was correct or wrong. If the word was misspelled, but lead to a correct answer, it would be regarded as correct. Intended level of difficulty This test is designed for intermediate to lower advanced level of English grammar ability. Reporting of the results Correlation between skills in each part, item difficulty, and discrimination index. DISCUSSION How construct validity is ensured and checked In order to establish construct validity for this test, every endeavour has been done to prevent the presence of two main threats to construct validity identified by Messick (1989), construct under-representation and construct-irrelevant variance. As further discussed by Zheng and De Jong (2011), a number of ways could be alternative solutions in order to prevent the presence of the threats. To avoid construct-under representation, the tasks had been ensured to have sufficient coverage of target language situations, especially in regards to situational and interactional authenticity (Bachman & Palmer, 1996). However, since this only assesses grammar knowledge, the efforts have been done to assure that all the questions given are relevant to the test takers background knowledge. CORRELATION BETWEEN ABILITY TO RECOGNIZE SENTENCE ERRORS AND ABILITY TO PRODUCE GRAMMATICALLY CORRECT UTTERANCES 20 | Englisia Vol. 5, No. 1, NOVEMBER 2017 In regards to construct-irrelevant variance, it has been ensured that no test- takers were advantaged or disadvantaged by the test as a result of their personal background. Everyone speaks the same first language and is learning English as a foreign language. Everyone shares the same topical knowledge and, therefore, the probability of providing correct answers are purely dependent on their own personal knowledge regardless of any non-academic background everyone shares (Kuncel & Sackett, 2014). Impact of consequences of the test on stakeholders This test is expected to give an overview about students’ strengths and weaknesses in analysing English sentence errors. Often, non-native English speakers tend to have better ability in recognising errors from pre-produced sentences, while at the same time they struggle to produce such utterances on their own. From this mini test, which obviously covers limited features of English grammar, it is expected that their weaknesses can be revealed so that further adjustments can be made in regards to teaching materials, classroom test design, and lesson coverage. Design of assessment tasks and scoring system As previously mentioned, the test instrument consists of two separate but related parts. Part A and B assess analytical and productive skill respectively. Further details and how the questions in both parts are connected to each other will be given in the following table. Part A The following sentence parts have been supplied incorrectly. Participants have to identify which one is incorrect in each particular sentence. Q uestio ns Part B The basic form of the following sentence parts have been provided. Participant need to insert/supply them into the gap by using correct forms. S-V agreement (incorrect verb form) Q1 S-V agreement. (supply correct verb form) S-V agreement (incorrect copula verb) Q2 S-V agreement. (supply correct copula) S-V agreement (incorrect copula verb) Q3 S-V agreement. (supply correct copula) S-V agreement(incorrect passive verb form) Q4 S-V agreement. (supply auxiliary in passive form) S-V agreement (incorrect verb form) Q5 S-V agreement. (supply correct verb form) Masrizal Englisia Vol. 5, No. 1, NOVEMBER 2017 | 21 Inverted subject and verb(incorrect copula) Q6 Inverted subject and verb(supply correct copula) S-V agreementwith either … or (incorrect copula verb) Q7 S-V agreementwith either … or (supply correct copula verb) Parallelism in object/complement (to+infform is not parallel) Q8 Parallelism in object/complement (supply parallel -ingform) Aux+V(modal + inf, incorrect infinitive) Q9 Aux+V(modal + inf, supply correct infinitive) Aux+V(aux had + past participle, incorrect pp) Q10 Aux+V(aux has + past participle, supply correct ‘be’ form) Correlative conjunctionnot only…but (incorrect pair) Q11 Correlative conjunctionnot only…but (supply correct pair) Word form Q12 Word form Plural & singularnoun using amount vs. number(incorrect reference) Q13 Plural & singularnoun using amount vs. number(supply correct reference) Pronoun (incorrect pronoun) Q14 Pronoun (supply correct pronoun) Administration of assessment tasks The test took place in two grammar classes at Syiah Kuala University, Indonesia. These classes were chosen due to the availability of access to the targeted participants. Special authorisation had initially been granted and the class lecturer, who happens to be a colleague of mine, had initially expressed her willingness to distribute the test materials as well as to administer the test herself. Prior to the test date, a research assistant has been hired to prepare the test materials and handed them to the lecturer. Thirty eight students coming from grammar 1 and grammar 3 classes (further labelled as group 1 and 2 respectively) participated in the test. Technically, the students from grammar 3 class are considered to be more proficient in English grammar, while those from the other group are mainly starters or at pre-intermediate level at the most possible. Therefore, I expected to see better results produced by the students from grammar 3 group due to having higher proficiency. CORRELATION BETWEEN ABILITY TO RECOGNIZE SENTENCE ERRORS AND ABILITY TO PRODUCE GRAMMATICALLY CORRECT UTTERANCES 22 | Englisia Vol. 5, No. 1, NOVEMBER 2017 As mentioned elsewhere in this paper, this test consists of two different parts. The first part contain multiple choice items about recognising sentence errors, while the other is a kind of filling the gap questions in which test takers are required to productively supply correct grammatical forms into the gap. Each item consists of three answer choices, except in number 11 to 14 of part B. There are still debates over whether a multiple choice test item should contain fewer or more options (Lee & Winke, 2013). In this test, the options are kept to minimum so that test-takers’ needs for testwiseness to succeed can be minimized (Rogers & Harley, 1999).The instruction for the test has been provided as clear as possible in order to assure that the test was completed within the time provided. It is expected that their level of proficiency in these two tasks can be distinguished after completing the test. Scoring of performances and analysis of results The scoring for this test has been done as simple as possible. Each correct answer is worth one point, while the incorrect or unanswered items are not given any score. The answers, along with the score obtained by all participants were then calculated and summarized in relevant tables to be further analyzed appropriately. Evaluation of participants’ performance Group 1 After running Pearson’s correlation test on SPSS, it is clearly seen that the Pearson’s r coefficient for this particular group is 0.251. This means that there is a positive, but small, correlation between the questions in part A and B. However, at Sig. (2-tailed) value of 0.299, which is greater than 0.05, we can determine that there is no statistically significant correlation between the two variables. These lead us to conclude that a better score in one part of the test, i.e. Part A, might probably have a little contribution to the increase in the other, i.e. Part B, or vice versa. Correlations PartA PartB PartA Pearson Correlation 1 .251 Sig. (2-tailed) .299 N 19 19 PartB Pearson Correlation .251 1 Sig. (2-tailed) .299 N 19 19 Masrizal Englisia Vol. 5, No. 1, NOVEMBER 2017 | 23 In addition, the distribution of test results by each participant can be seen in the following scatterplot chart. Considering this above correlation value, there is still possibility that the results of the test are not fully representative to the actual students’ proficiency. With this type of results, the chances that some answers come from guessing are big, especially if we look at particular results of individual questions. Average correct answer achieved by the whole group is 8.2 out of 14 in part A, while in part B there are only 6.3. This simply shows that part A seems to be easier for them considering that they do not have to produce their own form of answer. In question number 10, for example, Pearson’s correlation coefficient -0.368 at 0.121 level of significance proofs that this particular question in one part of the test correlate negatively with its relevant pair in the other. Therefore, we cannot confidently confirm whether a student who is good in one part of the test would perform equally in the other. Group 2 Within this particular group of students, the test seems to reveal slightly different but higher figures. Based on the Pearson correlation coefficient of 0.548, we can see that there is a bigger positive correlation between part A and part B scores of the test. At 0.015 significance level (2-tailed), which is obviously lower than CORRELATION BETWEEN ABILITY TO RECOGNIZE SENTENCE ERRORS AND ABILITY TO PRODUCE GRAMMATICALLY CORRECT UTTERANCES 24 | Englisia Vol. 5, No. 1, NOVEMBER 2017 0.05, the correlation between the two parts of the test is significant. Therefore, we can conclude that, among these participants, a good achievement in one part of the test, i.e. Part A, can somehow predict a better accomplishment in the other, i.e. Part B, or vice versa. Correlations PartA PartB PartA Pearson Correlation 1 .548* Sig. (2-tailed) .015 N 19 19 PartB Pearson Correlation .548* 1 Sig. (2-tailed) .015 N 19 19 *. Correlation is significant at the 0.05 level (2-tailed). A clear overview on how scores are positively distributed can be seen in the following scatterplot graph. From the graph, it is confirmed that students’ achievement in part A of the test reflects their score in the other part. The group’s mean score of correct answer also confirm this, which is 9.5 correct answers for both part A and B. In a simple definition, the number of correct answer they score in one part of the test is not very different from the one in the other part, except for a few students. This is sufficient to tell us that most of the participants in this group have a better proficiency level in Masrizal Englisia Vol. 5, No. 1, NOVEMBER 2017 | 25 recognizing sentence errors and, at the same time, producing grammatically correct English sentences on their own. Evaluation of assessment instrument In order to determine whether the test items are appropriate or not, a set of item analysis is required. This section of the paper will discuss two different measures of item analysis called difficulty index and discrimination index.The following table will provide an overview and numerical figures regarding these, while further details will be discussed in the subsequent sections. Item Analysis using Difficulty Index This measure is used to determine the level of difficulty of the test items. To do this, the proportion of student who answered the test item need to be calculated accurately. This will give information whether a test item is relatively easy or difficult, and if it needs replacing or not. This is done by simply dividing the number of students who choose the correct answer by the total number of students. This formula will reveal the level of difficulty of each item, also known as p-value. A general ‘rule of thumb’ is that an item is relatively easy if the difficulty is more than 0.75, whereas it is more difficult if the difficulty is below 0.25 (FCIT, 2016). Therefore, for example, if an item is answered correctly by 85% of test takers, it would have an item difficulty, or p-value, of 0.85 (Matlock-Hetzel, 1997). Based on the figures in the item analysis table, it is clearly seen that the difficulty index of each question from this test varies from 0.05 to 1.00. Students in group 1 seem to struggle more with a number of questions in part B and with very few in part A. Moreover, students in group 2 look more proficient with only one issue in each part of the test. It is also clear that the average difficulty index value is different between the two groups, with group 1 having lower score. According to the p-value, question 11 in part B seems to be the easiest question to both groups. Question 13 of part A appears to be the most problematic one for both groups. Since index of difficulty measure of both groups are below 0.25, this is considered to be a difficult item to all students of different proficiency level. This suggests that this item needs to be reviewed and replaced if necessary. Another CORRELATION BETWEEN ABILITY TO RECOGNIZE SENTENCE ERRORS AND ABILITY TO PRODUCE GRAMMATICALLY CORRECT UTTERANCES 26 | Englisia Vol. 5, No. 1, NOVEMBER 2017 important point is that students in group 2, which is supposedly a higher performing group, seem to struggle the most in question 1 in part B. Surprisingly, students in group 1 have recorded a completely opposing result, confirming that they seem to perform better in this particular question item. Overall, only a small number of questions are either too easy or too difficult. Item Analysis using Discrimination Index This measure is used to know how well an assessment differentiates between high and low scorers. In this regard, we would like to know how often the high- performing test takers would select right answers for each question in comparison to the low-performing ones. If an assessment has a positive discrimination index (which is between 0 and 1), high score participants are expected to choose correct answer for specific questions more often than those with lower total score. On the other hand, an assessment is considered having a negative discrimination index (between -1 and 0) if this happens otherwise (FCIT, 2016). Discrimination index can be determined by subtracting the number of students in lower group who got correct answer from the ones in the upper group, then divided the number of half of the total samples. According to the measurement results, question 2 in part A has a negative discrimination index in both groups. This means that low performing students are more likely to get this item correct. Considering this, the items need to be carefully analyzed and probably deleted or changed. Apart from this, a number of other questions with negative discrimination index need to be further reviewed, suggesting the replacement of the items or simply re-writing them. Furthermore, it is interesting to see that four question items given to participants in group 1 have recorded a 0.00 discrimination index. For group 2, there are 3 such questions. This simply means that both the high-performing and low-performing students in each group did not find these questions as being too easy, which indicates that the items are doing a great job to challenge the test- takers. Overall, the fact that most of the items have positive discrimination index has led us to assume that most of these questions are appropriate enough to the students, with a small number of them need to be either revised or discarded. Masrizal Englisia Vol. 5, No. 1, NOVEMBER 2017 | 27 CONCLUSION Based the result of the study, a number of conclusions can be reported as in the following. 1. Correlation between assessment parts The results of Pearson’s correlation test proves there is a correlation between students’ ability in recognising sentence errors and supplying correct parts of sentences. Based on this result, participants in group 2 produce a higher correlation score, which helps us assume that their ability to recognize errors perfectly matches with their ability to produce correct forms of sentence parts. 2. Item Analysis results In terms of the test instrument, item analysis through item difficulty and item discrimination index has proven that most of the test items are appropriate. However, some questions, with either too high or too low difficulty index, will need to be reviewed, revised, or even discarded. Likewise, items with negative discrimination index will also need to be treated as such. REFERENCES Bachman, L. F., & Palmer, A. S. (1996). Language Testing in Practice: Designing and Developing Useful Language Tests: OUP Oxford. FCIT. (2016). Classroom Assessment. Retrieved 10 January 2016, from http://fcit.usf.edu/assessment/selected/responsec.html Kuncel, N. R., & Sackett, P. R. (2014). Resolving the assessment center construct validity problem (as we know it). Journal of Applied Psychology, 99(1), 38. Lee, H., & Winke, P. (2013). The differences among three-, four-, and five-option- item formats in the context of a high-stakes English-language listening test. Language Testing, 30(1), 99-123. Matlock-Hetzel, S. (1997). Basic Concept in Item and Test Analysis. Retrieved 10 January, 2016, from http://ericae.net/ft/tamu/Espy.htm Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational researcher, 18(2), 5-11. Read, J. (2015). Assessing English Proficiency for University Study: Palgrave Macmillan. Rogers, W. T., & Harley, D. (1999). An empirical comparison of three-and four- choice items and tests: susceptibility to testwiseness and internal consistency reliability. Educational and Psychological Measurement, 59(2), 234-247. CORRELATION BETWEEN ABILITY TO RECOGNIZE SENTENCE ERRORS AND ABILITY TO PRODUCE GRAMMATICALLY CORRECT UTTERANCES 28 | Englisia Vol. 5, No. 1, NOVEMBER 2017 Zheng, Y., & De Jong, J. (2011). Research note: Establishing construct and concurrent validity of Pearson Test of English Academic: Pearson Academic Ltd.