The Arbutus Review • Fall 2015 • Vol. 6, No. 1 Bridging the Gap Between Instruction and Assessment: Examining the Role of Dynamic Assessment in the Oral Proficiency Skills of English-as-an-Additional-Language Learners Jeness Weisgerber ∗ The University of Victoria jenessodel@gmail.com Abstract This exploratory study investigated the role of dynamic assessment (DA) in improving the oral proficiency skills of English-as-an-additional-language learners. It focused specifically on speaking test scores and the use of language learner strategies, with the goal of providing empirical evidence as well as pedagogical recommendations. Seven participants were administered a section of the IELTSTM Speaking test in both dynamic and standardized formats. Each test was followed by a think-aloud protocol in order to ascertain participants’ thoughts and strategic behaviours during the testing process. In terms of test scores, results showed no holistic differences, but did show differences in fluency, grammatical range, and lexical resource scores. Scores for grammatical range and lexical resource were higher in DA, while scores for fluency were higher in standardized assessment. An analysis of the participants’ strategic behaviours also showed a greater use of cognitive and metacognitive strategy use in DA. These results point to DA’s potential to facilitate the development of grammatical and lexical abilities as well as to foster the use of language learner strategies within the sample. Keywords: Speaking test; dynamic assessment; English-an-an-additional language; language learner strategies I. Introduction S econd-language assessment tools are not only used to assess language skill, but are also perhaps one of the most fundamental learning and teaching tools (Huang, 2014). Increasingly, there is a dichotomy in the field of education between two types of assessment: standardized and dynamic. During standardized assessment (SA), sometimes referred to as traditional assess- ment, learners receive a set of items or problems and attempt to solve these with minimal or no feedback (Sternberg & Grigorenko, 2002). Conversely, during dynamic assessment (DA), learners receive intervention during assessment, often in the form of feedback (Sternberg & Grigorenko, 2002). DA, in contrast with SA, has a premise based on the role of mediation that “enables learners to perform beyond their current level of functioning, thereby providing insights into emerging capabilities” (Poehner & Lantolf, 2013, p. 323). DA has become of particular interest in the fields of Applied Linguistics and Second Language Acquisition (SLA), due to the dynamic nature of language learning. Many researchers (e.g. Poehner, 2008; Sternberg & Grigorenko, 2002) advocate for DA practices in the classroom. Despite this, according to Poehner (2008), there is a gap in the DA literature within the field of SLA in regards to second language (L2) performance from ∗This research was funded by the Jamie Cassels Undergraduate Research Award (JCURA) through the Vice President Academic and the Learning and Teaching Centre at The University of Victoria. I would like to especially thank my supervisor, Dr. Li-Shih Huang, for her continued support, guidance, and mentorship throughout this project. I would also like to thank Catherine Chao for her invaluable assistance with the video-stimulated recall, test rating, and transcribing. 25 mailto:jenessodel@gmail.com The Arbutus Review • Fall 2015 • Vol. 6, No. 1 a DA perspective. In an attempt to fill this gap, the current research focuses on examining the role of DA in facilitating the oral proficiency skills of English-as-an-additional-language learners. This exploratory study aims to provide pedagogical insights as well as empirical evidence to help instructors make informed decisions about the best method of classroom-based assessment for improving English language speaking proficiency. It is important to note that the aim of this study is to determine the best method of assessment solely in regards to the goal of improving language proficiency, rather than other potential goals of language assessment (i.e. predicting learner success in a certain English class). In the sections that follow, I first provide an overview of the key concepts and research in the areas of DA, including the role of language learner strategies. Then, I outline the methods used throughout the current study. Lastly, I present the research results and discuss their implications, particularly in regards to pedagogical recommendations. II. Literature Review I. Dynamic Assessment DA has its basis in L.S. Vygotsky’s (1978) notion of Sociocultural Theory (SCT) and the Zone of Proximal Development (ZPD). SCT can be understood as the idea that humans develop uniquely through their interactions with others and their environment, including in the form of mediation using various tools (physical, cultural, symbolic, etc.), and that this leads to higher forms of cognition that would not be possible without this interaction (Poehner, 2008). From this, Vygotsky (1978) defined the ZPD as “the distance between the actual developmental level as determined by independent problem solving and the level of potential development as determined through problem solving under adult guidance or in collaboration with more capable peers” (p. 86). The ideas of mediated learning and skills that are still in the process of developing are central to SCT and the ZPD, and are also core tenants in the concept of DA. Moreover, DA is founded in Vygotsky’s influential idea that in the ZPD “instruction leads development” (Lantolf & Thorne, 2006, p. 327). These notions are crucial to understanding and operationalizing DA. DA is commonly divided into two general sub-types: interventionist and interactionist (Lantolf & Poehner, 2014). Interventionist DA is similar to some methods of SA because it retains certain standardized procedures and has a greater emphasis on results that can be measured and used to compare between tests, both in regards to tests taken by the same learner at different times and between learners (Poehner, 2008). On the other hand, during interactionist DA, assistance surfaces from the “interaction between the mediator and the learner, and is therefore highly sensitive to the learner’s ZPD,” and has little regard for the amount of time needed or for any preset outcomes (Poehner, 2008, p. 18). I.1 Standardized Assessment versus Dynamic Assessment Sternberg and Grigorenko (2002) discuss three main ways in which DA differs from SA. These criteria centre on development, feedback, and examiner involvement. Firstly, SA “taps more into a developed state, whereas dynamic testing taps more into a developing process” (Sternberg & Grigorenko, 2002, p. 28). Secondly, in SA, feedback during assessment is viewed as a hindrance to the testing process and, therefore, is avoided. In DA, however, feedback during assessment is encouraged and is often provided in varying forms (i.e. implicit to explicit). Lastly, in SA, the examiner tries not to be involved directly during the testing process itself. In DA, however, the examiner creates a setting of teaching and mediation in order to help the examinee improve in the process of testing. Furthermore, Lantolf and Thorne (2006) argue that the main component 26 The Arbutus Review • Fall 2015 • Vol. 6, No. 1 that categorizes DA from SA is that DA creates a unity between instruction and assessment. This is further exemplified by Poehner (2008), who states, “DA overcomes the assessment-instruction dualism by unifying them according to the principle that mediated interaction is necessary to understand the range of an individual’s functioning but that this interaction simultaneously guides the further development of these abilities” (p. 24). DA integrates instruction and assessment into one process of learning and development, with the desired outcome of learner improvement during the assessment process (Poehner, 2008). In the SLA field, there is a general agreement among researchers on the important role of corrective feedback in the process of second language acquisition (e.g. Sheen, 2011) and in the role of corrective feedback within a learner’s ZPD (e.g. Nassaji & Swain, 2000). Aljaafreh and Lantolf (1994) explain two criteria to which corrective feedback provided within the ZPD should adhere. Firstly, corrective feedback must be “graduated,” which means that feedback starts as implicit (e.g. Step 1, Figure 1) and gradually increases in specificity to be more explicit (e.g. Step 4, Figure 1) (p. 486). Secondly, feedback must be “contingent,” which means its provision must depend on when, or whether, the examinee requires it (p. 468). Moreover, corrective feedback affords learners with “dialogically negotiated assistance as they move from other-regulation towards self-regulation” (Lyster, Saito, & Sato, 2013, p. 9). In this way, corrective feedback and the ZPD are complimentary processes, which has implications for the use of corrective feedback in DA. Aljaafreh and Lantolf (1994) conducted a descriptive study using this type of graduated and contingent corrective feedback working within the learner ’s ZPD (Nassaji & Swain, 2000, p. 36). In this study, learners received one-on-one corrective feedback on written essays. This feedback was provided as a 12-step regulatory scale, which ranged from implicit to explicit. Following from this research, Nassaji and Swain (2000) conducted a similar study where they compared two methods of corrective feedback. This involved one method following the regulatory scale, which took the learner’s ZPD into consideration, and one using the scale randomly, therefore ignoring the learner’s ZPD. Therefore, following the regulatory scale put forth by Aljaafreh and Lantolf (1994), the feedback provided within the learner’s ZPD was collaborative and graduated, while the other feedback was random. Consequently, they found that the former was more effective (Nassaji & Swain, 2000). I.2 Critiques of DA There are several critiques of DA, which often stem from the way that traditional SA is viewed. First, proponents of SA claim that the focus DA places on learner development in the process of assessment risks the test’s “internal-consistency reliability” (i.e. whether learner performance is stable throughout the test) (Poehner, 2008, p. 71). This is a potential problem because the entity being assessed is altered and, therefore, can no longer be accurately determined (Poehner, 2008). For proponents of DA, however, this change in learner ability is evidence of a successful procedure precisely because the individual is learning (Poehner, 2008). This is one of the main areas in which SA and DA seem to be irreconcilable; SA finds problematic one of the core aims and outcomes of DA. Further critiques are associated with the concepts of reliability and validity. Reliability refers to the extent to which a test-taker would receive the same results on a test over repeated instances (Huang, 2013). In terms of reliability, critics of DA are concerned that its interactionist nature makes reliability vulnerable because the provision of various mediations at different points in time may affect outcomes (Poehner, 2008). This critique is somewhat less significant in interventionist DA because certain levels of standardized procedures ensure the same assistance and feedback is given to each learner. Validity refers to the extent to which a test measures what it is seeking 27 The Arbutus Review • Fall 2015 • Vol. 6, No. 1 to measure (Huang, 2013). Similar to the critique of reliability in DA, the same argument can be made for validity. Poehner (2008) explains that for DA, standard methods for ensuring validity are compromised by the assessment’s aim of learner improvement. However, because development is a desirable outcome in DA, it is possible that the validity of DA should instead be understood as the degree to which it encourages development (Poehner, 2008). Because this study assumes that the goal of assessment is improving learners’ English language proficiency, Poehner’s argument is noteworthy. In addition to concerns over reliability and validity, other criticism involve generalizability, which refers to the extent to which one can make predictions about learner performance in non-assessment contexts based on performance during assessment (Poehner, 2008). In SA, it is assumed that a learner’s performance on a test should be able to accurately predict how they perform in non-assessment contexts. In DA, however, this emphasis on generalizability is less crucial because this dichotomy between assessment and non-assessment contexts is minimized by the merging of instruction and assessment (Poehner, 2008). Furthermore, commenting on DA’s potential incompatibilities with traditional test criteria, Poehner (2008) argues that, “it should be clear that DA’s incompatibility with more traditional frameworks does not invalidate it as an approach to assessment. Rather, their incommensurability simply points to the need for DA researchers to outline their own methods” (p. 73). In this way, DA’s potential lack of standardized notions of reliability, validity, and generalizability does not necessarily invalidate the method, but rather indicates a need for a different outlook on these criteria. I.3 Recent Major Studies in DA In Lantolf and Poehner’s (2014) most recent synthesis of DA in Applied Linguistics, they point out that the application of DA in the field has been sparse, and did not really begin until their own research in the early 2000s. However, there were a few studies that appear to have implemented a DA-like format without using the actual term dynamic assessment (e.g. Nassaji & Swain, 2000). That being said, research on DA in Applied Linguistics has been growing since these early studies. This section reviews some of the recent DA-related studies in the field of Applied Linguistics. Poehner (2008) implemented a DA program focusing on the oral communication skills of six advanced French learners. Throughout his study, he found that learners’ conceptual understanding of the verbal aspect in French developed uniquely though DA. Results also indicated that learners became more agentive (i.e. were independently active in improving their speaking skills) in their own learning, using various strategies to overcome obstacles. Overall, Poehner (2008) found that learners stretched beyond their current abilities to more complex tasks in DA. Like Poehner, Antón (2009) also found positive results in DA . She conducted a study with five participants from an advanced Spanish language program, analyzing DA’s role in the diagnostic assessment of speaking and writing levels. Her qualitative analysis indicated that DA allows for a deeper and fuller description of learners’ actual and developing abilities. She argues that this allows programs to individualize and tailor the instruction to learners’ specific needs. In addition to Antón, Travers (2010) also used DA to investigate speaking levels. He compared IELTSTM speaking tests in SA and DA formats in a novel application of DA used to modify a standardized speaking test format. In his study, he sought to ascertain DA’s potential advantage in regards to participant performance over independent speaking tests as well as examine the potential use of DA for learners from individualist versus collectivist cultural backgrounds. His findings from seven participants showed no difference in terms of mean test scores or gains over successive tests, but did show differences for grammar and lexical scores. These three studies all had unique aims and outcomes which have greatly contributed to DA 28 The Arbutus Review • Fall 2015 • Vol. 6, No. 1 research in the field of Applied Linguistics. However, there is still a need for research investigating quantitative test scores in DA. Moreover, further research into the interaction between language learner strategy use and performance using the DA method is needed. This is particularly pertinent when examining Canale and Swain’s (1980) prominent theory of communicative competence, which cites strategic competence as one of its main components (Huang, 2013). To my knowledge, the current research is the first study with this unique focus. II. Language Learner Strategies Language learner strategies or strategic behaviours are defined as the “conscious, goal-oriented thoughts and actions that learners use to regulate cognitive processes with the goal of improving language learning or language use” (Huang, 2013, p. 5). Language learner speaking strategies are often grouped into six major categories: approach, metacognitive, cognitive, communication, affective, and social (Huang, 2013). These six major categories of strategies and their definitions are outlined in Table 1. Table 1: Six Major Strategy Categories and Definitions (Huang, 2013, p.7) Strategy Category Definition Approach Orienting oneself to the speaking task Metacognitive Examining the learning process in order to organize, plan, and evaluate efficient ways of learning Cognitive Manipulating the target language for understanding and producing language Communication Involving conscious plans for solving a linguistic problem in order to reach a communication goal Affective Involving self-talk or mental control over affect Social Interacting with others to improve language learning/use Over the past four decades, the use of language learner strategies in the language learning context has been shown to positively affect language learning (Huang, 2012). Studies have shown a significant correlation between strategy use and success in language learning (e.g. Green & Oxford, 1995; Wong & Nunan, 2011). Huang (2010) contends that studies indicate that the use of metacognitive (e.g. Purpura, 1999) and cognitive (e.g. Oxford & Ehrman, 1995) strategies have a positive correlation with proficiency level, and in turn, this indicates the potential benefits of employing certain types of strategies. However, recent research supports the assumption that these strategies may not necessarily have a positive correlation with learner performance across all learners and situations, as certain specific strategies may have negative effects in testing or high-pressure contexts (e.g. Huang, 2013; Swain et al., 2009). Furthermore, learner strategies allow students to take more responsibility in their own learning and development (Wong & Nunan, 2011). Cohen (1998, as cited in Grenfell & Macaro, 2007) claims that language learning strategies can help learners shoulder more of the responsibility in their own learning, rather than relying solely on the teachers. In results from his survey of experts in the language learner strategy field, Cohen (2007) ascertained five purposes for language learning strategies: to enhance learning; to perform specified tasks; to solve specific problems; to make learning easier faster, and, more enjoyable; and, to compensate for a deficit in learning. It is evident that many experts agree that language learner strategies have a range of purposes that have the potential to enhance language learning. The examination of strategy use is an important component of SLA research. As discussed, Canale and Swain’s (1980) influential framework of communicative competence includes strategic competence as one of its main components (Huang, 2013). Strategic competence can be defined as the, “learners’/speakers’ ability to use communication strategies to deal with communication 29 The Arbutus Review • Fall 2015 • Vol. 6, No. 1 breakdowns” (Huang, 2013, p. 6). Furthermore, Bachman (1990) put forth that strategic compe- tence was not only important for purely communicative interactions, but that it also served an executive function in all domains of language learning and use (Huang, 2013). In this way, the examination of strategy use in relation to assessment is crucial for SLA research. Moreover, Huang (2013) argues that even though there is recognition that strategies can potentially affect learner performance, there remains a gap in the research about the strategic component in speaking contexts, and about the specific interaction between strategic competence and second-language performance. II.1 Critiques of Language Learner Strategies Despite the research on the effectiveness of language learner strategies, there have been some criticisms in this area. Early criticisms of language learner strategies questioned the efficacy of the verbal expression of strategy use. For example, Seliger (1983, as cited in Grenfell & Macaro, 2007) postulated that researchers cannot assume that the verbalizations of learners correspond to internal mechanisms because strategies are so deeply rooted in the mind of the learner. However, White, Schramm, and Chamot (2007) outlined several methods in which the internal processes of strategy use are more apparent, including retrospective interviews, self-report questionnaires, reflection journals, and think-aloud protocols. For example, in her study on the use of oral reflection in facilitating oral production and strategy use, Huang (2012) found that gains over pre- and post-test scores could be indicative of oral reflection as an effective meditational tool to help learners progress in their use of metacognitive strategies. Moreover, White et al. (2007) discuss the efficacy of the successful elicitation of think-aloud protocols, which should include careful orientation, practice, and prompting, multiple language use, and “integration into an authentic action context” (p. 115). For the purposes of this study, a think aloud, also called video-stimulated recall, refers to a method that involves videotaping a participant doing a task and then replaying the video for the participant as a prompt to elicit his or her thoughts on content that is in the scope of the research (Huang, 2014). Video-stimulated recall has the potential to “capture and investigate the dynamic nature of task performance” and, as such, one of its main purposes is “the potential to provide a wealth of information on the cognitive processes and strategic behaviours that participants engage in and deploy as they carry out a particular task or tasks across types or contexts” (Huang, 2014, p. 3). Several studies have pointed to the efficacy of using stimulated recall in the examination of language learner strategies (e.g. Sime, 2006; Macaro, 2006; Huang, 2014). For a detailed discussion of this method and its pros and cons, refer to Huang (2014). III. Research Questions The current research aims to address the following two research questions: 1) Is there a difference amongst students in the research sample between standardized and dynamic testing in terms of holistic (overall) and analytical (individual section) test scores? and 2) Do students in the research sample use different strategic behaviours in SA and DA contexts? III. Methods I. Participants Seven participants took part in this study. Participants, who speak Mandarin as a first language (L1), were recruited from the undergraduate student body of a mid-sized university in Western 30 The Arbutus Review • Fall 2015 • Vol. 6, No. 1 Canada. I chose Mandarin-speaking students because there is a large population of Chinese students at this university. I chose only undergraduate students because I wanted to control for age and proficiency level to minimize individual learner variables. All participants were advanced learners of English. Table 2. presents participant demographics. Table 2: Participant Demographics Criteria Category and Responses Age Mean Range 19.3 18-23 Length of Residence Mean Range 1 year 3 months-2 years Gender Male Female 4 (57.1%) 3 (42.9%) First-Language Chinese Test Taking Background IELTS (6) TOEFL (1) Score Range: 5.5-6.5 Score: 86 Before beginning participant recruitment, I received ethics approval from The University of Victoria Human Research Ethics Board (Protocol Number 14-360). I recruited participants through several methods including email, posters, and Facebook postings. Each recruitment method outlined the purpose of the study and the steps involved in participating, as per the university’s ethical guidelines. II. Instruments II.1 Background Questionnaire All participants completed a background questionnaire. This questionnaire, adapted from Huang (2012), included questions on gender, age, degree program, possible linguistic qualifications, first contact with English, length of learning English, length of residence in Canada, English usage per day, other languages spoken, and standardized test-taking background. II.2 IELTSTM Speaking Test I used the International English Language Testing System (IELTSTM) Speaking test to conduct the study. IELTSTM is a widely accepted test of English language proficiency worldwide and is used by 9,000 organizations in over 140 countries (IELTS, 2013). The IELTSTM Speaking test consists of three sections, one of which was used in this study. I adapted the test to consist of only the first task, which involves general questions about familiar topics, such as home, family, work, studies, or interests (IELTS, 2013). The other sections were not used because the second section consists of the examinee speaking about a topic for several minutes uninterrupted; therefore, it would not have been feasible to adapt this portion to a DA format. Likewise, I could not use the third section because many questions in this section depend on the answers that learners provide in the second 31 The Arbutus Review • Fall 2015 • Vol. 6, No. 1 section. The first section usually involves only four to five questions and lasts approximately four to five minutes; therefore, the duration and scope were not enough to adequately measure a participant’s speaking ability. In consultation with a certified examiner and a language-testing specialist, I extended the first task. In order to do this, I incorporated two task-ones and added two, randomly selected longer opinion questions from task three to each test. Therefore, each test was divided into two topics, with each topic having four shorter questions about basic information (from task 1) and one opinion question that required a longer answer (from task 3). These tests were used for both the standardized and the dynamic version of the test. II.3 Regulatory Scale for DA For the DA procedure, I used the regulatory scale as adapted from Travers (2010) and Aljaafreh and Lantolf (1994). As mentioned, this corrective feedback scale ranges from implicit (e.g. Step 1, Figure 1) to explicit (e.g. Step 4, Figure 1). Travers’s scale included four steps, but I modified the scale to follow five steps because the provision of the extra step (i.e. Step 4) would offer the participants an additional opportunity to self-correct. Additionally, I decided to provide a recast (rather than the correct utterance) as the final step because I did not want the correction to affect participant performance on the remainder of the test. The scale used in the present study is outlined in Figure 1. 1. Examiner indicates that something may be wrong in a speaking turn (“Sorry”’). 2. Examiner narrows down the location of the error (e.g. examiner repeats the specific speaking turn that contained the error). 3. Examiner indicates the nature of the error, but does not identify the error (e.g. “There was something wrong with the tense marking there”). 4. Examiner provides clues to help the learner arrive at the correct form (e.g. “It is not really past but something that is still going on”). 5. Examiner provides a recast (no matter if utterance is correct or incorrect). Figure 1: Regulatory Scale III. Data Collection Procedures III.1 Outline of Procedure Data collection consisted of two sessions over two days. Each session included one test (stan- dardized or dynamic) followed by video-stimulated recall. The first session began with informed participant consent followed by administration of the background questionnaire. I then admin- istered the first test. Each participant took one standardized test and one dynamic test, and I acted as the examiner for all tests (i.e. standardized and dynamic). I randomized the test order, so that some participants took the standardized test the first day and the dynamic test the second, while some did the opposite. Each participant took two different tests with different questions, to minimize potential practice effects regarding topic. All the tests were video recorded in order to facilitate the video-stimulated recall, which immediately followed the testing. All video-stimulated recall sessions were conducted by a research assistant, who is a graduate student in an Applied 32 The Arbutus Review • Fall 2015 • Vol. 6, No. 1 Linguistics program. At the end of the recall session, the research assistant asked the participant a few more follow-up questions concerning whether he or she treated it as a real test and what he or she liked and disliked about the test itself. The second session started with a verification of ongoing participant consent. Then, I administered the second test, which consisted of the testing method not used in the first session. This was again followed by video-stimulated recall and the same follow-up questions as the first session, with the exception of one additional question, which asked the participant to compare the two tests. III.2 Standardized Testing Procedure I conducted the standardized IELTSTM Speaking tests following IELTSTM test administration guidelines. I asked the questions and did not give any feedback other than providing neutral responses (“okay”), answering clarification requests, such as repeating the question or explaining the meaning of a word (S. Abrar-ul-Hassan, personal communication, Nov. 14, 2015), and encouraging the participant to elaborate with responses such as “Why?” or “Can you tell me more about that?” in cases where he or she did not provide enough information. III.3 Dynamic Testing Procedure I conducted the dynamic tests in a more interactive manner. This study followed a more interven- tionist type of DA due to the nature of the research and the need for quantifiable results, following previous studies (e.g. Travers, 2010). Therefore, the dynamic method involved providing graded levels of corrective feedback, following the regulatory scale (Figure 1), when the participant made a grammatical mistake. I followed these steps until the participant self-corrected or, in the case of no self-correction, until the last step (i.e. recast). I chose to focus solely on grammatical errors and of those, on common errors made by learners with a Chinese L1. This includes areas such as inflection of gender, number, and case, subject-verb agreement, verb tense, progressive aspect, absence of “be” before predicative adjectives, prepositions, articles, and count/non-count nouns (Swan & Smith, 2001). I did this in order to narrow down the amount of errors for research purposes. III.4 Video-Stimulated Recall The video-stimulated recall involved the research assistant and the participant watching the video of the testing process just conducted. Throughout this time, the research assistant paused the video periodically when the participant did something of interest (e.g. pausing, self-correcting, attending to examiner feedback) and asked the participant to comment on what he or she was thinking at that moment. This included questions such as, “What were you thinking before the test?” “What were you thinking here?” “I noticed you did X here. What were you thinking then?” and “When the examiner said that, what were you thinking?” The participant also had the option to pause the video and comment at any point. The research assistant is a Mandarin-as-a-first-language speaker; therefore, participants were given the option to speak in or be spoken to in their L1 at any time throughout the stimulated recall. III.5 Scoring with IELTSTM Descriptors The first stage in my data analysis involved scoring the tests with the IELTSTM Speaking Descrip- tors, which are measured on a 9-point scale (IELTS, 2014). These descriptors encompass four different analytical sections, including fluency and coherence, lexical resource, grammatical range 33 The Arbutus Review • Fall 2015 • Vol. 6, No. 1 and accuracy, and pronunciation. I scored both the standardized and dynamic tests, analytically and holistically, using these descriptors. Additionally, the research assistant independently scored the tests using these descriptors, in order to ensure inter-rater reliability. We then compared our scores, and in cases where they showed a difference of more than 0.5 points, we discussed the score until we agreed on a score within 0.5 of each other. In order to get the final scores, I then averaged both of our scores for each analytical component, so as to have one score for each component, and averaged these to calculate the holistic scores. III.6 Transcribing and Coding Strategies In order to analyze the data from the video-stimulated recall sessions, I first transcribed all the video clips. The research assistant translated and transcribed the portions that were in Mandarin. I then coded all the transcript data for both the reported and observed strategies. After I had coded all the strategies, I crosschecked one participant transcript with my supervisor, in order to have inter-rater reliability. I also recoded the data from 43% of participants, without referring back to the original, and then crosschecked these codes with my initial ones. My intra-rater reliability was 82%. After coding, I grouped these individual strategies into major categories including approach, metacognitive, cognitive, communication, affect, and social. I then calculated the percentage of use of each major strategy category (including both reported and observed strategies) for each method of assessment. I also counted the number of occurrences for each individual strategy and calculated the percentage of use with regard to the major strategy categories as well as to the total number of codes within that method of assessment. Lastly, I determined the top scoring and low scoring participants for each assessment type and analyzed each of these participants for strategy use. IV. Results and Discussion Research Question 1: Is there a difference amongst students in the research sample between standardized and dynamic testing in terms of holistic and analytical test scores? Out of seven participants, six reported treating the simulated exam as a real test. Holistically, there was very little difference (i.e. Table 1, column 2 & 8) in test scores between standardized and dynamic tests. Analytically, however, there were several differences. Scores for fluency and coherence showed variation, with 71% of participants scoring higher in SA. Scores for lexical resource also showed variation, with 57% of participants scoring higher in DA. Similarly, in the category of grammatical range and accuracy, 57% of participants scored higher in DA. Scores for pronunciation did not differ much between the two methods. Table 2 summarizes test scores. The difference in fluency scores could potentially be due to the method of corrective feedback used in the DA procedure. When participants made a grammatical error, I would disrupt their flow of speech in order to help them self-correct. Therefore, it is possible that this mediation interrupted their fluency. The format of the dynamic test could also be a factor in the variation of lexical and grammatical scores. My sole focus on grammatical errors could potentially play a role in participants’ learning potential in this area. This is in congruence with results from Travers (2010). Furthermore, it is possible that this focus on grammatical errors positively contributed to lexical resource scores, as the criteria for this category are associated with paraphrasing and the use of idiomatic phrases, both of which can be influenced by grammatical ability. Amongst the sample participants, the results indicate that DA is better able than SA to facilitate certain language abilities (i.e. grammatical and lexical); therefore, assuming the goal of assessment is improving language proficiency, DA is potentially better able to achieve this goal amongst the 34 The Arbutus Review • Fall 2015 • Vol. 6, No. 1 Table 3: Participant Test Scores Standardized Dynamic P Hol Flu Lex Gram Pron P Hol Flu Lex Gram Pron 1 6.0 5.8 6.0 6.3 6.0 7 5.9 5.8 5.8 6.0 6.0 3 5.9 5.8 5.5 5.8 6.3 3 5.8 5.3 5.8 6.0 6.0 7 5.9 6.0 5.8 5.8 6.0 2 5.7 5.8 6.0 5.0 5.8 6 5.8 6.0 5.8 5.5 6.0 4 5.7 5.0 5.5 6.0 6.3 2 5.7 5.5 5.5 5.8 6.0 6 5.7 5.8 5.5 5.8 5.8 5 5.7 5.5 5.3 5.8 6.0 5 5.6 5.5 5.5 5.3 6.0 4 5.6 5.3 5.3 5.5 6.3 1 5.5 5.3 5.3 5.5 6.0 M 5.8 5.7 5.6 5.8 6.1 M 5.7 5.5 5.6 5.7 6.0 Note. P= participant; Hol= holistic; Flu= fluency and coherence; Lex= lexical resource; Gra= grammatical range and accuracy; Pro= pronunciation; M= mean. sample. However, this may not be the case if the goal of assessment is considered differently. It is also important to note that this type of descriptive statistical analysis is meant only to generalize amongst the sample participants, and cannot be used to generalize real differences between the two methods of assessment in the population. When asked about their test preference, three participants reported preferring the standardized test, three reported preferring the dynamic test, and one had no preference. Of the participants who preferred the standardized test, two said it was because stopping for corrective exchanges in the dynamic test made them more nervous. It is possible that due to the small scale of this research, with participants only taking one test of each type, there was not sufficient time for participants to become comfortable with the DA testing process. This is supported by Travers (2010), who examined gains over three successive tests, and found that out of the participants who reported being nervous or uncomfortable with the corrective interruptions in the DA format in early tests, all reported being comfortable with these corrections by the third test. Additional reasons for test preference included order of test taking, question preference, and interest in the topic (i.e. If a certain test topic was more interesting to the learner, they sometimes cited preferring that test). Research Question 2: Do students in the research sample use different strategic behaviours in standardized and dynamic assessment contexts? In total, I recorded and observed that participants used 33 different individual strategies with 270 occurrences across the two methods of assessment. Within these strategies, four were approach strategies, twelve were metacognitive strategies, nine were cognitive strategies, five were communication strategies, one was an affective strategy, and two were social strategies. Figure 2 illustrates strategy use in SA and DA. Overall, approach and communication strategy use was similar across both methods of assessment. Instances of metacognitive strategy use were greater in DA than in SA, and for both methods, this was the type of strategy used the most in comparison with other strategy categories. Although the proportion of metacognitive strategy use was the same (48%) for both DA and SA, there were more individual instances of use for DA (78 versus 52). Cognitive strategies were also greater in DA. Previous studies have shown a positive correlation between metacognitive (e.g. Nakatani, 2005) and cognitive (e.g. Oxford & Ehrman, 1995) strategies and language proficiency (e.g. Huang, 2010). Anderson (2005) proposes that metacognitive strategy use is more effective because “once a learner understands how to regulate his or her own learning through the use 35 The Arbutus Review • Fall 2015 • Vol. 6, No. 1 Metacognitive 48% Cognitive 15% Communication 19% Approach 13% Social 5% Affect0% Metacognitive 48% Cognitive 20% Communication 13% Approach 10% Social 7% Affect 2% Figure 3: Reported and Observed Strategy Categories for SA (n= 107) (left) and DA (n= 163) (right) of strategies, language acquisition should proceed at a faster rate” (p. 766). Affective strategy use was also greater in DA in the study sample; however, the only affective strategy recorded was expressing affect in response to correction, which relied on correction only provided in the DA format. Lastly, social strategies were higher in DA, as is expected from the interactive format of the DA method. Differences in individual strategy use revealed that SA showed greater instances of evaluating test performance, generating words, seeking clarification, elaborating, and self-assessment, while DA showed greater instances of evaluating oral production, attending to feedback, formulating ideas, and L1 translation. In addition to an examination of individual strategy use, I also determined top scorers in each group to be any participants who scored at or above the mean score; this corresponded to four participants for SA and five participants for DA. The only common top scorer strategy used in both SA and DA was self-correction. The different strategy use between top scorers for the two methods of assessment may be indicative of context-/method-specific strategy use, rather than based on proficiency level, as three of the participants were top scorers in both SA and DA and showed little commonality. These results are supported by research on learner strategy use and context, which indicates a variation in strategy use across contexts (e.g. Huang, 2013; Swain et al., 2009). Specifically, in her study of the use of language learner strategies in testing and non-testing contexts, Huang (2013) found differences in strategy use between these two contexts as well as between different IELTSTM tasks, but did not find significant differences in strategy use in regards to proficiency level. This is further supported by White et al. (2007) who argue, “strategy use is not a fixed attribute of individuals, but changes according to the task, the learning conditions, and the available time” (p. 93). Low scorers, those who scored below the mean score, also demonstrated some differences. In DA, the two lowest scorers showed the highest individual use of self-correction and evaluating oral production, which indicates a limited repertoire of strategies. Studies have shown that successful language learners have a more extensive range of strategies and utilize a variety of individual strategies in the process of language learning (Anderson, 2005). However, this does not necessarily mean that higher strategy use always correlates to better performance. Anderson (2005) claims that it is important that learners apply strategies in an effective manner, and that this is more significant than number of strategies or particular strategy use. Similarly, Huang (2010) argues that “what matters for individual learners is not accumulating or using a wide variety of strategies, but managing a repertoire of strategies” (p. 19). 36 The Arbutus Review • Fall 2015 • Vol. 6, No. 1 I. Pedagogical Recommendations The results from this study brought to light several noteworthy pedagogical recommendations: a. Familiarize learners and teachers with the DA method. As individual participant’s comfort with the corrective feedback exchanges varied within the sample, it is important to ensure that the learner and teacher understand and are familiar with the DA method in order for the assessment to be productive for both. If the teacher was familiar with the DA procedure, then he or she would be better able to assist and scaffold the learner within his or her ZPD. b. Tailor DA to learner proficiency level. The participants in this study were all advanced learners of English, and, as such, this could have been a factor in their response to corrective feedback in regards to affect and receptiveness. It is possible that for advanced learners of English, DA should be less interruptive in order to avoid disrupting fluency or idea coherence and to increase learner receptiveness. c. Focus DA on linguistic features where learners need to develop. DA can potentially facilitate grammatical and lexical competence of learners, as indicated by the differences in participants’ grammatical range and accuracy and lexical resource scores. The foci of mediation were imperative to outcomes in participants’ DA scores. d. Attend to the strategies used by learners in DA. It is possible that the DA method potentially promoted the use of metacognitive and cognitive strategies amongst participants. As discussed, research shows that use of these strategies tends to correlate positively with learner oral production in the learning context. It is important to attend to strategies because awareness, for the learner as well as for the examiner or instructor, is essential to the successful use of strategies in language learning. e. Promote learner agency through DA. It is possible that DA promoted learner agency amongst the participants in the sample. This is indicated by the much greater use of strategies such as evaluating oral production and attending to feedback in DA. This is in congruence with results from Poehner’s (2008) study, which revealed learners exhibiting agency through the use of various strategies during DA. These particular strategies promoted learner agency because they allowed the learner to evaluate inwardly and independently find a solution to a language problem. Learner agency is a good thing for both learners and instructors because it allows the learners to develop independent language problem solving skills that they can use on their own without relying on the instructor, and consequently, this takes some of the responsibility off of the instructor (Cohen, 1998, as cited in Grenfell & Macaro, 2007). II. Limitations and Future Research Directions The results from this study should be considered in light of the following three research limitations. First, at times during the dynamic test, participants did not realize that they had made a mistake or did not understand my attempt to help them self-correct. Moreover, because I chose to follow a standardized scale of corrective feedback in order to enhance reliability for research purposes, I could not answer when a participant asked a question that veered from the responses based on the corrective scale. Second, because the study consisted of only seven advanced learners, further research should be conducted involving a larger sample size with learners of different proficiency levels in order to decipher other potential differences in areas such as test scores (holistic and analytical) and strategy use through statistical analysis. Third, because the test format included only the first section of the IELTSTM Speaking test, further research should examine the role of DA 37 The Arbutus Review • Fall 2015 • Vol. 6, No. 1 in performing a full-scale standardized test with a focus on other linguistic features. Additionally, there are several pedagogical limitations. It is possible that including recast in the regulatory scale (rather than the correct answer) may have undermined participant learning. In addition, the DA method is quite time-consuming and, as such, the method may not be feasible for large class sizes. Rather, it is more conducive for workshops or English-for-specific-purposes, with instruction tailored to the needs of individual learners. V. Conclusion This exploratory research has examined the differences between SA and DA in terms of speaking test scores and language learner strategies. It has highlighted differences in terms of fluency, grammar, and lexical resource. Test scores were higher for fluency in SA, while scores were higher for grammar and lexical resource in DA. This points to the potential of DA as a mediating tool to facilitate the development of grammatical and lexical abilities amongst the participants. Moreover, strategy use was greater in DA, particularly for metacognitive and cognitive strategies, which tend to correlate positively with learner proficiency and performance in the learning context. Empirically, the current study has contributed to the research on DA in the fields of SLA and Applied Linguistics. Practically, this study can inform instructors about the best method of classroom-based assessment for improving English language proficiency that goes beyond standardized testing, in order to facilitate the potential for learners to improve their English-speaking skills. References Anderson, N.J. (2005). L2 strategy research. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 757-772). Mahwah, NJ: Lawrence Erlbaum Associates. Antón, M. (2009). Dynamic assessment of advanced second language learners. Foreign Language Annals, 42(3), 576-598. Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford, UK: Oxford University Press. Budoff, M. (1987). The validity of learning potential assessment. In C.S. Lidz (Ed.), Dynamic assessment: An interactive approach to evaluating learning potential (pp. 52-81). New York, NY: Guilford. Canale, M., & Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics, 1, 1-47. Cohen, A.D. (2007). Coming to terms with language learner strategies: Surveying the experts. In A.D. Cohen & E. Macaro (Eds.), Language learner strategies (pp. 29-45). Oxford, UK: Oxford University Press. Feuerstein, R., Rand, Y., & Rynders, J.E. (1988). Don’t accept me as I am. Helping retarded performers excel. New York, NY: Plenum. 38 Text Box http://dx.doi.org/10.4324/9781410612700 Text Box http://dx.doi.org/10.1111/j.1944-9720.2009.01030.x Text Box http://dx.doi.org/10.1093/applin/1.1.1 The Arbutus Review • Fall 2015 • Vol. 6, No. 1 Green, J.M., & Oxford, R.L. (1995). A closer look at learning strategies, L2 proficiency, and gender. TESOL Quarterly, 29(2), 261–297. Grenfell, M., & Macaro, E. (2007). Claims and critiques. In A.D. Cohen & E. Macaro (Eds.), Language learner strategies (pp. 9-28). Oxford, UK: Oxford University Press. Huang, L.-S. (2010, Spring). Key concepts and theories in TEAL: Language learner strategies. TEAL News: The Association of B.C. Teachers of English as an Additional Language, (pp. 18-20). Retrieved from http://www.bcteal.org/wpcontent/uploads/2011/08/BCTEALNews _Spring_2010.pdf Huang, L.-S. (2012). Use of oral reflection in facilitating graduate EAL students’ oral language production and strategy use: An empirical action research. International Journal for the Scholarship of Teaching and Learning (lJ-SoTL), 6(2), 1-22. Huang, L.-S. (2013). Cognitive processes involved in performing the IELTS Speaking test: Respondents’ strategic behaviours in simulated testing and non-testing contexts, (pp. 51). IELTS Research Report Series. Retrieved from http://www.ielts.org/p df/Huang_RR_Online_2013.pdf Huang, L.-S. (2014). Key concepts and theories in TEAL: Cognitive validity. SHARE: TESL Canada’s eMagazine for ESL Teachers. Huang, L.-S. (2014). Video-stimulated verbal recall: A method for researching cognitive processes and strategic behaviours. In SAGE Research Methods Cases (pp. 1-22). London: SAGE Publications. IELTS Web site. (2013). Retrieved March 20, 2015, from www.ielts.org Lantolf, J.P., & Poehner, M.E. (2014). Sociocultural Theory and the pedagogical imperative in L2 education: Vygotskian praxis and the research/practice divide. New York, NY: Routledge. Lantolf, J.P., & Thorne, S.L. (2006). Sociocultural Theory and the genesis of second language development. Oxford, UK: Oxford University Press. Lyster, R., Saito, K., & Sato, M. (2013). Oral corrective feedback in second language classrooms. Language Teaching, 46(1), 1-40. Macaro, E. (2006). Strategies for language learning and for language use: Revising the theoretical framework. Modern Language Journal, 90, 320-337. Nakatani, Y. (2005). The effects of awareness-raising training on oral communication strategy use. The Modern Language Journal, 89(1), 76-91. Nassaji, H., & Swain, M. (2000). A Vygotskian perspective on corrective feedback in L2: The effect of random versus negotiated help on the learning of English articles. Language Awareness, 9(1), 34-51. 39 Text Box http://dx.doi.org/10.2307/3587625 Text Box http://dx.doi.org/10.4324/9780203813850 Text Box http://dx.doi.org/10.1017/s0261444812000365 Text Box http://dx.doi.org/10.1111/j.1540-4781.2006.00425.x Text Box http://dx.doi.org/10.1111/j.0026-7902.2005.00266.x Text Box http://dx.doi.org/10.1080/09658410008667135 The Arbutus Review • Fall 2015 • Vol. 6, No. 1 Oxford R.L., & Ehrman, M. E. (1995). Adults’ language learning strategies in an intensive foreign language program in the United States. System, 23(3), 359-386. Poehner, M. E. (2008). Dynamic assessment: A Vygotskian approach to understanding and promoting L2 development. Berlin, Germany: Springer. Poehner, M. E., & Lantolf, J. P. (2013). Bringing the ZPD into the equation: Capturing L2 development during computerized dynamic assessment (C-DA). Language Teaching Research, 17(3), 323-342. Purpura, J. E. (1999). Learner strategy use and performance and language tests: A structural equation modeling approach. Cambridge, UK: Cambridge University Press. Sheen, Y. (2011). Corrective feedback, individual differences and second language learning. New York, NY: Springer. Sime, D. (2006). What do learners make of teacher’s gestures in the language classroom? International Review of Applied Linguistics in Language Teaching, 44, 211-230. Sternberg, R. J., & Grigorenko, E. L. (2002). Dynamic testing: The nature and measurement of learning potential. Cambridge, UK: Cambridge University Press. Swain, M., Huang, L.-S., Barkaoui, K., Brooks, L., & Lapkin, S. (2009). The speaking section of the TOEFL iBTTM (SSTiBT): Test-takers’ reported strategic behaviors. TOEFL iBTTM Research Series No. TOEFLiBT-10. Princeton, NJ: Educational Testing Service. Swan, M., & Smith, B. (2001). Learner English: A teacher’s guide to interference and other problems. Cambridge, UK: Cambridge University Press. Travers, N. (2010). Relating learner culture to performance on English speaking tests with interactive and non-interactive formats. Retrieved from UVicSpace: Electronic Theses and Dissertations. Vygotsky, L.S. (1978). Mind in society: The development of higher psychological processes. Cambridge, MA: Harvard University Press. White, C., Schramm, K., & Chamot, A.U. (2007). Research methods in strategy research: Re- examining the toolbox. In A.D. Cohen & E. Macaro (Eds.), Language learner strategies (pp. 93-116). Oxford, UK: Oxford University Press. Wong, L.L.C., & Nunan, D. (2011). The learning styles and strategies of effective language learners. System, 39(2), 144-163. 40 Text Box http://dx.doi.org/10.1016/0346-251x(95)00023-d Text Box http://dx.doi.org/10.1177/1362168813482935 Text Box http://dx.doi.org/10.1515/iral.2006.009 Text Box http://dx.doi.org/10.1017/cbo9780511667121 Text Box http://dx.doi.org/10.1016/j.system.2011.05.004 Introduction Literature Review Dynamic Assessment Standardized Assessment versus Dynamic Assessment Critiques of DA Recent Major Studies in DA Language Learner Strategies Critiques of Language Learner Strategies Research Questions Methods Participants Instruments Background Questionnaire IELTSTM Speaking Test Regulatory Scale for DA Data Collection Procedures Outline of Procedure Standardized Testing Procedure Dynamic Testing Procedure Video-Stimulated Recall Scoring with IELTSTM Descriptors Transcribing and Coding Strategies Results and Discussion Pedagogical Recommendations Limitations and Future Research Directions Conclusion