Title of the Paper [16 point font]


Exploring the Effect of a Scaffolding Design on 
Students’ Argument Critique Skills 

YI SONG 
Educational Testing Service 
660 Rosedale Rd, Princeton, NJ 
USA 
ysong@ets.org 
 
SZU-FU CHAO 
Educational Testing Service 
660 Rosedale Rd, Princeton, NJ 
USA 
schao@ets.org 

YIGAL ATTALI 
Educational Testing Service 
660 Rosedale Rd, Princeton, NJ 
USA 
yattali@ets.org

  
Abstract: In this project, we exam-
ined the impact of scaffolded tasks on 
middle school students’ argument 
critique skills. The study results 
showed a small positive impact of the 
scaffolding on student performance 
on one controversial issue, but not the 
other, indicating that student skills of 
writing critiques could be affected by 
the topic and argument content. 
Additionally, students from low-SES 
families did not perform as well as 
their peers. Student performance on 
the critique tasks had moderate or 
strong correlations with students’ 
state reading and writing test scores. 
Implications of the scaffolding and 
critique task design are discussed. 

Résumé: Dans ce projet, nous avons 
examiné l’impact des tâches échafau-
dées sur les capacités des élèves de 
septième et huitième année de cri-
tiquer des arguments. Les résultats de 
l'étude ont montré un léger impact 
positif de l'échafaudage sur les per-
formances des élèves sur une question 
controversée, mais pas sur l'autre, 
indiquant que les compétences des 
élèves en matière de rédaction de 
critiques pourraient être affectées par 
le sujet et le contenu de l'argumenta-
tion. De plus, les élèves issus de 
familles à faible statut socio-
économique n'ont pas été aussi 
performants que leurs pairs. Les 
performances des élèves sur les tâches 
de critique avaient des corrélations 
modérées ou fortes avec les résultats 
des tests de lecture et d’écriture des 
élèves. Les implications de l'é-
chafaudage et de la conception des 
tâches de critique sont discutées.

 
Keywords: Argument critique, assessment, fallacy, reasoning, scaffolding 
 

mailto:ysong@ets.org


606 Song, Chao, and Attali 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

1. Introduction 
To become open-minded and critical readers and listeners, stu-
dents should learn not only to understand what an author or speak-
er is saying, but also learn to question the assumptions, premises, 
and reasoning in the text or speech to determine whether the claim 
is true and well supported. In the U.S. educational system, the 
skills of making logical arguments and using relevant evidence are 
greatly emphasized. For example, students are expected to “Delin-
eate and evaluate the argument and specific claims in a text, as-
sessing whether the reasoning is sound and the evidence is rele-
vant and sufficient; recognize when irrelevant evidence is intro-
duced.”1 Evaluating arguments is a challenging skill because 
students must recognize reasoning flaws and identify specific 
points in a text that are vulnerable to objections and counterargu-
ments. This goes beyond keeping track of which reason supports 
which point.  
 Argumentative writing studies show that students often fail to 
include relevant evidence, consider alternative perspectives, or 
critically evaluate others’ arguments (e.g., Ferretti, MacArthur and 
Dowdy 2000; National Center for Educational Statistics 2012; 
Nussbaum and Kardash 2005; Nussbaum and Edwards 2011; Song 
and Ferretti 2013). Students often presume that information in a 
given argument is true or valid, rather than asking questions that 
would reveal reasoning flaws (e.g., Song and Ferretti 2013). When 
students have prior misconceptions, they are likely to hold onto 
them and ignore counter-evidence (Kuhn, Cheney and Weinstock 
2000). When writing, many students draw a quick conclusion 
based on their personal experience and remain insensitive to the 
limitations of these examples. Not surprisingly, students who are 
unable to distinguish reasonable from fallacious arguments are 
unlikely to make appropriate decisions about whether they should 
accept or reject an argument (e.g., Ferretti, Lewis and Andrews-
Weckerly 2009). Ferretti and his colleagues pointed out the im-

 
1 http://www.corestandards.org/ELA-Literacy/RI/8/ 


Exploring the Effect of Scaffolding Design 607 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

portance of argumentation schemes (Walton 1996) in analyzing 
and evaluating arguments.  
 Recognition of typical reasoning flaws is normally taught dur-
ing middle and high school in the U.S.; therefore, informed by 
curriculum standards and learning sciences research, our goal is to 
examine how well students can apply these skills and, in their own 
words, explain reasoning errors in people’s arguments. Mean-
while, we also aim to facilitate the development of middle school 
students’ argument critique skills of informal logic (Walton 2016) 
by designing scaffolded tasks. In the following sections, we will 
present the research background, describe the study method, report 
the study results, and discuss implications for the assessment and 
instruction of the skill required for evaluating arguments.  

2. Theoretical background and related research 

To succeed in college and career settings, students must learn to 
comprehend, critique, and construct reasoned arguments. Under-
standing logical fallacies is an essential skill in critiquing argu-
ments and constructing plausible arguments. Informal fallacies 
refer to arguments that are “psychologically persuasive but logi-
cally incorrect; that do as a matter of fact persuade but, given 
certain argumentative standards, shouldn’t” (Copi and Burgess-
Jackson 1996, p. 97). For instance, people sometimes provide 
evidence for their claims, but the evidence could be irrelevant or 
insufficient. Arguers may jump to a conclusion too quickly, while 
not adequately supporting the conclusion with the premises, which 
is called hasty generalization (Walton 1999). Such arguments are 
often involved with generalizations from one case to a large popu-
lation, or from a stereotype group to a specific case. For example, 
it is probably a hasty generalization if someone claims that TV 
brings families together based on an observation that his neigh-
bor’s family watches TV together every night. Jumping to a con-
clusion can also be found in post hoc arguments, in which people 
draw a causal conclusion between two events based on the ob-
served correlation between the two events (Walton 2009). Alt-


608 Song, Chao, and Attali 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

hough causal conclusions are often based on correlations, post hoc 
occurs when the arguer overlooks some evidence that ought to be 
taken into account before reaching the conclusion. There are a host 
of other informal fallacies (e.g., ad hominem, straw man, slippery 
slope, and red herring) that make arguments unpersuasive, and 
students’ ability to identify informal fallacies is influenced by their 
familiarity with norms of argumentation (Neuman, Weinstock, and 
Glasner 2006; Weinstock, Neuman and Tabak 2004).   To detect 
whether an argument is misused or fallacious, it is important to 
understand the underlying forms of argumentation, also called 
argumentation schemes (Walton 1996). These schemes represent 
the relationship between what is stated in the claim and its sup-
porting justificatory structure (Walton 1996; Walton, Reed, and 
Macagno 2008). Consider a common argumentation scheme, 
argument from example, which is often used in essays when stu-
dents try to support their arguments with examples from their 
everyday experiences or background knowledge. Arguments from 
examples can be evaluated by asking the following questions 
(Walton 1996): (a) Is the example true? (b) Is it a relevant example 
of the general statement we are being asked to believe? (c) Is this 
example typical of the kinds of cases that the general statement 
covers? (d) Are there any special circumstances that could under-
mine the generalization from this case to other cases? These ques-
tions raise critical concerns about the argument strategy, which, if 
not addressed, could elicit strong counterarguments. For instance, 
if someone provides an atypical case, then one could undermine 
the argument by pointing out the limitation of its generalizability. 
Thus, asking scheme-relevant critical questions helps us differen-
tiate fallacious arguments from reasonable arguments by encour-
aging recognition of unwarranted assumptions in people’s reason-
ing (Walton 1996; Walton et al. 2008). 
 The current study builds upon our previous work examining 
student performance in an argument critique task (Song, Deane 
and Fowles 2017). In this task, students first read a letter that 
consists of arguments against banning advertisements aimed at 


Exploring the Effect of Scaffolding Design 609 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

children (see Figure 1). The letter is written to include common 
fallacious arguments in informal logic, such as hasty generaliza-
tion, circular argument, post hoc, etc. Students are asked to identi-
fy and explain these reasoning problems in their written critiques. 
We found that the majority of eighth graders (83%) did not write 
high-quality argument critiques (i.e., ones that identified and 
clearly explained major problems in the reasoning in the letter). 
Students’ responses showed several characteristic difficulties: (1) 
being off-task; (2) failing to identify fallacious arguments; (3) 
having difficulty explaining problems; and (4) not connecting their 
criticisms with specific parts of the text.  
 

610 Song, Chao, and Attali 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

 
Figure 1. Ban Ads Argument Critique Task (Song et al. 2017, p. 6) 

Therefore, we incorporated scaffolding components into the as-
sessment within a Vygotskian framework (Vygotsky 1978), break-
ing this complex task into easier, more “doable” steps. Specifical-
ly, we designed a “lead-in” task as light scaffolding to the written 
critique task and conducted a new study to explore the effect of the 
scaffolding. The lead-in task targets the skill of identifying com-
mon reasoning flaws, which is supposed to be critical for writing 
an argument critique. Such scaffolding may provide the support 
and structure necessary for students to learn how to write a critique 
or complete the task successfully because they see a model of a 
successful critique (of different arguments on the same topic) 
before they write their own.  
 In addition, this project serves the goal of identifying strategies 
that can help underserved learners do better in school. There is a 
persistent achievement gap between socio-economic groups in the 
U.S., with lower-income students performing worse on average 
than higher-income students from the same social group (e.g., Leu 
et al. 2015). Thus, there is reason for examining the subgroup 
performance and exploring the effectiveness of the scaffolding for 
the low-income group. Our research questions were:  


Exploring the Effect of Scaffolding Design 611 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

1. Does the scaffolding design help improve student perfor-
mance on the written critique tasks?  
2. Is there an achievement gap between students from low-
income and high-income families?  
3. Does the scaffolding design measure students’ argument 
critique skills reliably? 

3. Method 

Participants 

We contacted teachers who previously participated in prior re-
search projects and recruited those who expressed an interest in 
using the argument critique tasks with their students. Participants 
were sampled from three middle schools in two U.S. states. A total 
of 472 students from grade 7 (n = 231) and grade 8 (n = 241) were 
included. The sample represented a diverse group: Caucasian 39%; 
Hispanic 28%; Asian 18%; African American 13%; Other 1%; 
Unreported 1%. Most students (79%) were from low-income 
families, as they were qualified for receiving free or reduced-price 
lunch.  

Instruments 

 Argument Critique Tasks 

We used two existing argument critique tasks designed from the 
same blueprint: Ban Ads and Cash for Grades. Both were scenar-
io-based assessments, which contain a structured series of tasks 
within an overarching scenario context that provides a purpose for 
reading and writing from source texts (Sabatini, O’Reilly and 
Deane 2013). The Ban Ads scenario raises the issue of whether the 
United States should ban advertising to children under the age of 
12, and the Cash for Grades scenario asks whether students should 
be rewarded with money for getting good grades. The original 
critique tasks ask students to evaluate the arguments presented in a 
letter to the editor. In doing so, students must write a critique of 


612 Song, Chao, and Attali 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

the argument (i.e., identify and explain problems in the reasoning 
or use of evidence and point out any inaccurate information).  
 To create the scaffolded versions of the critique task, we added 
a “lead-in” task for the Ban Ads and Cash for Grades scenarios. 
The lead-in task consists of seven multiple-choice items as part of 
the scenario. These items assess whether students can identify 
common reasoning errors and choose appropriate words or phrases 
in written critiques about the issue under discussion (Ban Ads or 
Cash for Grades). Specifically, items 1–5 pose arguments with 
various fallacies (e.g., post hoc, hasty generalization), and students 
need to select the option that correctly explains the reasoning error 
in a given argument. Items 6 and 7 focus on the critique writing 
skill, in which students are supposed to select appropriate words or 
phrases to help formulate a brief and meaningful critique. See 
Figure 2 for a couple examples of multiple-choice items in Ban 
Ads.  
 

Exploring the Effect of Scaffolding Design 613 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

 
Figure 2. Sample multiple-choice items in the Ban Ads “leading-in” task  

 
Two types of forms were then created based on the tasks: one 
includes both the lead-in task and the original writing task; the 
other has the original writing task only. Moreover, the lead-in task 
has two versions: one provides feedback that tells students the 
correct answers (e.g., Option A is the correct answer), and the 
other does not provide any feedback. Therefore, there were three 
conditions reflecting three different lead-in settings: lead-in only, 
lead-in plus feedback, and no lead-in (control).   

 State reading and writing test  

Students’ reading and writing scores on relevant state standardized 
tests were provided by their teachers. Given the different states, 
one school took Partnership for Assessment of Readiness for 
College and Careers (PARCC)2 reading and writing assessments, 
and the other two schools took ACT Aspire assessments.3 These 
tests were intended to measure students’ ELA reading and writing 

 
2 https://parcc-assessment.org/ 
3 https://www.actaspire.org/ 


614 Song, Chao, and Attali 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

abilities, evaluating how well the students are meeting academic 
expectations.  

Procedure 

A two-phase study (one class period) was administered with three 
conditions: control (no lead-in task), scaffolded (lead-in task only), 
or scaffolded with feedback (lead-in task plus feedback). In other 
words, students in the control condition worked on the original 
written critique tasks in both phases, while students in the experi-
mental conditions worked on the scaffolded lead-in task and the 
original written critique task in Phase I and only the original writ-
ten critique task in Phase II. Each individual student received a 
participant ID prior to the study, and we randomly assigned the 
IDs to the three conditions.  
Although Ban Ads and Cash for Grades were designed from the 
same blueprint, the two topics could pose a varying level of diffi-
culty to students due to factors such as the content, the types of 
arguments in the letter, the given arguments in the tasks, and the 
alignment between the lead-in task and the written critique task. 
Therefore, we randomly assigned the order of Ban Ads and Cash 
for Grades across the phases, such that some students completed 
Ban Ads in Phase I and Cash for Grades in Phase II, while other 
students completed them in the opposite order.    

Scoring 

Responses to the lead-in tasks were automatically scored by the 
computer. Each lead-in task consists of seven multiple-choice 
items. One point was assigned to each question, so students could 
earn up to 7 points for the lead-in task. The overall quality of each 
critique was rated on a scale from 0 to 4 by human raters. In scor-
ing students’ written critiques, the following aspects were consid-
ered: (a) whether students identified and clearly explained most of 
the major problems in the letter’s reasoning and (b) whether stu-
dents expressed ideas in an appropriate tone for the class. See 
Figure 3 for the scoring rubric. 


Exploring the Effect of Scaffolding Design 615 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

 
“BAN ADS” Argument Critique Task Scoring Rubric 

 
4 

Excellent 
An “Excellent” Critique 
• Identifies and clearly explains most of the major 

problems in the letter’s reasoning, 

• Points out inaccurate information, AND 

• Expresses ideas in a clearly appropriate tone for 
the class 

3 
Adequate 

An “Adequate” Critique 
• Identifies and explains some of the major prob-

lems in the letter’s reasoning and inaccurate in-
formation, OR 

• Identifies only one major problem in the letter’s 
reasoning but explains it extremely well, AND 

• Expresses ideas in a generally appropriate tone for 
the class 

2 
Limited 

A “Limited” critique identifies at least one major 
problem in the letter’s reasoning and/or use of inaccu-
rate information but is limited in one or more of the 
following ways: 

a) Explains the problem(s) poorly, if at all 

b) Misinterprets parts of the letter or includes irrel-
evant information 

c) Misinterprets an important part of the task 

d) Expresses ideas in a somewhat inappropriate 
tone for the class 

1 
Minimal 

A “Minimal” response identifies or implies a problem 
in the letter’s reasoning and/or use of accurate infor-
mation but displays one or more of the following 
problems: 

a) Is very confusing 


616 Song, Chao, and Attali 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

b) Seriously distorts the letter or includes mostly 
irrelevant information 

c) Seriously distorts the writing task 

d) Expresses ideas in a highly inappropriate tone 
for the class 

0 
No Credit 

A response receives “No Credit” for any one of the 
following reasons: 

a) Identifies no problems in the letter’s reasoning 
and/or use of inaccurate information 

b) Not long enough for critical-thinking skills to be 
judged 

c) Not written in English 

d) Off topic 

e) Blank 

f) Only random key strokes 

Figure 3. Scoring Rubric (Song et al. 2017, p. 7)  

 
Candidates voluntarily participated in this project as scorers on 
Amazon Turk. These scoring candidates first studied the materials 
online themselves. The materials included the critique tasks, scor-
ing rubrics, topic notes, and anchor responses. After they reviewed 
these materials, they scored a practice set of student responses and 
then participated in a scoring qualification test. A total of 90 Ama-
zon Turk participants passed the qualification test for a topic (by 
reaching 80% of the agreement with the pre-assigned scores in a 
test set of student responses) and were assigned to score student 
responses. Half of the raters scored Ban Ads, and the other half 
scored Cash for Grades. We randomly assigned student responses 
to these raters. Each response received at least eight scores. Given 
the design, we conducted a generalizability analysis (Webb, 
Shavelson, and Haertel 2006) for each topic, focusing on the 20 


Exploring the Effect of Scaffolding Design 617 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

responses with scores from all 45 raters to assess rater reliability. 
Overall, the observed generalizability coefficients were greater 
than .90, and the more raters that were included, the greater the 
generalizability coefficient. Therefore, the average of all ratings 
was considered the final score for the response.  

Data analysis 

Nonparametric methods were applied to all analyses relating to the 
writing scores because of inconsistent patterns in score distribu-
tions (right skewed in most cases). The multiple-choice scores 
generally followed a normal distribution. Table 1 presents the 
average performance on the lead-in task and written critique task 
in Phase I and the written critique task in Phase II. Regardless of 
condition and topic, students did not achieve high scores in the 
lead-in task (below the 3.5, the mid-point of 7 points), and their 
performances on the written critique tasks were low in general. 
 
Table 1. Mean scores of the Phase I and Phase II tasks across 
conditions 

Task Order  
(Condition) 

N Phase I 
Lead-In 

(SD) 

Phase I 
Critique 

(SD) 

Phase II 
Critique 

(SD)  
Ban – Cash (Control) 81 n/a 1.38 (1.18) .94 (1.01) 
Ban – Cash (S no FB) 81 3.22 (1.88) 2.00 (1.34) 1.10 (1.09) 
Ban – Cash (S & FB) 76 3.43 (1.91) 1.88 (1.37) 1.07 (1.17) 
Cash – Ban (Control) 80 n/a 1.27 (1.08) 1.44 (1.24) 
Cash – Ban (S no FB) 78 2.83 (1.66) 1.35 (1.15) 1.31 (1.23) 
Cash – Ban (S & FB) 76 3.14 (1.61) 1.45 (1.22) 1.46 (1.33) 
Note. Ban = Ban Ads; Cash = Cash for Grades; S = scaffolded condi-
tion that has a lead-in task; FB = feedback; n/a = not available. 

 
 To answer RQ1 regarding the impact of the scaffolding design, 
we first ran a 2 (topic order) X 3 (condition) factorial ANOVA on 
ranks (Wobbrock, Findlater, Gergle and Higgins 2011) to examine 


618 Song, Chao, and Attali 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

the potential for main effects or an interaction effect on the scores 
of the written critique tasks in both phases. Then, we applied the 
Kruskal-Wallis test to first compare the writing scores from Phase 
I for each topic between conditions, then to compare the writing 
scores regardless of phases for each topic between conditions. For 
RQ2 about the students’ performance from low-SES families, we 
used the Mann-Whitney test to compare writing scores between 
the low-SES and high-SES groups in each phase for each condi-
tion. For RQ3, concerning the reliability of the scaffolded task 
design, we examined Spearman’s rank correlations between Phase 
I scores (i.e., the critique writing scores and the combined sum of 
the lead-in and critique writing scores) and the state test scores for 
each topic.  
 Across all statistical analyses, the significance level α was set at 
0.05. Also, by conducing power analysis with the assumption of a 
statistical power of 0.9, a minimum sample size of 324 was needed 
for a 2X3 factorial ANOVA analysis. Our sample was therefore 
sufficient for the analyses we planned to conduct. 

4. Results 

RQ1. Does the scaffolding design help improve student perfor-
mance on the written critique task?  

A factorial ANOVA on ranks was conducted to compare the main 
effects of order of task topic and condition and the interaction 
effect between them on Phase I and Phase II written critique task 
scores. The topic order included two levels (Ban Ads - Cash for 
Grades, and Cash for Grades - Ban Ads), and the condition con-
sisted of three levels (control, scaffolded, scaffolded with feed-
back). Results showed significant main effects of topic order for 
written critique scores: F(1, 466) = 10.28, p = .001 for Phase I; 
and F(1, 466) = 7.56, p = .006 for Phase II. A significant main 
effect of scaffolding version only appeared for the Phase I written 
critique score: F(2, 466) = 3.24, p = .040. No significant interac-
tion effect was found in either phase. In addition, for students who 


Exploring the Effect of Scaffolding Design 619 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

started with Ban Ads, Wilcoxon Signed Rank tests showed that 
they had significantly higher scores for Ban Ads than Cash for 
Grades (all p’s < .001) in all the conditions. On the other hand, for 
students who started with Cash for Grades, they performed simi-
larly on both topics in all three conditions. These results indicated 
that the task topics had an effect. Therefore, we proceeded to 
analyze the data by examining the scaffolding effect on each topic.   
 To compare the scores of the Phase I written critique for each 
topic among the conditions, a Kruskal-Wallis test showed that 
there was a statistically significant difference in Phase I Ban Ads 
scores among the three conditions (χ2(2) = 8.54, p = .014). Post-
hoc pairwise comparisons with Type I error controlled across the 
tests using the Bonferroni approach showed that students in the 
scaffolded only condition performed significantly better than 
students in the control condition (Cohen’s d = 0.45), but there was 
no significant difference between the two scaffolded conditions. 
As for the Phase I Cash for Grades scores, we did not find any 
significant difference among the three conditions [χ2(2) = 0.46, p = 
.797].  
 Next, we further compared the scores of the written critiques on 
each topic across the conditions regardless of the phases (e.g., 
three groups took Ban Ads in Phase I and three groups took Ban 
Ads in Phase II). A Kruskal-Wallis test showed a significant dif-
ference among the conditions for Ban Ads only [χ2(5) = 18.61, p = 
.002]. As previously indicated, the post-hoc pairwise comparisons 
indicated that this difference was driven by the significant differ-
ence between the scaffolding only and the control conditions. 
These results suggested that students critiquing the Ban Ads argu-
ments may benefit from the scaffolding design, but the feedback 
was not helpful.   

RQ2. Is there an achievement gap between students from low-
income and high-income families? 

Table 2 shows the average writing performance scores in each 
phase for every condition from the low and high SES groups, 


620 Song, Chao, and Attali 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

which were defined based on whether the students were receiving 
reduced-price/free lunch or not. The low-income groups per-
formed worse on the written critique tasks than the high-income 
groups in both phases under the same condition. Mann-Whitney 
tests revealed significant differences in the scaffolded only condi-
tion (Phase II Cash for Grades: U = 394.5, p = .050; Phase I Cash 
for Grades: U = 276.5, p = .015; Phase II Ban Ads: U = 245, p = 
.005), and the scaffolded with feedback condition (Phase I Ban 
Ads: U = 359, p = .028). The corresponding effect sizes (Cohen’s 
d) were 0.45, 0.58, 0.68, and 0.52, further suggesting that the 
differences between the low and high SES groups were not trivial. 
 
Table 2. Written critique scores of low-SES and high-SES groups 
 

Note. Ban = Ban Ads; Cash = Cash for Grades; S = scaffolded condi-
tion that has a lead-in task; FB = feedback. 

RQ3. Does the scaffolding design measure students’ argument 
critique skills reliably? 

To answer this question, we focused on Phase I performance 
because no scaffolding was involved in Phase II. Given that feed-
back did not have any effect on student performance, we combined 
the two scaffolded conditions. Participants with PARCC scores 
were used to examine the correlation between the Phase I task and 
the state test, given that the majority of our sample took this test 
(Table 3).  
Table 3. Correlations between the critique task performance 
(Phase I) and state test scores 


Exploring the Effect of Scaffolding Design 621 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

Task (Condition) Measure N State 
Reading 

State 
Writing 

Ban (Control)  CR 44 .61 .55 
Ban (Scaffolded) CR 96 .64 .63 
Ban (Scaffolded) MC + CR 96 .71 .67 
Cash (Control)  CR 49 .68 .70 
Cash (Scaffolded) CR 93 .73 .65 
Cash (Scaffolded) MC + CR 93 .68 .54 
Note. Ban = Ban Ads; Cash = Cash for Grades; CR = constructed 
response; MC = multiple choice. 
 
The following analysis included 282 students for whom PARCC 
test scores were available. We correlated the Phase I scores with 
the state test scores for control and non-control conditions on each 
topic. Spearman’s rank correlations ranged from .61 to .73 be-
tween Phase I scores and PARCC reading scores, while Phase I 
scores’ correlations with PARCC writing ranged from .54 to .70 
(all p’s < .01). The results showed that Phase I scores have higher 
correlations with the state reading scores than the state writing 
scores, except for the Cash for Grades control condition. In addi-
tion, correlations between the scores from scaffolding design (non-
control) and the state scores tended to be higher than those without 
(control condition), except for the correlations between Cash for 
Grades and state writing.  

5. Conclusion 

In this study, we examined the effect of scaffolding on student 
performance on argument critique tasks. The study results showed 
a small positive impact of scaffolding on one topic, but not the 
other, revealing that student skills of writing critiques could be 
affected by topic and argument content. Additionally, students 
from low-SES families did not perform as well as their peers. We 
also found that student performance on the critique tasks had 
moderate or strong correlations with the state reading and writing 
test scores. It is not surprising that many students did not produce 


622 Song, Chao, and Attali 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

high-quality critiques because these kinds of tasks might have 
been new to them. Middle school students may be just beginning 
to learn some common reasoning errors, while other students may 
not be introduced to logical fallacies or evidence-based justifica-
tion until high school. In our study, some students appeared to 
have difficulty explaining the reasoning problems, even when they 
had a rough sense that something was wrong in the given argu-
ments. For example, a student wrote: “I am here to explain some 
mistakes that have been made while writing this letter to the edi-
tor. Such as, in your first reason you don’t really know if advertis-
ing is always a good thing. Also, it may or may not bring families 
together.” This student pointed out that advertisements may not 
bring families together as it is claimed in the letter but did not 
provide an explanation that the claim is overgeneralizing from a 
single example.      
 One implication from our study is that scaffolded lead-in tasks 
and feedback themselves may not be strong enough to support the 
development of students’ skills in critically evaluating arguments, 
especially for students from low-income families. Each lead-in 
task only consisted of seven multiple-choice items on reasoning 
flaws, and the feedback simply informed the students of the cor-
rect answers without providing an explanation of the rationale. 
Prior research has shown that the effects of feedback on student 
learning may vary due to individual differences, task characteris-
tics, and feedback type (Shute 2008). Students may benefit more 
from receiving feedback that provides an explicit explanation of 
the reasoning flaws in the given arguments and shows the strategy 
for identifying such flaws.  
 In addition, the meta-analysis study on academic interventions 
for students with low SES status conducted by Dietrichson, Bog, 
Filges, and Jorgensen (2017) shows that tutoring and feedback and 
process monitoring have relatively robust average effect sizes on 
elementary and middle school students’ academic achievements 
when teachers are involved in the learning process. There has been 
rich evidence of the persistent achievement gap between low SES 


Exploring the Effect of Scaffolding Design 623 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

students and high SES students (e.g., Kim and Quinn 2013; Sirin 
2005; White 1982). Low SES students may require intensive 
instructional support to catch up. Therefore, students from low-
income families and low-achieving students might benefit more 
from direct instruction as to how to detect various reasoning flaws 
and unwarranted assumptions because they are still in the process 
of developing an epistemological understanding of argumentation 
and may not yet be able to assume the perspective of an objective 
evaluator (Kuhn 2009; Weinstock et al. 2004). Typically, this 
higher-order, argumentative thinking skill does not develop natu-
rally (Kinsler 1990). Teaching students effective strategies could 
help improve their ability to evaluate the strengths and weaknesses 
of a given argument, as indicated in existing studies (Nussbaum 
and Edwards 2011; Song and Ferretti 2013). In future work, we 
could design scaffolded tasks that require students to use some 
strategies (e.g., asking critical questions) as well as provide feed-
back that enables students to learn these strategies.  
 Another implication from our study is that students’ reactions 
to different topics and content may vary. Students in general per-
formed better on Ban Ads than Cash for Grades, even though these 
two task sets were designed from the same blueprint. While prior 
research that involved these forms did not reveal significant differ-
ences in student performance (Deane et al. 2019; van Rijn, Graf 
and Deane 2014), it is important to point out that the types of 
fallacies in these two written critique tasks are quite different. The 
Ban Ads critique consists of several different types of reasoning 
errors, such as hasty generalization, begging the question, post 
hoc, false assumption, and contradictory information, but the Cash 
for Grades critique focuses on the post hoc fallacy, which involves 
jumping to a quick causal conclusion based on the correlation of 
two factors. Given that the scaffolding design in the current study 
is aimed at covering a variety of common reasoning errors, it is 
aligned better with the Ban Ads written critique task than the Cash 
for Grades written critique task. In Cash for Grades, students are 
expected to demonstrate a thorough analysis of various factors that 


624 Song, Chao, and Attali 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

could contribute to a situation (e.g., social and personal reasons, as 
well as school factors that might lead to better academic achieve-
ment). This appears to be challenging to the participating students. 
While many inductive arguments to causal conclusions are estab-
lished on correlations, the fallacy of post hoc occurs when people 
overlook evidence that ought to be taken into account and make a 
quick causal conclusion. Walton (2009) presented three critical 
questions that match the argumentation scheme for arguing from 
correlation to causation, which can help people detect the post hoc 
fallacy: (1) Is there really a correlation between A and B? (2) Is 
there any reason to think that the correlation is any more than a 
coincidence? (3) Could there be some third factor C that is causing 
both A and B? (p. 216) Future research could explore how to 
effectively support students’ learning of argumentation schemes 
and critical questions and examine the impact on argument critique 
skills.  
 We also found that many students expressed that awarding 
students with cash for good academic performance is a great idea, 
which could result in acknowledging the author’s arguments rather 
than objectively critiquing them. Critical thinking research has 
shown that prior belief biases typically lead to accepting fallacious 
arguments and invalid conclusions in a variety of critical thinking 
and evaluation tasks (West, Toplak and Stanovich 2008). People 
tend to accept belief-consistent arguments more readily than be-
lief-inconsistent arguments (Klaczynski et al. 1997; Stanovich and 
West 2007). 
 The study has some limitations. First, data was collected from a 
convenience sample and so cannot be considered representative of 
the U.S. middle school population as a whole. The participating 
schools were involved in other prior research projects and were 
selected on a first-come, first-served basis. Second, we did not 
collect information about participants’ motivation, which probably 
impacted student responses. We can infer from student responses 
and the timing data that some did not take the test seriously. 


Exploring the Effect of Scaffolding Design 625 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

Roughly 5% of the students4 wrote something irrelevant to the 
task or simply copied some text from the task (e.g., “The school 
should have a program where kids meet other kids and make 
friends”). These students spent less than one minute completing 
their written responses. Motivation is always an issue for low-
stakes assessments. Third, we did not measure students’ back-
ground knowledge or beliefs on the two topics used in the test, so 
we could not draw any conclusion regarding how those aspects 
affected their argument critique performance. Only two topics 
were used in this study, and student performance may vary if 
different topics were presented. 
 In sum, the results of this study suggest the need to increase 
effort to design effective scaffolding for students who are develop-
ing skills associated with identifying fallacious arguments. Stu-
dents need to learn to detect fallacies that abound in everyday 
argumentation, which often contribute to unfounded beliefs and 
misconceptions. In future work, we will continue our investigation 
of how to optimize the scaffolding and assessment design to sup-
port the development of students’ argument critique skills, such as 
designing explanatory feedback and including activities that pro-
mote skills of asking critical questions.  

References 
CCSS. 2018. Students who are college and career ready in reading, 

writing, speaking, listening, & language. Retrieved March 21, 2020, 
from http://www.corestandards.org/ELA-Literacy/introduction 

Copi, Irving M., and Keith Burgess-Jackson. 1996. Informal logic. 
Englewood Cliffs, NJ: Prentice Hall.  

Deane, Paul, Yi Song, Peter van Rijn, Tenaha O’Reilly, Randy E. Ben-
nett, Mary Fowles, John Sabatini and Mo Zhang. 2019. The case for 
scenario-based assessment of written argumentation. Reading and 
Writing 32: 1575–1606.   

 
4 We calculated the mean scores of each task for each condition after removing 
these students and found that the mean scores were similar to those presented in 
Table 1. So, we decided to include all the participants in our analyses.  


626 Song, Chao, and Attali 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

Dietrichson, Jens, Martin Bog, Trine Filges and Anne-Marie K. Jorgen-
sen. 2017. Academic interventions for elementary and middle school 
students with low socioeconomic status: A systematic review and 
meta-analysis. Review of Educational Research 87(2): 243–282. 

Ferretti, Ralph P., Willan E. Lewis and Andrews-Weckerly, Scott. 2009. 
Do goals affect the structure of students’ argumentative writing strat-
egies? Journal of Educational Psychology 101: 577–589. 

Ferretti, Ralph P., Charles A. MacArthur and Nancy S. Dowdy. 2000. 
The effects of an elaborated goal on the persuasive writing of stu-
dents with learning disabilities and their normally achieving peers. 
Journal of Educational Psychology 93(4): 694–702. 

Kim, James. S. and David M. Quinn. 2013. The effects of summer 
reading on low-income children’s literacy achievement from kinder-
garten to grade 8: A meta-analysis of classroom and home interven-
tions. Review of Educational Research 83(3): 386–431. 
doi:10.3102/0034654313483906 

Kinsler, Kimberly. 1990. Structured peer collaboration: Teaching essay 
revision to college students needing writing remediation. Cognition 
and Instruction 7(4): 303–321. 

Klaczynski, Paul A., David H. Gordon and James Fauth. 1997. Goal-
oriented critical reasoning and individual differences in critical rea-
soning biases. Journal of Educational Psychology 89(3): 470–485. 

Kuhn, Deanna. 2009. The importance of learning about knowing: Creat-
ing a foundation for development of intellectual values. Child Devel-
opment Perspectives 89(2): 112–117.  

Kuhn, Deanna, Richard Cheney and Michael Weinstock. 2000. The 
development of epistemological understanding. Cognitive Develop-
ment 15(3): 309–328. 

Leu, Donald J., Elena Forzani, Chris Rhoads, Cheryl Maykel, Clint 
Kennedy and Nicole Timbrell. 2015. The new literacies of online re-
search and comprehension: Rethinking the reading achievement gap. 
Reading Research Quarterly 50(1): 37–59.   

National Center for Education Statistics. 2012. The Nation’s Report 
Card: Writing 2011 (NCES 2012-470). Institute for Education Sci-
ences, U.S. Department of Education, Washington, D.C. 

Neuman, Yair, Michael P. Weinstock and Amnon Glasner. 2006. The 
effect of contextual factors on the judgement of informal reasoning 
fallacies. The Quarterly Journal of Experimental Psychology 59(2): 
411–425.  

Nussbaum, Michael E. and Ordene V. Edwards. 2011. Critical questions 
and argument stratagems: A framework for enhancing and analyzing 


Exploring the Effect of Scaffolding Design 627 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

students’ reasoning practices. The Journal of the Learning Sciences 
20(3): 443–488. 

Nussbaum, Michael E. and CarolAnne M. Kardash. 2005. The effects of 
goal instructions and text on the generation of counterarguments 
during writing. Journal of Educational Psychology 97(2): 157–169. 

Rijn, Peter van, Edith A. Graf and Paul Deane. 2014. Empirical recovery 
of argumentation learning progressions in scenario-based assessments 
of English language arts. Spanish Journal of Educational Psychology 
(Psicologia Educativa) 20: 109–115. 

Sabatini, John, Tenaha O’Reilly and Paul Deane. 2013. Preliminary 
reading literacy assessment framework: Foundation and rationale for 
assessment and system design (Research Report 13-30). Princeton, 
NJ: Educational Testing Service.  

Shute, Valerie J. 2008. Focus on formative feedback. Review of Educa-
tional Research 78(1): 153–189. 

Sirin, Selcuk R. 2005. Socioeconomic status and academic achievement: 
A meta-analytic review of research. Review of Educational Research, 
75(3): 417–453. doi:10.3102/00346543075003417 

Song, Yi. and Ralph Ferretti. 2013. Teaching critical questions about 
argumentation through the revising process: effects of strategy in-
struction on college students’ argumentative essays. Special Issue: 
Reading and Writing: An Interdisciplinary Journal, 26(1), 67–90. 

Song, Yi., Paul Deane, and Mary E. Fowles. 2017. Examining students’ 
ability to critique arguments and exploring assessment and instruc-
tional implications. (Research Report No. RR-17-16). Princeton, NJ: 
Educational Testing Service. 

Stanovich, Keith E. and Richard F. West. 2007. Natural myside bias is 
independent of cognitive ability. Thinking & Reasoning 13(3): 225-
247. doi:10.1080/13546780600780796 

Vygotsky, Lev S. 1978. Mind in society: The development of higher 
psychological processes. Cambridge, MA: Harvard University Press. 

Walton, Douglas. 1996. Argumentation schemes for presumptive reason-
ing. Mahwah, NJ: Lawrence Erlbaum. 

Walton, Douglas. 1999. Rethinking the Fallacy of Hasty Generalization. 
Argumentation 13(2): 161–182. 

Walton, Douglas. 2009. Jumping to a conclusion: Fallacies and stand-
ards of proof. Informal Logic 29(2): 215–243. 

Walton, Douglas, Chris Reed and Fabrizio Macagno. 2008. Argumenta-
tion schemes. New York, NY: Cambridge University Press. 

http://rd.springer.com/article/10.1007/s11145-012-9381-8
http://rd.springer.com/article/10.1007/s11145-012-9381-8
http://rd.springer.com/article/10.1007/s11145-012-9381-8


628 Song, Chao, and Attali 
 

© Yi Song, Szu-Fu Chao, and Yigal Attali. Informal Logic, Vol. 40, No. 4 (2020), pp. 
605–628 

Webb, N. M., R. J. Shavelson and E. H. Haertel. 2006. Reliability coef-
ficients and generalizability theory. Handbook of Statistics 26(4): 81–
124.   

Weinstock, Michael, Yair Neuman and Iris Tabak. 2004. Missing the 
point or missing the norms? Epistemological norms as predictors of 
students’ ability to identify fallacious arguments. Contemporary Edu-
cational Psychology 29(1): 77–94.  

West, Richard F., Maggie E. Toplak and Keith E. Stanovich. 2008. 
Heuristics and biases as measures of critical thinking: Associations 
with cognitive ability and thinking dispositions. Journal of Educa-
tional Psychology 100(4): 930–941. doi:10.1037/a0012842. 

White, Karl R. 1982. The relation between socioeconomic status and 
academic achievement. Psychological Bulletin 91(3): 461–481. 
doi:10.1037/0033-2909.91.3.461 

Wobbrock, Jacob O., Leah Findlater, Darren Gergle and James J. Hig-
gins. 2011. The aligned rank transform for nonparametric factorial 
analyses using only ANOVA procedures. Proceedings of the ACM 
Conference on Human Factors in Computing Systems (CHI '11). 
Vancouver, British Columbia (May 7–12, 2011). New York: ACM 
Press, pp. 143–146. Retrieved March 7, 2020, from  
http://faculty.washington.edu/wobbrock/pubs/chi-11.06.pdf 

 
http://faculty.washington.edu/wobbrock/pubs/chi-11.06.pdf