What Constitutes Skilled Argumentation and How Does it Develop


© Marion Goldstein, Amanda Crowell & Deanna Kuhn. 
Informal Logic, Vol. 29, No. 4 (2009), pp. 379-395. 

What Constitutes Skilled Argumentation  
and How Does it Develop? 
 
MARION GOLDSTEIN 
AMANDA CROWELL 
DEANNA KUHN 
 
Teachers College 
Columbia University 
525 West 120th St.  
New York, NY 10027 
U.S.A. 
 
mariongoldstein@gmail.com
amanda.jane.crowell@gmail.com
dk100@columbia.edu
 
 
Abstract: We report our efforts to 
assess the skill of contemplating and 
evaluating argumentation. An adaptive 
forced-choice instrument was 
developed and administered to 6th 
grade students, 7th grade students who 
had participated in a year-long 
intervention that successfully 
strengthened their argumentation 
production skills, and expert arguers. 
The instrument was sensitive enough 
to detect differences in skill level 
across these groups. Despite their 
gains in production skill, however, 7th 
graders showed only modest 
superiority over the untrained 6th 
graders and performance well below 
that of experts. Meta-level demands 
involved in evaluation but not present 
in production, we propose, may make 
evaluation more difficult.  
 
 
Résumé: Nous faisons un rapport sur 
nos efforts d’évaluer les habiletés 
d’interpréter et d’évaluer l’argument-
ation. Nous avons créé une épreuve 
qui mesure ces habiletés et l’avons 
administrée à trois groupes différents: 
des étudiants de 6ième année, des 
étudiants de 7ième année qui ont réus-
sis à améliorer leurs habiletés argu-
mentatives durant une année d’inter-
ventions, et des experts. L’épreuve 
était assez judicieuse pour détecter les 
différents niveaux de performances 
dans ces trois groupes. Malgré l’amé-
lioration des étudiants de 7ième année, 
leurs résultats de l’épreuve étaient 
seulement modestement supérieurs à 
ceux des étudiants non entraînés de 
6ième année, et très inférieurs à ceux 
des experts. Nous proposons que les 
exigences à méta-niveaux impliquées 
dans l’évaluation peuvent rendre 
l’évaluation plus difficile. 

 
Keywords: argument; argumentation; development; metacognition; causal 
reasoning 
 
 
mailto:mariongoldstein@gmail.com
mailto:amanda.jane.crowell@gmail.com
mailto:dk100@columbia.edu


Goldstein, Crowell & Kuhn 380 

1. Reasoning as argumentation 
 
Cognitive psychologists have a long-standing interest in reasoning, 
as well as more elemental cognitive processes, and believe the 
reasoning of ordinary individuals to be amenable to empirical 
investigation. Yet, until recently, virtually all of their research 
attention in the realm of reasoning has been devoted to solitary 
problem solving by individuals, most often of well-structured 
problems of an artificial nature, such as the extensively studied 
Tower of Hanoi problem. Argumentation, in contrast, defined as 
asserting and defending claims consistent with one’s purposes, has 
a social dimension and is ubiquitous in everyday activity. Indeed, 
the case has been made by Oaksford, Chater, and Hahn (2008) that 
argumentation is the umbrella under which all reasoning lies; in 
their words, it is “the more general human process of which more 
specific forms of reasoning are a part” (p. 383). Asserting, 
supporting, and refuting claims is the purpose to which we apply 
our reasoning skills. If so, the repeatedly noted poor performance 
of students in assessments of argumentation skill (see Kuhn & 
Franklin, 2006; Kuhn, in press, for review) becomes a cause for 
serious concern. 
 
2. Assessing skills of argument   
 
We describe here our efforts to identify and measure argumentation 
skills in adolescents and adults. In particular, we focus in this 
article on evaluative skills. There have been some studies of 
people’s generally weak skills in the evaluation of individual 
arguments, but little research that we know of on meta-level, or 
evaluative, skills with respect to dialogic argumentation. In earlier 
work we describe here, we have seen students improve over time in 
the practice of dialogic argumentation. Do they, we wondered, 
show corresponding developmental change in the appreciation of 
stronger vs. weaker argumentive moves during discourse? 
Increasing use of stronger moves implies such awareness, but we 
thought the question was worthy of empirical test. 
 We begin by noting similarities and differences between our 
general approach and those of several others engaged in research 
on argument. The first distinction to be made is that between 
argument as a product and argumentation as a process. Studies of 
the former focus on rhetorical arguments in support of a claim 
advanced by a single individual in the absence of an interpersonal 
or dialogic context. Analyses of such arguments focus on formal 
structure and the presence or absence of individual components 
regarded as necessary to sound arguments. As noted by Clark, 
Sampson, Weinberger, and Erkens (2007) in their review of 
approaches, most researchers adopt a framework that draws on 


Skilled Argumentation & Development 381 

Toulmin’s (1958) analysis of the components of argument. 
Toulmin proposed that an argument consists of six possible 
components. These include claims (the conclusions drawn by the 
arguer), data (statements used as evidence to support a claim), 
warrants (statements that relate the claim to data), qualifiers 
(special conditions that present the arguer’s degree of certainty 
about the claim), backings (underlying assumptions that provide 
justifications for the warrant), and rebuttals (statements that 
acknowledge the limits of a claim). Subsequent researchers have 
simplified Toulmin’s framework (e.g., Erduran, Simon, & Osborne, 
2004), collapsing categories in order to improve clarity and 
reliability of analysis.  
 Regardless of the adaptations, stronger arguments are thought 
to contain more of the different components than weaker 
arguments. The accuracy or relevance of the statements within an 
argument is accorded lesser importance in determining an 
argument’s strength. Researchers who use a framework focusing on 
formal argumentation structure are able to identify the argument 
components that students tend to omit during argument 
construction. Pedagogical supports may then aim to address these 
skill deficits (Larson, Britt, & Kurby, in press). 
 Other researchers have focused on types of reasoning 
individuals use in an argument, more than the formal structure of 
the argument. These approaches are concerned with the quality of 
the grounds and the accuracy of the content within an argument. 
Arguments that distort evidence or address irrelevant points, for 
example, are considered weaker than arguments that are accurate 
and logically coherent. Jimenez-Aleixandre, for example, examines 
students’ treatment of claims by classifying their reasons into 
epistemic categories including inductional, deductional, appeals to 
analogy, and others (Jimenez-Aleixandre & Erduran, 2008). 
Similarly, Duschl (2008) has suggested a system originating with 
Walton (1996), in which students’ arguments are classified in terms 
of requests for information, expert opinion, inference, and analogy. 
In each of these cases, the proportions of arguments in each 
category typically are tracked to identify changes in the nature of 
students’ reasoning over time. 
 We turn now to our focus here, analyses of argumentation as a 
dialogic process in which two or more individuals engage. Several 
researchers have studied this dialogic process and undertaken to 
identify its characteristics, some at the more macro level of an 
entire dialog or dialogic sequence and others, like our own, at the 
micro level of individual utterances. Leitäo (2000, 2003), for 
example, analyzes dialogic sequences and posits that the most 
successful argumentive interactions adhere to a specific pattern 
involving a claim, a responsive counter-claim, and an integrative 
reply that incorporates the previous ideas. Strong arguments, 


Goldstein, Crowell & Kuhn 382 

according to this framework, build on the ideas of participants and, 
over time, differences in perspectives are negotiated and resolved. 
Erduran et al. (2004) classify conversational turns into a 
hierarchical category system ranging from simple exchange of 
claims to single and multiple rebuttals. Clark and Sampson (2008) 
similarly propose a category system that includes analysis of 
discourse moves as well as conceptual quality of arguments.  
 
 
3. A functional coding scheme for dyadic argumentation 
 
Our own efforts to develop a system of analysis applicable to 
dialogic argumentation (Felton & Kuhn, 2001; Kuhn & Udell, 
2003, 2007) were based on consideration of the goals of 
argumentive discourse. Specifically, we drew on the framework 
originating with Walton (1989), who posits two goals of 
argumentation: a) to obtain commitments from the opponent to 
support one’s own position, and b) to challenge and weaken the 
opponent’s position by critiquing their premises. Both of these 
goals mandate attention to the ideas of the opposing side. In this 
way, the framework is transactional in nature because the strength 
of the argument, or whether an argument is deemed productive, is 
determined by whether and how the arguer addresses the ideas put 
forth by the interlocutor. 
 In our functional coding scheme (Felton & Kuhn, 2001; Kuhn, 
Goh, Iordanou, & Shaenfield, 2008; Kuhn & Crowell, in press), a 
code is assigned to each utterance in a dialog. These codes are 
based on an utterance’s functional relation to the opponent’s 
preceding utterance. Specifically, what function does the utterance 
serve in relation to the immediately preceding utterance and with 
respect to the mutual goals that define the objective of the 
interchange? The weakest arguments simply express agreement or 
disagreement without mentioning a rationale, or express one’s own 
ideas in the absence of any attention to the other side. At the next 
level, the arguer goes beyond simple disagreement to advance an 
argument to justify this disagreement. Here, distinction between 
two different forms of counterargument becomes critical. The 
weaker form of counterargument, a Counter-Alternative, expresses 
disagreement by advancing a different argument in support of one’s 
own position but does not directly address the argument put forth 
by the opponent, thereby leaving its force intact. For example, in a 
dialog in which 6th grade students were asked to argue about the 
topic “Should a misbehaving student be expelled from school?,” 
one boy was confronted with his opponent’s statement, “They 
shouldn’t be expelled because they deserve another chance.” His 
reply ignored his peer’s “another chance” concept and instead 
introduced a new, perfectly sound argument against the opponent’s 


Skilled Argumentation & Development 383 

position, but one that leaves the opponent’s argument intact: “Yes 
but they have been acting up for a while and their behavior has not 
gotten better and it’s not fair to the other kids who are trying to 
learn.” 
  A Counter-Critique, the stronger of the two counterargument 
types, disagrees by responding directly to the opponent’s argument 
with an argument designed to weaken its force. For example, in a 
dialog later in the year, the 6th graders were arguing about another 
topic (Should the sale of human organs be allowed?), and the same 
boy exhibited greater skill in genuine counterargument. His 
opponent argued, “[They] shouldn’t be allowed to [sell their 
kidney] because it is part of their body,” and this time he directly 
addressed and undertook to weaken his opponent’s argument: “But 
if people are willing to give up their own body parts and be so 
generous to the people who need kidneys why should we stop 
them…?” Over time and with extended opportunity to practice, 
students’ transition from the frequent use of weaker argumentive 
strategies to more frequent use of powerful strategies, notably 
Counter-Critique, reflects development in the production skills 
central to argumentive discourse.  
 Skill in dialogic argumentation is clearly significant in its own 
right—a skill that pays dividends not only in academic contexts but 
in the world of work and in everyday life. In our work on 
developing argument skills in adolescents (see Kuhn & Crowell, in 
press, for a current review), we have focused on dialogic 
argumentation not only for this reason but also because we, along 
with a number of others (Billig, 1987; Graff, 2003) see it as a 
promising pedagogical path to the development of individual 
expository argument skills in both verbal and written forms. Our 
recent work (Kuhn et al., 2008) provides some initial support for 
this view. In an extended, year-long intervention, middle-school 
students engaged in electronic dialogs about a series of social 
issues such as whether a misbehaving student should be expelled 
from school, whether parents have the right to homeschool their 
child, and whether the USA should intervene to help a small third-
world country under attack. Students were assigned to positions 
based on their responses to an opinion poll and, working with a 
same-side partner, were asked to convince a succession of peers 
holding an opposing view that their position was the better one. 
Electronic dialogs lasted about 30 minutes, and each topic was 
debated 5-6 times over the course of several months, each time 
with a new pair from the opposing side. This activity culminated in 
a final, full-class debate. Over the year, students demonstrated 
gains in what we have referred to as argument production—
showing more frequent use of the more skilled strategy of 
counterargument (the Counter-Critique) in their dialogs. They also 
showed gains in individual (non-dialogic) argumentive essays. 


Goldstein, Crowell & Kuhn 384 

 A third aspect of skilled argument, in addition to skill in 
argumentive discourse and in production of individual expository 
arguments, is skill in argument evaluation. We turn now to describe 
our recent efforts to investigate this third component. Both 
academic and research assessments have shown students to be 
weak in evaluation, as well as production, of individual expository 
arguments, but little is known about the skills involved in the 
evaluation of dialogic argument. In our dialogic-based 
interventions to develop argumentation production skills, such as 
the one described above, we have had success in enhancing 
performance of both dialogic and individual production of sound 
arguments. How might one assess skill in evaluation of dialogic 
argument, we wondered, and would we see corresponding advance 
in evaluation skill following interventions that have been 
successful in enhancing the two performance components? 
 
 
4. Assessing skill in evaluation of dialogic argument 
 
In the design of such an instrument, we focused our attention on 
what our work on production of dialogic argumentation has 
suggested is a key evolution—the transition illustrated above from 
less advanced levels to responding to an opponent’s argument by 
identifying its weaknesses and thereby reducing its force (the 
Counter-Critique). Prior to this transition, our study of dialogic 
production has shown (see Kuhn & Crowell for review), arguers 
initially ignore the opponent’s argument entirely and focus 
exclusively on their own arguments supporting their own opposing 
claim; or, if they have begun to pay some attention to the 
opponent’s argument, they express their disagreement with it, 
supported by a new argument against the opponent’s position that 
leaves the opponent’s immediately preceding argument 
unaddressed (the Counter-Alternative).  
 We developed an instrument to measure students’ ability to 
distinguish between these two argumentive strategies—the 
Counter-Critique and the Counter-Alternative.  It situates students 
in a hypothetical argument with a friend. This friend, who we call 
Lee, begins with a claim about why some students fail in school. 
Upon reading Lee’s argument, the student is presented with two 
options for how to respond to Lee. One choice is a strong 
counterargument—a Counter-Critique—and one is a weak 
counterargument—a Counter-Alternative. Students are asked to 
imagine that they disagree with Lee’s position and choose the 
response to Lee that is in their opinion the stronger of the two. The 
response the student chooses dictates Lee’s rebuttal, which is a 
Counter-Critique response to the reply selected by the student. 
Again, the student has two options for how to respond to Lee’s 


Skilled Argumentation & Development 385 

rebuttal. In this way, the instrument is designed to mimic genuine 
argumentation and permits an assessment of the extent to which 
respondents are able to recognize and select stronger responses to 
their hypothetical opponent. When the respondent completes three 
counterargument selections relating to why students fail in school, 
the topic changes to why many criminals return to a life of crime 
after being released from jail. Again, for each of three items, Lee 
presents claims and the student must choose the better of two 
counterarguments. The third topic is whether movie stars should 
make as much money as they do. Pilot testing showed the three to 
be of comparable complexity and difficulty level. Response choices 
that respondents in the pilot sample found confusing or of low 
plausibility were revised or eliminated and replaced. All 
alternatives thus had at least surface plausibility as accurate 
statements with respect to the topic. To minimize the likelihood of 
respondents’ choosing a response option based on factors other 
than its relevance to the preceding claim, we ensured that both 
options at each decision point were of comparable length and 
comprehensibility, as well as accuracy as true statements.  
 Although respondents dictate their own path through the 
assessment based on the response options they choose, each 
respondent follows a path in which he/she chooses three 
counterarguments for each topic. Thus, respondents are confronted 
with nine decisions over the course of the assessment. A 
respondent’s score is the number of Counter-Critique replies he/she 
selected. Nine is the highest possible score. 
 The instrument is housed in SurveyMonkey, a Web-based tool 
used to create surveys and other self-report instruments. The tool’s 
skip logic enables the designer to dictate which test item follows a 
respondent’s selection. This allows the instrument to absorb 
respondents in an actual sustained dialogic argument by ensuring 
that they are presented relevant, factually plausible rebuttals to their 
choices of counterarguments. This format, we believe, is superior 
to independent, isolated test items in that it is more reflective of 
authentic discourse. An example of the problem structure for one of 
the three topics, why some students fail in school, is presented in  
Figure 1. Respondents, however, never see the entire structure; 
instead, they see only the two response options at each point, 
determined by their preceding choice. 
 

Goldstein, Crowell & Kuhn 386 

 
Figure 1. Problem structure for one topic; left branches are the 
stronger response option (the Counter-Critique). 
 
  
5. Initial findings 
 
Initial testing of this instrument was undertaken with 6th and 7th 
grade students (ages 11-13) at an urban public middle-school. The 
student body is 75% minority (primarily African American and 
Hispanic, with a small number of Asian students). Students are 
residents of the surrounding low- to middle-income neighborhood.  
 Beginning in 6th grade and continuing through 8th grade, 
students at this school participate in a twice-weekly philosophy 
class designed to foster the development of argumentation skills. 
Work with this curriculum (Kuhn & Udell, 2003; Kuhn et al., 
2008) supports the view that argumentive discourse skills develop 
gradually through authentic practice in argumentation. As 
described earlier, the curriculum provides dense experience in 
argumentive discourse as students debate real-world social issues, 
first in interchanges with a succession of peers holding an opposing 
view and finally in a whole-class debate. 


Skilled Argumentation & Development 387 

 We administered the argumentation evaluation instrument 
described above to 92 of these 6th grade students at the beginning 
of the school year, prior to their exposure to the argument 
curriculum, in order to assess the evaluation skills of novice 
arguers of this age group. As explained earlier, a student’s score is 
the number of Counter-Critique responses he/she selects. Students 
receive a 9 if they choose the stronger counterargument—
representing a Counter-Critique—at each decision point. In this 
assessment, 6th grade students achieved a mean score of 5.66.  
 Because each item has only two response options (and 
therefore students have a 50% chance of choosing the Counter-
Critique response), chance responding on all 9 items can be 
expected to lead to a score of 4 or 5. For this reason, we classified 
scores of 0-5 in a low performance category. Scores of 6 and 7 
were assigned to a middle category, as possibly showing some 
competence in choice of the Counter-Critique option, and scores of 
8 or 9 were classified as reflecting high performance and clear 
recognition of Counter-Critique as the superior alternative. Of 92 
6th grade students who completed the assessment, 41 (44.57%) 
received low scores, 32 (34.78%) received middle scores, and 19 
(20.65%) received high scores.  
 We also administered the instrument to 63 students at the 
beginning of their 7th grade year. These students had completed the 
first year of the three-year argument curriculum. The mean score 
for this group was 6.17. The direction of the difference between the 
6th and 7th grade scores suggests some increasing evaluation skill 
with increasing age and experience, but this mean score is not 
significantly higher than that of the 6th grade students, t(150.7)=-
1.731, p=.10. Hence, the increase is slight at best. Of the 63 7th 
grade students who completed the instrument, 20 (37.15%) 
received a low score, 28 (44.44%) received a middle score, and 15 
(23.81%) received a high score. Future research will track students’ 
skill levels throughout the course of the three-year curriculum. In 
this ongoing work, we have expanded the instrument to a five-
topic, 15-item one to avoid ceiling effects. 
 In addition, as an expert comparison group, we administered 
the assessment to a group of 37 doctoral students in developmental 
or educational psychology. The mean score for experts was 7.46. 
Four experts (10.81%) received a low score, 13 (35.14%) received 
a middle score, and 20 (54.05%) received a high score.  
 Mean scores across the three groups differed significantly, 
F(2,189)=12.56, p<.001. This effect establishes that the instrument 
is sensitive enough to detect differences in argumentation 
evaluation skills. It also establishes that young adolescents lack 
proficiency in this area, with only at best modest gains and much 
room for improvement after a full year of dense practice in 
argumentation.  


Goldstein, Crowell & Kuhn 388 

 Results with respect to production tell a different story. Here 
practice leads to significant gains (Kuhn & Crowell, in press). In 
September and again in May, the students reported on above 
engaged in one-on-one dialogic argumentation on the topic of 
capital punishment, a topic the students had not discussed during 
the intervention. Students were assigned to Pro/Con sides based on 
their responses to an opinion poll, and each student argued against 
another student who held an opposing view.  
 Figure 2 shows students’ performance across time on the 
capital punishment dialogic assessment, compared to a control 
group of the same grade and from the same school who participated 
in a parallel whole-class discussion class revolving around similar 
topics. Figure 2 shows mean percentage of utterances that were 
successful Counter-Critiques of the opponent’s preceding 
utterance, in September and again in May. At the May posttest, 
35% of the intervention group students responded with Counter-
Critiques more than one third of the time, compared to 13% of 
control group students, a significant difference (Kuhn & Crowell, 
in press).  
 
 
Figure 2. Production skill in dialogic argumentation at pre-  
and post-tests, indexed by percentage Counter-Critiques. 
 
 
6. Interpreting the relation between evaluation and production 
of skilled counterarguments 
 
The production results reported above, as well as our prior 
intervention studies (Kuhn & Udell, 2003; Kuhn et al., 2008), show 
that performance assessments of middle school students’ dialogic 
argumentation improve following dense engagement in dyadic 
argumentation with peers. Our purpose in the work described here 
has been to develop a means to assess what we might expect to be a 
parallel skill—the recognition or appreciation of superior moves in 
dyadic argumentation. If so, we might then assess whether 


Skilled Argumentation & Development 389 

corresponding improvement in recognition occurs with time and 
engagement.  
 If anything, we might predict that recognition skills would not 
simply parallel production skills but would in fact precede them, a 
pattern that has been noted in a broad range of domains from early 
language development to moral reasoning development: An 
individual first recognizes and can appreciate (and hence at least 
implicitly evaluate) a higher form prior to being able to 
independently produce it. 
 To the contrary, however, while the middle school students 
reported here on average show themselves capable of learning to 
produce the higher-level, Counter-Critique counterarguments, they 
do not show such improvement in learning to recognize the 
superiority of direct Counter-Critique when given a choice between 
it and a less advanced argument strategy. How should this pattern 
of performance be interpreted? This is the question we turn our 
attention to in the remainder of this article.  
 
 
7. Why is argument evaluation challenging? 
 
As a first step toward addressing why evaluation skills appear to be 
difficult to develop, we return to the simpler case in which 
individuals are asked to evaluate the strength of a simple (non-
dialogic) argument in support of a claim. What challenges does this 
task present? In our own earlier work addressed to this question 
(Kuhn & Felton, 2000; Kuhn, 2001), we asked 8th graders and 
adults to choose the stronger of two arguments in support of a 
claim. One provided a theoretical explanation that made the claim 
plausible, whereas the other provided empirical evidence that the 
claim was true. Earlier work of our own and by others has indicated 
a preference for explanation over evidence as justification for a 
claim (Brem & Rips, 2000; Kuhn, 2001). The following is an 
example of the choice that our items presented: 
 
Why do teenagers start smoking? Which is the stronger argument?  
 
A. Smith says it’s because they see ads that make smoking look 
attractive. A good-looking guy in neat clothes with a cigarette in 
his mouth is someone you would like to be like. 
 
B. Jones says it’s because they see ads that make smoking look 
attractive. When cigarette ads were banned from TV, smoking went 
down. 
 
After choosing A or B as the stronger argument and providing a 
justification, respondents were also asked explicitly to indicate not 


Goldstein, Crowell & Kuhn 390 

only the strengths of the argument they chose and the weaknesses 
of the other argument, but also whether the chosen argument had 
any weaknesses and the non-chosen argument any strengths.  
 A very common response, especially among the 8th grade 
respondents, was to cite nonepistemic, rather than epistemic, 
criteria as justifications for their choices and evaluations. Epistemic 
criteria apply to any argument of a given form. Nonepistemic 
criteria apply only to an argument of specific content. 
Nonepistemic justifications thus most often address the correctness 
of the claim (“This is a good argument because what it’s saying is 
true”), rather than the quality of the argument supporting the claim. 
Less often did 8th graders (as well as a number of adults) invoke 
epistemic criteria, citing the epistemic strength of explanation (e.g., 
“It gives a reason”), or the epistemic strength of evidence (e.g., 
“It’s something that really happened”). Even less often did teens or 
adults mention the epistemic weakness of explanation (e.g., “It’s 
only a theory”) or the epistemic weakness of evidence (e.g., “It 
doesn’t say why”). 
 These findings indicate that adolescents and even some adults 
find it difficult to think about the formal characteristics of an 
argument. In contrast, if the content and meaning of the claims 
involved are familiar, they find it easy to evaluate this content, at 
least with respect to their agreement or disagreement with it. 
 In this case, note, the respondent’s own perspective becomes 
one with one or the other of the two argument proponents (Smith 
and Jones) whose arguments are being compared. The respondent 
“agrees” with what Smith (or Jones) is saying and adopts the 
argument as his or her own. To invoke an epistemic criterion, in 
contrast, the respondents must adopt a distanced, “third-party” 
stance in order to evaluate Smith’s (or Jones’) argument and its 
standing in relation to the claim at hand. Hence a meta-level 
(metacognitive) stance is required, to enable reasoning about 
reasoning. 
 
 
8. Non-dialogic vs. dialogic argument evaluation 
 
If we extend this analysis to the case of dialogic argumentation, the 
context of primary interest to us here, we can see that similar 
considerations apply. In the argumentation evaluation instrument 
described earlier, the respondent is asked not simply to evaluate the 
two choices on their own merit as arguments, but rather to evaluate 
them in relation to (and specifically as counterarguments to) the 
preceding argument. Although a seemingly straightforward task on 
the surface, the cognitive demands it poses are in fact considerable. 
The respondent likely holds views of agreement or disagreement 
with the content of the initial argument as well as each of the 


Skilled Argumentation & Development 391 

proposed counterarguments. Each of these stances must be in effect 
bracketed—held in abeyance—to allow execution of the task: the 
evaluation of each of the proposed counterarguments with regard to 
its relation to the initial argument. As in the simpler non-dialogic 
case, a meta-level stance is required, but now in an even more 
complex form given the number of propositions involved.  
 Following administration of the dialogic argument evaluation 
instrument to the middle school students described earlier, we 
interviewed a number of them to gain insight into the thinking 
underlying their choices. The following response from a 7th grader 
is typical of what we heard. At the choice point being discussed, 
her task is to choose between A and B as the stronger 
counterargument to the argument preceding it. 
 
Prisoners who get released from jail return to a life of crime 
because they can’t find a good job. 
 
A. Jails offer training to prisoners so that they can find work when 
they get out. 
 
B. When prisoners get out of jail, their old friends often pressure 
them to return to crime. 
 
This student chose B as the stronger counterargument and when 
asked to explain why, she said the following: 
 

Because I have family members; my dad’s best friend has 
been in jail twice, so I know from personal experience. 
They’re accustomed to crime so there’s pressure and then 
they’re used to it so they continue to commit crimes. 

 
Clearly, she has ignored the initial argument and responded based 
only on her agreement with the content of the second 
counterargument. She is endorsing a particular causal explanation 
of a particular phenomenon, but she is not engaged in the 
evaluation of arguments or argumentation. 
 In this example, we in fact have no evidence of the student’s 
processing of the first alternative. We don’t know if she agrees or 
disagrees with it or if she has compared it to the second alternative. 
 In another example, a student does explicitly address both 
alternatives: 
 
Students fail in school because they don’t try hard enough to do 
well on tests. 
 
A. No matter how hard students work, some just aren’t good test-
takers. 


Goldstein, Crowell & Kuhn 392 

 
B. Some students act out in class instead of paying attention to the 
teacher. 
 
This student chose B as the stronger counterargument and gave the 
following explanation: 
 

Because I don’t think it’s true. There is no such thing as 
good and bad test takers. But some students do play 
around and don’t listen and just give up. 

 
As in the first example, however, this student gives no indication of 
considering either alternative in its relation to the initial argument. 
 There is much evidence in the cognitive development literature 
of children and adolescents having difficulty reasoning about the 
formal relations among propositions independent of their truth 
value, in the case of both deductive and inductive reasoning (Kuhn 
& Franklin, 2006; Klaczynski, 2004; Moshman, 2005). In this 
light, the present indications of difficulty in argumentation 
evaluation should not be a great surprise.  
 A further contributor to this challenge may possibly lie in 
children’s and adolescents’ conceptions of causality, in particular 
the concept of multiple causes contributing to a common outcome. 
Students of middle-school age have considerable difficulty in 
coordinating the effects of multiple variables on an outcome (Kuhn, 
2007; Kuhn, Pease, & Wirkala, 2009). They are susceptible to what 
has come to be known as discounting (Sedlak & Kurtz, 1981): 
identifying one cause of an outcome makes other causes less likely. 
To see how this tendency may figure in argumentation evaluation, 
consider the above example. Clearly, there exist multiple reasons 
that students fail in school. The “act out in class” cause should not 
foreclose the possibility, or even reduce the likelihood, of other 
causes. To the extent one does not think in terms of multiple 
causes, however, endorsement of one cause (as the student in the 
example does by choosing counterargument B) may do exactly 
this—reduce the perceived likelihood of other causes being 
relevant. Accordingly, they become less worthy of contemplation 
and of argument. 
 
 
9. Conclusion: Reconciling production and evaluation 
 
In conclusion, we return to our earlier question. After an extended 
and dense intervention affording engagement and practice in 
dialogic argumentation, young adolescents show significant 
improvement in their ability to address an opposing peer’s 
arguments with cogent counterarguments that serve to weaken their 


Skilled Argumentation & Development 393 

force. They do not, however, show much improvement in 
recognizing the stronger of two counterarguments with respect to 
their power to weaken a claim. We have suggested some of the 
cognitive challenges the latter poses. Confirming their role and 
more precise nature clearly requires further exploration. We believe 
such investigation is warranted, given the fundamental significance 
of argument and argumentation in everyday thinking, as well as its 
importance to the intellectual development of students in academic 
contexts, especially those who go on to advanced levels of 
education.  
 The qualification is in order that in the work described here we 
did not undertake to measure and compare production and 
comprehension skills on comparable instruments, equated for 
content and difficulty in all other respects. The conclusion we wish 
to draw is simply that an intervention focused on argumentation 
that produced significant advances in production skill did not yield 
corresponding advances in evaluation skill. This finding suggests a 
possibility that initially may seem counterintuitive; recognizing the 
relative strength of counterarguments may not precede the ability to 
produce them when engaged in one’s own authentic argumentation. 
In the latter case, the meta-level demands are lesser. The other’s 
argument opposes one’s own, and the counterargument to be 
produced is one that coincides with one’s own views. Furthermore, 
the motive to weaken the force of the other’s argument to the 
greatest extent possible is clear, consistent, and strong. No “third-
party” stance-taking is required. All of these factors help to make 
the task more tractable.  
 Moreover, and finally, the dialogic production task may serve 
as an effective bridge to the development of the recognition and 
evaluation skill that the students we have described seem to have 
found the most challenging. With continued engagement and 
practice in dialogic argument across multiple topics and opponents, 
the meta-level demands of evaluation may become less 
challenging. Alternatively, they may need to be met by activities 
that engage them directly. These are the alternatives we hope to 
assess in continuing work. 
 
 
References: 
 
Billig, M. (1987). Arguing and thinking: A rhetorical approach to 

social psychology. Cambridge: Cambridge University Press. 
Brem, S., & Rips, L. (2000). Explanation and evidence in informal 

argument. Cognitive Science, 24, 573-604. 
Clark, D., & Sampson, V. (2008). Assessing dialogic argumenta-

tion in online environments to relate structure, grounds, and 


Goldstein, Crowell & Kuhn 394 

conceptual quality. Journal of Research in Science Teaching, 
45(3), 293-321.  

Clark, D., Sampson, V., Weinberger, A, & Erkens, G. (2007). 
Analytic frameworks for assessing dialogic argumentation in 
online learning environments. Educational Psychology Review, 
19(3), 343-374.  

Duschl, R. (2008). Science education in three-part harmony: 
Balancing conceptual, epistemic, and social learning goals. 
Review of Research in Education, 32, 268–291.  

Erduran, S., Simon, S., & Osborne, J. (2004). TAPing into 
argumentation: Developments in the application of Toulmin’s 
argument pattern for studying science discourse. Science 
Education, 88, 915-933. 

Graff, G. (2003). Clueless in academe: How schooling obscures 
the life of the mind. New Haven: Yale University Press. 

Jiménez-Aleixandre, M.P., & Erduran, S. (2008). Argumentation in 
science education: An overview. In Erduran, S., & Jiménez-
Aleixandre (Eds.). Argumentation in science education: 
Perspectives from classroom-based research, (pp. 137-157). 
Dordrecht: Springer.  

Klaczynski, P. (2004). A dual-process model of adolescent 
development: Implications for decision making, reasoning, and 
identity. In R. Kail (Ed.), Advances in child development and 
behavior (Vol. 31). San Diego: Academic Press. (pp. 73-123). 

Kuhn, D. (2001). How do people know? Psychological Science, 12, 
1-8. 

Kuhn, D. (2007). Reasoning about multiple variables: Control of 
variables is not the only challenge. Science Education, 91, 
710-726. 

Kuhn, D. (in press). Reasoning. In P. Zelazo (Ed.) Oxford 
Handbook of Developmental Psychology. New York: Oxford 
University Press. 

Kuhn, D., & Crowell, A. (in press). What are the cognitive skills 
adolescents need for life in the 21st century? In J. Smetana & 
E. Amsel (Eds.), Adolescence: Prospects and possibilities. 
Taylor & Francis. 

Kuhn, D., & Felton, M. (2000, January). Developing appreciation 
of the relevance of evidence to argument. Paper presented at 
the Winter Conference on Discourse. Text, and Cognition, 
Jackson Hole, WY. 

Kuhn, D. & Franklin, S. (2006). The second decade: What 
develops (and how)? In D. Kuhn & R. Siegler (Eds.), (W. 
Damon & R. Lerner, Series eds.), Handbook of Child 
Psychology: Vol. 2. Cognition, Perception, and Language. (6th 
edition). Hoboken NJ: Wiley. 

Kuhn, D., Goh, W., Iordanou, K., & Shaenfield, D. (2008). 
Arguing on the computer: A microgenetic study of developing 


Skilled Argumentation & Development 395 

argument skills in a computer-supported environment. Child 
Development, 79 (5), 1310-1328. 

Kuhn, D., Pease, M., & Wirkala, C. (2009). Coordinating effects of 
multiple variables: A skill fundamental to causal and scientific 
reasoning. Journal of Experimental Child Psychology, 103, 
268-284. 

Kuhn, D., & Udell,W. (2003). The development of argument skills. 
Child Development, 74(5), 1245–1260. 

Kuhn, D., & Udell, W. (2007). Coordinating own and other 
perspectives in argument. Thinking and Reasoning, 13, 90–
104. 

Larson, A.A., Britt, M.A., & Kurby, C. (in press). Improving 
students’ evaluation of informal arguments. Journal of 
Experimental Education. 

Leitao, S. (2000). The potential of argument in knowledge 
building. Human Development (43), 332-360 

Leitão, S. (2003). Evaluating and selecting counterarguments. 
Written Communication, 20, 269-306. 

Moshman, D. (2005). Adolescent psychological development: 
Rationality, morality; and identity (2nd ed.). Mahwah NJ: 
Erlbaum.  

Oaksford, M., Chater, N., & Hahn, U. (2008). Human reasoning 
and argumentation: The probabilistic approach. In J. Adler & 
L. Rips (Eds.), Reasoning: Studies of human inference and its 
foundations. New York: Cambridge University Press. 

Sedlak, A., & Kurtz, S. (1981). A review of children's use of causal 
inference principles. Child Development, 52, 759-784. 

Toulmin, S.E. (1958). The uses of argument. Cambridge: Univer-
sity Press. 

Walton, D.N. (1989). Dialogue theory for critical thinking. 
Argumentation, 3,169-184. 

Walton, D.N. (1996). Argumentation schemes for presumptive 
reasoning. Hillsdale, NJ: Lawrence Erlbaum. 

 
	 We turn now to our focus here, analyses of argumentation as a dialogic process in which two or more individuals engage. Several researchers have studied this dialogic process and undertaken to identify its characteristics, some at the more macro level of an entire dialog or dialogic sequence and others, like our own, at the micro level of individual utterances. Leitäo (2000, 2003), for example, analyzes dialogic sequences and posits that the most successful argumentive interactions adhere to a specific pattern involving a claim, a responsive counter-claim, and an integrative reply that incorporates the previous ideas. Strong arguments, according to this framework, build on the ideas of participants and, over time, differences in perspectives are negotiated and resolved. Erduran et al. (2004) classify conversational turns into a hierarchical category system ranging from simple exchange of claims to single and multiple rebuttals. Clark and Sampson (2008) similarly propose a category system that includes analysis of discourse moves as well as conceptual quality of arguments. 
	3. A functional coding scheme for dyadic argumentation
	4. Assessing skill in evaluation of dialogic argument
	5. Initial findings