sancar-tokmak.pdf


Australasian Journal of
Educational Technology

2012, 28(8), 1283-1297

Differences in the educational software evaluation
process for experts and novice students

Hatice Sancar Tokmak, Lutfi Incikabi and Tugba Yanpar Yelken
Mersin University

This comparative case study investigated the educational software evaluation
processes of both experts and novices in conjunction with a software evaluation
checklist. Twenty novice elementary education students, divided into groups of five,
and three experts participated. Each novice group and the three experts evaluated
educational software selected by the novice groups in accordance with the checklist.
Data were collected through focus group interviews, classroom observations and
document analysis. Evaluation processes were analysed through thematic
comparisons. The results showed that the expert-novice agreement rate was as low as
48%, with novice students tending to grade the software higher. While the experts
used a systematic approach, including understanding and assessing each criterion,
supporting the process with literature and evaluating the software as a whole, novice
students lacked such methods, indicating a need for additional training and
development.

Introduction

Many expert-novice comparison studies have been conducted in the teaching field. For
example, Chi, Feltovich and Glaser (1981) investigated novice and expert
representations of physics problems related to knowledge organisation and
experimental research design. Stylianou and Silver (2004) compared expert
mathematicians’ and undergraduate students’ potential and actual use of visual
representations in problem solving, by investigating their perceptions and analysing
their methods. Similarly, Stepich and Ertmer (2009) conducted three related studies
investigating novice-expert instructional designers’ strategies for problem-finding,
stating that the results of these studies can be used by educators to hasten the
development of problem-finding expertise in their students. According to Fadde
(2009), in the novice-expert studies, the important step is to identify “who is an
expert?” Moreover, he advocates that “An obvious measure of expert status is amount
of experience” (Fadde, 2009, p. 177). In the current study, which investigated the
educational software evaluation processes of both experts and novices in conjunction
with a software evaluation checklist developed by Heinich, Molenda, Russell and
Smaldino (2002), the experts and novices were identified according to amount of
experience.

Technology has become integral to instruction since the introduction of computers in
the classroom. Kurz, Middleton and Yanik (2005) mention the promise of effective
technology integration in school mathematics. Moreover, Virvou, Katsionis and Manos
(2005) explain that students need to be motivated to learn during instruction, and that
educational software can be both interesting and stimulating. Computer programs can


1284 Australasian Journal of Educational Technology, 2012, 28(8)

fulfill a variety of functions in the teaching and learning process, such as drill and
practice, tutorials, instructional games, simulations, spreadsheets, word processing,
database management and computer programming (De Corte, Verschaffel & Lowyck,
1994). However, as Niederhauser and Stoddart (2001) state, “Computer technology, in
and of itself, does not embody a single pedagogical orientation” (p. 15). Well-designed
review software offers technology while assuring predictable outcomes (Hooper &
Hokanson, 2000, as cited in Kurz et al., 2005).

Yanpar Yelken (2011) further points out that selection of educational software is an
important part of instruction. Similarly, Heinich et al. (2002) and Smaldino, Lowther
and Russell (2008) explain the importance of media selection as part of a specific
model, ASSURE: (a) analyse the learner, (b) state objectives, (c) select media, (d) utilise
media, (e) require learner participation and (f) evaluate and revise. Many scholars,
such as Heinich et al. (2002), De Villiers (2004), Niederhauser and Stoddart (2001),
Cennamo, Ross and Ertmer (2010) and Squires and Preece (1996), provide checklists,
evaluation forms and surveys to help educators select appropriate software for
instructional purposes. To successfully integrate technology into their teaching, pre-
service and novice teachers must gain expertise in the educational software selection
process (Yanpar Yelken, 2011).

Practising is one of the important parts of the gaining expertise and transferring what
is learned to a real context, like an expert (Sancar Tokmak & Karakus, 2011). Stepich
and Ertmer (2009) criticise educational programs by stating that students often gain
conceptual knowledge from educational programs, but do not apply that knowledge
to problem-solving during practice. Further, Baumgartner and Payr (1996) advocate
that the learner's goal should always be to become an expert in a field. Expertise based
training (XBT) can be an alternative for instructors who aim to enable novice learners
to gain expertise in a field and transfer knowledge to the job, since the focus of XBT, an
instructional design theory, is to create strategies based on expert-novice studies to
help learners develop advanced skills (Fadde, 2009). Chi, Glaser and Rees (1982)
mention a relevant study: Chase and Simon investigated novice and expert chess
strategies to examine the effects of knowledge on complex skill performance.
Livingston and Borko (1990) emphasise the importance of expert-novice comparison
research: “Research comparing how expert and novice mathematics teachers construct
different types of lessons, manage content, and interact with their students may
suggest ways to assist prospective and beginning mathematics teachers” (p. 372). Since
expert-novices studies not only present the differences between two groups, but also
includes clues to the key parts of expertise, novices can always get lessons from expert-
novice studies to behave more like an expert.

Based on this literature review, the current novice-expert comparison study was
conducted as a first step in enhancing students’ expertise in educational software
selection and evaluation. It was specifically designed to (a) compare the ways experts
and novices evaluate educational software using the Educational Software Evaluation
Checklist developed by Heinich et al. (2002) and (b) examine the implications of these
variations in instructional practice.

Research questions
Two primary research questions drove this comparative case study, which
investigated novice and expert selection processes for educational mathematics
software:


Sancar Tokmak, Incikabi and Yanpar Yelken 1285

1. How do novices and experts use the evaluation checklist to select educational
software?

2. How compatible are the evaluations of novices and experts in terms of
a. the degree of consistency in scores; and
b. the percentage distribution of criteria in the evaluation checklist?

Method

This comparative case study used thematic comparisons. The researchers used themes
and criteria from Heinich et al.'s (2002) Educational Software Evaluation Checklist to
compare novice and expert software evaluation processes. Comparing two cases, in
accordance with comparative case-study design, could help define the gap between
novices’ and experts’ evaluation processes. The current study employed both
qualitative and quantitative approaches. More specifically, data collection in the
current study consisted of focus group interviews, classroom observations and
document analysis.

Sampling

The researchers followed a specific sampling procedure with regard to research
design, choosing the sampling purposively with regard to qualitative research, as
prescribed by Patton (1990) and Miles and Huberman (1994). The participants of this
study were both novices and experts; novices were defined as students who had not
previously evaluated educational software. A total of 25 novice students enrolled in
the course Instructional Technology and Material Design volunteered to participate and
were placed in groups to evaluate educational software. One group (n=5 students)
selected instructional chess software, while the other four groups (n=20 students)
selected educational mathematics software. The group (n=5 students) that selected
chess-education software was not taken into account, since the researchers applied a
homogenous sampling strategy as a type of purposive sampling. Thus, the sampling of
the study consisted of 20 students who selected educational mathematics software.
Once the novices had selected their software, three experts were purposively selected,
based solely on their fields of expertise: computer education and instructional
technology, mathematics education and educational sciences, respectively.

Participants' background information

Data were collected from all participants. Before the sampling procedure, the novices
had received a demographics questionnaire with questions about age, gender,
department, enrolled courses, class level, GPA, home computer usage and coursework
and other activities requiring computer usage. Twenty second-year college students (9
males, 11 females) from the night program in elementary education participated in the
study as a part of the subject Instructional Technology and Material Design. Their ages
ranged from 19 to 25, with an average of 21 (SD = 1.9). Their average GPA was 2.82
(out of 4.0) with an SD = .1. All had previously taken the subjects Introduction to
Education Sciences, Psychology of Education, and Teaching Technique and Methods. Except
for one student who did not have a computer, all students reported using computers
actively, with experience ranging from 2 to 12 (average 6.3). Students had previously
been required to take three subjects that necessitated using computers, during which
they learned to use MS Office programs such as PowerPoint.


1286 Australasian Journal of Educational Technology, 2012, 28(8)

After the novices conducted their evaluations, an expert panel was chosen based on
the topic of this study and the content of the software selected. All experts held
doctoral degrees and had teaching experience. The expert from the field of computer
education and instructional technology had been studying the development and
usability of educational software for six years; the expert in educational sciences, for 14
years. The mathematics education expert focused on curriculum development and
educational software use and evaluation.

Instrumentation

Six instruments were used to collect data in the current study, beginning with the
demographics questionnaire described above, which was checked by a professor
before being given out.

The second instrument was Heinich et al.’s (2002) Educational Software Evaluation
Checklist. The checklist has three main parts: software description, rating and open-
ended questions. Software description includes the title, serial title, keywords, format,
source, date, cost, length, subject area, intended audience, brief description, objectives
and entry capabilities required. The rating section has 11 points for evaluation: match
with curriculum, accurate and current, clear and concise language, arouse
motivation/maintain interest, learner participation, technical quality, evidence of
effectiveness, free from objectionable bias, user guidance/documentation, clear
directions, stimulates creativity and design principles, the last of which was added for
course relevancy. The third part asks open-ended questions about the software's
strengths and weaknesses, and asks respondents for recommendations for the
software.

The third instrument was an observation form based on Heinich et al.’s (2002)
checklist. It consisted of two parts: the criteria in the checklist and researcher notes on
group comments made about the criteria during evaluation presentations.

The fourth instrument was the expert’s Educational Software Evaluation Checklist, which
included detailed explanations for each criterion in Heinich et al.’s (2002) checklist (see
Appendix). The definitions were compiled using De Villiers (2004), Niederhauser and
Stoddart (2001), Cennamo, Ross and Ertmer (2010) and Squires and Preece (1996). The
three experts evaluated the selected educational software separately. During the
evaluation process, each expert defined detailed criteria relevant to each main criterion
in Heinich et al.’s (2002) checklist. The detailed criteria were analysed through inter-
rater reliability, then discussed, and a unique criterion table was defined.

The fifth instrument was a focus group interview form prepared by the researchers
and controlled by each expert professor. The form consisted of three questions:

1. How did group members evaluate the software (individually or together)?
2. To what extent did you benefit from the evaluation checklist?
3. How did you decide to grade the educational software as high, medium, or low?

The last instrument was an expert diary, in which the experts were requested to define
their own evaluation process based on the evaluation checklist.


Sancar Tokmak, Incikabi and Yanpar Yelken 1287

Procedure

This study followed several steps. First, the participants completed a demographic
data questionnaire. Second, under the scope of the Instructional Technology and Material
Development course, novice students discussed key issues in evaluating educational
software. Third, the novices, in self-determined groups, selected and evaluated
educational software using Heinich et al.’s (2002) checklist, presenting their findings in
class. In the next step, the three experts evaluated the selected software using the same
checklist. The final step was the analysis of the codes for both experts and novices.

Data analysis

The data analysis was descriptive, focusing on comparing how participants used the
evaluation checklist, on the effects of experts’ practical experiences on their evaluation
processes, and on the compatibility between the experts’ and novices’ scores.

Open coding analysis was applied to observation notes from group presentations and
the subsequent focus group interviews. Experts wrote diaries on their evaluation
processes; these were similarly coded. Two researchers coded the data; according to
Miles and Huberman’s (1994) formula, interview intercoder reliability on the themes
was 75%. After recognising and clarifying a disagreement on the name of one theme,
intercoder reliability reached 100%.

Researchers organised the data collected from the experts into common themes, as
outlined by Patton (1990). Moreover, as Ayres, Kavanaugh and Knafl (2003)
recommend, each theme was described in detail, noting variations across themes.
Then, the relationships between these themes were defined. This process was then
applied to the novice groups’ responses and observation notes. Lastly, expert and
novice responses were compared to find similarities and differences in the educational
software evaluation process with regard to the defined themes. Separate comparisons
were first conducted within expert and novice groups, then expanded across both
groups.

The software evaluation checklist results were compared descriptively. The experts
evaluated the educational software that novices selected using Heinich et al.’s (2002)
checklist separately, defining each criterion in detail. The coders' agreement rate on the
criteria was 84%. After discussions on disputed items and corrections of
miscalculations, coder agreement rate increased to 100%.

Validity issues

According to Maxwell (1996), validity is the goal of research studies, rather than a
product, and a qualitative researcher should rely on common sense for credible
conclusions. Research can be interpreted in many different ways, and the researcher
should explain how he or she ruled out alternatives (Maxwell, 1996). This study's
validity threats and coping strategies are given below:

• Sampling selection plays an important part in the validity of a study, as it can affect
results. Expert and novice participants were identified based on their levels of
knowledge of educational software evaluation. Before the study, a demographics
questionnaire was presented to the novice students and background information


1288 Australasian Journal of Educational Technology, 2012, 28(8)

was collected about the experts’ fields. There were no validity threats to the
sampling selection in the current study.

• Instrumentation is another key piece of study validity, according to Wallen and
Fraenkel (2001): a researcher cannot automatically assume instruments are valid. In
the current study, all the developed instruments were verified by an expert before
application.

• Data collection was triangulated by collecting data through focus group interviews,
observations and document analysis.

• Peer debriefing was applied during data analysis. Two researchers analysed the
data, and intercoder reliability was controlled via Miles and Huberman’s (1994)
formula.

• Data interpretation was discussed by the researchers. Moreover, for external
conformability, an expert reviewed the study results.

Ethical issues

The researchers considered how the study could harm participants. The experts, who
were from different fields, faced no potential harm. However, since the data were
collected under the scope of a course where the instructor's attitude might have been
affected by negative comments, all students who voluntarily participated in focus
group interviews and demographic data collection were assigned pseudonyms. Since
the software evaluation and presentation was a graded part of the course, the
researchers acquired permission from participants before using the data.

Results

This comparative case study focused on novice and expert processes of evaluating
educational software in the mathematics field using Heinich et al.’s (2002) Educational
Software Evaluation Checklist. The subheadings below separate results according to the
research questions of the study.

Differences in the use of the checklist by novices and experts

During the process of evaluating the software based on Heinich et al.’s Educational
Software Evaluation Checklist (2002), experts and novices presented several differences.
As a result of data analysis of observation notes taken during the novices’
presentations, focus group interviews with novices and experts’ diaries, four categories
were created: focusing on the meaning of each criterion; searching for related
literature; handling criteria; and grading policy.

At the beginning of the evaluation process, experts displayed detailed assessments of
the meaning of each criterion. They aimed to construct a table for each criterion based
on studies of software evaluation processes. One of the experts stated:

When I received the software evaluation checklist, I firstly thought about the meaning
of each main criterion in the checklist. Then, I defined the detailed criteria in each
main criterion in the list. I thought about each criterion separately.

The novice students did not indicate any complex efforts to understand the meaning of
each criterion, leading to some misconceptions. For example, with respect to the
criterion "stimulates creativity", novice students felt that creativity could only be
stimulated by interesting and motivating activities such as games. However, the


Sancar Tokmak, Incikabi and Yanpar Yelken 1289

criterion was actually related to providing opportunities for learner discovery and
problem-solving. The same situation occurred with "evidence of effectiveness":
although this criterion should be evaluated via students’ test results, all groups
measured it based on games, colouring, writing activities or stories in the software.
During focus group interviews, no novice student said anything explicit about
contemplating criterion meaning.

Another difference between expert and novice methods involved the consideration of
literature and curriculum. For the "match with curriculum" criterion, the students only
weighed whether the topics corresponded with the official curriculum, but ignored
how the topic was treated in terms of curricular goals and skills to be acquired by
learners. On the other hand, experts investigated the curriculum first, and only then
evaluated whether the software was suitable for each curricular goal.

The two groups’ methods also varied in terms of handling criteria. The experts defined
detailed criteria and created a primary checklist, evaluating the software from many
sides. However, the novices identified only one main property. For example, the
inclusion of a game earned a high grade for "arouse motivation/maintain interest",
without an assessment of variety and quality. For "student participation", interaction
through answering questions was marked as high by the novices, while the experts
checked the software in terms of learner control, interactivity, maintenance of curiosity
and confidence, effectiveness of feedback and means of participation.

The grading policy presented a fourth difference between the experts and novices. The
experts’ diaries showed that they created a detailed scale based on Heinich et al.’s
(2002) checklist, incorporating the same measures for each criterion to increase
consistency and reliability. They assigned each criterion a positive or negative sign
with respect to software properties. In evaluating design principles, for example, the
experts defined 12 detailed sub-points that they evaluated for each software program
as negative, positive, or neutral (see Appendix). To categorise "design principles" as
high, medium, or low, the experts counted the numbers of negative, positive and
middle signs for the detailed criteria. One expert explained her grading policy:

While grading the software, I defined detailed criteria for each main criterion in
Heinich et al.’s (2002) checklist. I checked whether it had the necessary properties with
respect to the each detailed criterion. Then I marked each criterion as positive or
negative with respect to the software properties. If the negative signs were 25% or less
I signed the main criterion as positive; if the positive signs were 25% or less I signed
the main criterion as negative. If the negative and positive signs were nearly equal, I
signed the main criterion as neutral.

However, observation notes showed that the novices did not use any rubric during the
process. They graded each criterion as high, medium or low, but they did not provide
details about specific properties in their evaluation checklist or during their
presentations. Moreover, in focus group interviews, all novice groups stated that they
evaluated the software according to general impressions. For example, while
evaluating design principles, the novice groups focused on “consistency of the colours,
texts or objects” and “use of contrast colours between letters/ pictures/ elements and
background in the software”. They ignored important criteria such as “avoiding
excessive use of highlighting techniques on letters, pictures, animation”, “avoiding
excessive use of texts, pictures, animations, sounds, etc.”, “appropriateness of
computer-screen layout and ease of reading/ following”, “appropriate use of margins
for consistency” and “indication of completed sections of the software”.


1290 Australasian Journal of Educational Technology, 2012, 28(8)

It is important to emphasise the differences between experts' and novices' definitions
of two more elements of the evaluation checklist: cost-effectiveness and target
audience. When judging cost-effectiveness, three out of four novice groups presented a
general misunderstanding that cost-effectiveness is based solely on literal cost: if
something is free, it automatically qualifies as cost-effective. They tended to ignore the
fact that effectiveness must also measure the degree of fulfilment of targeted goals.
Moreover, three out of four novice groups presented a poor match between the content
of the software and appropriate target audience, as they were not aware of the benefits
of considering the literature.

Compatibility between novices’ and experts’ scores using the checklist

Novice and expert scores on Heinich et al.’s (2002) checklist were analysed
descriptively to assess score compatibility in terms of the degree of consistency and
percentage distribution of standards. Table 1 gives the agreement rates of scores based
on the software evaluation checklist. The general agreement rate for all software was
48%. For "clear and concise language", participants showed 100% agreement, whereas
they presented no agreement on "stimulates creativity".

Table 1: Novice-expert agreement rates on software evaluation
Criteria Agreement Disagreement Agreement rate (%)

Match with curriculum 1 3 25
Accurate and current 3 1 75
Clear and concise language 4 0 100
Arouse motivation/maintain interest 2 2 50
Learner participation 1 3 25
Technical quality 3 1 75
Evidence of effectiveness 1 3 25
Free from objectionable bias 3 1 75
User guidance/documentation 3 1 75
Clear directions 1 3 25
Stimulates creativity 0 4 0
Design principles 1 3 25
Total 23 25 48

As shown in Table 1, the compatibility rate between novice and expert scores in terms
of consistency was high for five criteria: "accurate and current", "clear and concise
language", "technical quality", "free from objectionable bias" and "user guidance/
documentation". For agreed items, respondents rated 74% of criteria as "high", whereas
they rated  22% as "medium" and 4% as "low".

Table 2 gives the distribution of scores for four educational software programs, as
indicated by the evaluation checklist. In general, students marked the software as
highly compatible with each criterion, rating 75% of the criteria as high, 23% as
medium and 2% as low. For all software, students marked "match with curriculum",
"accurate and current" and "free from objectionable bias" as high. Only one group of
students ranked one software product as low, because it did not provide any guidance
or a manual. On the other hand, experts did not put any software on a single level.
They differed from the novice students by giving more low marks under "match with
curriculum" and "evidence of effectiveness". Therefore, students presented weaknesses
in knowing the content and skills presented in the related curriculum, and did not
demonstrate a clear understanding of the criteria in the evaluation checklist.


Sancar Tokmak, Incikabi and Yanpar Yelken 1291

Table 2: Distribution of novice-expert evaluation scores based on levels
Novice ExpertCriterion High Medium Low High Medium Low

Match with curriculum 4 0 0 1 1 2
Accurate and current 4 0 0 3 1 0
Clear and concise language 3 1 0 3 1 0
Arouse motivation/maintain interest 4 0 0 2 2 0
Learner participation 3 1 0 1 3 0
Technical quality 3 2 0 1 3 0
Evidence of effectiveness 1 2 0 1 1 2
Free from objectionable bias 4 0 0 3 1 0
User guidance/documentation 2 1 1 0 2 2
Clear directions 3 1 0 2 1 1
Stimulates creativity 2 1 0 0 2 2
Design principles 2 2 0 3 1 0
Total 35 11 1 20 19 9
% 75 23 2 41 40 19

Although experts tended to rate criteria as "high" more often, at 41%, their rates were
more closely distributed across the levels (Figure 1). In general, 48% of novice-expert
scores were in agreement. For the novice students, 46% were higher, whereas 6% were
lower. Clearly, novice students tended to grade more highly than experts.

0

10

20

30

40

50

60

70

80

low medium high

%

novice
expert

Figure 1: Percent distribution of novice and expert scoring

Discussion and conclusion

Among the main findings of the study was that while using Heinich et al.’s (2002)
checklist, novice students presented weaknesses in understanding the meaning of each
main criterion, supporting the process with related literature, handling each criterion
and grading. The experts tended to weigh the meaning of each criterion and search the
literature for further understanding, whereas novices simply discussed the criteria and
started to assess. Similar results were found by Swanson, O'Connor and Cooney (1990)
in a study that aimed to investigate differences between expert and novice teachers in
solving classroom discipline problems. The expert teachers identified priorities while


1292 Australasian Journal of Educational Technology, 2012, 28(8)

Handling
criterion

defining the problem, but novice teachers tended to represent problems strictly in
terms of possible solutions (Swanson, O'Connor & Cooney, 1990). Moreover, in Stepich
and Ertmer's 2009 study, which investigated ill-structured problem-solving strategies
of novices and experts, the experts spent more time problem-finding and relating
details of the main issues. Similarly, in the current study, the experts defined detailed
standards for each main criterion. Figure 2 shows the differences between experts and
novices during the software evaluation process. Experts’ steps began with individuals
searching the literature to improve their understanding of the meaning of each
criterion, coming together to discuss and clarify their understanding, developing a
detailed criterion list to handle each main criterion and grading educational software
based on the number of detailed criteria they developed. On the other hand, novices’
process of software evaluation included first having a discussion about how to handle
each criterion, then grading the software according to their impressions.

Figure 2: Differences between experts and novices
during the software evaluation process

The expert-novice agreement rate was as low as 48%, and mostly based on highly rated
items: novices generally graded each criterion higher than the experts. Meyer (2004)
explains one of her study's findings about the difference between novice and expert
teachers’ conceptions of prior knowledge by pointing to “an apparent mismatch

Meaning of
each criterion

Search
related

literature

NovicesGrading according
to number of

criterion provided

Discussion

Grading
according to
impressions

  Experts


Sancar Tokmak, Incikabi and Yanpar Yelken 1293

between the novice teachers’ beliefs about their urban students’ life experiences and
prior knowledge and the wealth of knowledge the expert teachers found to draw
upon” (p. 970). In the current study, differences may have been caused by students
misinterpreting criteria, by the limited methods for evaluating each criterion, by
students’ sparse of knowledge of content and skills addressed in the curriculum and
by the lack of a common grading strategy.

The findings of the current study have many implications for understanding the gap
between novice and expert, with regard to software evaluation. Educators can fill this
gap and hasten the development of pre-service or new teachers by taking into account
the process differences identified in this study. As Stepich and Ertmer (2009) state,
expertise can be gained via practice. For that reason, more students should be given
the opportunity to be involved in software evaluation, selection and development.

This study will help future researchers understand how experts and novices act
differently during the process of educational software evaluation. In an earlier study,
Kozma and Russell (1997) investigated what could be done to bridge the gap between
student and expert thinking in chemistry curriculum, instruction and assessment. The
results of this study are also beneficial for researchers in the field of mathematics
education who want to investigate how to close the gap between novice and expert,
with regard to software evaluation. Additional research is needed to investigate how
the results of this study would differ if novices had access to the detailed criteria table
created by the experts and were provided with general grading guidelines.

References
Ayres, L., Kavanaugh, K. & Knafl, K. A. (2003). Within-case and across-case approaches to

qualitative data analysis. Qualitative Health Research, 13(6), 871-883.
http://dx.doi.org/10.1177/1049732303013006008

Baumgartner, P. & Payr, S. (1996). Learning as action: A social science approach to the
evaluation of interactive media. In P. Carslon & F. Makedon (Eds.), Proceedings of ED-MEDIA
96, World Conference on Educational Multimedia and Hypermedia, Charlottesville, VA, 31-37.

Cennamo, K., Ross, J. & Ertmer, P. (2010). Technology integration for meaningful classroom use: A
standards-based approach. Belmont, CA: Wadsworth, Cengage Learning.

Chi, M. T. H., Feltovich, P. J. & Glaser, R. (1981). Categorization and representation of physics
problems by experts and novices. Cognitive Science, 5(2), 121-152.
http://dx.doi.org/10.1207/s15516709cog0502_2

Chi, M. T. H., Glaser, R. & Rees, E. (1982). Expertise in problem solving. In R. J. Sternberg (Ed.),
Advances in the Psychology of Human Intelligence (Vol. 1). Hillsdale, NJ: Erlbaum.

Crossley, M. & Vulliamy, G. (1984). Case-study research methods and comparative education.
Comparative Education, 20(2), 193-207. http://www.jstor.org/stable/3098564

De Corte, E., Verschaffel, L. & Lowyck, J. (1994). Computers and learning. In T. Husen & T. N.
Postlethwaite (Eds.), The international encyclopedia of education (2nd ed.). Elsevier Science.

De Villiers, R. (2004). Usability evaluation of an e-learning tutorial: Criteria, questions and case
study. In G. Marsden, P. Kotze & A. Adesina (Eds.), Proceedings of the 2004 Annual Research
Conference of the South African Institute of Computer Scientists and Information Technologists on IT
Research in Developing Countries (pp. 284-291). http://dl.acm.org/citation.cfm?id=1035092


1294 Australasian Journal of Educational Technology, 2012, 28(8)

Fadde, P. J. (2009). Expertise-based training: Getting more learners over the bar in less time.
Technology, Instruction, Cognition and Learning, 7(2), 171-197. [verified 10 Sep 2012]
http://peterfadde.com/Research/xbttraining.pdf

Heinich, R., Molenda, M., Russell, J. D. & Smaldino, S. E. (2002). Instructional media and
technologies for learning (7th ed.). Upper Saddle River, NJ: Prentice-Hall.

Kurz, T. L., Middleton, J. A. & Yanik, H. B. (2005). A taxonomy of software for mathematics
instruction. Contemporary Issues in Technology and Teacher Education, 5(2), 123-137.
http://www.citejournal.org/articles/v5i2mathematics1.pdf

Kozma, R. & Russell, J. (1997). Multimedia and understanding: Expert and novice responses to
different representations of chemical phenomena. Journal of Research in Science Teaching, 34(9),
949-968. http://dx.doi.org/10.1002/(SICI)1098-2736(199711)34:9<949::AID-TEA7>3.0.CO;2-
U

Livingston, C. & Borko, H. (1990). High school mathematics review lessons: Expert-novice
distinctions. Journal for Research in Mathematics Education, 21(5), 372-387.
http://www.jstor.org/stable/749395

Maxwell, J. A. (1996). Qualitative research design: An integrative approach. Thousand Oaks, CA:
Sage.

Meyer, H. (2004). Novice and expert teachers’ conceptions of learners’ prior knowledge. Science
Education, 88(6), 970-983. http://dx.doi.org/10.1002/sce.20006

Miles, M. B. & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2nd ed.).
Thousand Oaks, CA: Sage.

Niederhauser, D. S. & Stoddart, T. (2001). Teachers' instructional perspectives and use of
educational software. Teaching and Teacher Education, 17(1), 15-31.
http://dx.doi.org/10.1016/S0742-051X(00)00036-6

Patton, M. Q. (1990). Qualitative research and evaluation methods (2nd ed.). Newbury Park, CA:
Sage.

Reiser, R. A. & Kegelmann, H. W. (1994). Evaluating instructional software: A review and
critique of current methods. Educational Technology Research and Development, 42(3), 63-69.
http://dx.doi.org/10.1007/BF02298095

Sancar Tokmak, H. & Karakus, T. (2011). ICT pre-service teachers’ opinions about the
contribution of initial teacher training to teaching practice. Contemporary Educational
Technology, 2(4), 319-332. http://www.cedtech.net/articles/24/245.pdf

Smaldino, S. E., Lowther, D. L. & Russell, J. D. (2008). Instructional technology and media for
learning (9th ed.). Upper Saddle River, NJ: Pearson Merrill.

Squires, D. & Preece, J. (1996). Usability and learning: Evaluating the potential of educational
software. Computers & Education, 27(1), 15-22. http://dx.doi.org/10.1016/0360-
1315(96)00010-3

Stepich, D. A. & Ertmer, P. A. (2009). Teaching instructional design expertise: Strategies to
support students’ problem-finding skills. Technology, Instruction, Cognition and Learning, 7,
147-170.


Sancar Tokmak, Incikabi and Yanpar Yelken 1295

Stylianou, D. A. & Silver, E. A. (2004). The role of visual representations in advanced
mathematical problem solving: An examination of expert-novice similarities and differences.
Mathematical Thinking and Learning, 6(4), 353-387. http://dx.doi.org/10.1207/s15327833mtl0604_1

Swanson, H. L., O'Connor, J. E. & Cooney, J. B. (1990). An information processing analysis of
expert and novice teachers' problem solving. American Educational Research Journal, 27(3), 533-
556. http://dx.doi.org/10.3102/00028312027003533

Virvou, M., Katsionis, G. & Manos, K. (2005). Combining software games with education:
Evaluation of its educational effectiveness. Educational Technology & Society, 8(2), 54-65.
http://www.ifets.info/journals/8_2/5.pdf

Wallen, N. E. & Fraenkel, J. R. (2001). Educational research: A guide to the process (2nd ed.).
Mahwah, NJ: Erlbaum.

Yanpar Yelken, T. (2011). Ogretim teknolojileri ve material tasarımı (2nd ed.). Ankara: Ani
Yayincilik.

Appendix: Experts’ educational software evaluation checklist

Criteria 1. Match withcurriculum
2. Accurate
and current

3. Clear and
concise language

4. Arouse motivation/
maintain interest

Detailed
criteria

a. Meeting the
curriculum goals

b. Appropriateness of
curriculum
philosophy
• Critical thinking
• Creative thinking
• Communication
• Decision-making
• Problem-solving
• Exploring
• Entrepren-

eurship

a. Accuracy of
definitions

b. Completeness of
definitions

c. Currency of
definitions

d. Accuracy of
notations

e. Completeness of
notations

f. Currency of
notations

g. Accuracy of symbols
h. Completeness of

symbols
i. Currency of symbols

a. Appropriateness to
target learners’
reading level

b. Sufficiency of
character sizes

c. Understandability of
texts/speeches, etc.

d. Avoidance of jargon
e. Consistency in clear

use of fonts

a. Learner control
b. Variety of

interactivity
c. Maintenance of

curiosity and
confidence

d. Effectiveness of
feedback to students'
responses
• Positive and neg-

ative feedback
• Correctness of

feedback
• Indication of

students' correct
response

• Re-teaching the
lesson or concept
after multiple in-
correct attempts

• Student perform-
ance report (print
out or on-screen)

e. Engagement with
daily life
• Through objects/

stories/cartoons
• Through

activities such as
problem-solving,
creating
characters

f. Appropriateness to
learners' age/gender
etc.

Rating Low     Medium     High
 ( )              ( )             ( )

Low    Medium    High
 ( )             ( )            ( )

Low   Medium   High
 ( )           ( )            ( )

Low    Medium    High
 ( )            ( )              ( )


1296 Australasian Journal of Educational Technology, 2012, 28(8)

Criteria 1. Learnerparticipation
2. Technical

quality
3. Evidence of
effectiveness

4. Free from
objectionable bias

Detailed
criteria

a. Requiring active
participation of
target users

b. Variety of
participation
• Selection
• Entering answers
• Matching
• Repeating

through writing
or spelling

a. Proper functioning
of all links

b. Proper functioning
of all multimedia
resources at all times

c. Ease of installing
software

d. Availability of clear
directions for access
or installation

e. Use with peripherals
f. Booting of program
g. Adjustable sound

level
h. Minimum level of

technical expertise to
run program

i. Cost-effectiveness
j. Program operation

without crashing
k. Multiple means to

operate program
l. Terminating the

program from any
place at any time

a. Test results
b. Ongoing monitoring

of target learners’
progress

c. Variety of
assessments

a. Avoiding
stereotypes

b. Avoiding biases
(gender, race,
culture, etc.)

Rating Low     Medium     High
  ( )            ( )              ( )

Low    Medium    High
  ( )           ( )             ( )

Low    Medium   High
  ( )           ( )            ( )

Low    Medium    High
  ( )           ( )             ( )

Criteria 1. User guidance/documentation
2. Clear

directions
3. Stimulates

creativity
4. Design
principles

Detailed
criteria

a. Completeness of
user guidance/
documentation

b. Correctness of user
guidance/
documentation

c. Understandability of
user guidance/
documentation

d. Currency of user
guidance/
documentation

e. Accessibility of user
guidance/
documentation from
anywhere in
software

a. Correctness of
grammar and
spelling

b. Appropriateness of
directions to target
learners’ level

c. Sufficiency of
directions

d. Evidence of
functions of
navigational
elements/icons

a. Existence of
discovery

b. Discovery through
automatic guidance

c. Discovery through
optional guidance

a. Consistency in lett-
ers, colours, buttons,
pictures, etc. used

b. Avoiding excessive
use of highlighting
techniques on
letters, pictures,
animation

c. Avoiding excessive
use of texts, pictures,
animations, sounds,
etc.

d. Use of contrasting
colours between
letters/ pictures/
elements and
background

e. Appropriate spacing
between elements

f. Appropriateness of
size of letter/
picture/ animation
etc. to target
learners’ age

g. Appropriateness of
computer-screen
layout and ease of
reading/following

h. Appropriate use of
margins for
consistency


Sancar Tokmak, Incikabi and Yanpar Yelken 1297

i. Appropriate use of
chunks

j. Use of same colours
for elements in the
software as ones in
real life

k. Equal distribution of
elements on screen

l. Indication of
completed sections
of the software

m. Providing more than
one way of
navigating through
the content

Rating Low     Medium     High
 ( )              ( )             ( )

Low    Medium    High
 ( )            ( )             ( )

Low   Medium   High
  ( )          ( )             ( )

Low     Medium     High
  ( )            ( )               ( )

Authors: Assistant Professor Hatice Sancar Tokmak
Computer Education and Instructional Technology Department
Mersin University, Turkey
Email: haticesancarr@gmail.com Web: http://haticetokmak.wordpress.com/

Assistant Professor Lutfi Incikabi
Mathematics Education Department, Mersin University, Turkey
Email: lutfiincikabi@yahoo.com

Professor Tugba Yanpar Yelken
Educational Sciences Department, Mersin University, Turkey
Email: tyanpar@gmail.com Web: http://www.mersin.edu.tr/apbs/tyanpar

Please cite as: Sancar Tokmak, H., Incikabi, L. & Yanpar Yelken, T. (2012). Differences
in the educational software evaluation process for experts and novice students.
Australasian Journal of Educational Technology, 28(8), 1283-1297.
http://www.ascilite.org.au/ajet/ajet28/sancar-tokmak.html