Australasian Journal of Educational Technology, 2017, 33(4).   

 
1 

Towards an automatic classification system for supporting the 
development of critical reflective skills in L2 learning 
 
Gary Cheng 
Department of Mathematics and Information Technology, The Education University of Hong Kong 
 

This study aimed to develop an automatic classification system, namely ACTIVE, for 
generating immediate and individualised feedback on students’ reflective entries about their 
second language (L2) learning experiences. It also aimed to explore students’ attitudes towards 
using the system to support the development of their reflective skills in L2 learning. A total of 
466 undergraduate students took part in the study. One hundred and twenty-seven participants 
were involved in the development phase, where their reflective entries were manually 
annotated according to a classification framework for critical reflection on L2 learning, and the 
annotated entries were then used to develop the ACTIVE system. The remaining participants 
were asked to generate automated feedback reports on their reflective entries for improvement 
by using the system. To solicit their views towards the system, the participants were 
administered an online questionnaire and some of them were also invited to attend a semi-
structured interview. The overall results indicate that the classification accuracy of the system 
is comparable to that of human annotators. They also suggest that both teacher and machine 
feedback types have strengths and limitations, highlighting the need to further explore the use 
of multi-channel, multi-layer feedback in improving students’ reflective skills in L2 learning. 

 
Introduction 
 
Reflective writing has been widely used as a pedagogical strategy to help English as a second language (ESL) 
students develop their critical thinking skills in second language (L2) learning (Hyland, 2003; Scott, 2005). In 
the process of reflective writing, students are expected to think about and make sense of the connections 
between what they have learnt and what they are learning, between what they are doing to learn and why they 
choose to do it and between what they would like to achieve and what they actually get. This practice 
provides an opportunity for students to critically reflect on their learning experiences and identify their 
existing strengths and areas for future development. Research suggests that reflective writing can potentially 
help ESL students to reinforce their understanding of L2 acquisition, to raise their self-awareness with respect 
to their L2 learning progress and to apply strategies to improve their L2 proficiency (Baturay & Daloğlu, 
2010; Chau & Cheng, 2010). 
 
Despite the potential of reflective writing for student learning, students’ reflective entries are commonly 
written at a descriptive rather than an analytical level. For example, Lai and Calandra (2007) studied the 
problems of preservice teachers in their reflection. The results of their study showed that most reflective 
entries were merely descriptive of an event or experience, suggesting that the preservice teachers had a very 
limited understanding and practice of critical reflection. Power (2012) found a similar issue with 
undergraduate students from a university language program. He suggested that a clear framework for critical 
reflection should be developed to promote students’ understanding of reflection and encourage their 
participation in reflection through scaffolding. In this light, previous studies proposed a number of multi-level 
framework to evaluate students’ reflective writing and then classify their reflective ability in various 
professions, such as accounting (Bisman, 2011), healthcare (Smith, 2011), teaching (Granberg, 2010) and L2 
learning (Chau & Cheng, 2012). 
 
There are two main challenges facing the use of a classification framework to evaluate students’ reflection: a 
great deal of time and effort put into understanding and applying the assessment criteria by teachers and a 
need for consistency in assessing a large number of reflective entries across multiple teachers (Wade, Abrami, 
& Sclater, 2005; Wetzel & Strudler, 2006). Given these challenges, it is worth considering whether existing 
technology can be adopted to offer immediate and adaptive feedback on students’ reflection in a helpful and 
consistent way. Therefore, the current study was undertaken to meet two objectives: (1) to develop an 


Australasian Journal of Educational Technology, 2017, 33(4).   

 
2 

automatic classification system for generating immediate and individualised feedback on students’ reflective 
entries about their L2 learning experiences; and (2) to explore students’ attitudes towards using the system to 
support the development of their reflective skills in L2 learning. The findings from this study can provide 
insights into the design of technology-assisted reflective practice and inform the direction of future research in 
this area. 
 
Related research 
 
Reflective learning 
 
The definition of reflective learning varies considerably across the literature. Dewey (1933) defined reflection 
as “active, persistent and careful consideration of any belief or supposed form of knowledge in light of the 
grounds that support it and the further consequences to which it leads” (p. 9). He argued that people learn 
more from the process of reflecting upon their former experiences and adapting to the environment than from 
the experience itself. This concept provides a cognitive basis for reflective learning, but with little 
consideration of learners’ emotional aspects. From the perspective of learners, Boud, Keogh, and Walker 
(1985) considered reflective learning as “those intellectual and affective activities in which individuals engage 
to explore their experiences in order to lead to a new understanding and appreciation” (p. 19). In this sense, 
reflective learning is not just an approach for critically reviewing prior experiences to guide future actions or 
responses, but also a process of re-examining the experience in order to gain new insights of self and practice 
(Mezirow, 1998).  
 
The benefits of reflective learning are widely recognised in the field of education and training, especially in 
L2 education. To promote reflective practice in L2 learning, reflective writing is one of the most widely used 
pedagogical strategies (Abednia, Hovassapian, Teimournezhad, & Ghanbari, 2013). Reflective writing can be 
described as a piece of students’ written work in which they document their thoughts and feelings in response 
to their personal learning experience in a specific domain or profession. It can also be viewed as a critical 
analysis of the learning experience and a discussion of its implications for future applications (Chau & Cheng, 
2012). Research indicates that reflective writing offers the potential for students to establish links between 
new knowledge and existing knowledge (Cochran-Smith & Lytle, 2001), to reinforce the application of 
critical thinking skills (Kuiper & Pesut, 2004), to promote self-evaluation and professional growth (Lee, 
2008) and to develop confidence and competence in their organisational and writing skills (Chang & Lin, 
2014). 
 
However, research has also highlighted challenges associated with the implementation of reflective writing in 
higher education. Chief among them is that students often encounter difficulties in choosing what to write in 
their reflection. For example, in Lai and Calandra’s (2007) study where preservice teachers’ difficulties in 
reflective writing were explored, two themes emerged from the interview data: “struggle in understanding 
reflection” and “technical and repetitive reflection writing assignments; in most cases, not reflection writing at 
all” (p. 73). Their study showed that the participants had very little knowledge of reflection and there was a 
lack of specific requirements and guidance on how to critically examine one’s own experiences. The findings 
are consistent with those of other studies (Greiman & Covington, 2007; Martin, 2005), underlining the need to 
help students develop a better understanding of reflection. An issue arising from this is to consider whether 
introducing a classification framework for reflective skills would be beneficial to students. 
 
Classification frameworks for reflective skills 
 
One research area in reflective writing is to design a classification framework for evaluation of students’ 
reflective skills. Some classification frameworks have been developed for specific domains to distinguish 
lower-level reflective skills from more-advanced reflective skills. Despite differences in their terminology and 
presentation, the classification frameworks share a common understanding of what constitutes a low level of 
reflection (e.g., a mere description of learning experiences) and a high level of reflection (e.g., interpretation 
and transformation of learning experiences into practice). The frameworks are described below.    


Australasian Journal of Educational Technology, 2017, 33(4).   

 
3 

 
McNeill, Brown, and Shaw (2010) proposed a framework to evaluate the reflective entries of specialist 
trainees with varying reflective skills in medical training. The framework classifies a reflective entry into one 
of the three different levels: the most reflective level (i.e., level 1), the intermediate reflective level (i.e., level 
2) and the least reflective level (i.e., level 3). Each level has its own set of descriptors designed by a group of 
experts in the fields of reflective learning and medical education. Critical thinking, context description, 
feelings and emotion, literature connection, evidence of learning and action planning are identified as key 
aspects for assessing the level of a reflective entry. According to the framework, an entry is considered most 
reflective if it demonstrates a high level of critical analysis with strong evidence of prior learning and with an 
action plan for future learning. In contrast, an entry is considered least reflective if it is just a mere description 
of an event with no evidence of learning from the past and no planning for the future. The results of the study 
indicated that 10% of the trainees’ reflective entries were identified as level 1 (most reflective) and nearly 
one-third were classified into level 3 (least reflective), suggesting that more guidance and support on 
reflective writing should be available to students. 
 
Hegarty (2011) designed a framework to analyse reflective writing of students undertaking a master’s 
program in teacher education. The framework is represented in a five-level hierarchical structure, where each 
level corresponds to one of the five categories: descriptive, explanatory, supported, contextual and critical 
reflection. Descriptive reflection represents the lowest level of the framework. At this level, students simply 
describe and narrate events in their lives without providing justification for their actions. Explanatory 
reflection is at the second level, where students analyse their experiences from a personal or professional 
perspective. The third level, namely supported reflection, refers to a reflective entry demonstrating the 
connection between an event (or action) and its supporting evidence from the literature. The fourth level, 
namely contextual reflection, suggests that an entry should include a consideration of different views and a 
comparison of those views with one’s own point of view. Critical reflection is known as the highest level of 
reflection, which contains in-depth analysis of one’s own experiences from multiple perspectives and relates 
the experiences to future learning. In Hegarty’s (2011) study, descriptive reflection and explanatory reflection 
were found most frequently but the rest were rarely found. This finding is consistent with that of McNeill et 
al. (2010), pointing to the need for promoting students’ understanding of critical reflection. 
 
Ryan and Ryan (2013) proposed a classification framework for reflection, namely 4Rs, intended to enhance 
students’ lifelong learning skills and professional practice in higher education. The framework distinguishes 
four levels of reflection: reporting and responding, relating, reasoning and reconstructing. The levels vary in 
complexity of thinking, ranging from lower-order thinking skills (e.g., reporting an incident and responding to 
it by making observations, expressing opinions or asking questions) to higher-order thinking skills (e.g., 
reconstructing for reframing future practice or professional understanding). Ryan and Ryan (2013) 
emphasised that reflective writing is never straightforward, and its success requires pedagogical support with 
reference to an effective evaluation framework. Ryan (2011) suggested an approach of teaching students how 
to compare and contrast the features of critical reflection with those of descriptive reflection. Power (2012) 
adopted a similar approach, where students were asked to individually identify and colour-code critical 
reflection in their reflective entries, followed by a face-to-face discussion with the class teacher on the color-
coded sections. He found that this approach could encourage students to develop self-directed learning skills, 
and also provide the teacher with an opportunity to give timely, personalised feedback on students’ reflection. 
He also noted that, however, this approach may have a significant impact on teacher workload, especially in a 
large class. A possible way to address this problem would be to capitalise on technology advances to assist the 
ongoing process of evaluation and feedback. 
 
Automatic classification of student texts 
 
Latent semantic analysis (LSA) is a computational approach used to determine document similarity in the 
field of information retrieval. It is known as a method to extract and represent the contextual-usage meaning 
of words by applying statistical computations to a large corpus of texts (Landauer, Foltz, & Laham, 1998). 
The underlying concept of LSA is to first aggregate all word contexts where a word does and does not appear, 
and then to generate a set of constraints that largely determine the similarity in meaning between two words 


Australasian Journal of Educational Technology, 2017, 33(4).   

 
4 

or groups of words. The constraints will be solved by linear algebra methods like singular value 
decomposition (SVD), and the solution will be used to guide the judgement of meaning similarity between 
words or documents. 
 
LSA starts by counting the frequency of each term (word) in each document of a text corpus, and 
subsequently constructs a term-document co-occurrence matrix where cell(x,y) contains the term frequency-
inverse document frequency (tf-idf) weight of the term x in the document y. The next step is to divide the 
term-document matrix by SVD into three matrices, of which one is a diagonal matrix. Small singular values in 
the diagonal matrix are eliminated, and the corresponding rows and columns in the other two matrices are 
ignored. As a result, the original term-document matrix can be transformed into a lower-dimension matrix. 
This truncated term-document matrix represents the correlational structure between terms and documents in a 
lower-dimensional semantic space. For automatic classification, a new document will firstly be transformed 
into a vector of features in the semantic space. The cosine similarity between the vector for the document and 
that for each known document in the corpus will then be calculated. In extreme cases, the similarity value 
between two documents is 1 if they are identical and 0 if they are completely different. A new document will 
be classified into the category of a known document if their similarity is highest among others. 
 
LSA has successfully been applied to text classification for educational purposes such as grading student 
essays based on their content (Jorge-Botana, León, Olmos, & Escudero, 2010; Lemaire & Dessus, 2001), 
evaluating brief summaries of narrative and expository texts (Olmos, León, Escudero, & Jorge-Botana, 2011), 
assessing free-text answers with reference to standard solutions (Pérez-Martin, Pascual-Nieto, & Rodriguez, 
2009) and providing adaptive feedback on student summaries in intelligent tutoring systems (He, Hui, & 
Quan, 2009; Kintsch et al., 2000). The studies reported that there was a high level of agreement between LSA 
and human judegments (with a coefficient of correlation close to or greater than 0.70), indicating that LSA 
could provide a good simulation of human performance in classifying student texts. 
 
Proposed approach for classifying reflective skills in L2 learning 

 
Classification framework 
 
Chau and Cheng (2012) introduced a framework called A-S-E-R to classify students’ reflective skills in L2 
learning by analysing their reflective entries. Four key elements of reflective skills are considered in the 
framework: analysis, reformulation and future application; strategy application; external influences; and 
report of events or experiences. Each element is further divided into four hierarchical levels, progressing from 
level 1 (i.e., developing) to level 4 (i.e., competent). An identified category is denoted by an element’s symbol 
and a level. For example, S4 represents level 4 on the scale assessing strategy application (S), indicating a 
critical analysis on the effectiveness of applied or alternative strategies for language learning. This study 
adopted the A-S-E-R framework to discriminate between levels of reflective skills with respect to L2 
learning. Details of the classification framework are illustrated in Table 1. 
 
  
Australasian Journal of Educational Technology, 2017, 33(4).   

 
5 

Table 1  
The A-S-E-R classification framework (adapted from Chau & Cheng, 2012) 

Elements of reflective 
L2 learning skills 

Category Descriptor 

Analysis, reformulation and 
future application (A): 
To analyse, reformulate, and 
refocus the experience; 
comprehensive discussion of 
implications of the 
experience in the context of 
future applications 

A4: 
Clear ability 

Discuss implications of the experience in the 
context of future applications and explain how the 
experience would benefit future language learning 

A3: 
Some ability 

Evaluate positive and/or negative effects of the 
experience on current practices of language 
learning 

A2: 
Limited ability 

Explain how the experience is connected to the 
objectives or challenges of language learning 

A1: 
Very limited ability 

Describe and comment on the experience with little 
or no justification 

Strategy application (S): 
To analyse effectiveness of 
applied or alternative 
strategies for language 
learning 
 

S4: 
Critical analysis 

Analyse and evaluate effectiveness of applied or 
alternative strategies on improving language 
learning 

S3: 
Logical explanation 

Identify language problems and explain how 
applied or alternative strategies can address the 
problems 

S2: 
Relevant discussion 

Describe choice and application of strategies for 
language learning 

S1 
Superficial description 

Briefly describe choice of strategies for language 
learning 

External influences (E): 
To make comments about 
external influences (e.g., 
circumstances, others’ 
perspectives) on the 
experience 
 

E4: 
Insightful and 

constructive comments 

Comment on how external influences can help 
change attitudes and behaviours towards language 
learning 

E3: 
Constructive comments 

Comment on how external influences can help 
identify language learning needs and ways to make 
progress 

E2: 
Some comments 

Comment on how external influences can help 
identify language learning needs 

E1: 
Very few comments 

Briefly comment on external influences with little 
focus on language learning 

Report of events or 
experiences (R): 
To report significant aspects 
of events or experiences 

R4: 
Detailed and  

analytical report 

Report significant aspects of the experience and 
make connections between language knowledge 
and skills gained from the experience 

R3: 
Detailed report 

Report significant aspects of the experience and 
identify language knowledge and skills gained 
from the experience 

R2: 
Coherent report 

Report some aspects of the experience and 
highlight language learning issues arising from the 
experience 

R1: 
Disjointed report 

Briefly report the experience with little relevance to 
language learning 

 
Australasian Journal of Educational Technology, 2017, 33(4).   

 
6 

Automatic classification system 
 
Figure 1 illustrates the design of the automatic system for classifying students’ reflective L2 learning skills in 
this study. The system was implemented using Natural Language Toolkit (NLTK) in Python (Bird, Klein, & 
Loper, 2009). It comprises the following four processing stages: 
 

(1) Compiling a corpus of annotated texts: The final corpus used in this study was compiled from 748 
reflective entries created by 398 students during the academic years 2013–14 and 2014–15. The total 
length of the reflective entries is 178,772 words (or 10,002 sentences), with an average length of 239 
words (or 13 sentences) per entry. Based on the A-S-E-R framework, each entry was manually and 
independently annotated by two researchers. Differences in annotation results between the 
researchers were identified and discussed to reach a consensus on the annotation standards. The basic 
unit of annotation was either a single sentence or an aggregate of consecutive sentences signifying 
the emergence of a category. If a unit has the properties of more than one category, it will be labelled 
with multiple categories. Figure 2 shows a sample annotated text where each unit is marked up with 
more than one category. 

(2) Pre-processing texts: Data pre-processing is performed to first remove words with little or no 
semantic value, and then to transform meaningful words into a standardised form for further 
processing. The tasks include spelling correction (i.e., to correct misspelled words), tokenisation 
(i.e., to tokenise annotated sentences into bags of words, where each bag represents one category), 
stop word removal (i.e., to filter out non-essential terms like punctuation marks, numbers, non-letter 
characters, and meaningless words such as “the”, “is”, “in” and “on”), and stemming (i.e., to convert 
terms into their root forms, like converting “connection”, “connective” or “connecting” into 
“connect”). 

(3) Selecting features: The process of feature selection is carried out to extract significant features from 
each bag of words. It starts by counting the frequency of each word (term) in a bag of words that 
corresponds to a certain category. LSA is subsequently applied to build a vector of term weights for 
each category, to generate a term-by-category matrix for representing a N-dimensional feature space 
where N is the total number of distinct terms, and to truncate the matrix for capturing the latent 
semantic structure of different categories in a lower-dimensional semantic space. 

(4) Classifying a new text: Every sentence of a new reflective entry is represented by a vector of term 
weights in the feature space. At the sentence level, a sentence s is classified into a category c if the 
cosine similarity between the vectors for c and s is greater than the average similarity between the 
vectors for c and the training sentences labelled with c. As such, it is possible for a sentence to be 
labelled with more than one element (e.g., A and S). For each element, however, a sentence will only 
be assigned a level (e.g., A1 or A2) that yields the highest similarity value. At the entry level, a piece 
of reflective writing will be assigned a weighted average level for each element based on the 
classification results of its constituent sentences. For example, Figure 2 shows a sample text 
containing one occurrence of R2 and one occurrence of R3. The weighted average level of R is 
computed by adding up all annotated levels of R (i.e., 2 + 3 = 5) and then dividing the sum by the 
total number of occurrences of R (i.e., 1 + 1 = 2). As a result, the rounded answer is 3 and the sample 
text will be categorised into R3. 

 
Australasian Journal of Educational Technology, 2017, 33(4).   

 
7 

 
Figure 1. Design of the automatic classification system 
 

Figure 2. A sample annotated text 
 
Research method 
 
Context of study 
 
This study was carried out as part of a government-funded project, namely Automatic Classification 
Techniques in Virtual Environments (ACTIVE). The project was proposed in response to the aforementioned 
concerns about students’ understanding of critical reflective skills as well as teachers’ workload in assessing 
and giving feedback on reflective writing. The main purpose of the project was to explore the effectiveness of 
using technological methods to address the concerns in the context of L2 learning. This study was specifically 
designed to meet this purpose with aims of developing a web-based automatic classification system to 
generate automated feedback on student reflections and examining students’ attitudes about using the system 
to facilitate the development of critical reflective skills. Prior to the study, an ethical review application was 
approved by the university. An informed consent form was designed with a clear description of the purpose of 
the study and what each participant was expected to do in the study. 
 
The study took place at the English Language Centre (ELC) of the Hong Kong Polytechnic University during 
the academic years 2013–14 and 2014–15. Since 2007, the ELC has developed a web-based e-portfolio 


Australasian Journal of Educational Technology, 2017, 33(4).   

 
8 

platform (http://eportfolio.elc.polyu.edu.hk/) to promote independent L2 learning in its English language 
enhancement courses, where students have the opportunity to develop, self-evaluate and reflect on their L2 
learning experiences. Participants of this study were students enrolled on a 13-week, credit-bearing language 
enhancement course entitled Advanced English for University Studies (AEUS). The AEUS course aims to 
help learners study more effectively in the university’s English medium learning environment and to improve 
and develop their English language proficiency. In the course, students are required to (1) plan, research for, 
write and revise a position argument essay; (2) present and justify views effectively in a mini oral defence; 
and (3) reflect on their English learning experiences and achievement and document the reflection in written 
form. A written reflective entry should be a minimum of 170 words and be submitted to the e-portfolio 
platform every 3 or 4 weeks. Failure to meet this requirement results in a grade deduction.  
 
Participants 
 
During the 2013–14 and 2014–15 academic years, 466 students (223 males and 243 females) taking the 
AEUS course voluntarily accepted an invitation to participate in the experimental study. At the beginning of 
the study, all participants were asked to sign an informed consent form and fill in a demographic data 
collection form. Table 2 provides an overview of the demographic information of participants. Of the 
participants, 82% were freshmen and 18% were sophomores. Their ages ranged from 17 to 26 years (M = 18.6 
and SD = 1.1). They came from eight academic disciplines: Applied Sciences, Business, Construction and 
Environment, Engineering, Fashion and Textiles, Health Sciences, Hotel and Tourism Management, and 
Social Sciences. The majority attained level 4 or above in the Hong Kong Diploma of Secondary Education 
(HKDSE) English Language Examination. A benchmarking study (Hong Kong Examinations and Assessment 
Authority, 2013) showed that level 4 in the HKDSE English Language Examination is equivalent to the band 
score range between 6.31 and 6.51 in the International English Language Testing System (IELTS). 
 
Table 2 
Demographic information of participants 

Item Category Count (Percentage) 
1. Number of participating students - 466 (100%) 
2. Gender Male  223 (47.9%) 
 Female 243 (52.1%) 
3. Age 17 35 (7.5%) 
 18 232 (49.8%) 
 19 132 (28.3%) 
 20 47 (10.1%) 
 21 or above 20 (4.3%) 
4. Year of study Year 1 380 (81.5%) 
 Year 2  86 (18.5%) 
5. Academic discipline Applied Sciences 58 (12.4%) 

 Business  72 (15.5%) 
 Construction and Environment  47 (10.1%) 
 Engineering 84 (18.0%) 
 Fashion and Textiles  22 (4.7%) 
 Health Sciences  123 (26.4%) 
 Hotel and Tourism Management  35 (7.5%) 

 Social Sciences 25 (5.4%) 
6. HKDSE English language exam result  5**  3 (0.6%) 

 5* 29 (6.2%) 
 5  65 (13.9%) 
 4  252 (54.1%) 
 3  1 (0.2%) 

 No answer  116 (24.9%) 

http://eportfolio.elc.polyu.edu.hk/


Australasian Journal of Educational Technology, 2017, 33(4).   

 
9 

Measures 
 

Performance evaluation of the automatic classification system 
By definition, accuracy measures the proportion of classifications that agree with the manually annotated 
results in the testing data set (Witten, Frank, & Hall, 2011). It was used to evaluate the effectiveness of the 
automatic classification system at the entry level. Beside accuracy, Cohen’s kappa (K) was also used to 
measure the agreement between human and machine annotation, while taking the possibility of chance 
agreement into account (Landis & Koch, 1977). 
 
Additionally, five-fold cross-validation was applied to evaluate the overall performance of the automatic 
classification system. During the cross-validation, all annotated entries in the corpus were randomly split into 
five equal-sized groups. Four groups were used as training data while the remaining group was used as testing 
data. This process was repeated five times until all groups were used for both training and testing purposes. 
The overall result was calculated by averaging the individual results derived from the five iterations. 
 
Students’ attitudes towards using the automatic classification system to support reflection 
An online questionnaire was designed to solicit students’ attitudes towards using the automatic classification 
system to support reflection. The questionnaire consists of seven self-report items. The first five items are 
fixed-response questions, of which the first two concern the number of entries submitted and feedback 
received. The next three measure levels of agreement on the effectiveness of the feedback and willingness to 
receive the feedback again using a 5-point Likert scale. The last two items are open-ended questions which 
allow students to provide their written comments. Details of the questionnaire items will be discussed in the 
ensuing section. 
 
To cross-validate the questionnaire results and to elicit further views on the automatic classification system, 
participants were invited to attend a focus group interview. The interview protocol comprises four guiding 
questions: 
 

(1) Do you think that the automatic classification system can help you improve your reflective writing?  
(2) What are the main differences between the feedback from the automatic classification system and the 

feedback from the teacher?  
(3) What do you like most and least about the automatic classification system?  
(4) Do you have any suggestions to improve the automatic classification system? 

 
Procedure 
 
In essence, this study can be divided into three different phases: the development phase, the implementation 
phase and the evaluation phase. All participants involved in the study were first introduced to the A-S-E-R 
framework. A total of 127 students participated in the first phase, namely the development phase. This phase 
was initiated in the academic year 2013–14 to compile a corpus of manually annotated reflective entries and 
to build up an online system, namely ACTIVE (http://gp.ied.edu.hk/active), for the provision of automatic 
classification and feedback. The second phase, namely the implementation phase, involved a total of 339 
participants studying the AEUS course during the academic year 2014–15. Participants in this phase were 
granted access to the ACTIVE system and they were also encouraged to use the system to generate feedback 
reports on drafts of their reflective entries. A feedback report contains four sections, including annotated 
entry, scoring and descriptors, remarks on total score and suggestions for improvement (see Figure 3). The 
final phase, namely the evaluation phase, started at the end of the AEUS course. In this phase, an online 
questionnaire was administered to the participants involved in the implementation phase and six interview 
sessions were arranged. Twenty-seven participants were randomly selected to attend the interview in order to 
share their views and experiences towards the ACTIVE system. 
 

http://gp.ied.edu.hk/active


Australasian Journal of Educational Technology, 2017, 33(4).   

 
10 

 
Figure 3. Layout of a feedback report generated by the ACTIVE system 


Australasian Journal of Educational Technology, 2017, 33(4).   

 
11 

Results and discussion 
 
Performance evaluation of the automatic classification system  
 
As stated earlier, a corpus of reflective entries collected from participants were manually annotated. The 
quality of the reflective entries varies among participants. The following is a sample reflective entry written 
by a student with good reflective skills, especially in the aspect of strategy application (S): 
 

{Most of us may agree that learning a foreign language is a challenging job, especially learning 
English which is a totally different language system when compared with Chinese. I had a hard 
time in learning English and disliked English. However, when you find the correct method, you 
can learn English in an effective way.} A2. 

 
{My strategy in learning English mainly follows the rule of practice makes perfect. I have read 
a sentence on a book: to learn a language, the best way is to soak yourself in it. This sentence 
inspired me a lot, as I agree that if you put yourself into the environment with an unknown 
language, you are forced to learn and speak that language in order to survive in that place. The 
more you practise that language, the more familiar you will be. The effect must be better than 
just learning from language books or sitting in classroom.} S4.  

 
{Therefore, I listen to the English radio channel every night and watch English movies with 
subtitles so as to get used to an English environment and practise listening. Moreover, I read 
English newspaper instead of Chinese newspaper to practise my reading. I have tried various 
methods to put myself in an English environment.} S3, R3. 

 
In contrast, a sample of a student’s entry shown below does not demonstrate consideration of strategy 
application (S): 
 

{As I caught an infectious disease before week 1, I was not supposed to go anywhere and so I 
did not attend the first English lesson. I was rather nervous before attending the lesson in week 
2 since I did not know any classmates. But instead of feeling awkward, I found myself enjoying 
the lesson. {Although the topic 'research' was not really my favourite type of study, the way that 
all of us interacted in class was relaxing and enjoyable.} A1. }R1. 

 
{I had a great time in class but somehow I feel pressure when thinking about the course. The 
content and assignments of the course are quite challenging. The essay we need to work on is 
an obstacle to me because every single word I use should be selected very carefully.}A2. {But I 
know that I am not alone. My classmates are doing exactly the same thing as I do. I am going to 
overcome all the obstacles with unflagging determination.} E1. Hope that everyone in the class 
will work hard and play hard together this year. 

 
As also expected, some students might reflect on their experiences beyond the context of L2 learning. Their 
entries would thus be unclassified according to the A-S-E-R framework. A representative student sample is 
given as follows:  
 

It was finally the last week of my first semester in university. During these 14 weeks, I have 
been worried, anticipated, moody and even anxious about the quizzes, assignments, 
presentations and tests of all the subjects. But at the end of this week, I realised that all of these 
components enriched my university life. I would also like to point out that there are actually 
quite a lot of differences between my expectation and the reality. 

  
People always claim that university is a place with great freedom and what we have to do is to 
enjoy our youth there. However, they never tell that when we are enjoying more freedom, we 


Australasian Journal of Educational Technology, 2017, 33(4).   

 
12 

ought to take more responsibilities. For instance, while I tried to skip class, I had to spend more 
time on self-study consequently. While I did not hand in assignment on time, I had to bare the 
consequence of losing some proportion of marks.  There are always consequences to take and 
be responsible of.  

  
In this semester, I started to get used to the life and systems of university. I might not do all my 
best in all courses, but the semester was more likely a great experience for me to have better 
management skill for next semester.  

 
The corpus of reflective entries were automatically annotated by the ACTIVE system. Table 3 shows the 
confusion matrix and Cohen’s kappa (K) obtained from the five-fold cross-validation on the corpus to assess 
the performance of the ACTIVE system. For each element of reflective skills (i.e., A, S, E and R), four 
hierarchical levels (i.e., 1, 2, 3 and 4) are included in the column and row headings together with an 
unclassified level (i.e., 0). The value at row A and column B of the confusion matrix represents the number of 
reflective entries classified into category A by human and category B by machine. Hence, the diagonal values 
of the matrix represent the numbers of matched results between human and machine annotation. As can be 
seen from Table 3, the results of human annotation are mostly consistent with those of machine annotation. 
This finding indicates that the performance of machine annotation is comparable to that of human annotation. 
Furthermore, the agreement between human and machine annotation is found to be substantial because all K 
values in the table are within or close to the range of substantial agreement (0.61–0.80) (Landis & Koch, 
1977). 
 
Table 3 
Confusion matrix and Cohen’s kappa (K) of the ACTIVE system 

H
um

an
 a

nn
ot

at
io

n 

Machine annotation K 
 A0 A1 A2 A3 A4 

0.60 

A0 40 4 1 0 0 
A1 18 115 8 1 0 
A2 25 23 260 44 4 
A3 7 2 48 118 6 
A4 0 0 4 11 9 

 S0 S1 S2 S3 S4 

0.73 

S0 102 5 3 0 0 
S1 40 298 43 0 0 
S2 6 14 120 10 0 
S3 1 0 12 91 0 
S4 1 0 2 0 0 

 E0 E1 E2 E3 E4 

0.67 

E0 99 2 1 1 0 
E1 37 254 29 15 0 
E2 9 23 167 20 0 
E3 12 9 15 53 0 
E4 2 0 0 0 0 

 R0 R1 R2 R3 R4 

0.70 

R0 90 0 3 1 0 
R1 18 142 11 1 0 
R2 26 11 144 33 0 
R3 26 0 30 199 3 
R4 2 0 0 4 4 

 
Australasian Journal of Educational Technology, 2017, 33(4).   

 
13 

Table 4 compares the classification accuracy of the ACTIVE system with two commonly used baseline 
methods, namely naïve Bayes (NB) and binary relevance (BR). NB is a simple probabilistic classifier based 
on Bayes’ theorem, which assumes that the features in a class are independent of each other. This method is 
particularly efficient and useful for classifying texts from very large datasets (Sebastiani, 2002). BR is a 
fundamental approach to address the multi-label classification problem, which constructs a set of binary 
classifiers to distinguish one single class from all other classes (Tsoumakas, Katakis, & Vlahavas, 2010). 
From Table 4, it can be seen that the classification accuracy of the ACTIVE system ranges from 72% to 82% 
over the elements of reflective skills and it outperforms that of the baseline methods. The empirical results 
demonstrate the feasibility of applying existing technology to the automatic classification of reflective skills 
in L2 learning with satisfactory performance. 
 
Table 4 
Classification accuracy of different methods 

Reflective elements NB BR ACTIVE 
A 60% 56% 72% 
S 52% 66% 82% 
E 54% 63% 77% 
R 61% 64% 77% 

 
Students’ attitudes towards using the automatic classification system to support reflection 
 
Results of the online questionnaire  
The online questionnaire was emailed to all participants involved in the implementation phase. A total of 203 
completed questionnaires were returned, yielding a response rate of 60%. The results of the fixed-response 
questions and open-ended questions are summarised in Table 5 and Table 6 respectively.  
 
Table 5 shows that most respondents (93%) submitted one to three reflective entries in the AEUS course, and 
a similar percentage (89%) used the ACTIVE system one to three times to generate feedback on their 
reflective entries. Only a minority (less than 8%) did not submit their reflective entries or use the ACTIVE 
system. When asked about the quality of the feedback offered by the system, about 70% of respondents 
agreed or strongly agreed that the system’s feedback could help them identify the strengths and weaknesses of 
their reflective skills in L2 learning. Nearly the same number of respondents (67%) agreed or strongly agreed 
that the system’s feedback could help improve their reflective skills in L2 learning. To a large or full extent, a 
substantial percentage (65%) were willing to continue to receive the system’s feedback on their reflective 
entries in the near future. 
 
  
Australasian Journal of Educational Technology, 2017, 33(4).   

 
14 

Table 5  
Results of the fixed-response questions in the online questionnaire 

Question Responses Number of 
respondents 

Percentage of 
respondents 

(1) How many reflective entries did you submit in 
the AEUS course? 

More than 3 7 3.4% 
3 81 39.9% 
2 62 30.5% 
1 45 22.2% 
0 8 3.9% 

(2) How often did you use the automatic 
classification system to generate feedback on 
your reflective entries? 

More than 3 6 3.0% 
3 54 26.6% 
2 68 33.5% 
1 59 29.1% 
0 16 7.9% 

(3) How much do you agree that the system’s 
feedback could help you identify the strengths 
and weaknesses of your reflective skills in L2 
learning? 

Strongly agree 10 4.9% 
Agree 133 65.5% 

Neutral 46 22.7% 
Disagree 10 4.9% 

Strongly disagree 4 2.0% 
(4) How much do you agree that the system’s 

feedback could help you improve your 
reflective skills in L2 learning? 

Strongly agree 8 3.9% 
Agree 128 63.1% 

Neutral 49 24.1% 
Disagree 15 7.4% 

Strongly disagree 3 1.5% 
(5) To what extent are you willing to continue 

receiving the system’s feedback on your 
reflective entries in the near future? 

To a full extent 26 12.8% 
To a large extent 106 52.2% 
To some extent 46 22.7% 

To a small extent 18 8.9% 
Not at all 7 3.4% 

 
In addition to the fixed-response questions, two open-ended questions were asked in the questionnaire to elicit 
respondents’ views on what they like and dislike most about the ACTIVE system. Table 6 shows the 
questions, students’ response categories, descriptive statistics and examples. A total of 58 respondents gave 
74 written responses, of whom 57 and 17 were the answers to the first and second questions respectively. As 
shown in Table 6, the answers to the first question identify students’ most favourite features of the system: (1) 
comprehensive and detailed comments; (2) convenience and ease of use; and (3) facilitation of reflection. On 
the other hand, the answers to the second question point to students’ least favourite features of the system: (1) 
brief comments and suggestions; (2) incapability to fully understand human language; and (3) unclear scheme 
for scoring reflection.  
 
  
Australasian Journal of Educational Technology, 2017, 33(4).   

 
15 

Table 6 
Results of the open-ended questions in the online questionnaire 

Question Response  
category 

Frequency   
count 

Valid  
percentage 

Example 

(1) What do you 
like most 
about the 
system? 

M1: 
Comprehensive and  
detailed comments 

32 56.1 • The generated comments are very 
detailed. 

• Some good examples of reflection 
are given. 

M2: 
Convenience and  

ease of use 

13 22.8 • It is really convenient to use because 
feedback could be generated with just 
a click. 

• It is easy to generate the feedback. 
The process is fast and simple. 

M3: 
Facilitation of  

reflection 

12 21.1 • It enables me to reflect on my 
language ability. 

• It can tell me the weaknesses of my 
reflective journal so I can see where 
to improve. 

(2) What do you 
like least 
about the 
system? 

L1: 
Incapability to fully 
understand human 

language  

6 35.3 • Some content is misunderstood by 
the system. 

• I doubt if the system can really 
analyse our feelings and then give 
reasonable marks. 

L2: 
Brief comments and 

suggestions 

6 35.3 • Suggestions are not concrete enough. 
• The comments given are a bit too 

general.  
L3: 

Unclear scheme for  
scoring reflection 

5 29.4 • I do not know how to mark my 
reflection. 

• Sometimes I don't understand how it 
marks my reflective journals. 

 
Table 7 shows a 3 x 3 contingency table of frequency counts according to the response categories identified in 
the two open-ended questions. Row percentages of the counts are given in brackets. As illustrated in Table 7, 
there were 16 respondents answering both questions. Due to this small sample size, the Fisher’s exact test was 
employed to analyse the cross-tabulation. The test result suggests that there is no evidence indicating an 
association between the response categories for question 1 and those for question 2 (p > 0.05). However, 
students who were in favour of a specific feature tended not to report a particular problem. This finding can 
be attributed to the contrast between two opposing categories. For example, it can be seen from Table 7 that 
students who liked M1 (comprehensive and detailed comments) did not raise concerns about L2 (brief 
comments and suggestions) and L3 (unclear scheme for scoring reflection). This result is reasonable because 
L2 is exactly the opposite to M1 while L3 is quite inconsistent with M1. A similar explanation can be used to 
interpret the result that students who liked M3 (facilitation of reflection) did not raise concerns about L3. If 
students really found that the ACTIVE system could facilitate reflection, it was likely that they could easily 
understand the scoring scheme and identify the aspects with low scores for improvement in their next 
reflection. 
 
  
Australasian Journal of Educational Technology, 2017, 33(4).   

 
16 

Table 7 
A contingency table of frequency counts according to the response categories identified in open-ended 
questions 

Response 
categories for Q1 

Response category for Q2 
Total L1 L2 L3 

M1 3 (100%) 0 0 3 (100%) 
M2 2 (18.2%) 4 (36.4%) 5 (45.5%) 11 (100%) 
M3 1 (50%) 1 (50%) 0 2 (100%) 

Total 6 (37.5%) 5 (31.3%) 5 (31.3%) 16 (100%) 
 
As seen in Table 7, students who reported different favourite features (M1 to M3) of the ACTIVE system 
shared a common problem of L1. This is understandable because using computer technology to interpret 
human language is never error free. Given that the design of the ACTIVE system was based on the extraction 
of common features shared by a limited set of reflective entries, the occurrence of uncommon features that 
cannot be captured by the system may cause misinterpretation. The second and third problems (L2 and L3) 
identified by some students, however, do not coincide with the favourite feature (M1) identified by some 
other students. Such a discrepancy warrants further investigation, as discussed in the next section. 
 
Results of the focus group interview  
Qualitative data was mainly obtained from six post-event interview sessions with a total of 27 participants (S1 
to S27, 17 females and 10 males) chosen randomly. Each session lasted approximately 30 to 45 minutes. All 
interview sessions were audio-recorded, transcribed verbatim, summarised and thematically categorised using 
thematic analysis (Braun & Clarke, 2006). After analysing the interview transcripts, four major themes 
emerged as significant for understanding students’ views on the use of the automatic classification system to 
support their reflection: elements of reflective skills in L2 learning; scoring for reflection; content of the 
feedback report; and arguments for and against automatic classification of reflective skills. The themes are 
discussed below along with student quotes from the interview transcripts. 
 
The first recurring theme expressed by the interviewees is about the benefits of using key elements of 
reflective skills in L2 learning to monitor and evaluate their reflective proficiency. Although the feedback 
report on each reflective entry was standardised in its format, it was not in content. Different reflective entries 
prepared by the same student might be classified into different categories of reflective skills. In this regard, 
most interviewees (S1–S10, S13–S18, S20–S22, S25–S26) considered the feedback report useful in 
supporting the improvement of their reflection. Such usefulness can be ascribed to the effectiveness of the A-
S-E-R framework in identifying the strengths and weaknesses of students’ reflective skills. As an interviewee 
(S1) observed: 
 

The feedback is effective and helpful because the feedback highlights the strengths and 
weaknesses in different aspects of my reflective skills in L2 learning. It helps me to see which 
areas I need to improve on to get into a deeper reflection. 

 
Another interviewee (S2) supported the same view: 
 

The four aspects [A-S-E-R] and their scores in my feedback provide good directions for me to 
improve the quality of my next reflection. … It is good to see that the sentences in my reflection 
are labelled with elements and also graded for their levels. The feedback shows my proficiency 
in each aspect and which type of reflective student I am. 

 
The second theme expressed by some interviewees is associated with scoring for reflection. While most 
interviewees recognised the usefulness of the system’s feedback in improving their reflection, some (S11, 
S12, S19, S23, S24) were not satisfied with the scores they attained in the assessment of their reflection. They 
particularly raised concerns that a low reflection score would undermine their motivation and confidence to 


Australasian Journal of Educational Technology, 2017, 33(4).   

 
17 

enhance their reflective entries, so they preferred to receive qualitative feedback only. An interviewee (S11) 
commented on this: 
 

Actually, I was a little bit disappointed that I got a low reflection score. I did put much effort in 
drafting my reflection ... I agree that the feedback could point out my weaknesses, but the score 
could not encourage me to work harder to improve my reflective skills. Given a low score, I just 
feel that I would not be able to achieve a great improvement in my reflection even though I try 
to do more. 

 
The third them is about the content of the feedback report. The majority of interviewees (S1–S10, S13–S18, 
S20–S22, S25, S26) agreed that the feedback report could help identify their strengths and weaknesses in 
relation to reflective skills in L2 learning. Some interviewees (S2–S5, S13–S16, S25, S26) also appreciated 
that good examples of reflection were given in their feedback reports. They said that they could learn good 
reflective practices from those examples and the examples could help them consider how to improve their 
own reflection. An interviewee (S3) reported: 
 

I like reading the examples given in my feedback report. From the examples, it was easier for 
me to understand what constitutes a good reflection and how to effectively reflect on my 
English learning experiences. I believe that the examples would also help other students 
improve their reflection.  

 
However, a few interviewees (S11, S12, S23) pointed out that the feedback report should provide more 
specific suggestions on how to further enhance their reflective skills at sentence level. For example, the 
following interviewee (S23) noted that strategy application (S) is a key element of reflective skills in L2 
learning (see Table 1) and he asked for suggestions about strategies for L2 learning without a specific context 
(level 1): 
 

I want to have more feedback about what strategies should be adopted to enhance my English 
learning ability. The feedback report could suggest something that we can try and do to make 
progress in developing our reflective skills. For example, talking more with international 
students may be an effective strategy to improve my English speaking ... I would like to see this 
type of assistance [effective strategy] offered in the feedback and step-by-step instructions on 
how to express the suggested ideas in sentences of my reflection. 

 
The following interviewee (S12) echoed the same concern but focused on achieving a higher level of 
reflective skills:  

It was clear from the system’s feedback that some sentences in my reflection were at a low level 
[of reflective skills]. However, I had no ideas of how those sentences could be revised to attain 
at a higher level. It would be perfect if the system could give me specific suggestions on how to 
modify the sentences, just like the instructions given by the teacher during consultation hours. 

 
Both interviewees hoped that the system could provide detailed instructions on how to modify their reflection 
at sentence level. Their request, however, is beyond the present capabilities of the system and remains a huge 
challenge for current technology and research areas like automated essay evaluation (Shemis & Burstein, 
2013).   
 
The last theme is about the arguments for and against automatic classification of reflective skills. Two-thirds 
of the interviewees (S1–S6, S8–S10, S13–S17, S21, S22, S25, 26) rendered support to the development of the 
automatic system for classifying their reflective skills. They noted that their teachers were busy and could not 
afford to provide timely feedback on their reflective entries. To address this challenge, the interviewees 
believed that applying automatic classification technology into reflective writing can be a feasible alternative. 
This view was corroborated by the following interviewee’s comment (S3):  
 

Australasian Journal of Educational Technology, 2017, 33(4).   

 
18 

In the past I had to wait for a long time before receiving teacher comments on my reflective 
entry. This would stall my progress in reviewing and improving my reflective skills. I found 
that it was a good experience to use the automatic classification system for preparing my 
reflective entry. Getting immediate feedback on my reflection was just a click away. 

 
Despite the potential for the automatic classification system to improve students’ reflective skills, there was a 
perception among the interviewees (S7, S11, S12, S18–S20) that machine feedback can only supplement but 
not replace teacher feedback. The following quotation (S7) illustrates this point: 
 

Although the automatic classification system could provide a timely and systematic evaluation 
of my reflective skills, I would also prefer to receive teacher feedback because it shows more 
concerns about my personal feelings and needs. Teacher feedback is also more positive and 
encouraging, which makes me feel more motivated to keep writing my reflective entries. 

 
Taken together, the analysis of student responses collected by the online questionnaire and focus group 
interview confirms the benefits of the automatic classification system in supporting the development of 
students’ reflective skills. In particular, the system offers three advantageous features: it is convenient to 
generate immediate and systematic feedback on student reflection; the feedback highlights one’s strengths and 
weaknesses of reflective skills in L2 learning; and students are provided with good examples of reflection to 
help them learn reflective skills from others.  
 
On the other hand, the analysis also reveals that there are concerns about the use of the automatic 
classification systems for self-reflection. They include the demotivating effect of giving a reflective entry a 
low score; the need for providing more specific suggestions on how to improve reflection; and the 
indispensable role of teacher feedback in caring about the feelings of students and encouraging them to 
continue with the practice of reflective writing. The qualitative results clearly show that both teacher and 
machine feedback have their own strengths and limitations. Teacher feedback could be specific and 
encouraging, but could also be slow, inconsistent among teachers and occasionally unavailable for many 
reasons. Machine feedback could be quick, consistent and always available on request. However, it could also 
be general and rigid. In this connection, there is no single type of feedback (teacher or machine) that can best 
suit the needs of all students. This issue underscores the need to explore the use of multi-channel, multi-layer 
feedback in improving students’ reflective skills. 
 
Implications and future work 
 
The results of this study have two implications for researchers and practitioners of technology-assisted 
reflective learning. First, this study contributes to the development of an automatic classification system for 
generating immediate and individualised feedback on students’ reflective skills in L2 learning. It lends further 
support to the view that technology could play a beneficial role in the process of evaluating the quality of 
student submissions and then providing automated feedback on them. The findings of this study indicate that 
the classification accuracy of the ACTIVE system is comparable to that of human annotators. Additionally, 
they show that the ACTIVE system can provide students with immediate feedback on their reflective entries 
and with opportunities to make revisions to the entries at an early stage before receiving teacher feedback at a 
later stage. Researchers can further explore this research direction to gain insights into how the automatic 
classification technology can support and enhance the feedback process within different learning contexts. 
 
Second, this study also adds evidence to confirm that most students see the potential and benefits of 
automated feedback on their reflective entries. Specifically, they find the feedback helpful in systematically 
identifying their strengths, weaknesses and areas for improvement according to the A-S-E-R classification 
framework. With a high level of classification accuracy, this kind of feedback would furnish students with 
evaluation results to reflect upon their own learning. Given that the automated feedback can be generated at 
different points of time, students can monitor and review their progress towards some particular goals. They 
can also adopt repair strategies to adapt to the attainment of the goals. In the long run, the formative 


Australasian Journal of Educational Technology, 2017, 33(4).   

 
19 

development of reflective skills would be facilitated through a self-directed, technology-assisted and 
performance-based learning process. To validate the positive role of the automated feedback, its impact on the 
quality of students’ reflective entries and on their academic achievement will be investigated in future 
research. 
 
Conclusion 
 
The results of this study indicate that the ACTIVE system could achieve a satisfactory level of classification 
accuracy over the key elements of reflective skills. This finding not only suggests that the results of machine 
classification could be substantially consistent with those of human classification, but also demonstrates the 
feasibility of using existing technology to automatically evaluate and classify the reflective skills of ESL 
students. In addition, the results show that students were generally satisfied with using the ACTIVE system to 
receive immediate and systematic feedback on the strengths and weaknesses of their reflective skills. 
However, the results also indicate that the automated feedback cannot replace teacher feedback. Some 
students reported that teacher feedback was very important for them to continue with the reflective practice 
because it could provide them with tailor-made advice, care and encouragement. The overall results of this 
study validate that both machine feedback and teacher feedback play their unique roles in meeting various 
needs of students. For this reason, researchers and practitioners should consider how to integrate both 
feedback types into the process of reflection in order to maximise the improvement of students’ reflective 
skills in L2 learning. In a broader sense, the findings would contribute to stimulating dialogue and advancing 
further work to explore the use of multi-channel, multi-layer feedback in improving reflective learning.  
 
As with all research, there are limitations in this study and some are noted here. First, the sample was taken 
from a single university in Hong Kong. In order to generalise the findings, more studies should be carried out 
across different universities in both English-speaking and non-English-speaking countries. Second, the 
reflective entries used in this study were collected from an English language enhancement course only. 
Research can also be undertaken across various professional development programmes (e.g., teacher 
education and nursing) in which critical reflection is considered a highly valued ability for practitioners. 
Third, this study focuses on exploring the accuracy of the automatic classification system as well as the 
benefits and limitations of the automatic classification system perceived by students. Research can be 
conducted to further investigate the impact of the system on the quantity and the quality of students’ reflective 
entries. The impact of the system on students’ persistence in their reflective practice and on their academic 
achievement can also be explored in future studies.  
 
Acknowledgments 
 
This research is financially supported by General Research Fund of Hong Kong (No. 840913).  
 
References 
 
Abednia, A., Hovassapian, A., Teimournezhad, S., & Ghanbari, N. (2013). Reflective journal writing: 

Exploring in-service EFL teachers’ perceptions. System, 41(3), 503–514. 
doi:10.1016/j.system.2013.05.003  

Baturay, M. H., & Daloğlu, A. (2010). E-portfolio assessment in an online English language course. 
Computer Assisted Language Learning, 23(5), 413–428. doi:10.1080/09588221.2010.520671   

Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. Sebastapol, CA: O'Reilly 
Media. 

Bisman, J. (2011). Engaged pedagogy: A study of the use of reflective journals in accounting education. 
Assessment & Evaluation in Higher Education, 36(3), 315–330. doi:10.1080/02602930903428676 

Boud, D., Keogh, R., & Walker, D. (1985). Reflection: Turning experience into learning. London: Kogan 
Page. 

Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 
3(2), 77–101. doi:10.1191/1478088706qp063oa 

http://dx.doi.org/10.1016/j.system.2013.05.003
http://dx.doi.org/10.1080/09588221.2010.520671
http://dx.doi.org/10.1080/02602930903428676
http://dx.doi.org/10.1191/1478088706qp063oa


Australasian Journal of Educational Technology, 2017, 33(4).   

 
20 

Chang, M. M., & Lin, M. C. (2014). The effect of reflective learning e-journals on reading comprehension 
and communication in language learning. Computers & Education, 71, 124–132. 
doi:10.1016/j.compedu.2013.09.023 

Chau, J., & Cheng, G. (2010). Towards understanding the potential of e-portfolios for independent learning: 
A qualitative study. Australasian Journal of Educational Technology, 26(7), 932–950. 
doi:10.14742/ajet.1026 

Chau, J., & Cheng, G. (2012). Developing Chinese students’ reflective second language learning skills in 
higher education. The Journal of Language Teaching and Learning, 2(1), 15–32. Retrieved from 
http://www.jltl.org/index.php/jltl/article/view/54/21  

Cochran-Smith, M., & Lytle, S. L. (2001). Beyond certainty: Taking an inquiry stance on practice. In A. 
Lieberman & L. Miller (Eds.), Teachers caught in the action: Professional development that matters (pp. 
45–58). New York, NY: Teachers College Press.  

Dewey, J. (1933). How we think. Lexington, MA: Heath. doi:10.1037/10903-000 
Granberg, C. (2010). Social software for reflective dialogue: Questions about reflection and dialogue in 

student teachers’ blogs. Technology, Pedagogy and Education, 19(3), 345–360. 
doi:10.1080/1475939X.2010.513766 

Greiman, B. C., & Covington, H. K. (2007). Reflective thinking and journal writing: Examining student 
teachers’ perceptions of preferred reflective modality, journal writing outcomes, and journal structure. 
Career and Technical Education Research, 32(2), 115–139. doi:10.5328/CTER32.2.115  

He, Y., Hui, S. C., & Quan, T. T. (2009). Automatic summary assessment for intelligent tutoring systems. 
Computers & Education, 53(3), 890–899. doi:10.1016/j.compedu.2009.05.008  

Hegarty, B. (2011). Is reflective writing an enigma? Can preparing evidence for an electronic portfolio 
develop skills for reflective practice? In G. Williams, P. Statham, N. Brown, & B. Cleland (Eds.), 
Proceedings ascilite Hobart 2011. Changing Demands, Changing Directions. (pp. 580–593). Retrieved 
from http://www.ascilite.org/conferences/hobart11/downloads/papers/Hegarty-full.pdf 

Hong Kong Examinations and Assessment Authority. (2013). Results of the benchmarking study between 
IELTS and HKDSE English Language Examination [Press release]. Retrieved from 
http://www.hkeaa.edu.hk/DocLibrary/MainNews/press_20130430_eng.pdf  

Hyland, K. (2003). Second language writing. Cambridge: Cambridge University Press. 
doi:10.1017/CBO9780511667251  

Jorge-Botana, G., León, J. A., Olmos, R., & Escudero, I. (2010). Latent semantic analysis parameters for 
essay evaluation using small-scale corpora. Journal of Quantitative Linguistics, 17(1), 1–29. 
doi:10.1080/09296170903395890    

Kintsch, E., Steinhart, D., Stahl, G., LSA Research Group, Matthews, C., & Lamb, R. (2000). Developing 
summarization skills through the use of LSA-based feedback. Interactive Learning Environments, 8(2), 
87–109. doi:10.1076/1049-4820(200008)8:2;1-B;FT087  

Kuiper, R. A., & Pesut, D. J. (2004). Promoting cognitive and metacognitive reflective reasoning skills in 
nursing practice: Self-regulated learning theory. Journal of Advanced Nursing, 45(4), 381–391. 
doi:10.1046/j.1365-2648.2003.02921.x  

Lai, G., & Calandra, B. (2007). Using online scaffolds to enhance preservice teachers’ reflective journal 
writing: A qualitative analysis. International Journal of Technology in Teaching and Learning, 3(3), 66–
81. Retrieved from http://sicet.org/web/journals/ijttl/issue0703/5_Lai_Calandra.pdf  

Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to latent semantic analysis. Discourse 
Processes, 25(2-3), 259–284. doi:10.1080/01638539809545028  

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. 
Biometrics, 33(1), 159–174. doi:10.2307/2529310  

Lee, I. (2008). Fostering preservice reflection through response journals. Teacher Education Quarterly, 35(1), 
117–139. Retrieved from http://www.jstor.org/stable/23479034  

Lemaire, B., & Dessus, P. (2001). A system to assess the semantic content of student essays. Journal of 
Educational Computing Research, 24(3), 305–320. doi:10.2190/G649-0R9C-C021-P6X3  

Martin, M. (2005). Reflection in teacher education: How can it be supported? Educational Action Research, 
13(4), 525–542. doi:10.1080/09650790500200343  

http://dx.doi.org/10.1016/j.compedu.2013.09.023
http://dx.doi.org/10.14742/ajet.1026
http://www.jltl.org/index.php/jltl/article/view/54/21
http://dx.doi.org/10.1037/10903-000
http://dx.doi.org/10.1080/1475939X.2010.513766
http://dx.doi.org/10.5328/CTER32.2.115
http://dx.doi.org/10.1016/j.compedu.2009.05.008
http://www.ascilite.org/conferences/hobart11/downloads/papers/Hegarty-full.pdf
http://www.hkeaa.edu.hk/DocLibrary/MainNews/press_20130430_eng.pdf
http://dx.doi.org/10.1017/CBO9780511667251
http://dx.doi.org/10.1080/09296170903395890
http://dx.doi.org/10.1076/1049-4820(200008)8:2;1-B;FT087
http://dx.doi.org/10.1046/j.1365-2648.2003.02921.x
http://sicet.org/web/journals/ijttl/issue0703/5_Lai_Calandra.pdf
http://dx.doi.org/10.1080/01638539809545028
http://dx.doi.org/10.2307/2529310
http://www.jstor.org/stable/23479034
http://dx.doi.org/10.2190/G649-0R9C-C021-P6X3
http://dx.doi.org/10.1080/09650790500200343


Australasian Journal of Educational Technology, 2017, 33(4).   

 
21 

McNeill, H., Brown J. M., & Shaw, N. J. (2010). First year specialist trainees’ engagement with reflective 
practice in the e-portfolio. Advances in Health Science Education, 15(4), 547–558. doi:10.1007/s10459-
009-9217-8  

Mezirow, J. (1998). On critical reflection. Adult Education Quarterly, 48(3), 185–198. 
doi:10.1177/074171369804800305   

Olmos, R., León, J. A., Escudero, I., & Jorge-Botana, G. (2011). Using latent semantic analysis to grade brief 
summaries: some proposals. International Journal of Continuing Engineering Education and Life-Long 
Learning, 21(2/3), 192–209. doi:10.1504/IJCEELL.2011.040198  

Pérez-Martin, D., Pascual-Nieto, I., & Rodriguez, P. (2009). Computer-assisted assessment of free-text 
answers. The Knowledge Engineering Review, 24(4), 353–374. doi:10.1017/S026988890999018X 

Power, J. B. (2012). Towards a greater understanding of the effectiveness of reflective journals in a university 
language program. Reflective Practice: International and Multidisciplinary Perspectives, 13(5), 637–649. 
doi:10.1080/14623943.2012.697889  

Ryan, M. (2011). Improving reflective writing in higher education: A social semiotic perspective. Teaching in 
Higher Education, 16(1), 99–111. doi:10.1080/13562517.2010.507311  

Ryan, M., & Ryan, M. (2013). Theorising a model for teaching and assessing reflective learning in higher 
education. Higher Education Research and Development, 32(2), 244–257. 
doi:10.1080/07294360.2012.661704  

Scott, T. (2005). Creating the subject of portfolios: Reflective writing and the conveyance of institutional 
prerogatives. Written Communication, 22(1), 3–35. doi:10.1177/0741088304271831  

Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1-
47. doi:10.1145/505282.505283  

Shermis, M. D., & Burstein, J. (2013). Handbook of automated essay evaluation: Current applications and 
future directions. New York, NY: Routledge. doi:10.4324/9780203122761  

Smith, E. (2011). Teaching critical reflection. Teaching in Higher Education, 16(2), 211–223. 
doi:10.1080/13562517.2010.515022  

Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining multi-label data. In O. Maimon & L. Rokach 
(Eds.), Data mining and knowledge discovery handbook. (pp. 667–685). Berlin: Springer. 

Wade, A., Abrami, P. C., & Sclater, J. (2005). An electronic portfolio to support learning. Canadian Journal 
of Learning and Technology, 31(3), 33–50. Retrieved from 
http://www.cjlt.ca/index.php/cjlt/article/view/26489/19671  

Wetzel, K., & Strudler, N. (2006). Costs and benefits of electronic portfolios in teacher education: Student 
voices. Journal of Computing in Teacher Education, 22(3), 99–108. 
doi:10.1080/10402454.2006.10784544  

Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Practical machine learning tools and techniques 
(3rd ed.). Burlington, MA: Morgan Kaufmann. 

 
Corresponding author: Gary Cheng, chengks@eduhk.hk  
 
Australasian Journal of Educational Technology © 2017. 
 
Please cite as: Cheng, G. (2017). Towards an automatic classification system for supporting the development 

of critical reflective skills in L2 learning. Australasian Journal of Educational Technology, 33(4), 1-21. 
https://doi.org/10.14742/ajet.3029  

 
http://dx.doi.org/10.1007/s10459-009-9217-8
http://dx.doi.org/10.1007/s10459-009-9217-8
http://dx.doi.org/10.1177/074171369804800305
http://dx.doi.org/10.1504/IJCEELL.2011.040198
http://dx.doi.org/10.1017/S026988890999018X
http://dx.doi.org/10.1080/14623943.2012.697889
http://dx.doi.org/10.1080/13562517.2010.507311
http://dx.doi.org/10.1080/07294360.2012.661704
http://dx.doi.org/10.1177/0741088304271831
http://dx.doi.org/10.1145/505282.505283
http://dx.doi.org/10.4324/9780203122761
http://dx.doi.org/10.1080/13562517.2010.515022
http://www.cjlt.ca/index.php/cjlt/article/view/26489/19671
http://dx.doi.org/10.1080/10402454.2006.10784544
mailto:chengks@eduhk.hk
https://doi.org/10.14742/ajet.3029

	Introduction
	Related research
	Reflective learning
	Classification frameworks for reflective skills
	Automatic classification of student texts

	Proposed approach for classifying reflective skills in L2 learning
	Classification framework
	Automatic classification system

	Research method
	Context of study
	Participants
	Measures
	Performance evaluation of the automatic classification system
	Students’ attitudes towards using the automatic classification system to support reflection

	Procedure

	Results and discussion
	Performance evaluation of the automatic classification system
	Students’ attitudes towards using the automatic classification system to support reflection
	Results of the online questionnaire
	Results of the focus group interview


	Implications and future work
	Conclusion
	Acknowledgments
	References