Microsoft Word - Cook et al Final Version.docx


 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

Using Text Mining and Data Mining Techniques for Applied Learning 
Assessment 

Jessica Cook, University of North Carolina Wilmington 
Cuixian Chen, University of North Carolina Wilmington, chenc@uncw.edu  

Angelia Reid-Griffin, University of North Carolina Wilmington 
 

Abstract. In a society where first hand work experience is greatly valued many 
universities or institutions of higher education have designed their Quality 
enhancement plan (QEP) to address student applied learning. This paper is the 
results of a university’s QEP plan, called Experiencing Transformative Education 
Through Applied Learning or ETEAL. This paper will highlight the research that was 
conducted using text mining and data mining techniques to analyze a dataset of 
672 student evaluations collected from 40 different applied learning courses from 
fall 2013 to spring 2015, in order to evaluate the impact on instructional practice 
and student learning. Text mining techniques are applied through the NVivo text 
mining software to find the 100 most frequent terms to create a document-term 
matrix in Excel. Then, the document-term matrix is merged with the manual 
interpretation scores received to create the applied learning assessment data. 
Lastly, data mining techniques are applied to evaluate the performance, including 
Random Forest, K-nearest neighbors, Support Vector Machines (with linear and 
radial kernel), and 5-fold cross-validation. Our results show that the proposed text 
mining and data mining approach can provide prediction rates of around 67% to 
85%, while the decision fusion approach can provide an improvement of 69% to 
86%. Our study demonstrates that automatic quantitative analysis of student 
evaluations can be an effective approach to applied learning assessment. 
 
Keywords: Text mining, data mining, applied learning assessment, short answer 
questions, student evaluation 
 
Text mining, sometimes referred to as text data mining, is the action of obtaining 
patterns or interesting knowledge from text-based documents. Text mining can 
become very complicated and time-consuming when original text documents lack 
structure (Tan, 1999). The process of text mining consists of two main phases: 
refining the original text documents to some chosen form and extracting knowledge 
from the text documents through patterns (Delgado, 2002). Mining a text-based 
document after it has been refined to the chosen form finds critical patterns and 
relationships seen across all documents (Tan, 1999). 
 
Student evaluations of teaching (SET) are seen from two different perspectives: 
informal and formal (Scriven, 1967; Stake, n.d.). Formal evaluation is done by 
conducting standardized testing of students. This study will focus heavily on the 
informal perspective of student evaluations. An informal student evaluation is 
perceived as informal based on its casual observation and subjective 
bias/judgment. The reason for focus on informal perspective is provide a 
personalize approach to evaluating course and learning objective that educator 



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 61 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

emphasizes in the course. One study revealed that most educators feared that 
scorers would not pay adequate attention to the characteristics that the educator 
deems most important. The best teachers continually utilize what is learned from 
student evaluations to improve their teaching practices (Ramsden, 2003).  

 
Educational Evaluations: SETs 

 
Student evaluation of Teaching (SET) is a common tool used in numerous 
institutions of higher education to provide evidence of teaching effectiveness and 
reflection of students’ learning (Wagner, Rieger, & Voorvelt, 2016). In terms of 
evaluating effectiveness of teaching, students are positioned to be intuitively 
knowledgeable of information on actual effectiveness. Oftentimes students lack 
information on how to assess teaching effectiveness which is problematic when SET 
scores are used for promotions and contracts renewals (Boring, 2017).  Because 
SETs have a history of being biased in areas of race/ethnicity and gender (Boring, 
2017; Wagner, Rieger, & Voorvelt, 2016) is the reason why this study focusing on 
the informal perspective of SETs and how it measures the instructional practices 
and student learning in 40 different applied learning courses from fall 2013 to 
spring 2015.  
 
The SETs are typically designed as a rating form for students to rank the instructor 
and/or course based on numerous specific characteristics of effectiveness (Uttl, 
White, Gonzalez, 2017). They are administered at the end of the semester and are 
often optional for student to complete. However, some higher education institutions 
have implemented required completion of SETs to improve response rates of the 
instrument (Boring, 2017).   
 
Students have been reported of not showing any objection to filling out evaluations 
and are often honest (Douglas & Carroll, 1987; Gal & Gal, 2014). According to Gal 
and Gal’s (2014) study on knowledge bias of student evaluations in an Economics 
course, students believed their role in evaluating courses is special, as it positions 
them to provide feedback that is reflective of the teaching quality.  Other claims of 
student evaluations being reliable than other teacher effectiveness measures, such 
as peer ratings and observations is supported by other researchers (Heller & Clay, 
1993; Fike, Fike, & Zhang (2015). The research by Galbraith, Merrill, & Kline’s 
(2012) on the student evaluation of teaching effectiveness (SETE) validity in 
measuring student outcomes in business classes, found “student rating of learning 
outcome problem from different statistical perspectives, resulted in a high degree of 
consistency with respect to validity (p. 368).  Recent studies on SET indicate that 
students demonstrate some bias in terms of teacher background and behaviors 
rather than quality of course instruction (Wagner, Rieger, & Voorvelt, 2016). Often, 
students believe that evaluations are effective and that teachers value the input 
from student evaluations and do not rank based on personal biases or grade. 
Students also believe that evaluations are a critical way to improve/adjust faculty 
teaching methods and improve quality of course (Scriven, 1967; Wagner, Rieger, & 
Voorvelt, 2016). A study found that students prefer mid-semester evaluations over 
those that take place at the end of the semester, because they are able to see the 
change being applied from the evaluations (Abott et al., 1990).  



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 62 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

Student evaluation is a strong measure of how effective a faculty’s teaching 
practices are and can reflect student learning (Beleche, Fairris, & Marks, 2012).  It 
is important that students are motivated to actively participate and provide honest 
input that contributes to the success of evaluation systems. Research conducted by 
Chen et al. (2003) found that students consider improvement in the implemented 
teaching practices to be the most attractive outcome of the evaluation system. The 
second most attractive outcome is seeing change to improve the course content. 
Chen et al. (2003) finds that students are more motivated to participate in 
evaluations when they believe their feedback is seen as meaningful. The quality of 
student evaluations is essential in obtaining meaningful student feedback to provide 
areas of opportunity to improve teaching methods and effectiveness. 
 
Teaching and learning in higher education are inextricably and elaborately linked. 
Good teachers continually use what they learn from their students to improve their 
own practice. The assumption that the primary goal of teaching is to improve 
student learning and teaching, leads to the argument that a reflective approach 
would be effective. Thus, student evaluation is an essential aspect to improve 
faculty teaching methods and course content leading to increased student learning 
(Ramsden, 2003). The role SETs have in providing feedback in higher education 
aids in student satisfaction of course and retention and completion at the 
institution.  When student course evaluations are matched with student specific 
objectives for courses there can be positive, statistically significant associations 
between students’ learning and the course evaluation (Beleche, Fairris, & Marks, 
2012).    
 
As there are numerous studies that have been conducted on student evaluation of 
faculty instruction using quantitative, meta-analyses practices (Evans, 2013; Uttl, 
White, & Gonzalez, 2017; Zhao & Gallant, 2012), this study provides a timely and 
unique approach to using text mining and data mining techniques in examining the 
validity and reliability of student evaluations in accessing teacher effectiveness and 
student learning.  Taking into account previous literature on student evaluation we 
are able to use this practice in providing a thorough critique of assessment and gain 
insight on the extent to which the classroom environment or other related factors 
affect student evaluation of faculty instruction in the applied learning courses (Zhao 
& Gallant, 2012). 
 
Abd-Elrahman et al. (2010) considers the automatic text mining techniques as a 
good method to investigate student course evaluation in a qualitative, open-ended 
manner. These techniques aim to identify unrevealed aspects affecting student 
learning process and develop a quantitative tool for these aspects. After 
preprocessing, each evaluation is categorized with the negative and positive 
comments made regarding the course. Then text mining is utilized to create two 
major groups: one for positive words and one for negative words. This study shows 
that the written responses from the student’s courses can be analyzed through text 
mining to understand the effectiveness of teaching.   

 

 



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 63 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

Applied Learning Assessment 

Like many universities, the higher education institution in this study aims to engage 
students in the research process or in creative scholarly activity in meaningful 
ways. Following such commitment, among the Quality Enhancement Plan, the 
Experiencing Transformative Education through Applied Learning (ETEAL) program 
has been initiated to have a positive impact on student learning with an applied 
learning experience in three areas: critical thinking, thoughtful expression, and 
inquiry. The ETEAL supported pedagogy initiatives offer many great opportunities, 
resource and funds for faculty to explore innovative pedagogies in applied learning, 
and/or implement high-impact pedagogies in new disciplines, promote the 
involvement of undergraduate students in faculties’ scholarly and creativity work, 
and enrich the interdisciplinary collaboration across campus. Since fall 2013, over a 
hundred ETEAL-supported initiatives have been implemented campus wide. 
Enormous efforts have been made to promote applied learning among departments 
of traditional sciences, social sciences, humanities, arts, etc. 
 
After three years since the ETEAL initiatives started, it is pressing to review the 
assessment data to evaluate its impact on instructional practice and student 
learning. Such data includes faculty survey, student survey, and scores of student 
artifacts from ETEAL-supported initiatives, as well as from non-ETEAL supported 
Exploration Beyond the Classroom (EBC) activities in classes, projects, internships, 
study-abroad and etc. Therefore, it is critical to formulate and evaluate the 
influence of applied learning experiences to determine analytically whether the 
ETEAL-supported applied learning techniques are effective in comparison to non-
ETEAL Exploration Beyond the Classroom experiences. The statistical analysis 
outcomes will provide scientific evidence of student learning and program 
effectiveness, with assessment foci on both student learning outcome and program 
outcome. By comparing the assessment data from ETEAL and non-ETEAL 
Exploration Beyond the Classroom (EBC), we aim to determine whether there is any 
statistically significant difference among ETEAL and EBC in terms of student 
learning and program effectiveness, and discover the related factors if such a 
difference exists. Specially, applied learning courses at the university are assessed 
by student evaluations completed throughout the length of the course. At the start 
of the semester, students complete an intention reflection articulating their 
expectations, the purpose, and/or goals of the experience in terms of personal 
educational development (EBC 1). Upon completion of the course, students submit 
a final reflection synthesizing: (i) knowledge drawn from their coursework to 
address challenges involved in the experience (EBC 2), (ii) the impact of the 
experience on personal educational development (EBC 3A), and (iii) the impact of 
the experience in the profession or in the field of study (EBC 3B). A sample of 
guidance for both the initial reflection and the final reflection for ETEAL supported 
pedagogy initiatives is illustrated in Appendix A. 
 
In order to evaluate the impact on instructional practice and student learning, all 
student evaluations are manually interpreted and scored on a scale 0 to 4 based on 
a provided scoring rubric by scorers who must first go through a mandatory training 
process. A sample of the scoring rubric is illustrated in Appendix B. For the training, 



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 64 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

each scorer is required to participate in two parts of an event. The first part 
consists of a five-hour session during which the rubric is reviewed and each person 
begins scoring with a partner. The second part consists of completion of the scoring 
of student work on one’s own, this can last up to approximately 5 hours. At the end 
of the event, each scorer is asked to provide feedback regarding the process and 
rubric for continual improvement in the scoring process. It is mandatory that 
scorers attend at least one event, but are invited to attend as many as they like. 
Scorers are allowed to pick from events covering topics including student critical 
thinking skills, student-written communication skills, and student evaluation skills. 
It is noted that the human manual scoring process is very complicated and time 
consuming. 
 
It is believed that the more in-depth evaluation leads to a better understanding of 
instructional practice and student learning outcomes. Therefore, even though 
intensive human manual scoring to analyze student evaluations is important, 
automatic quantitative analysis of student evaluation can be an alternative efficient 
approach to analyze students’ text response. In this paper, both text mining and 
data mining techniques are investigated on students’ text-based course evaluation 
to identify unrevealed aspects of instructional practice and student learning and 
develop a quantification tool to formulate and evaluate the influence of applied 
learning experiences.  
 

Data Gathering and Cleaning 

All original PDF files are provided by the institution’s General Education Assessment 
Office. These PDF files cover student evaluations of applied learning experiences 
from both ETEAL and EBC courses, consisting of scanned handwritten documents 
and scanned typed documents. As a pre-processing step, the answers from the 
original scanned PDF files are transcribed into .txt files by three students and a 
faculty member, which proved to be a very time-consuming process. Many issues 
come with the case of scanned handwritten files, including sloppy handwriting and 
faded handwriting. For some files, human judgment is used to best make out the 
writing that is illegible or has become extremely faded after being scanned in as a 
PDF file. In the case of scanned typed files, a PDF file converter is used to convert 
the PDF files into a document that could easily be copied and pasted into a .txt file. 
The PDF file converter can only convert one file at a time, so it is a time-consuming 
process. A drawback of using the PDF file converter is spelling and grammatical 
errors that are caused by the converter program being used. To fix these errors, 
each file is manually checked for spelling and grammar mistakes. A few of the 
original PDF files are not used because they are written in a different language (e.g. 
in French).  
 
Our final dataset consists of 672 student evaluation .txt files. All student 
evaluations are collected from the cycle of two academic years (fall 2013-spring 
2015). Among them, part of the student evaluations are collected from 21 different 
courses for the academic year of fall 2013- spring 2014, while the rest are from 19 
different courses during the academic year of summer 2014 -spring 2015. These 



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 65 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

courses include traditional sciences, social sciences, humanities, and arts. All, but 
four, of the applied learning courses covered are ETEAL-supported courses.  
 

 

 

Figure 1: Each pie chart shows the distribution of the scores all student evaluations 
received for each category of EBC 1, EBC 2, EBC 3A, and EBC 3B. The notation 
used above shows the score received, and a count of the student evaluations that 
received that score. For example, (1, 97) represents 97 student evaluations receive 
a score of 1. 
 
As mentioned previously, student evaluations are scored on four separate criteria. 
In this study, pie charts for EBC 1, EBC 2, EBC 3A, EBC 3B are created respectively 
to better visualize the manual perceived scores, which are shown in Figure 1. It is 
clear that most student evaluations are scored with a 1 or 2. It is noted that the 
student evaluations are scored based on human manual scoring of the provided 
scoring rubric. Also, when a student evaluation is scored as 0, this can either imply 
the student evaluation was written poorly or that no student evaluation is ever 
received.  
 



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 66 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

Methodology of Text Mining Techniques 

Text mining techniques are performed on the cleaned student evaluation data, by 
using both the statistical programming language of R and NVivo. The 
characteristics, including strengths and weakness of both software will be compared 
in detail below.  

Challenges on Text Mining with R 

In our text mining investigation, we begin analyzing student evaluations in the 
statistical computing software R. In order to perform text mining analysis, 21 
required packages must first be installed in R. A directory is set up where all 
original .txt files are loaded into R to begin analysis. Next, the files are loaded from 
the directory as the source of the files making up the corpus. The function Corpus 
in R uploads all the files. To begin with, these files are named original documents so 
they can later be used for comparison. To prepare for text analysis, more pre-
processing of the documents needs to be done. First, all numbers and punctuation 
are removed from the original documents. When numbers, punctuations, and stop 
words are removed, they are replaced by a white space where the word, number, 
or symbol have originally been in the corpus. In order to remove this white space, 
we use a command in R that strips any extra remaining white space. All text 
characters in the documents are converted to lowercase characters. Next, all 
English stop words are removed. English stop words are common words found in 
the English language. There exist 174 common stop words in the English language. 
Before moving forward to stemming and stem completion, it is important to check 
all student evaluations for spelling errors. This may seem trivial, yet it is essential 
in order to yield an accurate result. Correcting a spelling error in R requires a new 
line of code for each correction. To avoid this, all evaluations are manually checked 
for spelling errors and updated.  
 
Table 1: Term Frequency Table 

Least frequent terms 
1 2 3 4 5 6 7 8 9 10 

3256 1127 682 444 334 244 195 172 138 114 
Most frequent terms 

1690 1737 1743 1755 1956 2006 2216 2467 2625 3190 
1 1 1 1 1 1 1 1 1 1 

Note: This table provides a brief summary into the frequency distribution of terms 
appearing in the student evaluations for the least frequent and most frequent terms 
by R. For example, this table is interpreted as there ae 3,256 terms that only 
appear once in the evaluations. On the other extreme end, there is one term that 
appears 3,190 times.  
Lastly, in the pre-processing phase, stemming and stem completion is done on all 
documents. Stemming is the process of reducing words to their base form. 
Sometimes a word is stemmed to a phrase that is not a base form itself and stem 
completion completes the phrase back to a base form. Stem completion uses a 
dictionary created by the original documents. For example, “argue”, “argued”, 
“argues”, and “arguing” reduce to the stem “argu”. Then, R refers to the dictionary 



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 67 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

to stem complete “argu” back to a base form. At this stage, R tends to have 
difficulties with stemming and stem completion. To list a couple examples, “many” 
is stemmed into “maniac” and “really” is stemmed into “reallife.” Outputting both 
the results after stemming and the stem completion into an Excel file allows us to 
compare with the original documents and find where mistakes are made.  
 

 

Figure 2: A bar graph of the 20 most frequent terms by R. This graph allows for a 
better visualization of the terms that are appearing most frequently throughout the 
evaluations.  
 
A document-term matrix (dtm) is obtained, as a matrix with the 672 student 
evaluations as the rows and the terms found in the student evaluations as the 
columns. Each cell in the matrix is a frequency/count. Inspecting the dtm shows the 
distribution of the terms and the percentage of sparsity found in the matrix. To 
obtain the distribution of term frequencies, the dtm must be converted into a 
regular matrix and then the sum of columns is taken. Ordering the term frequencies 
allows a list to easily be created showing the least and most frequent terms, with a 
sample shown in Table 1, for easier interpretation. At first inspection of the dtm, it 
is revealed that the dtm contains 98% sparsity. Sparsity refers to infrequent terms 
occurring in the student evaluations. For example, in Table 1, there are 3,256 
terms that only appear once in the student evaluations. R has a function to remove 



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 68 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

a selection of sparse terms. After sparse terms are removed, the dtm now contains 
37% sparsity, which is a huge improvement.  

 

Figure 3: A visual representation of the word cloud produced by R. 
 
After all the pre-processing is done and the dtm is created, we can analyze the data 
to get better visual representations. Figure 2 represents the 20 most frequent 
words found in all student evaluations. It appears that some of these words may be 
deemed insignificant for what we are interested in (e.g., will, also, and et al.). To 
better understand the significance of these terms, it is important to look at the 
context in which the terms are used. A better visualization of the most frequent 
terms is shown in the word cloud produced by R in Figure 3.  
 
Alternative Text Mining with NVivo 
  
As previously mentioned, R lacks an approach to efficiently looking at the context of 
a term and performed poorly in the pre-processing stage of stemming and stem 
completion. These fallbacks in R steer us away from the software, and introduce us 
to the qualitative analysis software called NVivo. NVivo has the ability to create a 
flowchart of a term over all the student evaluations. This allows a deeper look at 
the context of the term in question over that achieved by human manual 
interpretation, when categorizing a term as significant or insignificant. NVivo also 
has the ability to produce word clouds with a chosen number of significant terms 
faster and more efficiently than R. NVivo has the option to group together like 
terms (stemming and stem completion) by just simply clicking a button. This fixes 



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 69 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

the mistakes caused by R, after running stemming and stem completion on all 
student evaluations.  
 
Hereafter, NVivo is used as the primary software for all textual based analysis. The 
NVivo option of word count query is used to produce the word cloud shown in 
Figure 4, which includes the 100 most frequent words that appear in all student 
evaluations. In the word cloud, different words are depicted with different color and 
font size. The font size is directly related to the frequency of the 100 most frequent 
words found. NVivo also has a word count query that allows us to search for each of 
the most frequent words across all student reflections and provides a count of how 
many times these words appears in that reflection. This function was used to 
generate the document-term matrix for all data mining classification techniques 
conducted below.  
 

 

 

Figure 4: A word cloud produced quickly and efficiently by NVivo that shows the 
100 most frequent words found in all student evaluations.  
 
As mentioned above, we use the “stemmed words” option to group together the like 
terms so no one term appears more than once in the word cloud. We see very 
similar results when comparing the word clouds produced by R and NVivo. Table 2 
illustrates a count of the term and the terms that are grouped together under a 
given term to present a better idea on how NVivo performs in stemming and stem 
completion. It is important to note that the “experience” is included in the word 
cloud by R, whereas “experiments” is shown in the NVivo word cloud. Note that in 
table 2, “experience” is grouped together with the term “experiments”. NVivo is 
able to quickly produce a flow chart of the context of the term used in all 
evaluations. However, it is a large flowchart that requires time to shift through. 
Hereafter, it is assumed that all most frequent terms are used in a positive and 
significant context.  



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 70 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

  
Table 2: Word frequency provided by NVivo 
Word Count Similar Words 
Learn 3181 Learn, learned, learning, 

learns 
Works 2641 Work, worked, working, 

workings, works 
Experiments 2288 Experience, experiences, 

experiment, 
experimented, 
experimenting, 
experiments 

Helps 2000 Help, helped, helpful, 
helping, helps 

Note: NVivo software has a “stemmed words” option that groups together like 
terms when calculating word frequency. 
 
R is able to produce a document-term matrix (dtm) which is a matrix including a 
count of the number of times a term appears in each of the student evaluations. 
NVivo has a similar function under its word search query option. This option allows 
the user to input the term and NVivo produces a list of all evaluations the term is 
located and a count of the term is located in that individual evaluation. A drawback 
of this option in NVivo is that NVivo does not include the evaluations where the 
term is not found in. As a result of this, a difficulty is created when generating a 
larger matrix that includes all evaluations as rows and the most frequent terms as 
columns. A count of the most frequent term is included in each cell. Another 
drawback of this NVivo option is that it only allows the user to search for one word 
at a time. Due to these drawbacks, the matrix had to be entered manually, which 
proves to be a time-consuming process. Once this document-term matrix is created 
for 100 most frequent terms and a matrix is created in Excel to lay out how many 
times these 100 most frequent words occur in each individual student evaluation, 
the data mining techniques are used to further analyze the student evaluations 
quantitatively.  
 
Methodology of Data Mining Techniques 

In this paper, after the document-term matrix (dtm) is obtained from the text 
mining techniques, we first consider four different classifiers to access the 
classification, including Random Forest, K-nearest neighbors (KNN), and Support 
Vector Machines (SVM) with Linear Kernel and Radial Kernel.  
Suppose there are n observations: (𝑥#,𝑦#),	 (𝑥(,𝑦(),…,(𝑥*,𝑦*),	where 𝑥+ ∈ 𝑅., and 𝑦+ ∈
{0,1} representing a score of Low or High. Random forest is a statistical classifier 
developed by Breiman (2001). Random forest builds a number of decorrelated 
decision trees, and then uses the mode of the predictions from the decision trees as 
the model output. Breiman (2001) suggests that as the number of the trees in the 
forest increases, the generalization error of random forest converges almost surely 
to a limit. Thus, the weak but unbiased decision trees produce relative efficient 



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 71 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

predictions. In order to decorrelate the trees, a random sample of predictors is 
chosen from the full set of predictors at each split in a tree.   
 
Let n be the number of data observations and let d be the number of predictors to 
be selected. Suppose the number of decision trees to be built is 𝑁4, with minimum 
node size 𝑛*6.7. The algorithm for random forest for classification is as following: 

(1) Draw a bootstrap sample of size n from the training observations. 
(2) With the bootstrapped data, grow a tree by repeating the following steps: 

i. Select m variables at random from the d predictors. 
ii. Find the best variable among the m selected variables, as well as the best 

split point for classification. 
iii. Split the node into two descendent nodes with each node resulting from 

the classification. 
iv. Stop growing the tree when the minimum node size 𝑛*6.7 is reached for 

all terminal nodes. 
(3) Repeat steps (1) and (2) 𝑁4 times to obtain the a collection of trees {𝑇+}+9#

:; . 
(4) For any input vector	𝑥, let 𝐺+(𝑥) be the class prediction from the	𝑖th Random 
Forest tree. The prediction from the random forest is 𝐺(𝑥)= mode of  {𝐺+}+9#

:; . 
 
The K-nearest neighbors classifier is memory-based. Given a query point, say 𝑥>, 
assume we find K training points closest in distance to the given point 𝑥> among n 
observations, say (𝑥#

∗,𝑦#
∗),  (𝑥(

∗,𝑦(
∗), (𝑥@

∗ ,𝑦@
∗ ),  which satisfies that  

A|	𝑥#
∗ − 𝑥>|A ≤ A|𝑥(

∗ − 𝑥>|A ≤ ⋯ ≤ A|𝑥*∗ − 𝑥>|A,	 
where A|∙|A represents the Euclidean distance. Let 𝐻(𝑥>) be the class prediction for 
the query point 𝑥>. Then 𝐻(𝑥>) = mode of {	𝑦#

∗, 𝑦(
∗,… , 𝑦@

∗ }, by the majority vote of its 
K nearest training points. K can take any integer within the sample size. To 
determine the best K for our experiments, 5-fold Cross-Validation (CV) is applied to 
choose a K value in order to minimize the Cross-Validation prediction error: 
𝑚𝑖𝑛@𝐶𝑉𝑒𝑟𝑟𝑜𝑟(𝐾). 
 
The technique of Support Vector Machines is considered as a method of classifying 
the data into the newly created High/Low variable. In the binary setting, suppose 
there are n observations: (𝑥#,𝑦#),	 (𝑥(,𝑦(),…,(𝑥*,𝑦*),	where 𝑥+ ∈ 𝑅., and 𝑦+ ∈ {−1,1}. 
SVM aims to find a separable hyperplane that best separates the two classes and 
produces a lower error of classification. The optimal hyperplane is the hyperplane 
that passes the farthest from all training observations with a maximum margin 
separating hyperplane 𝑤 ∙ 𝑥 + 𝑏 = 0 in the feature space through a quadratic 
programming:  

𝑚𝑖𝑛S,T
#
(
||𝑤||( + 𝐶 ∑*+9# 𝜉+, subject to 	𝑦+(𝑤 ∙ 𝑥 + 𝑏) ≥ 1 − 𝜉+ and 𝜉+ ≥ 0, ⩝ 𝑖, 

where || ∙ || represents the 𝑙( vector norm, 𝑤 is the normal vector to the hyperplane 
and the parameter 

T
‖S‖

 determines the offset of the hyperplane from the origin. The 

constant 𝐶 > 0 is a “cost” parameter which must be carefully tune for the “counts” 
of feature points ∑ 𝜉+

*
+9#  which lie within the margin or on the wrong side of the 

hyperplane. In the case that the data is linearly separable, we select two parallel 
hyperplanes that separate the two classes of data. When selecting these two 
parallel hyperplanes, we want the distance between them maximized. 



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 72 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

Geometrically, the distance between the two parallel hyperplanes defined above is 
represented as 

(
‖S‖

. We want to maximize the distance between the two parallel 

hyperplanes which is achieved by minimizing ‖𝑤‖.  
 
To extend the method of SVM to cases in which the data is not linearly separable, 
we can consider a kernel function 𝜅(𝑥,𝑥]): 

𝜅(𝑥,𝑥]) = 𝛷(𝑥) ∙ 𝛷(𝑥]) = 𝑒𝑥𝑝	(−𝛾||𝑥 − 𝑥]||(), ∀	𝑥,𝑥] ∈ 𝑅., 
where	𝛾 is a positive constant, 𝛷 is a function to map the training examples into 
some feature space Ƒ such that  𝛷:	𝑅. ↦ Ƒ. 
 
Furthermore, 5-fold Cross-Validation is considered to evaluate the performance of 
these four different classifiers. In 5-fold Cross-Validation, the dataset is randomly 
divided into five folds with approximately equal size. Then one fold is held out and 
treated as a validation set, while the remaining four folds are treated as a training 
set to build a classification system. This procedure is repeated five times, with a 
different fold of observations treated as a validation set until all folds have been 
used as a test dataset. 

 
Ensemble Learning to Improve Classification Performance 
 
To further improve the overall performance, ensemble learning by fusing multiple 
predictive decisions to make a final decision could be a potential way to get a more 
robust decision (Polikar, 2006; Moreno-Seco et al, 2006). For example, the 
classifier ensembles with different combination techniques have been widely 
explored in recent years. These methods have been shown to potentially reduce the 
error rate in the classification tasks compared to an individual classifier in a broad 
range of applications. In the decision fusion with ensemble-based systems, it is 
important to consider the diversity of decisions to be fused, with respect to diverse 
classifiers. In our analysis, we consider fusing independent classifiers among 
Random Forest, K-nearest neighbors, and Support Vector Machines with radial 
kernel.   
 
For the 𝑖th observation 𝑥+, let 𝐺(𝑥+),  𝐻(𝑥+), 	𝐽(𝑥+)  be the class predictions from 
Random Forest, K-nearest neighbors, and Support Vector Machines respectively. 
Then the final class predictions for Ensemble Learning is given by 𝐹(𝑥+) = mode of 
{𝐺(𝑥+),  𝐻(𝑥+), 	𝐽(𝑥+)}.  
 
Results of Data Mining Techniques 

Using the document-term matrix (dtm), data mining techniques can now be applied 
to classify these student evaluations into two categories of High or Low. All data 
mining techniques are performed in R. In order to achieve that, first, a new 
response variable of High and Low is created for EBC1, based on both the 
distribution of scores shown in the pie charts in Figure 1 and the criteria of the 
applied learning scoring rubric shown in Appendix B. Then repeat this procedure for 
the rest of EBC 2, EBC 3A, and EBC 3B, creating four new response variables. With 
these factors in mind, all student evaluations that received a score of 2 or below 



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 73 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

are classified in the Low class and student evaluations receiving a score of a 3 or 4 
are classified as High. The aforementioned document-term matrix (dtm) is then 
merged with the student evaluation corresponding EBC 1, EBC 2, EBC 3A, and EBC 
3B scores with four response variables of High or Low.  
 
Random Forest, K-nearest neighbors, and Support Vector Machines (with either 
linear or radial kernel) are all considered as the classification techniques using the 
5-fold Cross-Validation to analyze the free-style text of student evaluations. Each 
classification method is run on all EBC category of High or Low that a student 
evaluation receives (EBC 1, EBC 2, EBC 3A, and EBC 3B) respectively. The overall 
accuracies are shown in Table 3. Comparing to the EBC2, EBC3A and EBC3B, the 
overall prediction accuracy for student reflection EBC 1 scores indicates the lowest 
accuracy of around 65-68%. On the other hand, the EBC 2 student evaluation 
scores have a stronger accuracy of 78-82%. EBC 3A scores hold around a 70-73% 
overall prediction accuracy, and EBC 3B scores have the highest overall prediction 
accuracy of 83-85%. A graphical visualization of the overall accuracies produced 
from each method of classification is shown in Figure 5. From Figure 5, it is 
interesting that the classification results from EBC1 illustrate outliers consistently 
for all four classification methods applied. The possible reason why the overall 
accuracies for EBC 1 is lower than EBC 2, EBC 3A and EBC 3b is that students’ 
expectations, the purpose, and/or goals of the experience in terms of personal 
educational development can be at a larger range of terms used and/or less 
associated to the terms from the document-term matrix.  
 
Table 3: Prediction rates table.  
 EBC 1 EBC 2 EBC 3A EBC 3B 
Random Forest 0.671 0.813 0.723 0.840 
K-nearest neighbors 0.674 0.809 0.728 0.845 
SVM with Linear 
Kernel 

0.657 0.784 0.701 0.838 

SVM with Radial 
Kernel 

0.668 0.807 0.707 0.845 

Ensemble Learning 0.685 0.817 0.707 0.856 
Note: 5-fold Cross-Validation is run on the four models of Random Forest, K-
nearest neighbors, and Support Vector Machines with either Linear or Radial Kernel. 
Each classification method is run four different times using each of the EBC scores 
among EBC1, EBC2, EBC2A, and EBC3B) as the response variable. The overall 
accuracies from the five folds is shown in the table. Ensemble Learning accuracies 
after the method of decision fusion is used to combine the classifier methods of 
KNN, Random Forest, and SVM with a radial kernel. Note: For KNN, a different K 
(the number of neighbors) is chosen each time after running 5-fold Cross-Validation 
to determine the best K.   
 
Decision fusion is further considered as a method aiming to improve the 
classification performance. Random Forest, K-nearest neighbors, and Support 
Vector Machines with radial kernel are used in the decision fusion approach. It is 
important in decision fusion that all methods are independent of one another and 
an odd number of methods are used so that there are no ties created. It is revealed 



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 74 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

that the majority of the misclassified observations in the data are the observations 
with a ground truth of High that are misclassified as Low. Decision fusion is once 
again run on all EBC scores. The accuracies are illustrated in Table 4. Overall 
prediction accuracies are improved slightly. It is shown that the decision fusion 
approach results in higher accuracy than any of individual classifiers for EBC1, 
EBC2, and EBC3B. For EBC3A, even though the decision fusion approach does not 
lead to the highest accuracy, the accuracy is still competitive comparing to the 
individual classifiers. These results indicate that decision fusion with ensemble is 
effective in this text mining task. 
 

 

Figure 5: Boxplots to the overall accuracies for the four classification methods of 
Random Forest, K-nearest neighbors (KNN), and Support Vector Machines with 
Linear and Radial Kernel using EBC1, EBC2, EBC3A, and EBC3B as response 
variables.  
 

Conclusion and Recommendations 

Our dataset from two academic years over fall 2013-spring 2015 is studied 
systematically to provide the preliminary analysis results. The results of our 
experiments show that text mining is a promising technique to analyze the open-
ended free-style text based student reflections quantitatively, and automatically. 
Text mining can be an effective way to analyze text responses and how a student 
evaluation will score quantitatively which reveals how well a course and/or 
instructor is performing. Analyzing these text based student evaluations into 



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 75 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

quantitative information allows one to gain additional insights to evaluate student 
performance, instructor performance, and course performance. One can also gain a 
deeper understanding of individual schools at the university, departments, and 
majors as well, and eventually evaluate the impact on the implemented 
instructional practice and the student learning outcomes.  
 
Data mining classification methods show promising overall prediction accuracies for 
all EBC scores of student evaluations. Decision fusion is a method implemented to 
further improve classification accuracies, and while it does so, no strong change is 
made in the overall prediction accuracy of student reflections that are classified or 
misclassified by the three classification methods used. While accuracies held steady 
after decision fusion, the method does allow for a deeper understanding of the data 
being analyzed. Decision fusion reveals the individual student evaluations that are 
misclassified which can reveal what departments or majors have the most incorrect 
classifications and the performance or motivation of the students in those 
departments or majors on evaluations. Providing faculty and administrators with 
this information to be able to interpret results more critically and be able to make 
rational and fair decisions in terms of teaching effectiveness (Hou,Lee, & 
Gunzenhauser, 2017).  
 
Analyzing student evaluations by terms is a significant way to analyze the applied 
learning program as a whole, as well as the effectiveness of the applied learning 
program on overall student learning. Analyzing text based student evaluations 
provide additional insights. For example, a higher EBC score is associated with 
greater student performance and/or understanding of the course. Then we 
associate that these motivated students will provide a more in-depth and 
meaningful evaluation. On the other side, this well-trained system of text mining 
and data mining can be applied to the future applied learning student evaluations. 
In this case, the new student evaluations will be pre-processed in the same way of 
text mining as described previously and fed into the data mining system. 
Consequently, the scores of EBC1, EBC2, EBC3A, and EBC3B will be produced 
automatically. This framework can be an efficient way to provide quick preliminary 
analysis on the program evaluation of instructional practice and student learning. 
 
Abd-Elrahman et al. (2010) support the claim of automatic text mining techniques 
as a good method to investigate open-ended student course evaluation. This study 
extends the original two categories of negative and positive comments made 
regarding the course for each evaluation into four categories of benchmark, 
milestone-I, milestone-II, and capstone. Data Mining techniques are incorporated in 
our study for quantitative analysis. The promising overall prediction accuracies 
demonstrate that such automatic quantitative analysis of student evaluations can 
be an effective approach to applied learning assessment. 
 
Hou, Lee and Gunzenhauswer (2017) support the claim of these evaluations as 
instruments that can support transformative decisions in improving quality of 
teaching. By valuing the contributions of students and faculty in this process could 
help in preventing erroneous decisions based on some biased student feedback. 
 



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 76 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

References 

Abbott, R.D., Wulff, D.H., Nyquist, J.D., Ropp, V.A. & Hess, C.W. (1990). 
Satisfaction with processes of collecting student opinions about instruction: 
The student perspective. Journal of Educational Psychology, 82, 201-206.   

Abd-Elrahman, A., Andreu, M., & Abbott, T. (2010). Using text data mining  
techniques for understanding free-style question answers in course 
evaluation forms. Research in Higher Education Journal, 9, 11–21. 

Beleche, T., Fairris, D., & Marks, M. (2012). Do course evaluations truly reflect 
student learning? Evidence from an objectively graded post-test. Economics 
of Education Review. 31, 709-19. 

Boring, A. (2017). Gender biases in student evaluations of teaching. Journal of 
Public Economics. 145. 27-41. 

Chen, Y., & Hoshower, L. B. (2003). Student evaluation of teaching effectiveness: 
An assessment of student perception and motivation. Assessment & 
Evaluation in Higher Education, 28(1), 71-88. 

Cohen, K. B., & Hunter, L. (2008). Getting started in text mining. PLoS 
Computational Biology, 4(1), e20. doi:10. 1371/journal.pcbi.0040020 

Cronbach, L. J. (1963). Course improvement through evaluation. Teachers College 
Record, 64, 672-683. 

Fike, D. S., Fike, R., & Zhang, S. (2015). Teacher qualities valued by students: A 
pilot validation of the teacher qualities (T-Q) instrument. Academy of 
Educational Leadership Journal, 19(3), 115-125. 

Gal, Y., & Gal, A. (2014). Knowledge bias: Is there a link between students' 
feedback and the grades they expect to get from the lecturers they have 
evaluated? A case study of Israeli colleges. Journal of the Knowledge 
Economy, 5(3), 597-615. doi:10.1007/s13132-014-0188-5 

Galbraith, C. S., Merrill, G. B., & Kline, D. M. (2012). Are student evaluations of 
teaching effectiveness valid for measuring student learning outcomes in 
business related classes? A neural network and bayesian analyses. Research 
in Higher Education, 53(3), 353-374. 
doi:http://dx.doi.org.liblink.uncw.edu/10.1007/s11162-011-9229-0  

Delgado, M., Matrín-Bautista, M.J., Sánchez, D., & Vila, M.A. (2002, September). 
Mining text data: special features and patterns. Proceedings of EPS 
Exploratory Workshop on Pattern Detection and Discovery in Data Mining, 
London.  

Douglas, P.D. & Carroll, S.R. (1987). Faculty Evaluations: Are college students 
influences by differential purposes? College Student Journal, 21(4).  

Evans, C. (2013). Making sense of assessment feedback in. higher education. 
Review of Educational Research. 83(1), 70-120. doi: 
10.3102/0034654312474350 

Hastie, T., Tibshirani, R. & Friedman, J. (2016). The Elements of Statistical 
Learning: Data Mining, Inference, and Prediction. New York: Springer.  

Heller, H.W., & Clay, R.J. (1993). Predictors of teaching effectiveness: The efficacy 
of various standards to predict the success of graduates from a teacher 
education program. ERS Spectrum, 11, 7-11. 

Hou, Y., Lee, C., & Gunzenhauser, M.G. (2017). Student evaluation of teaching as a 



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 77 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

disciplinary mechanism: A Foucauldian analysis. The Review of Higher 
Education, 40(3), 325-352. 

Marsh, H.W. (1984). Students’ evaluations of university teaching: Dimensionality, 
reliability, validity, potential biases and utility. Journal of Educational 
Psychology, 76(5), 707-754.  

Marsh, H.W. (1987). Students’ evaluation of university teaching: Research findings, 
methodological issues and directions for future research. International 
Journal of Educational Research, 11(2), 253-388.  

Moreno-Seco, F., Inesta, J. M., de León, P. J. P., and Micó, L. (2006). Comparison 
of classifier fusion methods for classification in pattern recognition tasks. In 
Structural, Syntactic, and Statistical Pattern Recognition (pp. 705-713). 
Berlin, Germany: Springer-Verlag. 

Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and 
Systems Magazine, 6(3), 21-45. 

Ramsden, P. (2003). Learning to teach in higher education (2nd ed.) London: 
Routledge Falmer.  

Scriven, M. (1967). The methodology of evaluation. In R.W. Tyler, R. M. Gagné, & 
M. Scriven (Eds.), Perspectives of curriculum evaluation. Chicago, IL: Rand 
McNally. 

Stake, R. E. (n.d.). The Coutenance of Educational Evaluation. Center for 
Instructional Research and Curriculum Evaluation, University of Illinois.  

Tan, A. (1999). Text mining: The state of the art and the challenges. In 
Proceedings, PAKDD ’99 Workshop on Knowledge Discovery from Advanced 
Databases (KDAD ’99). 

Uttl, B., White, C., & Gonzalez, D. (2017). Meta-analysis of faculty’s teaching 
effectiveness: Student evaluation of teaching ratings and student learning 
are not related. Studies in Educational Evaluation, 54, 22-42. 

Wagner, N., Rieger, M., & Voorvelt, K. (2016). Gender, ethnicity and teaching 
evaluations: Evidence from mixed teaching teams. Economics of Education 
Review, 54, 79-94.  

Zhao, J. & Gallant, D. (2012). Student evaluation of instruction in higher education: 
Exploring issues of validity and reliability. Assessment & Evaluation in Higher 
Education, 37(2), 227-235.  
 
 
 
 
 
 
 

 
  



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 78 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

Appendix A 
 
Here is a sample of guidance for both the initial reflection and the final reflection for 
ETEAL supported pedagogy initiatives: (Note: SLO represents Student Learning 
Outcome) 
 
Intention reflection prompts (at the start of the semester): 
Explains in depth the purpose for engaging in the experience and directly links it to 
personal educational development through expected educational outcomes. Your 
intention reflection will be typed in 1 page, by answering the following questions.  
(SLO1) a. Articulate your expectation from, and the reason for participation in this 
project. 
(SLO1) b. Examine and explain what you hope to gain from this experience in 
terms of personal, educational, and/or career goals. 
(SLO1) c. Explain what statistical methods, presentation and communication skills, 
and use of technology you hope to learn from this project. 
(SLO1) d. Explain the impact (on others or on the field) that you hope to make 
through this project. 
 
 
Final reflection prompts (upon completion of the course): 
(SLO2) Summarize the relevant theories, ideas and skills you were able to apply in 
this project. 
(SLO2) Demonstrate how you apply what you learnt from other courses to complete 
this project. 
(SLO3) Summarize your team work and/or leadership experience through this 
project. 
(SLO3) Over the several presentation occasions, explain how you address questions 
from people of different fields, and lessons you have learn to improve your oral 
presentation and communication skills. 
(SLO3) Summarize the significance of your work in the field from this project. 
(SLO3) Summarize a personal challenge and how you overcome it during this 
project. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Using Text Mining and Data Mining Techniques for Applied Learning Assessment 79 
 

Journal of Effective Teaching in Higher Education, vol. 1, no. 2 

Appendix B 
 
Here is a sample of the scoring rubric for human manual scoring (Revised October 
2014):