Journal of Applied Engineering and Technological Science Vol 3(2) 2022 : 90-97 90 APPLICATION OF C5.0 ALGORITHM IN PREDICTION OF LEARNING OUTCOMES IN CALCULUS SUBJECT Fida Nafisah Giustin1*, Betha Nurina Sari2, Tesa Nur Padilah3 Program Studi Teknik Informatika, Universitas Singaperbangsa Karawang fida.nafisah17099@student.unsika.ac.id¹*, betha.nurina@staff.unsika.ac.id ², tesa.nurpadilah@staff.unsika.ac.id3 Received : 01 May 2022, Revised: 19 June 2022 , Accepted : 20 June 2022 *Corresponding Author ABSTRACT Calculus is one of the basic subject that must be studied at the computer science faculty of the informatics engineering study program. For some students, especially in the Faculty of Informatics Engineering, calculus is a subject that is considered quite difficult, even though this subject is important for them. And the resulted for some students having to repeat this subject. For this reason, predictions of calculus learning outcomes are carried out by applying the data mining process and using the C5.0 method for the prediction process based on the classification concept that will be carried out. This study applies the Cross Industry Standard Process for Data Mining (CRISP – DM) methodology with the C5.0 algorithm. The results are in the form of a decision tree (Decision tree) and the rules in it using the attributes of guardian, number of family members, status of residence, internet, activity, desire to continue study, the last education of parents (father and mother), parents' occupations, grades on assignments, UAS, and UTS. The C5.0 algorithm is able to predict the results of learning calculus. The evaluation results show that the applied C5.0 algorithm has an accuracy of 95%. Keywords : data mining, prediction, C5.0 algorithm, CRISP-DM, student achievements 1. Introduction Education is a conscious and planned effort to create a learning atmosphere and learning process so that students actively develop their potential to have religious spiritual strength, self- control, personality, intelligence, noble character and skills needed by themselves, society, nation and state (UU National Education System No. 20 of 2003). Educational institutions are places where learning activities take place with the aim of changing individual behavior for the better through interaction with the surrounding environment (Rahman, K, 2018). One type is formal educational institutions such as universities. Currently there are many universities in Indonesia. One of them is the Singaperbangsa Karawang University. Singaperbangsa Karawang University is one of the universities located in West Java. Founded on February 2, 1982 by the Pangkal Perjuangan Higher Education Foundation and became a state university on October 6, 2014. There are several study programs running at Unsika, one of which is Informatics Engineering. Informatics Engineering is one of the study programs that focuses on dealing with the problem of transformation or data processing by making optimal use of computer technology through logical processes. In it, there are basic courses that must be studied, namely calculus. For some students, especially in the Faculty of Informatics Engineering, calculus is a subject that is considered quite difficult, even though this course is very important for them. This resulted in some students having to repeat this course. At computer science faculty Unsika itself, from 2016 -2019 there were still students who got a final score of less than 5 or a quality score below D and repeated the course to improve their calculus scores. In 2016 there were 5 students who repeated the calculus course. In 2017, 6 students repeated the calculus course. In 2018 the number of computer science faculty students who repeated was 4 people and in 2019 as many as 6 students repeated calculus courses from a total of 684 computer science students. From several previous studies, according to P. Sokkhey & T. Okazaki there are 10 important factors that influence student learning outcomes (Sokkhey, P., Navy, S., Tong, L., & Okazaki, T, 2020). According to Benedict et al, prediction of student academic performance using the decision tree method achieved an accuracy of 71.661% (Benediktus, N., & Oetama, R. S, Gustin et. al… Vol 3(2) 2022 : 90-97 91 2020). According to Zhang X, by using a decision tree on the student dataset, a pattern was found where current learning outcomes depend on previous results (Zhang, X., Xue, R., Liu, B., Lu, W., & Zhang, Y, 2018). According to Ulfi et al, using Decision Tree C5.0 to perform early detection of drop out students with GPA and attendance as attributes for classifying. The results obtained are 93% accuracy with an error rate of 5% (Aesyi, U. S., Lahitani, A. R., Diwangkara, T. W., & Kurniawan, R. T., 2021). Sari predicts student learning outcomes in mathematics, there are 25 variables that are effectively used Sari B. N, 2017). So this research was conducted to predict the calculus learning outcomes of the faculty of computer science in the Informatics Engineering study program by applying the data mining process and using the C5.0 method for the prediction process based on the classification concept to be carried out. The difference between this research and previous research is that the object of research uses the latest data from the calculus course and uses the C5.0 method(Damanik, et al., 2019; Cherfi, et al., 2018; Fabriantono, et al., 2020). 2. Research Method This research applies the Cross Industry Standard Process for Data Mining (CRISP – DM) methodology. The CRISP – DM method is a standardized data mining methodology compiled by three data mining market initiators, namely Daimler Chrysler (Daimler-Benz), SPSS (ISL) and NCR (Hanin, N. A., 2022). There are 6 stages of this methodology which are shown in Figure 1. Fig. 1. Research Methodology flow 2.1 Business Understanding At this stage an understanding of the substance of the data mining activities that will be carried out is carried out as well as determining goals and preparing strategies to achieve these goals (Hanin, N. A., 2022). 2.2 Data Understranding Data understanding is the process of collecting initial data and studying the data to understand what the data can do. For this reason, student data was collected, both by distributing questionnaires and final grade data from the university. 2.3. Data Preparation Data preparation is to create a new database that will be used for the data mining process. The data obtained is still in the form of unstructed data, in which the contents of the data still contain noise. Therefore, in the data preparation process, data cleaning is carried out including eliminating data duplication, and correcting errors in data (Fahmi, R. N, 2021). 2.4 Modelling Modeling is the stage of selecting and applying various modeling techniques and some of the parameters will be adjusted to get the optimal value (Renaldi, D, 2020). The data modeling process uses the C5.0 method. 2.5 Evaluation Evaluation model is the stage of determining whether the model built is in accordance with the objectives set in the initial phase (Business understanding). At this stage, the evaluation of the model is done using a confusion matrix. Confusion matrix is a method that is usually used to Gustin et. al… Vol 3(2) 2022 : 90-97 92 perform accuracy calculations on data mining concepts or Decision Support Systems (Han, J., Kamber, M., & Mining, D, 2006). 2.6 Deployment Make a report about the knowledge gained from or pattern recognition in the data mining process which is presented in the form of graphs or descriptions that are easy to understand (Renaldi D, 2020). 3. Result and Discussion The implementation of the C5.0 model is carried out on the data of computer science students from the informatics engineering study program who have taken calculus subject. 3.1 Business Understanding At this stage, search for material by reading journals related to decision trees with the C5.0 method, journals related to educational data mining, and processing questionnaire data. 3.2 Data Understanding At the data understanding stage, questionnaires were distributed and final grade data were collected from lecturers who were in charge of the calculus course. 3.3 Data Preparation Data preparation is the stage of cleaning raw data from noise, cleaning duplicate data, correcting data errors, and data transformation. In this process, the obtained questionnaire data is cleaned and the selected attributes are taken during the validity and reliability testing process. The initial dataset is shown in table 1. Table 1 – Initial Dataset Guardian Family size Pstatus Internet Desire to continue studies Father More than 3 Living together Yes Yes Father More than 3 Living together Yes Yes Mother Less than 3 Living together No No Father More than 3 Living together Yes No ... ... ... ... ... Mother More than 3 Living together No No Activity Medu Fedu Mjob Fjob Yes Associate Degree / Bachelor Associate Degree / Bachelor Civil servant / armed forces Entrepreneur No High school equal Associate Degree / Bachelor Civil servant / armed forces Civil servant / armed forces No High school equal High school equal Housewife Other No Associate Degree / Bachelor Associate Degree / Bachelor Civil servant / armed forces Other .... .... .... .... .... No High school equal Associate Degree / Bachelor Housewife Civil servant / armed forces Assigment Midterm exam Exam NA Description 53.3 31.0 5.0 27.9 Fail 67.3 28.0 9.0 28.5 Fail 64.7 22.0 12.0 29.3 Fail 59.3 48.0 5.0 31.9 Fail .... .... .... .... .... 77.0 28.0 13.0 33.8 Fail It can be seen in table 1, there are several attributes that have values with data types that do not match what is needed, for this reason, changes are made to data types and deletion of attributes Gustin et. al… Vol 3(2) 2022 : 90-97 93 that are not used in the dataset so as to produce a new dataset that is ready to be used in the modeling process. The latest dataset can be seen in table 2. Table 2 – Data After Processing Guardian Family size Pstatus Internet Desire to continue studies Father GT3 T Yes Yes Father GT3 T Yes Yes Mother LE3 T No No Father GT3 T Yes No ... .... .... .... .... Mother GT3 T Yes No Activity Medu Fedu Mjob Fjob Yes 5 5 Civil servant / armed forces Entrepreneur No 4 5 Civil servant / armed forces Civil servant / armed forces No 4 4 Housewife Other No 5 5 Civil servant / armed forces Other .... .... .... .... .... No 4 5 Housewife Civil servant / armed forces Assigment Midterm exam Exam 53.3 31.0 5.0 67.3 28.0 9.0 64.7 22.0 12.0 59.3 48.0 5.0 .... .... .... 77.0 28.0 13.0 Table 2 is the final stage of the data preparation process. It can be seen, there are several attributes, namely NA (final grade) and description (pass / repeat) which are deleted and the data type of data on the attributes of Medu (mother's last education) and Fedu (father's last education) is changed from data of type factor to number. After the dataset is created, the next step is to divide the data into training data and testing data. Splitting dataset is done with the input shown in Figure 2. Fig. 2. Input Splitting Dataset After inputting into Rstudio, it will produce the output shown in Figure 3. Fig. 3. Output Splitting Dataset Gustin et. al… Vol 3(2) 2022 : 90-97 94 In Figure 3, the distribution of data is carried out with a ratio of 70: 30 where the train data has 144 observations with 15 attributes and the testing data has 60 observations. 3.4 Modelling This stage is the process of applying certain methods to the prepared dataset. the method used is the classification of training data using the C5.0 algorithm, data processing is carried out with the help of the Rstudio application where the initial stage is to classify the dataset into pass and repeat classes. Input from modeling with C5.0 can be seen in Figure 4. Fig. 4. Modelling Dataset From Figure 4 it can be seen that all attributes are used and the dataset used is training data. Modeling was carried out 3 times using training data with different amounts according to the number of datasets that had been built in the data preparation process. The output generated from the modeling process is shown in Figure 5. Fig. 5. Output Modelling Dataset In Figure 5, 144 observations in the training data are applied to build the C5.0 model which produces 129 observations in the graduating class and 13 observations in the repeat class. At this stage, a prediction process is also carried out using the testing data that has been shared made in the data preparation process. Prediction input can be seen in Figure 6. Gustin et. al… Vol 3(2) 2022 : 90-97 95 Fig. 6. Prediciton input Seen in Figure 6, the results of the modeling that have been carried out on the training data are applied to the input predictions by using data testing with the output in the form of a crosstable shown in Figure 7. Fig. 7. Output of predictions on testing data In Figure 7 on the testing data, as many as 60 observations on the testing data are applied in predicting the results of learning calculus. A total of 54 observations were applied to the prediction of passing, 52 observations worth passing were successfully predicted and 2 failed to be predicted. A total of 5 observations worth repeating were correctly predicted and 1 observation failed to be predicted from a total of 6 observations. 3.5 Evaluation Evaluation is the stage of evaluating the C5.0 model that has been built. The evaluation is carried out using a confusion matrix whose input can be seen in Figure 8. Fig. 8. Input Confusion Matrix On Rstudio The results of the confusion matrix calculation in Figure 8 are presented in tabular form which can be seen in the table 3. Table 3 – Confusion Matrix No Confusion matrix 70 : 30 1. Accuracy 95% 2. Precision (positive predict value) 98.11% 3. Sensitivity 96.3% 4. Spesificity 83.3% 3.6 Deployment At this stage, the knowledge or information that has been obtained is presented in a special form, one of which is a simple report that describes the final results of the entire data mining process that has been carried out so that it can be used by users. The presentation of knowledge can be seen in Figure 9. Gustin et. al… Vol 3(2) 2022 : 90-97 96 Fig. 9. Decision tree Figure 9 is a decision tree of the model that has been built. Next, build a rule model to understand the decision tree that has been built. The input rule model can be seen in Figure 10. Fig. 10. Rule model C5.0 In Figure 10, there are 3 rules, namely when the Exam (UAS) value is > 57 then it is included in the pass class, when the assignment is > 81 then it is included in the graduating class, and when the assignment is <81 and UAS <57 then it is included in the repeat class (fail). 4. Conclusion From the research that has been done, it can be concluded that the C5.0 algorithm can be used to predict calculus learning outcomes. The prediction process is carried out using a classification method with the C5.0 algorithm with the attributes of guardians, number of family members, residence status, internet, activity, desire to continue studies, parents' last education (father and mother), parents' occupations, assignment scores, UAS, and UTS. The final result of the C5.0 classification process forms a decision tree with 3 rules in it. The performance of the C5.0 algorithm gets an accuracy of 95%. References Aesyi, U. S., Lahitani, A. R., Diwangkara, T. W., & Kurniawan, R. T. (2021). Deteksi Dini Mahasiswa Drop Out Menggunakan C5. 0. JISKA (Jurnal Informatika Sunan Kalijaga), 6(2), 113-119. Gustin et. al… Vol 3(2) 2022 : 90-97 97 Benediktus, N., & Oetama, R. S. (2020). The decision tree c5. 0 classification algorithm for predicting student academic performance. Ultimatics: Jurnal Teknik Informatika, 12(1), 14-19. Cherfi, A., Nouira, K., & Ferchichi, A. (2018). Very fast C4. 5 decision tree algorithm. Applied Artificial Intelligence, 32(2), 119-137. Damanik, I. S., Windarto, A. P., Wanto, A., Andani, S. R., & Saputra, W. (2019, August). Decision tree optimization in C4. 5 algorithm using genetic algorithm. In Journal of Physics: Conference Series (Vol. 1255, No. 1, p. 012012). IOP Publishing. Febriantono, M. A., Pramono, S. H., Rahmadwati, R., & Naghdy, G. (2020). Classification of multiclass imbalanced data using cost-sensitive decision tree C5. 0. IAES International Journal of Artificial Intelligence, 9(1), 65. Fahmi, R. N. (2021). Implementasi Metode K-Means Clustering dalam Analisis Persebaran UMKM di Jawa Barat. JOINS (Journal of Information System), 6(2), 211-220. Han, J., Kamber, M., & Mining, D. (2006). Concepts and techniques. Morgan Kaufmann, 340, 94104-3205. Hanin, N. A. (2022). Analisis Layanan Informasi Berbasis Chatbot Menggunakan Framework Rasa Open Source Pada Objek Wisata Candi Prambanan (Doctoral dissertation, Institut Teknologi Telkom Purwokerto). Rahman, K. (2018). Perkembangan Lembaga Pendidikan Islam di Indonesia. Jurnal Tarbiyatuna: Kajian Pendidikan Islam, 2(1), 1-14. Renaldi, D. (2020). Penerapan Association Rule Data Mining Untuk Rekomendasi Produk Kosmetik Pada Pt. Fabiando Sejahtera Menggunakan Algoritma Apriori. Algor, 2(1), 1-11. Sari, B. N. (2017). Prediksi Performa Akademik Siswa Pada Pelajaran Matematika Menggunakan Bayesian Networks dan Algoritma Klasifikasi Machine Learning. KNPMP II, Universitas Muhammadiyah Surakarta. ISSN, 2502-6526. Sokkhey, P., Navy, S., Tong, L., & Okazaki, T. (2020). Multi-models of educational data mining for predicting student performance in mathematics: a case study on high schools in Cambodia. IEIE Transactions on Smart Processing and Computing, 9(3), 217-229. Zhang, X., Xue, R., Liu, B., Lu, W., & Zhang, Y. (2018, July). Grade prediction of student academic performance with multiple classification models. In 2018 14th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC- FSKD) (pp. 1086-1090). IEEE.