Sebuah Kajian Pustaka: IT Journal Research and Development (ITJRD) Vol.8, No.1, August 2023, E-ISSN : 2528-4053 | P-ISSN : 2528-4061 DOI : 10.25299/itjrd.2023.12412 1 Journal homepage: http://journal.uir.ac.id/index/php/ITJRD Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imbalanced Dataset Muhammad Asyroful Nur Maulana Yusuf1, Meida Cahyo Untoro2* Terknik Informatika, Institut Teknologi Sumatera, Lampung, Indonesia muhammad.119140026@student.itera.ac.id1, cahyo.untoro@if.itera.ac.id2 Article Info ABSTRACT Article history: Received Mar 08, 2023 Revised May 22, 2023 Accepted Aug 18, 2023 Classification is a model for making predictions based on existing data. Unbalanced data leads to misclassification or modeling errors where the data is irrelevant and results in poor classification modeling. The poor classification model is caused by an imbalance in the data on the classification label, so it is necessary to balance the data as a solution to overcome this problem. The methods used to deal with data imbalance are Random Undersampling and MWMOTE. The aim is to see the implementation of Random Undersampling and MWMOTE work well in dealing with unbalanced datasets and to know the performance and accuracy in modeling. The dataset used is an open source dataset from Kaggle which consists of Diabetes data, Bank Turnover data, Stroke data, and Credit Card data with various data ratios, with the aim of overcoming the problem of data imbalance. Model evaluation was carried out using confusion matrix and decision tree algorithms by looking at the values of precision, recall, f-measure, and accuracy of the original data, the Random Undersampling method, and MWMOTE. In the original data with 48.86% precision, 54.90% recall, 51.73% f-measure, and 85.30% accuracy. Random Undersampling can overcome data imbalance problems with 76.28% precision, 76.74% recall, 76.48% f-measure, and 76.21% accuracy. MWMOTE can solve data imbalance problems with 86.04% precision, 87.30% recall, 86.66% f-measure, and 86.61% accuracy. It can be concluded that the MWMOTE method is better than the Random Undersampling method because the evaluation average of the Confusion Matrix Random Undersampling method is smaller than the MWMOTE method. Keyword: Imbalanced Data Random Undersampling MWMOTE Confusion Matrix Β© This work is licensed under a Creative Commons Attribution- ShareAlike 4.0 International License. Corresponding Author: Muhammad Asyroful Nur Maulana Yusuf and Meida Cahyo Untoro Teknik Informatika Institut Teknologi Sumatera Jati Agung, Lampung Selatan, Indonesia Email: muhammad.119140026@student.itera.ac.id and cahyo.untoro@if.itera.ac.id 1. INTRODUCTION In machine learning, classification is a model for making predictions based on existing data. The classification performed in making predictions is designed with a large and balanced data set in order to maximize classification accuracy. Data processing with a large data set and with various classes often results in an imbalanced class [1]. Unequal distribution of samples in different mailto:muhammad.119140026@student.itera.ac.id IT Jou Res and Dev, Vol.8, No.1, August 2023 : 1 - 13 Muhammad Asyroful Nur Maulana Yusuf, Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imabalanced Dataset 2 classes is a problem in class imbalance data, where most of the samples have several classes and the rest belong to other classes. If the sample only has two classes, then the class that has the most of the sample is called the majority class and the others are the minority class [2]. Imbalanced data or Imbalanced dataset is a condition in which the classification label on a dataset experiences a large discrepancy between the majority class data and minority class data. Classification of data with unbalanced classes results in lower classification accuracy [3]. Classification with unbalanced data will tend to the majority class data compared to the minority class data. Data imbalance results in misclassification or modeling errors where data is irrelevant and results in poor classification modeling [4]. The imbalance of data greatly affects several classifications such as credit data [5], stroke data , [6]online [7]news data , biomedical data [8], diarrhea case data for toddlers [9], poor household classification data [10], and other data. There needs to be special handling of imbalanced datasets prior to data analysis. A bad classification model is caused by an imbalance in the data on the classification label, it is necessary to balance the data as a solution to solving this problem. The undersampling and oversampling methods are solutions for data imbalance [11]. Undersampling method is a method which reduces the majority class data and stores all minority class data [12]. One method that is often used to overcome imbalanced datasets by reducing the majority class is the random undersampling method [13]. This method is used by randomly selecting and deleting majority class data until the number of majority class data and minority data is the same. The advantage of this method is to do it randomly without certain conditions in eliminating or reducing the value of the majority class. The weakness of the undersampling method can eliminate important parts of the majority class data so that it affects classification performance [14]. Oversampling method is a method by making data replication on minority class data until the number of minority class data and majority data is the same [15]. The weakness of the oversampling method is that there is overfitting because the synthesis data is too racing on the training data so that it cannot make accurate predictions [16]. Overfitting can be overcome by using the MWMOTE method. The stages of creating synthetic data in MWMOTE consist of three stages, namely identifying the majority class and minority class, measuring the minority class, and grouping data using the clustering method . The creation of minority class data replication depends on synthetic data. Misclassification becomes a problem in creating synthetic data when unbalanced data causes certain classes [18]. Undersampling and oversampling methods in dealing with imbalanced datasets have advantages and disadvantages between the two. Therefore, this research wants to compare and analyze two imbalanced dataset methods, namely random undersampling and MWMOTE by looking at the accuracy, precision, recall, and f-measure values . The purpose of this research is to find the best method for handling imbalanced datasets between random undersampling and MWMOTE methods. 2. RESEARCH METHOD 2.1. Data Mining Data mining is often referred to as knowledge discovery in database is a process of taking , processing , and data analysis with purpose make it information for retrieval decisions in the form of classification, clustering, association, and prediction [21]. There is various stages in carrying out the process of data mining namely data selection, data cleaning, data transformation, data integration, pattern evaluation, and knowledge. Classification in the data can be measurement, categorical, signal, or image based on the class on the data label. Machine learning is one method of analysis in data mining. Machine learning programs learn patterns in some interrelated data linkage . So, simply machine learning is learning machine to achieve appropriate and desired results based on a set of existing data [22]. Analysis results from data mining processing can be in the form of description, association, clustering, prediction and classification [10]. IT Jou Res and Dev, Vol.8, No.1, August 2023 : 1 - 13 Muhammad Asyroful Nur Maulana Yusuf, Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imabalanced Dataset 3 2.2. Imbalanced Imbalanced data is data that has ratio differs from first class data to other class data. Where by general Imbalanced dataset is the ratio of the number of unbalanced majority data with minority data. Application machine learning in analyzing or Data processing is often a problem in imbalanced datasets [23]. Unbalanced data can result classification error which has an impact on the accuracy value decreases as well as allows the minority class considered as outliers [24]. Methods of classification and application agortima being one way to deal with unbalanced data . Reducing the amount of majority data using the undersampling method and using synthetic data in adding minority classes using the oversampling method can be a solution to overcoming unbalanced data. 2.3. Random Undersampling Undersampling is the simplest method to deal with unbalanced data. Random undersampling calculate the difference between majority and minority class data so that later the majority data class is selected and deleted in a manner random to the sum of the minority class equal to the number of the majority class [5]. In figure 1 the minority class marked with a dotcolored orange. Point colored blue represents the majority class data, period blue is the blocked sample in a manner random until the number of majority data classes and minority data balanced. Deleting data will reduce storage and upgrade processing time. However, by deleting in a manner random majority class data then it can cause loss important and influential information accuracy results in classification [13]. Here's a representation graphic undersampling using random undersampling method in the image below following: Figure 1. Representation Graphical Random Undersampling 2.4. MWMOTE MWMOTE or Majority Weighted Minority Oversampling Technique is a method of creating synthetic data based on the minority class on a data label [25]. MWMOTE is a repair method from SMOTE via synthetic data generation. Stages synthetic data creation on MWMOTE consists from three stages that is identify the majority class and the minority class, measure the minority class, and group the data using the clustering method [17]. Stages MWMOTE synthetic IT Jou Res and Dev, Vol.8, No.1, August 2023 : 1 - 13 Muhammad Asyroful Nur Maulana Yusuf, Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imabalanced Dataset 4 data creation starts with a selection sample from the majority class and minority class, then carry out the measurement process or weighting on the minority class in order to know the position of the minority class adjacent to the borderline, then every minority data given weight as needed as well as data interests, and the last one is to carry out the clustering process aim to _ results from synthetic data are in a group or cluster [18]. Illustration MWMOTE stages in the image below following: Figure 2. MWMOTE Making synthetic data using the MWMOTE method in dealing with imbalanced datasets consists of 4 phases, namely: Phase 1: Identifying minority class samples. 1. Separate the minority data sample (Smin) and the majority data sample (Smaj). 2. Create a set of Sminf by finding the nearest neighbor node in every xi in Smin. Search for nodes using KNN in predicting noise in the minority class and looking for the minority class in the majority class. π‘†π‘šπ‘–π‘›π‘“ = π‘†π‘šπ‘–π‘› βˆ’ {π‘₯𝑖 ∈ π‘†π‘šπ‘–π‘›: 𝑁 𝑁 (π‘₯𝑖 )} (1) 3. Determine the boundary line for the majority class which will later be useful in sharing information about minority data. π‘†π‘π‘šπ‘Žπ‘— = βˆͺπ‘₯π‘–βˆˆ π‘†π‘šπ‘–π‘›π‘“ π‘π‘šπ‘Žπ‘—(π‘₯𝑖 ) (2) 4. Form an informative minority class data sample. Phase 2: Weighting of minority class data 1. Perform weight calculations on the Simin sample (Iw), each xi that is in Simin will be given a sample weight (Sw) 𝑆𝑀(π‘₯𝑖 ) = βˆ‘ 𝐼𝑀(𝑦𝑖π‘₯𝑖 ) (3) 2. Convert each Sw(xi) into a probability sample (Sp) 𝑆𝑝 (π‘₯𝑖 ) = 𝑆𝑀(π‘₯𝑖 ) / βˆ‘ 𝑆𝑀(𝑍𝑖) (4) Phase 3: Making synthetic data on the minority class using the clustering method. Phase 4: Balanced data with the addition of synthetic data from the minority class. IT Jou Res and Dev, Vol.8, No.1, August 2023 : 1 - 13 Muhammad Asyroful Nur Maulana Yusuf, Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imabalanced Dataset 5 2.5. Classification Classification as a performance evaluator from method comparison is a process carried out to identify and compare the effectiveness of various methods used in measuring or evaluating the performance of a particular system or process. Classification refers to the division or grouping of these methods based on certain characteristics or attributes. By classifying, we can organize these methods into relevant groups, making it easier to analyze and evaluate performance[12]. As a "performance evaluator", the role of this classifier is to assess and compare these methods in terms of their ability to measure or evaluate the performance of the system or process being observed. In this case, these methods can be algorithms, statistical models, or other approaches used to collect data, make measurements, and analyze performance results. In making comparisons, the classification takes into account various factors, such as accuracy, precision, recall, computation time, efficiency, reliability, and practicality of the method[14]. Taking these factors into account, the classification will assist in determining which method is most suitable and can provide the most accurate and reliable performance evaluation results. In addition, classification as a performance evaluator can also help in understanding the strengths and weaknesses of each method, as well as gaining better insight into situations where certain methods are more effective than others. As such, classifications can provide valuable guidance in the selection and use of the optimal performance evaluation method for specific needs. 2.6. Confusion Matrix Confussion matrix is a matrix used in measurement performance from machine learning classification. There are 4 terms in the confusion matrix, including True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) [18]. The confusion matrix table can be seen in the table below following : Table 1. Confusion Matrix Predicted Positive Predicted Negatives Actual Positive TP FN Actual Negatives FP TN Result of grouping based on the number of positive data or negative data then do a comparison between mark actually with the predicted value based on the evaluation matrix. The evaluation matrix consists from accuracy, precision, recall, and f-measure [26]values. This matrix provides helpful information about model performance how well the model classifies the correct data. Accuracy measures how many correct predictions from the overall prediction of the model. Accuracy is calculated by dividing the number of correct predictions by the total number of predictions. However, accuracy is not the best evaluation matrix when the data is unbalanced, therefore it is necessary to evaluate precision, recall and f-measure to be more informative. The accuracy value is obtained by the following formula: Accuracy = TP+TN TP+TN+FP+FN (5) Precision measures how many positive predictions are true of all positive predictions. Precision is very important in classifying data because where positive prediction error own bad consequences. The precision value is obtained by the following formula: Precision = 𝑻𝑷 𝑻𝑷+𝑭𝑷 (6) IT Jou Res and Dev, Vol.8, No.1, August 2023 : 1 - 13 Muhammad Asyroful Nur Maulana Yusuf, Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imabalanced Dataset 6 Recall measures how much many positive predictions are correct from the total number of true positive classes. Recall is very important in classifying data because where positive prediction error own bad consequences. The recall value is obtained by the following formula: Recall = 𝑻𝑷 𝑻𝑷+𝑭𝑡 (7) F-Measure is the average of precision and recall, where is the range from F-Measure is 0 to 1. The f-measure value is obtained by the following formula: F-Measure =𝟐. π’‘π’“π’†π’„π’Šπ’”π’Šπ’π’ . 𝒓𝒆𝒄𝒂𝒍𝒍 π’‘π’“π’†π’„π’Šπ’”π’Šπ’π’+𝒓𝒆𝒄𝒂𝒍𝒍 (8) 2.7. Research Workflow In this study, the flow used in the final evaluation task research was random undersampling and Majority Weighted Minority Oversampling Technique in overcoming Imbalanced dataset can be seen in Figure 3: Figure 3. Modeling Design At this stage is a modeling design that will be used to test the model on several unbalanced datasets. In this modeling design is divided into several processes, which can be seen in Figure 3 The modeling design process as following. After the four datasets perform the data preprocessing stages The next step is to check Imbalanced dataset based on the results of the classification of the IT Jou Res and Dev, Vol.8, No.1, August 2023 : 1 - 13 Muhammad Asyroful Nur Maulana Yusuf, Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imabalanced Dataset 7 Bank Turnover, Diabetes, Stroke, and Credit Fraud datasets as shown in the table and plotting below following: Table 2. Description of Datasets Datasets Amount of data Attribute Minority Class Majority Class Ratio Diabetes 768 9 268 500 34% : 66% Bank Turnovers 10,000 13 2037 7,963 20% : 80% Strokes 5.110 19 249 4,861 4% : 96% Credit card 284,807 30 492 284.315 1% : 99% Next, check the distribution of the data by using scatter plotting from the 4 datasets used. The Diabetes dataset is checked for data distribution by looking at the correlation of values between the Age and BMI attributes, which can be seen in Figure 4. The Bank Turnover dataset is checked for data distribution by looking at the correlation between the Age and CreditScore attributes, which can be seen in Figure 5. The Stroke dataset is distributed by looking at the correlation between the avg_glucose_level and BMI attributes, which can be seen in Figure 6. The Credit Card dataset is distributed by looking at the correlation of values between attributes V1 and V28 which can be seen in Figure 7. Figure 4. Distribution of Diabetes Data Figure 5. Distribution of Bank Turnover Data Figure 6. Distribution of Stroke Data Figure 7. Distribution of Credit Card Data 3. RESULTS AND ANALYSIS 3.1. Random Undersampling The data used to train the Random Undersampling model uses data after preprocessing where later the amount of data from the majority class will be the same as the amount of data from the minority class. Furthermore, balancing the dataset using the Random Undersampling method where by changing the amount of data from the majority class will be equal to the amount of data IT Jou Res and Dev, Vol.8, No.1, August 2023 : 1 - 13 Muhammad Asyroful Nur Maulana Yusuf, Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imabalanced Dataset 8 from the minority class. There are 2 stages in balancing data with the Random Undersampling method, which is to separate the minority and majority class data, then resample the majority data class randomly as much as the minority class data. The results of balancing data using the Random Undersampling method can be seen in Table 3: Table 3. 1Distribution of Balanced Data Random Undersampling Method Datasets Amount of data Attribute Minority Class Majority Class Ratio Diabetes 536 9 268 268 50% : 50% Bank Turnovers 4,074 13 2037 2037 50% : 50% Strokes 498 19 249 249 50% : 50% Credit card 984 30 492 492 50% : 50% Next, check the distribution of the data by using scatter plotting from the 4 datasets used. The Diabetes dataset is checked for data distribution by looking at the correlation of values between the Age and BMI attributes, which can be seen in Figure 8. The Bank Turnover dataset is checked for data distribution by looking at the correlation between the Age and CreditScore attributes, which can be seen in Figure 9. The Stroke dataset is distributed by looking at the correlation between the avg_glucose_level and BMI attributes, which can be seen in Figure 10. The Credit Card dataset is distributed by looking at the correlation of values between attributes V1 and V28 which can be seen in Figure 11. Figure 8. Distribution of Diabetes Data Figure 9. Distribution of Bank Turnover Data Figure 10. Distribution of Stroke Data Figure 11. Distribution of Credit Card Data 3.2. MWMOTE Oversampling The data used to train the MWMOTE Oversampling modeling uses data after preprocessing where the amount of minority class data will be the same as the amount of majority class data. Furthermore, balancing the dataset using the MWMOTE method where by changing the IT Jou Res and Dev, Vol.8, No.1, August 2023 : 1 - 13 Muhammad Asyroful Nur Maulana Yusuf, Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imabalanced Dataset 9 amount of minority class data will be the same as the amount of majority class data with the growth of synthetic data. There are 3 stages in balancing data with the MWMOTE method, namely by separating minority data into the majority data class, doing weighting, and doing clustering in making synthetic data. The results of balancing data using the Random Undersampling method can be seen in Table 4. Table 4.2Balanced Data Distribution of the MWMOTE Method Datasets Amount of data Attribute Minority Class Majority Class Ratio Diabetes 1,000 9 500 500 50% : 50% Bank Turnovers 15,926 13 7,963 7,963 50% : 50% Strokes 9,722 19 4,861 4,861 50% : 50% Credit card 568,630 30 284,315 284,315 50% : 50% Next, check the distribution of the data by using scatter plotting from the 4 datasets used. The Diabetes dataset is checked for data distribution by looking at the correlation of values between the Age and BMI attributes, which can be seen in Figure 12. The Bank Turnover dataset is checked for data distribution by looking at the correlation between the Age and CreditScore attributes, which can be seen in Figure 13. The Stroke dataset is distributed by looking at the correlation between the avg_glucose_level and BMI attributes, which can be seen in Figure 14. The Credit Card dataset is distributed by looking at the correlation of values between attributes V1 and V28 which can be seen in Figure 15. Figure 12. (a)Distribution of Diabetes Data Figure 13. (b)Distribution of Bank Turnover Data Figure 14. (c)Distribution of Stroke Data Figure 15. (d)Distribution of Credit Card Data 3.3. Model Evaluation The following is a comparison table of the evaluation results of the method in balancing the dataset. IT Jou Res and Dev, Vol.8, No.1, August 2023 : 1 - 13 Muhammad Asyroful Nur Maulana Yusuf, Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imabalanced Dataset 10 Table 5. Comparison of Confusion Matrix Evaluation No Rebalancing and Random Undersampling Datasets No Rebalancing Random Undersampling Precision Recall F- Measure Accuracy Precision Recall F- Measure Accuracy Diabetes 56,25% 65,00% 60,46% 70,56% 70,37% 75,00% 72,61% 73,29% Bank Turnover 49,75% 52,39% 51,04% 80,43% 74,83% 72,03% 73,40% 72,69% Stroke 19,38% 21,34% 20,32% 90,28% 69,73% 67,94% 68,83% 68,00% Credit Fraud 70,06% 80,88% 75,08% 99,91% 90,19% 92,00% 91,08% 90,87% Avg 48,86% 54,90% 51,73% 85,30% 76,28% 76,74% 76,48% 76,21% Table 6. Comparison of Confusion Matrix Evaluation No Rebalancing and MWMOTE Datasets No Rebalancing MWMOTE Precision Recall F- Measure Accuracy Precision Recall F- Measure Accuracy Diabetes 56,25% 65,00% 60,46% 70,56% 73,20% 74,66% 73,92% 73,66% Bank Turnover 49,75% 52,39% 51,04% 80,43% 83,86% 83,61% 83,73% 83,82% Stroke 19,38% 21,34% 20,32% 90,28% 87,65% 91,33% 89,45% 89,44% Credit Fraud 70,06% 80,88% 75,08% 99,91% 99,44% 99,59% 99,52% 99,52% Avg 48,86% 54,90% 51,73% 85,30% 86,04% 87,30% 86,66% 86,61% Imbalanced data consists of majority class and minority class. Using the Random Undersampling method, the process is carried out by changing the amount of the majority data to be equal to the number of minority data randomly. Using the MWMOTE method is done by changing the amount of minority data to be equal to the amount of majority data by adding synthetic data. Making synthetic data using the MWMOTE method goes through 3 stages, namely by separating minority data into the majority data class, doing weighting, and doing clustering in making synthetic data. In Table 5 and Table 6 the evaluation results use the decision tree algorithm by testing 3 types of data, namely without rebalancing, Random Undersampling, and MWMOTE with the aim of evaluating and comparing the performance of the imbalanced dataset method. Random Undersampling can overcome the problem of unbalanced data with a precision value of 76.28%, 76.74% recall, 76.48% f-measure, and 76.21% accuracy. MWMOTE can overcome the problem of unbalanced data with a precision value of 86.04%, 87.30% recall, 86.66% f-measure, and 86.61% accuracy. In the Diabetes dataset, the results of the model evaluation test using the Random Undersampling and MWMOTE methods show that the values for precision, recall, f-measure, and accuracy are relatively the same, but different from the other 3 datasets which produce relatively different values. This shows that the ratio in a dataset has an influence on the process of balancing the dataset. The diabetes dataset has a ratio of minority data and majority data of 34%: 66% where the dataset has an almost balanced ratio of differences between minority data and majority data. The significant difference to the other 3 datasets regarding the ratio of minority data and majority data causes the balancing of datasets using the Random Undersampling and MWMOTE methods to have significantly different results. For the precision value of 4 datasets used in the research, it has increased when you have done data balancing. Such a significant change occurred in the Storke dataset with a change value of 68.27% from the original data to the MWMOTE oversampling data. The change in the optimal precision value in the Diabetes dataset is 16.95% from the original data to the MWMOTE oversampling data. The change in the optimal precision value in the Bank Turnover dataset is 34.11% from the original data to the MWMOTE oversampling data. The change in the optimal precision value in the Credit Card dataset is 29.38% from the original data to the MWMOTE oversampling data. IT Jou Res and Dev, Vol.8, No.1, August 2023 : 1 - 13 Muhammad Asyroful Nur Maulana Yusuf, Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imabalanced Dataset 11 For the recall value of the 4 datasets used in the research, it has increased when balancing the data. Such a significant change occurred in the Storke dataset with a change value of 69.99% from the original data to the MWMOTE oversampling data. The change in the optimal recall value in the Diabetes dataset is 10% from the original data to Random Undersampling data. The change in the optimal recall value in the Bank Turnover dataset is 31.22% from the original data to the MWMOTE oversampling data . Changes in the optimal recall value on the Credit Card dataset of 18.71% from the original data to the MWMOTE oversampling data. For the value of f-measure 4, the dataset used in the research has increased when balancing the data. Such a significant change occurred in the Storke dataset with a change value of 69.13% from the original data to the MWMOTE oversampling data. The change in the optimal f-measure value in the Diabetes dataset is 13.46% from the original data to the MWMOTE oversampling data. The change in the optimal f-measure value in the Bank Turnover dataset is 32.69% from the original data to the MWMOTE oversampling data. The change in the optimal f-measure value in the Credit Card dataset is 24.44% from the original data to the MWMOTE oversampling data. Accuracy value of 2 datasets used in the research has increased and the other 2 datasets have decreased when balancing the data. Such a significant change occurred in the Stroke dataset by experiencing a change in value of -22.28% from the original data to Random Undersampling data. Changes in the accuracy value on the Credit Card dataset experience a change value of - 9.04% from the original data to Random Undersampling data. The change in the optimal accuracy value for the Diabetes dataset is 3.10% from the original data to the MWMOTE oversampling data. The change in the optimal accuracy value in the Bank Turnover dataset is 3.39% from the original data to the MWMOTE oversampling data. 4. CONCLUSION Based on the results of the research and discussion in the previous chapter, the following conclusions can be drawn: 1. Random Undersampling Method and the Majority Weighted Minority Oversampling Technique (MWMOTE) can overcome imbalanced dataset problems. Where is the Random Undersampling method changing the number of majority data will be equal to the number of minority class data. There are 2 stages in balancing data with the Random Undersampling method, which is to separate the minority and majority class data, then resample the majority data class randomly as much as the minority class data. The MWMOTE method where by changing the number of minority class data will be equal to the amount of majority class data with the growth of synthetic data. There are 3 stages in balancing data with the MWMOTE method, namely by separating minority data into the majority data class, doing weighting, and doing clustering in making synthetic data. 2. The evaluation results of the confusion matrix using the decision tree algorithm by looking for precision, recall, f-measure, and accuracy values for the Random Undersampling and MWMOTE methods have increased over the stages without dataset rebalancing . The small amount of data and attributes does not really affect the process of balancing the dataset because the results of testing the two methods have significant average values of precision, recall, f-measure, and accuracy . However, differences in data ratios affect the performance results of the Random Undersampling and MWMOTE methods where the datasets with minority data ratios and majority data are not as significant as in the diabetes dataset, the performance results or model evaluation of the Random Undersampling method are relatively the same as the MWMOTE method. The results are different when the dataset with the ratio of minority data and majority data is so significant that the evaluation results or performance of the MWMOTE method are better than the Random Undersampling method. IT Jou Res and Dev, Vol.8, No.1, August 2023 : 1 - 13 Muhammad Asyroful Nur Maulana Yusuf, Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imabalanced Dataset 12 3. In the original data with 48.86% precision, 54.90% recall, 51.73% f-measure, and 85.30% accuracy. Random Undersampling can overcome data imbalance problems with 76.28% precision, 76.74% recall, 76.48% f-measure, and 76.21% accuracy. MWMOTE can solve data imbalance problems with 86.04% precision, 87.30% recall, 86.66% f-measure, and 86.61% accuracy. It can be concluded that the MWMOTE method is better than the Random Undersampling method because the evaluation average of the Confusion Matrix Random Undersampling method is smaller than the MWMOTE method. ACKNOWLEDGEMENTS The author expresses his deepest gratitude to Teknik Infromatika, Institut Teknologi Sumatera who has guided and provided input on this research. Hopefully this will be a spirit in moving forward to develop research that is much better and can be useful for others. REFERENCES [1] P. Agustia Rahayuningsih and P. Studi Sistem Informasi Akuntansi Kampus Kota Pontianak, β€œPenerapan Teknik Sampling Untuk Mengatasi Imbalance Class Pada Klasifikasi Online Shoppers Intention,” Jurnal Teknik Informatika Kaputama (JTIK), vol. 4, no. 1, 2020. [2] Hudori, β€œResampling Neural Network untuk Penanganan Class Imbalance pada Prediksi Klaim Asuransi A. PENDAHULUAN,” vol. 10, no. 1, pp. 57–64, 2020, doi: 10.36350/jbs.v10i1. [3] D. Chen, X. J. Wang, C. Zhou, and B. Wang, β€œThe Distance-Based Balancing Ensemble Method for Data With a High Imbalance Ratio,” IEEE Access, vol. 7, pp. 68940–68956, 2019, doi: 10.1109/ACCESS.2019.2917920. [4] I. Pratama, A. Y. Chandra, and P. T. Presetyaningrum, β€œSeleksi Fitur dan Penanganan Imbalanced Data menggunakan RFECV dan ADASYN,” Jurnal Eksplora Informatika, vol. 11, no. 1, pp. 38–49, Jan. 2022, doi: 10.30864/eksplora.v11i1.578. [5] A. Syukron and A. Subekti, β€œPenerapan Metode Random Over-Under Sampling dan Random Forest untuk Klasifikasi Penilaian Kredit,” JURNAL INFORMATIKA, vol. 5, no. 2, 2018. [6] S. Mutmainah, β€œPenanganan Imbalance Data Pada Klasifikasi Kemungkinan Penyakit Stroke,” 2021. [Online]. Available: https://library.uii.ac.id/osr [7] S. Keputusan Dirjen Penguatan Riset dan Pengembangan Ristek Dikti, A. Nikmatul Kasanah, U. Pujianto, T. Elektro, F. Teknik, and U. Negeri Malang, β€œTerakreditasi SINTA Peringkat 2 Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN,” masa berlaku mulai, vol. 1, no. 3, pp. 196–201, 2017. [8] H. Ali, N. A. Samat, and H. M. Ashgher, β€œAdaptive Semi-Unsupervised Weighted Oversampling with Sparsity Factor for Imbalanced Biomedical Data,” Journal of Soft Computing and Data Mining, vol. 01, no. 01, Mar. 2020, doi: 10.30880/jscdm.2020.01.01.003. [9] P. Statistika STIS, P. M. Statistika STIS Alfa Rizki, and P. Statistika STIS Rani Nooraeni, β€œPenerapan Metode Resampling dalam Mengatasi Imbalanced Data Pada Determinan Kasus Diare Pada Balita di Indonesia Andriansyah Muqiit WS Intan Putri Ananda Zahrotin Dwi Hapsari.” [10] T. Purwa, β€œPerbandingan Metode Regresi Logistik dan Random Forest untuk Klasifikasi Data Imbalanced (Studi Kasus: Klasifikasi Rumah Tangga Miskin di Kabupaten Karangasem, Bali Tahun 2017),” Jurnal Matematika, Statistika dan Komputasi, vol. 16, no. 1, p. 58, Jun. 2019, doi: 10.20956/jmsk.v16i1.6494. [11] S. Bagui and K. Li, β€œResampling imbalanced data for network intrusion detection datasets,” J Big Data, vol. 8, no. 1, Dec. 2021, doi: 10.1186/s40537-020-00390-x. IT Jou Res and Dev, Vol.8, No.1, August 2023 : 1 - 13 Muhammad Asyroful Nur Maulana Yusuf, Evaluate of Random Undersampling Method and Majority Weighted Minority Oversampling Technique in Resolve Imabalanced Dataset 13 [12] M. Bach, A. Werner, and M. Palt, β€œThe proposal of undersampling method for learning from imbalanced datasets,” in Procedia Computer Science, 2019, vol. 159, pp. 125–134. doi: 10.1016/j.procs.2019.09.167. [13] S. Mishra, β€œHandling Imbalanced Data: SMOTE vs. Random Undersampling,” International Research Journal of Engineering and Technology, 2017, [Online]. Available: www.irjet.net [14] A. Fauzi, β€œKomparasi Algoritma Dengan Pendekatan Random Undersampling Untuk Menangani Ketidakseimbangan Kelas Pada Prediksi Cacat Software,” Maret, vol. 15, no. 1, p. 27, 2019, [Online]. Available: www.nusamandiri.ac.id [15] I. Nekooeimehr and S. K. Lai-Yuen, β€œAdaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets,” Expert Syst Appl, vol. 46, pp. 405–416, Mar. 2016, doi: 10.1016/j.eswa.2015.10.031. [16] S. Barua, M. M. Islam, X. Yao, and K. Murase, β€œMWMOTE - Majority weighted minority oversampling technique for imbalanced data set learning,” IEEE Trans Knowl Data Eng, vol. 26, no. 2, pp. 405–425, Feb. 2014, doi: 10.1109/TKDE.2012.232. [17] M. C. Untoro and J. L. Buliali, β€œPenanganan imbalance class data laboratorium kesehatan dengan majority weighted minority oversampling technique,” Register: Jurnal Ilmiah Teknologi Sistem Informasi, vol. 4, no. 1, pp. 23–29, Jan. 2018, doi: 10.26594/register.v4i1.1184. [18] M. C. Untoro, M. Praseptiawan, M. Widianingsih, I. F. Ashari, A. Afriansyah, and Oktafianto, β€œEvaluation of Decision Tree, K-NN, Naive Bayes and SVM with MWMOTE on UCI Dataset,” in Journal of Physics: Conference Series, 2020, vol. 1477, no. 3. doi: 10.1088/1742-6596/1477/3/032005. [19] P. Y. Saputra, M. Z. Abdullah, and A. P. Kirana, β€œImprovisasi Teknik Oversampling MWMOTE Untuk Penanganan Data Tidak Seimbang,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 2, p. 398, Apr. 2021, doi: 10.30865/mib.v5i2.2811. [20] M. C. Untoro, β€œMWMOTE optimization for imbalanced data using complete linkage,” Jurnal Teknologi dan Sistem Komputer, vol. 9, no. 2, pp. 77–82, Apr. 2021, doi: 10.14710/jtsiskom.2021.13748. [21] M. Iqbal Ramadhan, β€œPenerapan Data Mining Untuk Analisis Data Bencana Milik Bnpb Menggunakan Algoritma K-Means Dan Linear Regression,” 2017. [22] Y. Pristyanto, β€œPenerapan Metode Ensemble Untuk Meningkatkan Kinerja Algoritme Klasifikasi Pada Imbalanced Dataset,” 2019. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/User+Knowledge [23] C. E. Puspita, O. N. Pratiwi, and E. Sutoyo, β€œPerbandingan Algoritma Klasifikasi Support Vector Machine Dan Naive Bayes Pada Imbalance Data,” JURTEKSI (Jurnal Teknologi dan Sistem Informasi), vol. 8, no. 1, pp. 11–18, Dec. 2021, doi: 10.33330/jurteksi.v8i1.1185. [24] M. Koziarski, β€œRadial-Based Undersampling for imbalanced data classification,” Pattern Recognit, vol. 102, Jun. 2020, doi: 10.1016/j.patcog.2020.107262. [25] J. Wei, H. Huang, L. Yao, Y. Hu, Q. Fan, and D. Huang, β€œNI-MWMOTE: An improving noise-immunity majority weighted minority oversampling technique for imbalanced classification problems,” Expert Syst Appl, vol. 158, Nov. 2020, doi: 10.1016/j.eswa.2020.113504. [26] T. Purwa, β€œPerbandingan Metode Regresi Logistik dan Random Forest untuk Klasifikasi Data Imbalanced (Studi Kasus: Klasifikasi Rumah Tangga Miskin di Kabupaten Karangasem, Bali Tahun 2017),” Jurnal Matematika, Statistika dan Komputasi, vol. 16, no. 1, p. 58, Jun. 2019, doi: 10.20956/jmsk.v16i1.6494.