Journal of Applied Engineering and Technological Science Vol 4(1) 2022 : 375-389 375 ANALYSIS OF TWITTER SENTIMENT TOWARDS MADRASAHS USING CLASSIFICATION METHODS Supriadi Panggabean1* , Windu Gata2, Tri Agus Setiawan3 Computer Science, Nusa Mandiri University Jakarta, Indonesia12 STIKOM Cipta Karya Informatika, Indonesia3 14002471@nusamandiri.ac.id Received : 07 October 2022, Revised: 05 December 2022, Accepted : 05 December 2022 *Corresponding Author ABSTRACT Several incidents of sexual violence, the emergence of radical Islamic issues, terrorism, intolerance of changes in the character of students and so on have recently become a highlight for madrasahs. To find out how the sentiment of social media users towards madrasahs, research on twitter sentiment towards madrasahs was conducted using text mining techniques. The methods used are Naïve Bayes (NB), Decision Tree (DT) and K – Nearest Neighbor (K-NN) which aim to classify public sentiment towards Madrasahs on Twitter. The dataset used is a tweet in Indonesian with the keyword "Madrasah" as many as 3288 tweets. The techniques used to build classification and sentiment analysis are text mining, transformation, tokenize, stemming and classification, etc. Gataframework tools, execute Python script and RapidMiner are also used to help create sentiment analysis in measuring classification values. The results obtained by the optimization using Particle Swam Optimization (PSO) using the Naïve Bayes algorithm and the accuracy value obtained was 80.80%, with a precision value of 83.03%, a recall value of 78.68%, and an AUC of 0.739. Keywords : Data Mining, Sentiment Analysis, Classification 1. Introduction In today's digital era, the influence and use of the internet has become a necessity, especially in Indonesian, internet users in Indonesia in early 2021 reached 202.6 million people. This number increased by 15.5 percent or 27 million people when compared to January 2020. The total population of Indonesia at this time is 274.9 million people. This means that internet penetration in Indonesia in early 2021 reached 73.7 percent. This is reported in a recent report released by content management service HootSuite, as well as social media marketing agency We Are Social in a report titled" Digital 2021". Internet activities that are very popular with Indonesian internet users are social media. Currently, there are 170 million Indonesians who are active users of social media. On average, they spend 3 hours and 14 minutes on the network platform social (Riyanto, 2021). Social media that are often used in Indonesia include Instagram, Facebook, and Twitter. Although twitter is not as big as Facebook and Instagram at this time, it is sourced from data that the researcher read from tekno.kompas.com published on April 14, 2021 at 20.42 WIB reported that the development of Twitter at this time is getting better. In the first quarter of 2020, there was a surge in its daily active users from 134 million in the first quarter of 2019 to 166 million users or face an increase of 24 percent. In the second quarter of 2020, this figure increased again to 186 million users. The number of active users every day exceeded the forecasts of analysts, who initially estimated that they were only trying to reach 176 million users (Pratomo, 2021). In Indonesia, Twitter users have unique characteristics compared to other countries. Indonesian users, use Twitter as a medium to express comments. Not only that, Twitter users also tend to notify the events that are intertwined around them. In a very short span of time an opinion or expression of a person will be so easy to see by many parties. Starting from that argument, there should be other arguments or opinions on the issue. Many researchers also use social media as a reference for data taken for an important source of information about opinions, or community responses, and measure the level of popularity and become a benchmark for the services of these agencies, institutions, or companies. At this time the process of taking data through social media twitter where this platform is believed to be a platform whose opinions can and has value to be processed in several algorithms, it's just that to measure a comment sentiment on one of the social media is difficult. Panggabean et al… Vol 4(1) 2022 : 375-389 376 Sentiment analysis can be interpreted as the process of extracting, processing, and understanding data automatically in the form of unstructured text to retrieve sentiment information contained in opinions or opinion sentences (Brahimi et al., 2019). The use of sentiment analysis to evaluate the trend of an opinion against negative opinions and positive opinions on a topic (Rozi et al., 2012). Sentiment analysis is a computational-based detection and learning of opinions or views (sentiments), emotions, and subjectivity in the text. As a special text mining application, sentiment analysis is related to the automatic extraction of positive or negative opinions from the text (He et al., 2015). Text mining is one of the techniques that can be used to classify documents, where text mining is a variation of data mining that seeks to find interesting patterns from a large set of text data. One of the classification methods that can be used in doing text mining is the Naive bayes method. Naïve Bayes is a classification using probability and statistical methods (Suryanto et al., 2019) (Suryanto et al., 2019). The advantage of using Naïve Bayes is that this method only requires a small amount of training data to determine the estimated parameters required in the classification process. Naïve Bayes often works much better in most complex real-world situations than expected (S.A Pattekari, 2012). Naïve Bayes classifier’s research intends to carry out the process of classifying the results of netizens' comments on the application of technology that has gone through a process of sentiment analysis. Another method used in this research is decision tree. Decision Tree is a very popular and practical approach in machine learning to solve classification problems(G. Wahyuningtyas, 2014). Apart from the fact that the construction is relatively fast, the results of the built model are easy to understand (Y. Sunoto, 2014). Next is the K-Nearest Neighbor (KNN) method which is often also used to analyze sentiment. The KNN method is the process of grouping data into predefined classes based on the closest distance/degree of similarity of that data to an existing dataset/training data(Deng & Yu, 2013). Several incidents of sexual violence that occurred in the madrasa environment as reported in the media, the emergence of radical Islamic issues which he said were the fruit of thoughts from the madrasa environment, terrorism which was also said to come from misinterpreting knowledge from madrasahs, intolerance to different religions, changes in the character of madrasah students and so on will cause negative thoughts towards madrasahs. Until now, there has not been much research on sentiment towards madrasas. Based on the background that has been described, in this study raised the title of the thesis entitled " Analysis of Twitter Sentiment Towards Madrasahs Using Classification Methods". The method used to process data from twitter opinions, the author tries to use five methods as a comparison of which one is more accurate and can be processed data. These methods are Naïve Bayes (NB), Decision Tree (DT) and K – Nearest Neighbor (K-NN) using the RapidMiner application. Problem Identification: Based on the description of the background of the problem above, in this study identifying problems that can be used as the object of research is how sentiment analysis of Twitter data regarding opinions on madrasas using methods are Naïve Bayes (NB), Decision Tree (DT) and K – Nearest Neighbor (K-NN). The purpose of writing this thesis is to get the best classifier in determining the classification of sentiment analysis on social media twitter Indonesian texts about Madrasah. Scope of Research : In order for the discussion in this study to be more directed, the writing provides a limitation of the problem, namely: The sentiment category used includes positive sentiment and negative sentiment, the dataset used is data from social media twitter with Indonesian text that has a narrative related to Madrasah, The algorithm used for sentiment analysis is Naïve Bayes (NB), Decision Tree (DT) and K – Nearest Neighbor (K-NN) with a K-fold testing model Cross validation and compared results are only the results of the Accuracy pattern and AUC performance on the ROC curve to measure the model, to improve the performance of the classification method can be done using the Particle Swarm Optimization (PSO) feature selection and the data mining method used is the Cross Industry Standard Process for Data Mining (CRISP- DM). 2. Literature Review Panggabean et al… Vol 4(1) 2022 : 375-389 377 Data Mining is a term used to describe the discovery of knowledge in a database. Data mining is a process that uses statistical techniques, mathematics, artificial intelligence, and machine learning to extract and identify useful information and related knowledge from various large databases(Turban, 2005). Data classification is a process that finds the same properties in a set of objects in a database and classifies them into different classes according to the established Classification model. Text Mining is mining carried out by a computer to obtain something new, something previously unknown or rediscover implicitly implied information, derived from information extracted automatically from different sources of text data Text mining is a technique used to deal with classification, clustering, information extraction and information retrieval problems (Xiaojun, 2011). In data mining to measure or there are several ways to measure the performance of the resulting model, one of which is using a confusion matrix (accuracy). Confusion matrix is a method used to perform accuracy calculations on the concept of data mining. Precision or confidence is the proportion of positive predicted cases that are also positive in the actual data. Recall or sensitivity is the proportion of actual positive cases that are correctly predicted to be positive. Table 1 - Model Confusion Matrix Correct classification Classified as + - + True positive False negative - False positive True negative Source : (Ibrahim, 2017) Sentiment analysis is extracting people's opinions, sentiments, evaluations, and emotions about a particular topic written using natural language processing techniques. A number of other major works mention sentiment analysis focusing on specific applications that classify positive, negative and neutral opinions(Alita et al., 2019). Sentiment analysis or also known as mining opinion is an analysis that aims to see the opinion of the community or group regarding certain entities (Safitri et al., 2021). The preprocessing stage is needed to clean the data from unnecessary text, where the unstructured text data will be converted into structured or semi-structured text data. The stages of preprocessing to process data are case folding, convert emoticons, cleansing, tokenizing, stop word removal and stemming (Aditia Rakhmat Sentiaji et al., 2014) Social media is a new set of communication and collaboration tools that enable many types of interactions that were previously unavailable to ordinary people. The most important thing about this technology is the shift in the way people know, read and share news, and search for information and content. There are hundreds of social media channels operating around the world today, with the top three on Facebook, LinkedIn, and Twitter (Dailey, 2009). Social media has several special characteristics including: Reach, Accessibility, Usability, Actuality and Permanently (Purnama, 2011). Twitter is the most popular microblogging in Indonesia. This microblogging allows users to send and read messages called tweets, in the form of a maximum of 140 characters of text displayed on the user's profile page(Badri, 2011). The Naive Bayes approach is a classification method that refers to Bayes' theorem. Bayes' theorem is used to calculate the probability of data uncertainty (Peter Norvig, 2010). The Naïve Bayes Classifier approach process assumes that the presence or absence of a feature in a class is not related to the presence or absence of other features in the same class (Setiawan et al., 2021). The equation of Bayes' theorem is 𝑃(𝐻|𝑋) = 𝑃(𝐻|𝑋).𝑃(𝐻) 𝑃(𝑋) (Muktamar et al., 2015). Panggabean et al… Vol 4(1) 2022 : 375-389 378 The Decision Tree is a tree-like flowchart structure, where each internal node represents an attribute test, each branch represents the test result, and the leaf node represents a class or class distribution(Kasih, 2019). The K-Nearest Neighbor (KNN) method is the simplest of all other classification methods for solving classification problems. The technique used in this K-NN is to classify the data using objects with adjacent closest values. The results obtained from this process are higher or best when the weighting of the similarity of Cosine Similarity is used in the calculation of each tribe. Text classification with the K-NN method gives a better value when the expression Cosine Similarity is used to weigh each word in the text document being processed before calculating the value of Cosine Similarity, after the word weighting is completed the steps of the word weighting process are carried out, namely tf, df, idf, tfidf, and use the Cosine Similarity formula to perform similarities between documents (Nurjanah et al., 2017). Particle Swarm Optimization (PSO) is often used in research, because PSO has similar properties to genetic algorithms (GA). The advantage of PSO is that it is easy to implement and there are several parameters to adjust. The PSO system is initiated by a random solution population and then finds the optimum point by updating each generation result. The approach used is more systematically mathematical to find solutions. Particle Swarm Optimization (PSO) was formulated by Edward and Kennedy in 1995. The thought process behind this algorithm is inspired by the social behavior of animals, such as birds in groups or groups of fish (Evanko, 2010). Study Review Some of the existing studies related to this study are as follows: Table 2 - Related Research No Title Author Results Description 1 Sentiment Analysis Of Teacher's Room App On Twitter Using Classification Algorithm Angelina Puput Giovani, Ardiansyah Ardiansyah, Tuti Haryanti, Laela Kurniawati, Windu Gata This study compares the NB, SVM, K-NN methods without using feature selection with the NB, SVM, K-NN methods which use feature selection and compares the Area Under Curve (AUC) values of these methods to find out the most optimal algorithm. The test results found that the best optimization application in this model was an SVM-based PSO algorithm with an accuracy value of 78.55% and an AUC of 0.853. This research managed to get the effective and best algorithm in classifying positive comments and negative comments related to the Ruang Guru application. (Giovani et al., 2020) Jurnal Teknoinfo, Vol. 14, No. 2, 2020, 116- 124, ISSN: 2615-224X DOI:10.33365 / jti.v14i2.679 2 Text Mining Accuracy Using K-Nearest Neighbor Algorithm on SMS News Content Data Windu Gata, Purnomo The results of the research conducted obtained results on the accuracy of the ya prediction selection of 772 correct and not in accordance with the number of 32, so that the precision was 96.02%. Meanwhile, predictions do not have a result of 0 errors and 14 correct in the prediction of NO. So that the accuracy results obtained are 96.15%.(Gata, 2017) www.neliti.co m Journal Format Volume 6 Number 1 of 2017: ISSN : 2089 -5615 3 Twitter Sentiment Analysis Of Post Natural Disasters Using Comparative Classification Algorithm Support Vector Ainun Zumarniansyah , Rangga Pebrianto, Normah, Windu Gata In calculating the natural disaster sentiment analysis using a comparison of the Support Vector Machine and the Naive Bayes algorithm, the difference in accuracy is 3.07% where the support vector machine results are greater than the Naive Bayes. (Zumarniansyah et al., 2020) Jurnal Pilar Nusa Mandiri Vol 16 No 2 (2020): Publishing Period for September 2020. http://ejournal.nusamandiri.ac.id/index.php/pilar/issue/view/55 http://ejournal.nusamandiri.ac.id/index.php/pilar/issue/view/55 http://ejournal.nusamandiri.ac.id/index.php/pilar/issue/view/55 http://ejournal.nusamandiri.ac.id/index.php/pilar/issue/view/55 http://ejournal.nusamandiri.ac.id/index.php/pilar/issue/view/55 http://ejournal.nusamandiri.ac.id/index.php/pilar/issue/view/55 Panggabean et al… Vol 4(1) 2022 : 375-389 379 No Title Author Results Description Machine And Naïve Bayes https://doi.org / 10.33480/pilar . v16i2.1423 4 Sentiment Analysis of Covid-19 Information using Support Vector Machine and Naïve Bayes Ratino, Noor Hafidz, Sita Anggraeni, Windu Gata There are several classification algorithms used, namely Naïve Bayes with an accuracy of 78.02% and an AUC of 0.714, while the Support Vector Machine produces an accuracy of 80.23% and an AUC of 0.904. It has an accuracy difference of 2.21%. After optimization with the Particle Swarm Optimization operator, the Naïve Bayes (PSO) algorithm produces an accuracy of 79.07% and an AUC of 0.729, while the Support Vector Machine (PSO) algorithm produces an accuracy of 81.16% and an AUC of 0.903. It has an accuracy difference of 2.09%. Algorithm test results, PSO-based Support Vector Machine or not, can always result in higher accuracy.(Ratino et al., 2020) JUPITER Journal (Journal of Computer Science and Technology Research) Vol 12 No 2 (2020): JUPITER October 2020 5 Sentiment Analysis of the House of Representatives with Particle Swarm Optimization- Based Classification Algorithm Anas Faisal, Yuris Alkhalifi, Achmad Rifai, Windu Gata The study was conducted using two algorithms, namely the Support Vector Machine (SVM) Algorithm and Naive Bayes (NB). The two algorithms are each optimized using Particle Swarm Optimization (PSO). The results of the SVM and NB k-fold cross validation tests obtained accuracy values of 71.04% and 70.69% with Area Under the Curve (AUC) values of 0.817 and 0.661. While the results of the k-flod cross validation test using PSO, for SVM and NB, they received accuracy values of 75.03% and 73.49% respectively with AUC values of 0.808 and 0.719. The use of PSO is able to increase the accuracy value of the SVM algorithm by 3.99% and 2.8% in the NB algorithm. The result of testing the two algorithms the highest accuracy value was SVM with a PSO of 75.03%.(Faisal et al., 2020) Jurnal JOINTECS (Journal of Information Technology and Computer Science) Vol 5, No 2 (2020) DOI: https:// doi.org/ 10.31328/ jointecs.v5i2.1 362 6 Sentiment Analysis of National Exam Removal on Twitter Using Support Vector Machine and Naïve Bayes- based Particle Swarm Optimization Yuris Alkhalifi, Windu Gata, Arfhan Prasetya, Imam Budiawan The test was carried out using k-Fold Cross Validation to obtain accuracy values, confusion matrix tables and area under curve. The test results obtained an accuracy value of 92.92% and an AUC of 0.977 for SVM without PSO. Then the accuracy value is 94.81% and the AUC is 0.974 for SVM with PSO. The accuracy value is 85.93% and the AUC is 0.645 for NB without PSO. As well as an accuracy value of 86.92% and an AUC of 0.715 for NB with PSO. In this study, the SVM method with PSO was best for classifying positive and negative sentiments related to the elimination of UN.(Alkhalifi et al., 2020) CoreIT Journal Vol 6, No 2 December 2020 ISSN 2460-738X (Print) ISSN 2599-3321 (Online) 7 Internet Sentiment Analysis on AMIK BSI Tegal Social Media Ahmad Fauzi, Amin Nur Rais Muhammad Faittullah The NAIVE BAYES algorithm and its methods will be tested with two inputs using positive (100 commentary comments) and negative (100 text comments), the accuracy obtained by the NAIVE BAYES algorithm is Jurnal SEMNATI Vol 1 (2018): SEMNATI 2018 https://doi.org/%2010.33480/pilar.%20v16i2.1423 https://doi.org/%2010.33480/pilar.%20v16i2.1423 https://doi.org/%2010.33480/pilar.%20v16i2.1423 https://doi.org/%2010.33480/pilar.%20v16i2.1423 http://prosiding.uika-bogor.ac.id/index.php/semnati/issue/view/2 http://prosiding.uika-bogor.ac.id/index.php/semnati/issue/view/2 http://prosiding.uika-bogor.ac.id/index.php/semnati/issue/view/2 Panggabean et al… Vol 4(1) 2022 : 375-389 380 No Title Author Results Description Using Naive Bayes Algorithm Akbar, Windu Gata 76.50%+/-7.76%(micro:76.50). The results showed that NAIVE BAYES (NB) got the best and accurate results.(Fauzi et al., 2018) 8 Sentiment Analysis of DKI Jakarta Governor Candidate 2017 on Twitter Ghulam Asrofi Buntoro The data used is tweets in Indonesian with the keywords AHY, Ahok, Anies, with a total dataset of 300 tweets. The result of this study is an analysis of sentiment towards the 2017 DKI Jakarta gubernatorial candidate. The highest accuracy was obtained when using the Naà ̄ve Bayes Classifier (NBC) classification method, with an average accuracy value of 95%, a precision value of 95%, a recall value of 95% a TP rate value of 96.8% and a TN rate value of 84.6%. (Buntoro, 2017) Jurnal INTEGER: Journal of Information Technology Vol 2, No 1 (2017). 9 Sentiment Analysis of Online Learning on Twitter during the COVID-19 Pandemic Using the Naïve Bayes Method Samsir, Ambiyar, Unung Verawardina, Firman Edi, Ronal Watriantho The analysis was conducted on Twitter by mining document-based text interpreted using the Naïve Bayes algorithm. The results showed that online learning had a positive sentiment of 30 percent, a negative sentiment of 69 percent, and a neutral of 1 percent during the period. Due to public dissatisfaction about online learning, many negative sentiments were created. Some tweets show disappointment with the words 'stress' and 'lazy' in conversations that become high-frequency words. (Samsir, Ambiyar, Unung Verawardina, Firman Edi, 2021) JOURNAL OF INFORMATI CS MEDIA BUDIDARM A Vol 5, No 1 (2021). DOI: http://dx. doi.org/10.308 65 /mib.v5i1.258 0 10 Twitter Sentiment Analysis Of Post- Covid-19 Online Lectures Using Support Vector Machine Algorithm and Naive Bayes Hendrik Setiawan, Ema Utami, Sudarmawan For sentiment analysis, researchers applied the Bayes nave algorithm and support vector machine (SVM) with performance results obtained on the Bayes algorithm with an accuracy of 81.20%, a time of 9.00 seconds, a recall of 79.60% and a precision of 79.40% while for the SVM algorithm it obtained an accuracy value of 85%, a time of 31.60 seconds, a recall of 84% and a precision of 83.60%, the performance results were obtained at iteration 1 for nave Bayes and the 423rd iteration for the SVM algorithm.(Setiawan et al., 2021) Journal of Mathematics (Computing and Informatics) Vol 5 No 1 (2021) https://doi.org /10.31603/ko mtika.v5i1.51 89 Panggabean et al… Vol 4(1) 2022 : 375-389 381 Thinking Framework Fig. 1. Frame of Mind Source: Research Results (2022) 3. Research Methods Research in general can be interpreted as an effort to seek knowledge or an investigative process that is carried out actively, diligently, and systematically, which aims to find information on a particular topic. Therefore, good research methods are needed to find solutions to the problems raised. The research method that will be proposed in this study is to use the Cross-Industry Standard Process for Data Mining (CRISP-DM) model. The Cross-Industry Standard Process for Data Mining (CRISP-DM) method consists of several 6 stages in CRISP-DM, namely Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation and Deployment. Fig. 2. CRIPS-DM Methods Source: (Shafique & Qaiser, 2014) 4. Results and Discussions Panggabean et al… Vol 4(1) 2022 : 375-389 382 Results and Discussion is a section that contains all scientific findings obtained as research data. This section is expected to provide a scientific explanation that can logically explain the reason for obtaining those results that are clearly described, complete, detailed, integrated, systematic, and continuous. The discussion of the research results obtained can be presented in the form of theoretical description, both qualitatively and quantitatively. In practice, this section can be used to compare the results of the research obtained in the current research on the results of the research reported by previous researchers referred to in this study. Scientifically, the results of research obtained in the study may be new findings or improvements, affirmations, or rejection of a scientific phenomenon from previous researchers. Business Understanding The Business Understanding stage is the initial stage in research to understand the scope of the problem and determine the objectives of the research. In this study, opinions or opinions related to madrasahs from social media are very diverse, this can be used and helps to find out the public's views on madrasahs, to reveal factors that influence the results of research and produce appropriate solutions. Data Understanding At the data understanding stage, a process of understanding the data that will be used as research material is carried out. At this stage, the process of retrieving the original data is carried out in accordance with the required attributes. The dataset to be used is an opinion or opinion from the social media platform Twitter. The data collected is only Indonesian-language tweets from April 10, 2022 to April 16, 2022. The madrasa query parameter is set to 5000 and use the latest or latest type. Then save the popular file to Microsoft Excel. Using Twitter's RapidMiner Studio Tools API version 9.10, Twitter's social media crawl method is used to retrieve tweet data. The data preparation stage is the stage of the data preparation process that aims to make the data clean and ready for research. The initial data obtained from crawling data was comments on social media twitter related to madrasahs as many as 3288 pieces of data. In addition, the process carried out is a cleanup process, such as deleting duplicate data, deleting data with narratives that are not related to the research topic, and producing 458 pieces of data. Here's one example of a view or opinion on Social Media Twitter: Fig. 3. Sample of Opinions on Twitter Social Media (Source: Research Results (2022) Using the data source obtained through the cleaning process, a data set is created with attribute text, which contains opinions or opinions narrated by the waiter that are considered consistent with the population document, then determined the class attribute. In this study, three attributes or class labels will be used in this study, namely positive and negative. In the process Panggabean et al… Vol 4(1) 2022 : 375-389 383 of determining these attributes, it is carried out by the Madrasah Supervisor Working Group in South Jakarta. Table 3 - Labeling Process Results Table Labeling Process Results Label Sum Positive 233 Negative 225 Total 458 Source: Research Results (2022) Data Preparation The preprocessing stage is needed to clean the data from unnecessary text, where the unstructured text data will be converted into structured or semi-structured text data. The stages of preprocessing to process data are case folding, convert emoticons, cleansing, tokenizing, stop word removal and stemming. Pthere is a first stage this researcher uses Gataframework by accessing the link http:// www. gataframework.com/, here's how it looks: Fig. 4. Data Preprocessing Model Design Drawing using Gataframework Source: Research Results (2022) Due to system limitations, where Gataframework can only preprocess a maximum of 100 data, the researcher uses the Excecute Python source code connected from the RapidMiner application to Gataframework. Fig. 5. Data Preprocessing using Execute Python script on Rapidminer application Source: Research Results (2022) Modeling At this stage, datasets that have gone through preprocessing will be used as input in the classification algorithm, and used as training and testing datasets. According to the previous chapter, this study will use four Algorithms at once as comparative material, namely Naïve Bayes (NB), Decision Tree (DT) and K – Nearest Neighbor (K-NN). After the Preprocessing process with the Rapid Miner tool, then proceed with Tokenization, Stop word Filter (Dictionary), Token Filter (by Length) and 10 Cross fold validation. With the 10-Fold Cross validation method, the dataset is divided into 10 areas, with each aspect providing the same information percentage of http://www.gataframework.com/ http://www.gataframework.com/ http://www.gataframework.com/ http://www.gataframework.com/ http://www.gataframework.com/ http://www.gataframework.com/ http://www.gataframework.com/ http://www.gataframework.com/ http://www.gataframework.com/ http://www.gataframework.com/ http://www.gataframework.com/ http://www.gataframework.com/ http://www.gataframework.com/ http://www.gataframework.com/ http://www.gataframework.com/ Panggabean et al… Vol 4(1) 2022 : 375-389 384 each type of data. 9/10 of the data area is used in the Training process to form a model, while 1/10 of the area is used in the Testing process. Training to produce models and testing performance. Fig. 6. Validation Testing Model Using Naïve Bayes (NB), Decision Tree (DT), K – Nearest Neighbor (K-NN) Naïve Bayes (NB) PSO, Decision Tree (DT) PSO and K – Nearest Neighbor (K-NN) PSO Source: Research results (2022) Evaluation The comparison of accuracy, precision, recall and AUC between the Naïve Bayes, Decision Tree, k-Nearest Neighbor, Naïve Bayes PSO, Decision Tree PSO, and k-Nearest Neighbor PSO algorithms with the 10-Fold Cross Validation model has been carried out as follows: Table 4 - Comparison of Accuracy, Precision, Recall and AUC Validation Algorithm Accuracy Precision Recall AUC Cross Validation NB 73.84% 79.42% 73.84% 0.712 German 61.38% 57.21% 97.01% 0.607 K-NN 74.70% 74.08% 78.53% 0.853 NB PSO 80.80% 83.03% 78.64% 0.739 DT PSO 65.27% 59.75% 98.68% 0.647 K-NN PSO 67.24% 81.97% 52.44% 0.764 Source: Research Results (2022) Based on the results of the comparison of research in table 4.13 from tweet processing as many as 458 data shows that the results of the accuracy pattern of classification of the Naïve Bayes PSO algorithm outperform other algorithms, namely Naïve Bayes, Decision Tree, k- Nearest Neighbor, Decision Tree PSO, and k-Nearest Neighbor PSO. Table 5 - Confusion matrix with Naïve Bayes PSO model Accuracy: 80.80% +/- 4.86% (micro average: 80.79%) true NEGATIVE true POSITIVE class precision Pred. NEGATIF 187 50 78.90% Pred. POSITIVE 38 183 82.81% class recall 83.11% 78.54% Source: Research Results (2022) AUC: 0.739 +/- 0.105 (micro average: 0.739) (positive class: POSITIF) Fig. 7. AUC model NB PSO Source: Research results (2022) Based on the results of the study using the Naïve Bayes PSO algorithm, an Accuracy Panggabean et al… Vol 4(1) 2022 : 375-389 385 pattern was obtained, which was 80.80%, where from 233 data predicted positive, it turned out that 183 data were correctly predicted positive (TP) while 50 data turned out to be predicted negative (FP) and from 225 data predicted negative it turned out that 187 were correctly predicted negative (TN) while 38 data turned out to be predicted positive (FN), precision 83.03%, Recall 78.64% and model performance from AUC on ROC curve is 0.739. Deployment Based on the evaluation results of the model testing process between the Naïve Bayes algorithm, Decision Tree, k-Nearest Neighbor, Naïve Bayes PSO, PSO Decision Tree, and k- Nearest Neighbor PSO, it was found that the highest model test results from all algorithm testing results is the Naïve Bayes PSO model. Therefore, the weights that will be used in the application modeling research are based on the results of testing the Naïve Bayes PSO algorithm. Fig. 8. Application Flowchart Source: Research Results (2022) Fig 9. Get Tweet with Twitter API Source: Research Results (2022) The picture above shows the deployment results to get tweets to twitter using the twitter API (Application Programming Interface) by mentioning madrasahs. At this stage the tweet data is taken so that it can be carried out in the next step, namely text preprocessing. Panggabean et al… Vol 4(1) 2022 : 375-389 386 Fig. 10. Text Preprocessing Source: Research Results (2022) In the picture above, after the tweet data is taken, the next step is to preprocess and clean the text, using the Remove @annotation, Remove URL, Tokenize Regexp, Stemming, Not Transformation Negative, Stop word, Remove _ to Space techniques. After the tweet data has been preprocessed and cleaned, the next step is to calculate the word weight. Where the word weight is obtained from the test results of the Naïve Bayes PSO model, because Nave Bayes PSO is an algorithm that has the highest accuracy compared to the Naïve Bayes algorithm, Decision Tree, k-Nearest Neighbor, Naïve Bayes PSO, PSO Decision Tree, and k-Nearest Neighbor. PSO. Fig. 11. Probability of word Source: Research Results (2022) In the picture above, it can be seen that household has a weight of 0, Baharuddin has a weight of 0, a mosque has a negative weight: 0.0028278477625407737 a positive weight: 0.003904555804386555, an island has a negative weight: 0 a positive weight: 0.0012392906127454187, a bargain has a weight of 0, a mat has a weight of 0, a shine has a weight 0, Isytihar has a weight 0, madrasa has a weight 0, the chairman has a negative weight: 0.0011593064344059666 positive weight: 8, maulana has a negative weight: 0 positive weight: 0. 0.0011118029544961404 and hel has a weight of 0. The results of the calculation of these weights are negative weights: 0.0039871541969467 and a positive weight of 8.0062556493716. Thus, the results of the calculations for these categories produce positive conclusions. Panggabean et al… Vol 4(1) 2022 : 375-389 387 Fig. 12. Result Summary Prediction category Source: Research Results (2022) In the picture above, shows the graphic data and the results of the summary predictions of categories that have been categorized based on tweet data and the weight of each word in each category has been calculated. In addition, a graph function to monitor the number of tweet data that has been categorized so that the progress of the data can be monitored. 5. Conclusion From the research above, it can be concluded that the application of Data Mining for the case of sentiment analysis towards madrasahs using 3 classification algorithms of Naïve Bayes (NB), Decision Tree (DT) and K – Nearest Neighbor (K-NN). To improve the performance of the classification method can be done using the Particle Swarm Optimization (PSO) selection feature. The test results of the algorithm Naïve Bayes (NB) PSO get the highest accuracy when compared to the algorithms of Naïve Bayes, Decision Tree, k-Nearest Neighbor, Decision Tree PSO, and k- Nearest Neighbor PSO which is 80.80%, so that with this it can be applied to analyze an opinion. References Aditia Rakhmat Sentiaji, A. M. B., Sarjana, P. S., Statistika, D., Matematika, F., Ilmu, D. A. N., & Alam, P. (2014). Analisis Sentimen Terhadap Acara Televisi Berdasarkan Opini Publik. Jurnal Ilmiah Komputer Dan Informatika (KOMPUTA). Alita, D., Priyanta, S., & Rokhman, N. (2019). Analysis of Emoticon and Sarcasm Effect on Sentiment Analysis of Indonesian Language on Twitter. Journal of Information Systems Engineering and Business Intelligence, 5(2), 100. https://doi.org/10.20473/jisebi.5.2.100- 109 Alkhalifi, Y., Gata, W., Prasetyo, A., & Budiawan, I. (2020). Analisis Sentimen Penghapusan Ujian Nasional pada Twitter Menggunakan Support Vector Machine dan Naïve Bayes berbasis Particle Swarm Optimization. CoreIT, 6(2), 71–78. http://ejournal.uin- suska.ac.id/index.php/coreit/article/view/9723 Badri, M. (2011). Corporate Marketing and Communication. Universitas Mercu Buana. Brahimi, B., Touahria, M., & Tari, A. (2019). Improving sentiment analysis in Arabic: A combined approach. Journal of King Saud University - Computer and Information Sciences, 33(10), 1242–1250. https://doi.org/10.1016/j.jksuci.2019.07.011 Brogan, C. (2011). Social Media 101: Tactics and Tips to Develop Your Business Online. Buntoro, G. A. (2017). Analisis Sentimen Calon Gubernur DKI Jakarta 2017 Di Twitter. Integer Journal, 2(1), 32–41. https://t.co/jrvaMsgBdH Dailey, P. R. (2009). Social Media: Finding Its Way into Your Business Strategy and Culture. Linkage. Deng, L., & Yu, D. (2013). Deep learning: Methods and applications. Foundations and Trends in Signal Processing, 7(3–4), 197–387. https://doi.org/10.1561/2000000039 Evanko, D. (2010). Optical imaging of the native brain. Nature Methods, 7(1), 34. https://doi.org/10.1038/nmeth.f.284 Panggabean et al… Vol 4(1) 2022 : 375-389 388 Faisal, A., Alkhalifi, Y., Rifai, A., & Gata, W. (2020). Analisis Sentimen Dewan Perwakilan Rakyat Dengan Algoritma Klasifikasi Berbasis Particle Swarm Optimization. JOINTECS (Journal of Information Technology and Computer Science), 5(2), 61. https://doi.org/10.31328/jointecs.v5i2.1362 Fauzi, A., Rais, A. N., Akbar, M. F., & Gata, W. (2018). Analisis Sentimen Berinternet Pada Media Sosial AMIK BSI Tegal Dengan Menggunakan Algoritma Naive Bayes. Seminar Nasional Teknologi Informasi Universitas Ibn Khaldun Bogor, 46–54. G.Wahyuningtyas, I. M. and S. (2014). Aplikasi Data Mining untuk Penilaian Kredit Menggunakan Metode Fuzzy Decision Tree. Jurnal Sains Dan Seni Pomits, 1(1), 1–6. Gata, W. (2017). Akurasi Text Mining Menggunakan Algoritma K-Nearest Neighbour pada Data Content Berita SMS. 6, 1–13. Giovani, A. P., Ardiansyah, A., Haryanti, T., Kurniawati, L., & Gata, W. (2020). Analisis Sentimen Aplikasi Ruang Guru Di Twitter Menggunakan Algoritma Klasifikasi. Jurnal Teknoinfo, 14(2), 115. https://doi.org/10.33365/jti.v14i2.679 He, W., Wu, H., Yan, G., Akula, V., & Shen, J. (2015). A novel social media competitive analytics framework with sentiment benchmarks. Information and Management, 52(7), 801–812. https://doi.org/10.1016/j.im.2015.04.006 Ibrahim, D. (2017). Analisis Hubungan antar Faktor dan Komparasi Algoritma Klasifikasi pada Penentuan Penundaan Penerbangan. 2017, September, 15– 17. Kasih, P. (2019). Pemodelan Data Mining Decision Tree Dengan Classification Error Untuk Seleksi Calon Anggota Tim Paduan Suara. Innovation in Research of Informatics (INNOVATICS), 2, 63–69. Keilany, Z. (1978). Book Reviews: Book Reviews. Review of Social Economy, 36(2), 228–229. https://doi.org/10.1080/00346767800000037 Muktamar, B. A., Setiawan, N. A., & Adji, T. B. (2015). Pembobotan Korelasi Pada Naïve Bayes Classifier. Seminar Nasional Teknologi Informasi Dan Multimedia 2015, 2, 43–47. Nurjanah, W. E., Perdana, R. S., & Fauzi, M. A. (2017). Analisis Sentimen Terhadap Tayangan Televisi Berdasarkan Opini Masyarakat pada Media Sosial Twitter menggunakan Metode K-Nearest Neighbor dan Pembobotan Jumlah Retweet. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer (J-PTIIK) Universitas Brawijaya, 1(12), 1750–1757. Peter Norvig, R. (2010). Artificial intelligence—a modern approach by Stuart. Cambridge University Press. Pratomo, Y. (2021). Sejarah Twitter, Jejaring Sosial yang Terinspirasi dari SMS. Tekno.Kompas.Com. https://tekno.kompas.com/read/2021/04/14/20420077/sejarah- twitter-jejaring-sosial-yang-terinspirasi-dari-sms?page=all Purnama, H. (2011). Media Sosial Di Era Pemasaran 3.0. Corporate and Marketing Communication. Jakarta : Pusat Studi Komunikasi Dan Bisnis Program Pasca Sarjana Universitas Mercu Buana, Pp 107-124. Ratino, Hafidz, N., Anggraeni, S., & Gata, W. (2020). Sentimen Analisis Informasi Covid-19 menggunakan Support Vector Machine dan Naïve Bayes. Jurnal JUPITER, 12(2), 1–11. Riyanto, G. P. (2021). Jumlah Pengguna Internet Indonesia 2021 Tembus 202 Juta. Tekno.Kompas.Com. https://tekno.kompas.com/read/2021/02/23/16100057/jumlah- pengguna-internet-indonesia-2021-tembus-202-juta Rozi, I., Pramono, S., & Dahlan, E. (2012). Implementasi Opinion Mining (Analisis Sentimen) Untuk Ekstraksi Data Opini Publik Pada Perguruan Tinggi. Jurnal EECCIS, 6(1), 37–43. S.A Pattekari, A. P. (2012). Prediction system for heart disease using Na ̈ıve Bayes. International Journal of Advanced Com-Puter and Mathematical Sciences, 3(3), 290–294. Safitri, S. I., Suhery, C., & Bahri, S. (2021). Implementasi Algoritma K–Means Untuk Clustering Sentimen Pada Opini Kualitas Pelayanan Jasa Penerbangan. Coding Jurnal Komputer Dan Aplikasi, 09(02), 186–197. https://jurnal.untan.ac.id/index.php/jcskommipa/article/view/47377 Samsir, Ambiyar, Unung Verawardina, Firman Edi, R. W. (2021). Analisis Sentimen Pembelajaran Daring Pada Twitter di Masa Pandemi COVID-19 Menggunakan Metode Naïve Bayes. Jurnal Media Informatika Budidarma, 5(1), 157–163. Panggabean et al… Vol 4(1) 2022 : 375-389 389 https://doi.org/10.30865/mib.v5i1.2604 Setiawan, H., Utami, E., & Sudarmawan, S. (2021). Analisis Sentimen Twitter Kuliah Online Pasca Covid-19 Menggunakan Algoritma Support Vector Machine dan Naive Bayes. Jurnal Komtika (Komputasi Dan Informatika), 5(1), 43–51. https://doi.org/10.31603/komtika.v5i1.5189 Shafique, U., & Qaiser, H. (2014). A Comparative Study of Data Mining Process Models ( KDD , CRISP-DM and SEMMA ). International Journal of Innovation and Scientific Research, 12(1), 217–222. http://www.ijisr.issr-journals.org/ Suryanto, A., Alfarobi, I., Tutupoly, T. A., & Fauziahti, R. (2019). Optimasi Naive Bayes Menggunakan Optimize Weights Dan Stratified Pada Data Kredit Koperasi. Mantik Penusa, 3(1), 211–219. Turban, E. (2005). Decision Support Systems and Intelligent Systems Edisi Bahasa Indonesia. Andi. Xiaojun, Z. (2011). Michael W. Berry and Jacob Kogan (eds.): Text mining: applications and theory. Information Retrieval, 14(2), 208–211. https://doi.org/10.1007/s10791-010-9153-5 Y. Sunoto, B. W. (2014). Analisis Testimonial Wisatawan Menggunakan Text Mining Dengan Metode Naive Bayes DanDecision Tree, Studi Kasus Pada Hotel Hotel Di Jakarta. Jurnal Informatika Dan Bisnis ANALISIS, 3(2), 39–49. Zumarniansyah, A., Pebrianto, R., & ... (2020). Twitter Sentiment Analysis of Post Natural Disasters Using Comparative Classification Algorithm Support Vector Machine and …. Jurnal Pilar Nusa …, 169–174. http://ejournal.nusamandiri.ac.id/index.php/pilar/article/view/1423