KEDS_Paper_Template Knowledge Engineering and Data Science (KEDS) pISSN 2597-4602 Vol 3, No 1, July 2020, pp. 50–59 eISSN 2597-4637 https://doi.org/10.17977/um018v3i12020p50-59 ©2020 Knowledge Engineering and Data Science | W : http://journal2.um.ac.id/index.php/keds | E : keds.journal@um.ac.id This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/) Opinion Analysis for Emotional Classification on Emoji Tweets using the Naïve Bayes Algorithm Siti Sendari a, 1, *, Ilham Ari Elbaith Zaeni a, 2 , Dian Candra Lestari a, 3 , Hanny Prasetya Hariyadi b, 4 a Departement of Electrical Engineering, Universitas Negeri Malang Jl. Semarang No. 5 Malang 65145, Indonesia b Graduate School of Information, Production and Systems (IPS), Waseda University 1-104 Totsukamachi, Shinjuku City, Tokyo 169-8050, Japan 1 siti.sendari.ft@um.ac.id *; 2 ilham.ari.ft@um.ac.id; 3 dnivers@gmail.com; 4 prasetya.hanny@fuji.waseda.jp * corresponding author I. Introduction Social media is a tool used to interact or communicate digitally and can be accessed while connected to the internet. There are several examples of social media, such as Twitter, Facebook, and Instagram. Twitter is an open-source social media, thus developers can do research and development on it [1][2]. Quoted from the online media eBizMBA, Twitter was ranked 4th with 375 million active users in the September 2019 period [3]. Twitter has several features, namely tweet, hashtag, and Emoji. Based on these features, one feature is chosen that is quite interesting to study is Emoji. Where Emojis are the latest generation of emoticons. The use of Emojis emerged in the late 20th century, by Shigetaka Kurita in 1990 with the aim of beautifying the message. In other words, the Emojis are graphic symbols included in Unicode, used to express facial expressions or represent an object as a simple illustration in conveying an idea [4][5]. Using Emojis separately within a message can lead to miscommunication. However, if it is attached within a message, it can maximize the understanding between the writer and the reader [6]. Often, opinions in the form of text experience the ambiguity of emotions conveyed, including the Emojis contained therein. Sentiment analysis needs to be done to see the user's opinion on the tendency of opinion on a problem [7]. This will affect the psychology of users in interacting through social media. In the book The Emoji Code, by Vyvyan Evans (cognitive linguist), states that Emojis imply non-verbal language in non-face-to-face interactions [8]. Therefore, Emojis play a role in ARTICLE INFO A B S T R A C T Article history: Received 28 July 2020 Revised 9 August 2020 Accepted 11 August 2020 Published online 17 August 2020 Opinion Analysis is a research study needed to social media, since the content could become a trending topic and has a significant impact on social life. One of the social media that have a big contribution to cyberspace and information development is Twitter. In the Twitter application, users can insert images that represent emotions, facial expressions, or icons. Emoji is a graphic symbol in the form of an image to express a thing, with the Emoji, a text can be read and understood according to its meaning because the image represents it. Of the several things that have been mentioned then, the researchers conducted research on the classification of tweet content based on the use of Emojis. This study aims to determine the emotional uses of Twitter in one period. Every tweet on the Twitter timeline, which contains both text and Emojis, will be classified according to several categories. The algorithm used was Naïve Bayes. It calculated the probability of Emoji tweet to obtain the text classification with Emojis. The results of the classification of emotions are grouped with three categories, namely "angry," "joy," and "sad," it showed that the category "joy" had become the emotional trend of Twitter users where Emojis (x1f60a) dominate the most. Meanwhile, the accuracy of the algorithm used to reach 90% with a 70:30 holdout technique. This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/). Keywords: Opinion Analysis Twitter Emoji Classification Naïve Bayes mailto:siti.sendari.ft@um.ac.id mailto:ilham.ari.ft@um.ac.id mailto:dnivers@gmail.com mailto:prasetya.hanny@fuji.waseda.jp https://creativecommons.org/licenses/by-sa/4.0/ S. Sendari et al. / Knowledge Engineering and Data Science 2020, 3 (1): 50–59 51 representing mood and emotion within communication. Emoji can also add user’s personality to text and generate user empathy and it is important to produce effective communication. Thus, it is essential to find deeper meaning of Emoji within the field of sentiment analysis. Sentiment analysis is a study of opinion mining, which is carried out to obtain information about opinions and emotions on a topic. Emotions can be used as a benchmark for the happiness of society with consideration for making decisions. Emotion detection can be used to check the message content so as to minimize misunderstanding between the reader and the sender [9][10]. An opinion is able to represent feelings and emotions thus the classification of emotions is needed to see the emotional tendency towards the meaning of the content implied in the opinion. Emotional classed as a form of sentiment classification that focuses on the emotional classification of meaning or content. There are several aspects of the study that become reference points in this study, namely the Naive Bayes algorithm, the exploration of Emojis on Twitter for opinion analysis, and the exploration of the classification of emotions towards Emoji users. Some examples of research are relevant to this study, such as a study of sentiment classification with Emoji using Training Heuristical Training [11][12], research on multilingual emoji prediction [13][14], differences perception using Emoji [15][16][17], sentiment analysis with emojis [18][19]. II. Methods This research established a system for classifying emotions based on tweets that have Emojis. The study was conducted by applying the Naïve Bayes Classifier (NBC) algorithm for text-based emotion classification. Meanwhile, the emojis contained in the tweet will be identified as terms. Thus, text and Emojis are likened to certain terms. The experiments were carried out on two conditions, namely tweet by ignoring Emojis and tweet by processing Emojis. The design of the research flow is shown in Figure 1. Some of the processes are to be done in developing a system consist of: (1) collecting data of Twitter data retrieval, (2) data pre-processing to prepare data to fit the research boundaries, (3) the classification process to identify emotions and measure the performance of algorithms against opinion data, then (4) evaluate by seeing the results of accuracy from the algorithm, and (5) finally is the analysis and visualization of the frequency of texts and Emojis that often appear. At the stage of Twitter data retrieval, the method used was crawling, which was crawling each tweet on the timeline. The crawling process is done using the R language of R-studio software, which is flexible and adaptable for other applications [20]. Then, at the pre-processing stage, there are several steps in it, such as data preparation, cleaning, stopword removal, and stemming [21]. Data preparation was done to select the data that have been collected to fit the research limitations and ease the workload of the system [22]. Data preparation was done manually by selecting tweets Fig. 1. Emotional classification research design flow 52 S. Sendari et al. / Knowledge Engineering and Data Science 2020, 3 (1): 50–59 that only post themselves (not retweeted), tweets containing text and Emojis, and tweets in English. Next, the cleaning stage is to delete components in the tweet that are not needed in the classification process [23]. These components included @username, punctuation, and numbers. Next, the stopword removal stage was deleting conjunctions or not unique words [24]. The last step, stemming, was the process of turning every word in a sentence into a basic word that matches the dictionary in the stemmer [25]. Followed by the classification process, in which text and Emojis (which are considered as a term) were calculated in the word polarity to determine the emotional class [26][27]. After the classification process was complete, the evaluation process was carried out using the Naive Bayes algorithm to see the results of the system's accuracy in processing opinion data [28]. Finally, it continued by analyzing the results and visualization of the term and Emoji frequencies [29]. III. Results and Discussions On the pre-processing data stage, data preparing, case folding, cleaning, stopword removal, and stemming were performed. Initial data processing (pre-processing) is processing raw data from crawling into data that is ready for the classification process. The process of preparing data was to select data. The selected data are tweets that are not retweets, tweets that contains Emojis, and tweets in English. This process was done manually by scanning data. In this process, emotional labeling classes were also done manually based on the expert review on the psychological-based of the Faculty of Psychology Education – Universitas Negeri Malang (State University of Malang) as data verification. The results of the selection data obtained as many as 305 tweets. In this process, the selection of attributes that will be used in the research is also carried out. Unused attributes were deleted and ignored. Furthermore, it included attributes to support dataset processing. The following Table 1 presents the attributes that were used in research. The reason for choosing these attributes is because each attribute is a factor needed in the classification of emotions. Text and X1f600-X1f637 attributes were calculated to find the probability value hence it can determine the tendency of emotional classes. Meanwhile, the Emoji count attribute is used to see the emoji trends that are often used. Then, the No, ID, and Emoji attributes were used as identification of the Text and X1f600-X1f637 attributes. The cleaning process cleaned tweets that have punctuation marks, numbers (0 to 9), links (http / https), and username (@ user1) because they do not provide informative messages in terms of emotions. Figure 2 is an example of a tweet that was reduced in the cleaning process. Thus, from the process, it produced the following tweet: "To every sunrise and sunset and everything in between of we're excited for you (beer)." The Stopword removal removed conjunctions and words that were not included in the unique word. Tweets were scanned based on a database containing conjunctions. If the words in the tweet have similarities with the words in the database, then the word was deleted. The above figure is an example of the stopword removal process: "To every sunrise and sunset and everything in between of we're excited for you (beer)." After going through the stopword removal process, it changed into: "Every sunrise, sunset everything excited you (beer)." The process of stemming was the process of changing a word into a basic word. The words that contain a prefix and suffix affixes were changed to a word stem. A collection of word stems was stored in a database called a porter stemmer. Every word in the tweet was matched into the porter Table 1. Dataset details No Attribute Name Data Type Explanation 1 No Numeric Data sequence number 2 ID Numeric Tweet Identity 3 Text String Tweet 4 Emoji Character Emojis (Unicode) in the form of characters 5 X1f600 – X1f637 Biner Emoji contained in tweets 6 Emoji count Numeric Total emojis contained in one tweet S. Sendari et al. / Knowledge Engineering and Data Science 2020, 3 (1): 50–59 53 stemmer dictionary thus words that contain the affix were changed according to the dictionary. The following is an example of the stemming process: "Every sunrise, sunset everything excited you (beer)". After going through the stemming process, it becomes: "Every sunrise, sunset everything excites you (beer)". In calculating the probability using the Naïve Bayes algorithm, a comparison of values was performed. Where the category of emotions that has a high probability value is included in the dominant category. Probability is an event that can be known or predicted by looking at the pattern of previous events based on facts. Another term is simply explained that probability is the chance of an event or the possibility of an event occurring based on previous events. The program is carried out to measure emotions. It is based on text variables and other emoji forms initialized with hexadecimal. Then, from these variables, the probability of emotion categories (joy, anger, sadness) and word probability is obtained based on each emotion. Meanwhile the variable n shows the number of words / terms. After the variables are formed, the Naive Bayes algorithm performs calculations with a formula to find the emotional probability based on the text and get the identity of the result of the emotion. For more details, the calculation steps for calculating and obtaining variables will be provided. Based on these three keys, an equation can be arranged to produce the probability value of an event, and it can be shown by (1) P (𝐴) = (1) where P(A) it the probability of an event, N is the number of events, and M is the amount of sample space. Conditional probability is an event that occurs after another event exists. More precisely, it is an event that is based on another event that affects each other. For example, P (B | A) is spelled out that the probability of event B with condition A. At the time of classification, the algorithm looks for the highest probability value of all categories tested [30]. The Basic Naïve Bayes theorem is described in (2) Pr(𝐵|𝐴) = Pr(A|B) Pr(B) Pr(A) (2) where Pr(𝐵|𝐴) is the class probability (𝐵) based on the object (𝐴), Pr(𝐴|𝐵) is the probability of occurrence of objects (𝐴) based on class (𝐵), Pr(𝐵) is the probability of class data occurrence (𝐵), and Pr(A) is the probability of object (𝐴). In table 2 there are four sample datasets with emotional category labels. The table presents an example of a tweet for emotional classification where the table contains new data, which will later be determined as a class of a category. By using the probability calculation formula, the results of the tendency of the tweet emotion category seen from the probability value were obtained. The example of calculation is presented as follows. First, it requires to determine the probability value of the category based on training data. In the example above, each category has a probability (3) (3) where Pr( ) is the probability of the appearance of class i, 𝑁𝑑 is the amount of data based on class i, and 𝑁𝑑 is the sum of all training data. Fig. 2. Example tweets before the cleaning process 54 S. Sendari et al. / Knowledge Engineering and Data Science 2020, 3 (1): 50–59 To obtain the value of Pr( 𝑖), the researchers divided the amount of data based on class i and the total number of training data. Then, it obtained Joy = , Sad = , Angry = . Further, it calculated the probability of each word in one tweet that served as the testing data. Thus, the calculation is presented as follows [31]: ∑ = (4) The probability of the word based on the the category was obtained by counting the number of words in a category ( ) added by 1, divided by the number of words in that category (∑ ). Before doing these calculations, it should first determine: • Total words in categories Joy = 17 word • Total words in categories Sad = 21 word • Total words in categories Angry = 7 word To make it easier, it is presented in the form of table as Table 3. After obtaining the probability value of words with Emojis, then it proceeded with calculating the probability of tweets in each category. This was done to find the highest value of each category based on tweets. The following are examples of calculations: ∏ (5) where Pr is the category probabilities on tweets, Pr( ) is the probability of tweets by category , n is the many words in one tweet, and Pr( ) is the word probability by category . Table 3. Probability value of word with emoji Word Joy Sad Angry Will 0 + 1 17 0 + 1 21 0 + 1 7 Remote 0 + 1 17 0 + 1 21 1 + 1 7 Work 2 + 1 17 1 + 1 21 1 + 1 7 Another 0 + 1 17 0 + 1 21 0 + 1 7 Company 0 + 1 17 0 + 1 21 0 + 1 7 x1f60a 1 + 1 17 0 + 1 21 0 + 1 7 Table 2. Examples of labeled tweets for emotional classification No Tweet Emotional categories 1 Exhaust good I work paycheck just collect x1f60a Joy 2 Sun shine I work home can see daylight x1f60d Joy 3 Good day move forward something stuck hand feel exhaust mind many hour work hard try plate always full x1f629 x1f629 x1f629 Sad 4 Just lazy people want remote work x1f603 Angry 5 Will remote work another company x1f60a ? S. Sendari et al. / Knowledge Engineering and Data Science 2020, 3 (1): 50–59 55 The Naïve Bayes algorithm is useful for finding the highest class probability value in a tweet. The probability of data appearing for a category ( ) was obtained by dividing the number of tweets included in that category based on the total number of tweets. Meanwhile, the probability of the occurrence of tweets in a category (Pr( )) was done by calculating the multiplication of the word probability. Pr(Joy| will remote work another company x1f60a) = Pr(Joy) * Pr(will|Joy) * Pr(remote|Joy) * Pr(work|Joy) * Pr(another|Joy) * Pr(company|Joy) * Pr(x1f60a|Joy) = ( ) ( ) ( ) ( ) ( ) ( ) ( ) = 1242.9x10 -10 Pr(Sad| will remote work another company x1f60a) = Pr(Sad) * Pr(will|Sad) * Pr(remote|Sad) * Pr(work|Sad) * Pr(another|Sad) * Pr(company|Sad) * Pr(x1f60a|Sad) = ( ) ( ) ( ) ( ) ( ) ( ) ( ) = 58.3x10 -10 Pr(Angry| will remote work another company x1f60a) = Pr(Angry) * Pr(will|Angry) * Pr(remote|Angry) * Pr(work|Angry) * Pr(another|Angry) * Pr(company|Angry) * Pr(x1f60a|Angry) = ( ) ( ) ( ) ( ) ( ) ( ) ( ) = 8499.86x10 -10 After obtaining the probability value of tweets, then the three categories were compared based on which category has the highest probability value. Hence, it could be classified into the following groups: Joy, Sad, or Angry emotional categories. The results of the probability calculation found that the value of Angry's condition outperformed Joy and Sad's emotional categories. Thus, tweet number 5 belongs to the Angry emotional category. Holdout evaluation was an evaluation method used to divide data into training and testing data in accordance with a specified percentage. It is known that the accuracy of the classification of emotions by using the system is 90% where the holdout used was 70%: 30%. Thus, systematically, it obtained accurate results, then it performed manual calculations (accuracy, precision, recall, and specificity) to prove and strengthen the results based on the confusion matrix. Where the training data used were 214 data, and testing data were 91 data. From the results of testing data, 28 data are prediction error data. Thus, it can be illustrated by using the confusion matrix as shown in Figure 3. Overall Accuracy can be presented in the following calculation ∑ ∑ 100 = 100 = 100 = 90.81% Fig. 3. Confusion matrix of training data 56 S. Sendari et al. / Knowledge Engineering and Data Science 2020, 3 (1): 50–59 The algorithm performance calculation was a test for Emoji tweet data. Meanwhile, the value of the algorithm performance with 91 data based on testing data was 69%. This is obtained from the confusion matrix shown in Figure 4. Testing accuracy can be presented in the following calculation ∑ ∑ 100 = 100 = 100 = 69% To see the performance of the Naïve Bayes algorithm for the classification of emotions on tweet data with differences in the percentage of training and testing data, an experiment was conducted using the holdout method on three schemes, namely 70:30, 80:20 and 90:10. The results of all three schemes are presented in the following explanation: The first scheme (70:30) split 305 data into 214 training data and 91 testing data. From these data, a Naïve Bayes calculation was made to the classification of emotions. The result is an accurate value calculated from the overall data of 90%. Meanwhile, the accuracy of the performance of the naïve Bayes algorithm based on data testing has a value of 69%. In this scheme (80:20), the 305 record is devided into 244 training data and 61 testing data. From these data, a Naïve Bayes calculation was made to the classification of emotions. The result is an accurate value calculated from the overall data of 93%. Meanwhile, the accuracy of the performance of the naïve Bayes algorithm based on data testing has a value of 67%. The last scheme (90:10) devided 305 data was done into two types, namely 275 training data and 30 testing data. From these data, a Naïve Bayes calculation is made to the classification of emotions. The result is an accurate value calculated from the overall data of 95%. Meanwhile, the accuracy of the performance of the Naïve Bayes algorithm based on data testing has a value of 53%. Based on the three schemes, the results show that the amount of training data affects the level of accuracy. Because, more training data, the probability of words with Emojis is also higher. That is because of the frequency of words with Emojis affects algorithm calculations. It is evidenced by the results of the comparison of 70:30, 80:20, and 90:10, the sequential accuracy is 90%, 93%, and 95%. Then, based on testing data for accuracy in a row, that is 69%, 67%, and 53%. The results of testing data, as many as 45 data were prediction error data. In addition, the accuracy of the overall data is 85%. Meanwhile, the performance of the Naïve Bayes algorithm has an accuracy of 50% based on testing data. Therefore, for the comparison between text tweets and Emoji tweets, testing was done on the data testing with the third scheme, which is 70:30. Accordingly, from the results of the overall comparison of data on the accuracy of text tweets (85%) and Emoji tweets (90%), it was stated that the accuracy increased by 5%. As for the results of comparison of testing data on the accuracy of text tweets (50%) and Emoji tweets (69%), it was stated that the increase in the probability of tweets was 19%. Figure 5 is the result of visualization in the form of Word Cloud, where the word "day" is the center of Word Cloud [32]. The word has the highest frequency compared to other words, thus "day" dominates. The left (a) is a square-shaped Word Cloud, and the right (b) is a circular Word Cloud. Both have the same information, and it's just a different word cloud model. Fig. 4. Confusion matrix of training data S. Sendari et al. / Knowledge Engineering and Data Science 2020, 3 (1): 50–59 57 Figure 6 shows a histogram based on words in the highest frequency sequence on the left side. Horizontal lines are identification to show words/terms that are there. Then, the vertical line (y-axis) shows the value / many words based on the x-axis (horizontal line). The frequency of the word "day" has the highest value of 39. IV. Conclusion Pre-processing is the stemming stage that uses the logic of the Porter stemmer. Where, based on this logic, it produces basic and single words. However, the lack of sensitivity and adaptation of words so that the resulting changes in words become less precise. The probability of a word acquires a high value depends on the frequency of the word based on emotional categories. Testing of Naïve Bayes Algorithm using the holdout method was done by sharing training data and testing data by 70% and 30% of 305 data. Where training data are 214 and testing data are 91 therefore 90% accuracy is obtained. Precision in Joy (0.99), Sad (0.90), and Angry (0.72). Then, recall of Joy (0.88), Sad (0.91), and Angry (0.98). Calculation of the probability of tweet Emoji, able to increase the emotional tweet by 19%. Where, based on the accuracy of data testing text tweet with Emojis at 69%. Meanwhile, the accuracy of testing data, text tweets without Emojis is 50%. This is made clear by the results of prediction errors on Emoji text data totaling 28 data. And prediction errors in text data without Emojis are 45 data. It can be implemented using other classification algorithms to compare the performance of classification algorithms and handling methods on Emojis. The stemming pre-processing stage can use other logic to convert words into basic and single words, according to the actual basic words. Punctuation can affect the emotional state of a text. Hence, in subsequent studies, the research can be extended by including punctuation marks to see the effect on tweet emotions. (a) (b) Fig. 5. (a) Square word cloud and (b) circle word cloud Fig. 6. Word frequency histogram 58 S. Sendari et al. / Knowledge Engineering and Data Science 2020, 3 (1): 50–59 Acknowlegdement This research was supported by Universitas Negeri Malang and Waseda University. We thank our colleagues from both institutions who provided insight and expertise that greatly assisted the research, although they may not agree with all of the interpretations/conclusions of this paper. We thank Dr. Aji P. Wibawa for assistance with suggestion in methodology and for comments that greatly improved the manuscript Declarations Author contribution All authors contributed equally as the main contributor of this paper. All authors read and approved the final paper. Funding statement This paper is a part of research, which has been supported by DRPM research grant of Indonesian Government.. Conflict of interest The authors declare no conflict of interest. Additional information No additional information is available for this paper. References [1] N. Alias, M. S. Sabdan, K. A. Aziz, M. Mohammed, I. S. Hamidon, and N. Jomhari, "Research Trends and Issues in the Studies of Twitter: A Content Analysis of Publications in Selected Journals (2007 – 2012)," Procedia - Soc. Behav. Sci., vol. 103, pp. 773–780, 2013. [2] A. Uhl, N. Kolleck, and E. Schiebel, "Twitter data analysis as contribution to strategic foresight-The case of the EU Research Project' Foresight and Modelling for European Health Policy and Regulations' (FRESHER)," Eur. J. Futur. Res., vol. 5, no. 1, 2017. [3] Statista Research Department, "Twitter: number of users in Indonesia 2019 | Statista," 2019. [Online]. Available: https://www.statista.com/statistics/490591/twitter-users-malaysia/. [4] P. K. Novak, J. Smailović, B. Sluban, and I. Mozetič, “Sentiment of emojis,” PLoS One, vol. 10, no. 12, pp. 1–21, 2015. [5] Y. Tang and K. F. Hew, "Emoticon, emoji, and sticker use in computer-mediated communication: A review of theories and research findings," Int. J. Commun., vol. 13, pp. 2457–2483, 2019. [6] H. Miller, D. Kluver, J. Thebault-Spieker, L. Terveen, and B. Hecht, "Understanding emoji ambiguity in context: The role of text in emoji-related miscommunication," Proc. 11th Int. Conf. Web Soc. Media, ICWSM 2017, pp. 152–161, 2017. [7] D. Bandorski et al., "Contraindications for video capsule endoscopy," World J. Gastroenterol., vol. 22, no. 45, pp. 9898–9908, 2016. [8] E. Vyvyan, The Emoji Code: The Linguistics Behind Smiley Faces and Scaredy Cats Handbook, 2017. [9] I. Ileri and P. Karagoz, "Detecting user emotions in twitter through collective classification," IC3K 2016 - Proc. 8th Int. Jt. Conf. Knowl. Discov. Knowl. Eng. Knowl. Manag., vol. 1, no. Ic3k, pp. 205–212, 2016. [10] M. S. Asriadie, M. S. Mubarok, and Adiwijaya, "Classifying emotion in Twitter using Bayesian network," in Journal of Physics: Conference Series, 2018, vol. 971, no. 1. [11] F. Hallsmar and J. Palm, "Multi-class Sentiment Classification on Twitter using an Emoji Training Heuristic," pp. 1– 27, 2016. [12] S. Narr, M. Hulfenhaus, and S. Albayrak, "Language-independent Twitter sentiment analysis," Knowl. Discov. Mach. Learn. (KDML), LWA, pp. 12–14, 2012. [13] F. Barbieri et al., “SemEval 2018 Task 2: Multilingual Emoji Prediction,” pp. 24–33, 2018. [14] H. W. Raj and S. Balachandran, “Future Emoji Entry Prediction Using Neural Networks,” Journal of Computer Science, vol. 16, no. 2, pp. 150–157, Feb. 2020 [15] J. Berengueres and D. Castro, "Differences in emoji sentiment perception between readers and writers," Proc. - 2017 IEEE Int. Conf. Big Data, Big Data 2017, vol. 2018-Janua, pp. 4321–4328, 2018. [16] S. Lau, "The effect of smiling on person perception," J. Soc. Psychol., vol. 117, no. 1, pp. 63–67, 1982. [17] J. Berengueres and D. Castro, "Sentiment Perception of Readers and Writers in Emoji use," 2017. [18] G. Guibon, M. Ochs, and P. Bellot, "From Emojis to Sentiment Analysis," 2016. [19] S. Ayvaz and M. O. Shiha, "The Effects of Emoji in Sentiment Analysis," Int. J. Comput. Electr. Eng., vol. 9, no. 1, pp. 360–369, 2017. [20] S. Khalil and M. Fakir, "RCrawler: An R package for parallel web crawling and scraping," SoftwareX, vol. 6, pp. 98– 106, 2017. https://doi.org/10.1016/j.sbspro.2013.10.398 https://doi.org/10.1016/j.sbspro.2013.10.398 https://doi.org/10.1016/j.sbspro.2013.10.398 https://doi.org/10.1007/s40309-016-0102-4 https://doi.org/10.1007/s40309-016-0102-4 https://doi.org/10.1007/s40309-016-0102-4 https://www.statista.com/statistics/490591/twitter-users-malaysia/ https://www.statista.com/statistics/490591/twitter-users-malaysia/ https://doi.org/10.1371/journal.pone.0144296 https://doi.org/10.1371/journal.pone.0144296 https://doi.org/10.1007/978-981-10-8896-4_16 https://doi.org/10.1007/978-981-10-8896-4_16 http://brenthecht.com/publications/icwsm17_emojitext.pdf http://brenthecht.com/publications/icwsm17_emojitext.pdf http://brenthecht.com/publications/icwsm17_emojitext.pdf https://doi.org/10.3748/wjg.v22.i45.9898 https://doi.org/10.3748/wjg.v22.i45.9898 https://doi.org/10.1126/science.aao5728 https://doi.org/10.5220/0006037502050212 https://doi.org/10.5220/0006037502050212 https://doi.org/10.1088/1742-6596/971/1/012041 https://doi.org/10.1088/1742-6596/971/1/012041 http://www.diva-portal.org/smash/get/diva2:927073/FULLTEXT01.pdf http://www.diva-portal.org/smash/get/diva2:927073/FULLTEXT01.pdf http://www.dai-labor.de/fileadmin/files/publications/narr-twittersentiment-KDML-LWA-2012.pdf http://www.dai-labor.de/fileadmin/files/publications/narr-twittersentiment-KDML-LWA-2012.pdf https://www.aclweb.org/anthology/S18-1003/ https://doi.org/10.3844/jcssp.2020.150.157 https://doi.org/10.3844/jcssp.2020.150.157 https://doi.org/10.1109/bigdata.2017.8258461 https://doi.org/10.1109/bigdata.2017.8258461 https://doi.org/10.1080/00224545.1982.9713408 https://arxiv.org/abs/1710.00888 https://hal-amu.archives-ouvertes.fr/hal-01529708/document https://doi.org/10.17706/ijcee.2017.9.1.360-369 https://doi.org/10.17706/ijcee.2017.9.1.360-369 https://doi.org/10.1016/j.softx.2017.04.004 https://doi.org/10.1016/j.softx.2017.04.004 S. Sendari et al. / Knowledge Engineering and Data Science 2020, 3 (1): 50–59 59 [21] M. Desai and M. A. Mehta, "Techniques for sentiment analysis of Twitter data: A comprehensive survey," Proceeding - IEEE Int. Conf. Comput. Commun. Autom. ICCCA 2016, no. April 2016, pp. 149–154, 2017. [22] A. S. Raamkumar, M. Erdt, H. Vijayakumar, E. Rasmussen, and Y. L. Theng, "Understanding the Twitter usage of humanities and social sciences academic journals," Proc. Assoc. Inf. Sci. Technol., vol. 55, no. 1, pp. 430–439, 2018. [23] V. A. and S. S. Sonawane, "Sentiment Analysis of Twitter Data: A Survey of Techniques," Int. J. Comput. Appl., vol. 139, no. 11, pp. 5–15, 2016. [24] J. K. and J. R., "Stop-Word Removal Algorithm and its Implementation for Sanskrit Language," Int. J. Comput. Appl., vol. 150, no. 2, pp. 15–17, 2016. [25] M. Adriani, J. Asian, B. Nazief, S. M. M. Tahaghoghi, and H. E. Williams, “Stemming Indonesian,” ACM Transactions on Asian Language Information Processing, vol. 6, no. 4, pp. 1–33, Dec. 2007 [26] H. Pajupuu, R. Altrov, and J. Pajupuu, “Identifying Polarity in Different Text Types,” Folklore: Electronic Journal of Folklore, vol. 64, pp. 125–142, Jun. 2016 [27] G. Yurtalan, M. Koyuncu, and Ç. Turhan, "A polarity calculation approach for lexicon-based Turkish sentiment analysis," Turkish J. Electr. Eng. Comput. Sci., vol. 27, no. 2, pp. 1325–1339, 2019. [28] F. C. Permana, Y. Rosmansyah, and A. S. Abdullah, “Naive Bayes as opinion classifier to evaluate students satisfaction based on student sentiment in Twitter Social Media,” Journal of Physics: Conference Series, vol. 893, p. 012051, Oct. 2017. [29] E. Hauthal, D. Burghardt, and A. Dunkel, "Analyzing and visualizing emotional reactions expressed by emojis in location-based social media," ISPRS Int. J. Geo-Information, vol. 8, no. 3, 2019. [30] Li-guo Duan, D. Peng, and Ai-ping Li, “A New Naive Bayes Text Classification Algorithm,” TELKOMNIKA Indonesian Journal of Electrical Engineering, vol. 12, no. 2, Feb. 2014 [31] M. S. Saputri, R. Mahendra, and M. Adriani, "Emotion Classification on Indonesian Twitter Dataset Emotion Classification on Indonesian Twitter Dataset," in International Conference on Asian Language Processing, 2018, no. November. [32] B. Tessem, S. Bjørnestad, W. Chen, and L. Nyre, “Word cloud visualisation of locative information,” J. Locat. Based Serv., vol. 9, no. 4, pp. 254–272, 2015. https://doi.org/10.1109/ccaa.2016.7813707 https://doi.org/10.1109/ccaa.2016.7813707 https://doi.org/10.1002/pra2.2018.14505501047 https://doi.org/10.1002/pra2.2018.14505501047 https://doi.org/10.5120/ijca2016908625 https://doi.org/10.5120/ijca2016908625 https://doi.org/10.5120/ijca2016911462 https://doi.org/10.5120/ijca2016911462 https://doi.org/10.1145/1316457.1316459 https://doi.org/10.1145/1316457.1316459 https://doi.org/10.7592/fejf2016.64.polarity https://doi.org/10.7592/fejf2016.64.polarity https://doi.org/10.3906/elk-1803-92 https://doi.org/10.3906/elk-1803-92 https://doi.org/10.1088/1742-6596/893/1/012051 https://doi.org/10.1088/1742-6596/893/1/012051 https://doi.org/10.1088/1742-6596/893/1/012051 https://doi.org/10.3390/ijgi8030113 https://doi.org/10.3390/ijgi8030113 https://doi.org/10.11591/telkomnika.v12i2.4180 https://doi.org/10.11591/telkomnika.v12i2.4180 https://doi.org/10.1109/ialp.2018.8629262 https://doi.org/10.1109/ialp.2018.8629262 https://doi.org/10.1109/ialp.2018.8629262 https://doi.org/10.1080/17489725.2015.1118566 https://doi.org/10.1080/17489725.2015.1118566 I. Introduction II. Methods III. Results and Discussions IV. Conclusion Acknowlegdement Declarations Author contribution Funding statement Conflict of interest Additional information References