Bulletin of Social Informatics Theory and Application ISSN 2614-0047 Vol. 7, No. 1, March 2023, pp. 24-31 24 https:doi.org/10.31763/businta.v7i1.606 Uncovering negative sentiments: a study of indonesian twitter users' health opinions on coffee consumption Laksono Budiarto a,1,*, Nissa Mawada Rokhman a,2, Wako Uriu b,3 a Universitas Negeri Malang, Jl. Semarang No. 5 Malang 65145, Jawa Timur, Indonesia b Chikushi Jogakuen University,2-chōme-12-1 Ishizaka, Dazaifu, Fukuoka 818-0118, Japan 1 laksono.budiarto@um.ac.id; 2 nissa.mawarda@um.ac.id; 3 ue2017119@chikushi-u.ac.jp * corresponding author 1. Introduction In previous research, it has been found that coffee consumption is generally safe at normal intake levels, based on a summary indicating low to no increased health risk for consumption of three to four cups per day, and may even be more beneficial than harmful to health. Importantly, outside of pregnancy, available evidence suggests that coffee can be tested as an intervention without significant risk of harm [1]. Caffeine, trigonelline, chlorogenic acid, amino acids, carbohydrates, lipids, organic acids, minerals, and volatile aroma compounds are just a few numbers of many chemical components contained in coffee that have both positive and negative impacts on the health of coffee drinkers [2], [3]. Recent research on coffee consumption and consumer purchasing patterns may help us better understand food choices and lifestyles. It is also possible to further improve and establish consumption recommendations and food purchasing behavior by integrating knowledge of food nutrient quality [4]. Evidence shows that consuming coffee may reduce the risk of noncommunicable diseases (NCDs) [1]. Understanding modifiable risk factors through unhealthy dietary patterns could help the World Health Organization (WHO) achieve its goal of reducing the relative risk of premature deaths from NCDs by 25% by025 [5]. Although there have been numerous studies on the general impact of coffee on health, there is still limited discussions on public opinion regarding to the effects of coffee on health. Opinions on the health effects of coffee vary widely among coffee drinkers, depending on their physical health and knowledge of the beverage itself. The same cup of coffee can have different opinions on each individual, not only in terms of taste but also its effects on their body, which sometimes contradicts the facts. According to Liu [6], there are two main types of textual information: facts and opinions. A R T I C L E I N F O A B S T R A C T Article history Received January 24, 2023 Revised February 6, 2023 Accepted February 26, 2023 The increase in coffee consumption among the public is due to several reasons, including health and lifestyle. Awareness of coffee consumption’s positive and negative effects has also increased. This research is a sentiment analysis that aims to investigate Twitter users’ opinions about the impact of coffee consumption on their health. The method involves data collection using the RapidMiner application, which utilizes the Twitter Application Programming Interface (API) function connected to a prepared Twitter account. The obtained data underwent data cleaning, saved as an Excel file type, training and testing, and model evaluation. Then, the data were classified into Negative, Neutral, and Positive Opinions. The results showed that less than 10% of opinions were positive, 19% were neutral, and 73% were negative. The opinions obtained are useful information for stakeholders in the coffee industry. They can also be used to determine better steps in educating the public about coffee. This is an open access article under the CC–BY-SA license. Keywords Sentiment analysis Twitter Coffee effect Negative opinion RapidMiner https://doi.org/10.31763/businta.v7i1.606 mailto:laksono.budiarto@um.ac.id http://creativecommons.org/licenses/by-sa/4.0/ http://creativecommons.org/licenses/by-sa/4.0/ ISSN 2614-0047 Bulletin of Social Informatics Theory and Application 25 Vol. 7, No. 1, March 2023, pp. 24-31 Budiarto et.al (Uncovering negative sentiments: a study of indonesian twitter users' health opinions on coffee) Current processing techniques (such as search engines) work with facts (assuming the information is true), which can be expressed through topic keywords. However, search engines are not used to search for opinions because opinions are difficult to express with just a few keywords [7]. To mine opinions, it is more appropriate to use a dataset consisting of messages collected from Twitter, which contains a large number of short messages created by users of this microblogging platform [8]. Twitter is a well-known social media platform with 18.45 million users as of January 2022 [9]. Twitter is a popular social media platform widely used to express opinions and thoughts in short messages [10]. The limited character count on Twitter makes it an ideal platform for opinion mining. By analyzing tweets and opinions about coffee, it is possible to determine the sentiment or polarity of these opinions, whether they are positive, negative, or neutral. This information is valuable for understanding how users perceive and feel about coffee. By conducting sentiment analysis, a more accurate picture can be obtained regarding public opinion about coffee, particularly its impact on health. If the results show a higher negative score, education about coffee should be improved by emphasizing more research facts about the effects of coffee on the body. This research was conducted on Twitter posts, both recent and popular posts. The data was collected in October 2022, on Indonesian-language posts. 2. Method Sentiment analysis, sometimes called opinion mining, is a method or process that classifies text resulting in Natural Language Processing (NLP) [11]. This is a common method of defining and grouping opinions about goods, services, or concepts, involving data mining applications, artificial intelligence (AI), and machine learning (ML) to mine text with a sentiment or subjective meaning. Opinion analysis is one type of study used in social media because its content can become a trending topic and significantly impact social life [12]. Text Mining is one of the main subfields of data mining. Its goal is to uncover previously undiscovered but potentially useful information from semi-structured or unstructured text data [13]. The mining process uses the RapidMiner application, one of the most popular, comprehensive and adaptable data mining tools that can be accessed, with over 400 data mining modules or operators. RapidMiner is an open-source data mining tool that was one of the top three data mining tools overall in 2007 and 2008, according to a survey conducted by the well-known data mining website KDnuggets.com among several hundred data mining specialists. Data loading, preprocessing, visualization, interactive mining process design and review, automatic modeling, automatic parameter and process optimization, automatic feature creation and feature selection, evaluation, and implementation are all supported by RapidMiner. RapidMiner is a powerful platform for mining and analyzing data [14], [15]. With the RapidMiner application, the analysis results will be processed through several modeling methods, one of which is the Naïve Bayes method [16]. The Bayes’ theorem forms the basis of the Naive Bayes classification. Classification is a set of algorithms for classification. Naive Bayes classification is based on the idea that the features being classified are all independent of each other [17], [18]. As shown in Fig. 1, the steps of the process can be explained as follows, according to the RapidMiner Studio manual book [19]. Fig. 1. The Stages of the Sentiment Analysis Process 26 Bulletin of Social Informatics Theory and Application ISSN 2614-0047 Vol. 4, No. 2, July 2018, pp. 24-31 Budiarto et.al (Uncovering negative sentiments: a study of indonesian twitter users' health opinions on coffee) 2.1. Data collection with API Twitter • Testing the prepared query on the Twitter search section. The obtained results can at least provide a definite picture of the relevance of the resulting message posts to the research objective. • Selecting Twitter as the connection type to input the Twitter API access code in the token input field. • Coming into the Twitter search operator, which is located under the Data Access, Application, and Twitter option. • Select the connection type created in the connection entry option, and enter the prepared query in the query entry field. 2.2. Data Cleaning • The cleaning process involves using the Select Attributes operator under the Blending, Attributes, and Selection options. Select “single” for the attributes file type option and “Text” for the attributes option. • Enter the “Remove Duplicates” operator, which is located in the Cleansing, Duplicates option. Select “Single” in the attributes file type options, and “Text” in the attributes option. 2.3. Save as Excell • The mining result data is saved in an Excel format file using the Write Excel operator located in the Data Access, Files, Write option. • Specify the filename to save the results by filling in the filename in the Excel file input field. 2.4. Train and Test • Adding a label column to the Excel file to enter Negative, Neutral, and Positive inputs. • RapidMiner handles this process by using the XValidation operator. This operator automatically divides the data into various subsets needed for cross-validation. Several sample experiments can be found, including experiments that use XValidation for performance measurement, which are available under the Sample, Repository option. 2.5. Model and Evaluate • By default, AutoModel uses multi-hold-out-set validation instead of cross-validation to validate the model. • RapidMiner creates the resulting process after running AutoModel to see how the model’s performance is estimated. The data obtained through mining consists of the date, time, sender, message content, and other determined attributes. The reading process is done by utilizing the Application Programming Interface (API) service provided by Twitter. Connection entry is done by entering the given API token and then entering the “Search Twitter” operator with the query input “minum kopi” sakit and “minum kopi” aman, result type “recent or popular”. A total of 328 data were obtained. The author did not use the query “minum kopi” sehat as an antonym of sakit because even though the results obtained were more (357 data), the negative value was higher than the positive value. It is because the word “sehat” (healthy) is more often associated with “tidak sehat” (unhealthy) or “kurang sehat” (rather sick). The process is shown in Fig. 2. It starts with searching data based on the query entered in the search Twitter operator, followed by combining data using the union operator. The next step is to select the attribute to be processed in the text column containing user tweets. The following step is to remove duplicate data that may occur because users only retweet. The resulting data is then saved in Excel format for further processing. ISSN 2614-0047 Bulletin of Social Informatics Theory and Application 27 Vol. 7, No. 1, March 2023, pp. 24-31 Budiarto et.al (Uncovering negative sentiments: a study of indonesian twitter users' health opinions on coffee) Fig. 2. Data mining process in RapidMiner To obtain a data comparison, this study also conducted an online survey. Two options were provided in this survey system, a) Drinking coffee makes you sick, and b) Drinking coffee is healthy. The survey will be distributed by sharing a tweet link. This survey was conducted for seven days, in accordance with Twitter’s rules [20]. The polling was conducted by creating a tweet containing a message introducing the purpose and objective of the survey. Then, the poll feature was selected, options for the survey were filled in, and the maximum duration of the survey, which could last up to 7 days, was determined. The process can be seen in Fig. 3. Fig. 3. The Process of Creating a Poll on a Twitter’s Tweet 3. Results and Discussion In Table 1, there is an opinionated statement that drinking coffee has a negative impact on health. The text part of the sentence explicitly states that the cause of illness is drinking coffee. Table.1 Data Classification of Negative Opinions ID Text Label 1585067228881641472 jangan minum kopi pagi2 sakit perut wkwk Negative 1584980056065339392 yaolo yaolo niat minum kopi biar ga ngantuk eh malah sakit perut wkwk Negative 1584976005495943168 ih asik banget kakak bisa minum kopi. aku sekalinya nyeruput kafein langsung sakit perut masa, kak. ?? Negative 28 Bulletin of Social Informatics Theory and Application ISSN 2614-0047 Vol. 4, No. 2, July 2018, pp. 24-31 Budiarto et.al (Uncovering negative sentiments: a study of indonesian twitter users' health opinions on coffee) Meanwhile, Table 2 shows that the sentences are categorized as neutral statements and are mostly dominated by questions. Table.2 Data classification of neutral opinions ID Text Label 1583559221081440256 kalo bsk pagi gue minum kopi pait kr2 aman gak yaa,, Neutral 1583459347073445890 maybe lebih aman makanan yaa. ga usah mahal mahal gapapa ko. soalnya blm tentu bisa minum kopi. Neutral 1583345106140397569 iyaa, kadang malah minum kopi dari biji salak. kalo itu masih aman. Neutral In Table 3, there are affirmations that drinking coffee is safe and categorized as positive statements. Table.3 Data Classification of Positive Opinions ID Text Label 1584774800689405953 gue pasti sama dia sama-sama minum kopi terus? aman sejauh ini! lagi dan selalu berusaha diimbangi dengan makan terus. Positive 1584724633646817280 pagiku aman kalau udah sarapan plus minum kopi. Positive 1584462034921754624 yg pasti perut udh diisi dulu sm makanan si ceu biar lambung aman. abis itu mangga klo mau ngopi. biar gak insomnia jangan minum kopi diatas jam 6 sore si kyknya Positive In Table 4, show the results of several modeling. Table.4 The Results of Several Modeling Model Classificati on Error Standard Deviation Gai ns Total Time Training Time (1,000 Rows) Scoring Time (1,000 Rows) Naive Bayes 0.4 0.0 0. 0 999.0 149.4 4297.7 Generalized Linear Model 0.4 0.1 0. 0 4199.0 6957.3 6694.7 Fast Large Margin 0.4 0.1 - 6. 0 2464.0 122.0 6251.9 Decision Tree 0.4 0.0 0. 0 1125.0 149.4 3404.6 Random Forest 0.4 0.0 - 2. 0 3661.0 280.5 3587.8 Gradient Boosted Trees 0.3 0.0 14 .0 377175. 0 3192.1 2511.5 Support Vector Machine 0.3 0.0 2. 0 1945.0 332.3 3709.9 The Naïve Bayes sentiment analysis result can be seen in Fig. 4, where the positive opinion is minimal, below 10%, while the neutral opinion is 19%, and the negative opinion is 73%. The words “tiap, makin, and mood” are part of the support for the negative opinion. Meanwhile, contradicting negative opinion is obtained from the words “dada, asli, beli, tidur”. For example, it can be seen in the sentence "anjir tiap minum kopi malem2 pasti langsung mual + sakit perut??" from ID "1583154503926575105"., “Kalo stress minum kopi aja kali yah..biar makin sakit!!!” from ID ”1583567191580246016", become part of supports negative opinion, while those that become part of contradicts negative opinion, in the sentence "Hari-hari minum kopi malah mules, biasanya deg- degan, dada sakit, sekarang mules wkaakakaka" from ID "1582013788210876416", "udah tau ga bisa minum kopi pake ngide segala beli kopi, akhirnya sakit kan perutnya:(“ from ID “1582321855889043463”. ISSN 2614-0047 Bulletin of Social Informatics Theory and Application 29 Vol. 7, No. 1, March 2023, pp. 24-31 Budiarto et.al (Uncovering negative sentiments: a study of indonesian twitter users' health opinions on coffee) Fig. 4. Diagram and factors of negative opinion The Naïve Bayes performance as seen in Fig. 5 Fig. 5. Naïve Bayes – Performance As a comparison, an online survey was conducted in this study. As seen in Fig. 6, the result of choosing drinking coffee as healthy obtained a value of 67.7%, much larger than the choice of drinking coffee causing illness at 32.3%. The survey participants amounted to 167 respondents, starting from December 13, 2022, until December 20, 2022. Fig. 6. The results of the opinion poll on Twitter 30 Bulletin of Social Informatics Theory and Application ISSN 2614-0047 Vol. 4, No. 2, July 2018, pp. 24-31 Budiarto et.al (Uncovering negative sentiments: a study of indonesian twitter users' health opinions on coffee) These different results may be influenced by the limited scope of dissemination, which tends to be obtained from the account’s circle of friends, as well as the respondents’ subjective level towards the poll creator’s profile. 4. Conclusion The results obtained from this sentiment analysis provide an overview of the imbalance between coffee education, the promotion of coffee benefits, and the growing negative opinion regarding the impact of coffee on health. It may also be due to the increasing number of coffee shops that do not apply proper serving methods and the importance of coffee education by baristas to customers. These possibilities also serve as suggestions for further research development, and the results of online polls can be improved with wider dissemination. References [1] R. Poole, O. J. Kennedy, P. Roderick, J. A. Fallowfield, P. C. Hayes, and J. Parkes, “Coffee consumption and health: umbrella review of meta-analyses of multiple health outcomes,” BMJ, vol. 359, p. j5024, Nov. 2017, doi: 10.1136/bmj.j5024. [2] J. V. Higdon and B. Frei, “Coffee and Health: A Review of Recent Human Research,” Crit. Rev. Food Sci. Nutr., vol. 46, no. 2, pp. 101–123, Mar. 2006, doi: 10.1080/10408390500400009. [3] F. Bastian et al., “From Plantation to Cup: Changes in Bioactive Compounds during Coffee Processing,” Foods, vol. 10, no. 11, p. 2827, Nov. 2021, doi: 10.3390/foods10112827. [4] A. Samoggia and B. Riedel, “Consumers’ Perceptions of Coffee Health Benefits and Motives for Coffee Consumption and Purchasing,” Nutrients, vol. 11, no. 3, p. 653, Mar. 2019, doi: 10.3390/nu11030653. [5] WHO, “Global action plan for the prevention and control of noncommunicable diseases 2013-2020.,” World Heal. Organ., p. 102, 2013, [Online]. Available at: https://www.who.int/publications/i/item/9789241506236. [6] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Aug. 2004, pp. 168–177, doi: 10.1145/1014052.1014073. [7] G. Uddin and F. Khomh, “Automatic Mining of Opinions Expressed About APIs in Stack Overflow,” IEEE Trans. Softw. Eng., vol. 47, no. 3, pp. 522–559, Mar. 2021, doi: 10.1109/TSE.2019.2900245. [8] M. Imran, P. Mitra, and C. Castillo, “Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages,” Proc. 10th Int. Conf. Lang. Resour. Eval. Lr. 2016, pp. 1638–1643, May 2016, Accessed: Jun. 22, 2023. [Online]. Available: https://arxiv.org/abs/1605.05894v2. [9] A. O. P. Dewi, A. Isnaini, and M. F. Lestari, “An Analysis of the post-COVID-19 Information Distribution on Social Media,” E3S Web Conf., vol. 359, p. 02037, Oct. 2022, doi: 10.1051/e3sconf/202235902037. [10] N. Öztürk and S. Ayvaz, “Sentiment analysis on Twitter: A text mining approach to the Syrian refugee crisis,” Telemat. Informatics, vol. 35, no. 1, pp. 136–147, Apr. 2018, doi: 10.1016/j.tele.2017.10.006. [11] C. J. Rameshbhai and J. Paulose, “Opinion mining on newspaper headlines using SVM and NLP,” Int. J. Electr. Comput. Eng., vol. 9, no. 3, p. 2152, Jun. 2019, doi: 10.11591/ijece.v9i3.pp2152-2163. [12] S. Sendari, I. A. E. Zaeni, D. C. Lestari, and H. P. Hariyadi, “Opinion Analysis for Emotional Classification on Emoji Tweets using the Naïve Bayes Algorithm,” Knowl. Eng. Data Sci., vol. 3, no. 1, pp. 50–59, Aug. 2020, doi: 10.17977/um018v3i12020p50-59. [13] Z. Ding, Z. Li, and C. Fan, “Building energy savings: Analysis of research trends based on text mining,” Autom. Constr., vol. 96, pp. 398–410, Dec. 2018, doi: 10.1016/j.autcon.2018.10.008. [14] M. Bjaoui, H. Sakly, M. Said, N. Kraiem, and M. S. Bouhlel, “Depth insight for data scientist with RapidMiner « an innovative tool for AI and big data towards medical applications»,” in Proceedings of the 2nd International Conference on Digital Tools & Uses Congress, Oct. 2020, pp. 1–6, doi: 10.1145/3423603.3424059. https://doi.org/10.1136/bmj.j5024 https://doi.org/10.1080/10408390500400009 https://doi.org/10.3390/foods10112827 https://doi.org/10.3390/nu11030653 https://www.who.int/publications/i/item/9789241506236 https://doi.org/10.1145/1014052.1014073 https://doi.org/10.1109/TSE.2019.2900245 https://arxiv.org/abs/1605.05894v2 https://doi.org/10.1051/e3sconf/202235902037 https://doi.org/10.1016/j.tele.2017.10.006 https://doi.org/10.11591/ijece.v9i3.pp2152-2163 https://doi.org/10.17977/um018v3i12020p50-59 https://doi.org/10.1016/j.autcon.2018.10.008 https://doi.org/10.1145/3423603.3424059 ISSN 2614-0047 Bulletin of Social Informatics Theory and Application 31 Vol. 7, No. 1, March 2023, pp. 24-31 Budiarto et.al (Uncovering negative sentiments: a study of indonesian twitter users' health opinions on coffee) [15] P. Ristoski, C. Bizer, and H. Paulheim, “Mining the Web of Linked Data with RapidMiner,” J. Web Semant., vol. 35, pp. 142–151, Dec. 2015, doi: 10.1016/j.websem.2015.06.004. [16] Z. E. Rasjid and R. Setiawan, “Performance Comparison and Optimization of Text Document Classification using k-NN and Naïve Bayes Classification Techniques,” Procedia Comput. Sci., vol. 116, pp. 107–112, Jan. 2017, doi: 10.1016/j.procs.2017.10.017. [17] K. Yadav and R. Thareja, “Comparing the Performance of Naive Bayes And Decision Tree Classification Using R,” Int. J. Intell. Syst. Appl., vol. 11, no. 12, pp. 11–19, Dec. 2019, doi: 10.5815/ijisa.2019.12.02. [18] M. Singh, M. Wasim Bhatt, H. S. Bedi, and U. Mishra, “WITHDRAWN: Performance of bernoulli’s naive bayes classifier in the detection of fake news,” in Materials Today: Proceedings, Dec. 2020, p. 1, doi: 10.1016/j.matpr.2020.10.896. [19] RapidMiner, RapidMiner Studi o Manual, p.116, 2012. [Online]. Available at: https://docs.rapidminer.com/downloads/RapidMiner-v6-user-manual.pdf. [20] Twitter, “Twitter Polls – how to create and how to vote.”. [Online]. Available at: https://help.twitter.com/en/using-twitter/twitter-polls. https://doi.org/10.1016/j.websem.2015.06.004 https://doi.org/10.1016/j.procs.2017.10.017 https://doi.org/10.5815/ijisa.2019.12.02 https://doi.org/10.1016/j.matpr.2020.10.896 https://docs.rapidminer.com/downloads/RapidMiner-v6-user-manual.pdf https://help.twitter.com/en/using-twitter/twitter-polls