FACTA UNIVERSITATIS Series: Economics and Organization Vol. 16, N o 1, 2019, pp. 59 - 73 https://doi.org/10.22190/FUEO1901059G © 2019 by University of Niš, Serbia | Creative Commons Licence: CC BY-NC-ND Original Scientific Paper CONTEMPORARY DATA ANALYSIS TECHNIQUES FOR ONLINE REPUTATION MANAGEMENT IN HOSPITALITY AND TOURISM  UDC 338.48:004.738.5 Olivera Grljević, Zita Bošnjak, Saša Bošnjak University of Novi Sad, Faculty of Economics in Subotica, Serbia Abstract. Knowing what attracts or deters tourists to/from a tourist visit and what products to offer them and to pay special attention to is crucial for good economic results. Such knowledge can be obtained by analysis of online comments and reviews that tourists leave on travel websites (such as Booking, TripAdvisor, Trivago, etc.). This paper describes the value which information about opinions and emotions hidden in online reviews has for managers who receive it, especially the knowledge of (dis)satisfaction of users with certain aspects of the tourist offer. Uncovered knowledge from online reviews provides a chance to take advantage of the strong points, and correct the shortcomings through timely corrective measures and actions. Contemporary approaches and methods of analyzing online reviews and the opportunities for development they provide in the tourism industry are described through a case study conducted over a subset of 20491 hotel reviews from TripAdvisor. We have conducted sentiment analysis of reviews with the goal of building an automated model which will successfully distinguish positive from negative reviews. Logistic Regression classifier has the best performance, in 90% of reviews it has correctly classified positive reviews and in 83% negative. We have illustrated how association rules can help management to uncover relationships between concepts under discussion in negative and positive reviews. Key words: online reputation management, е-word-of-mouth, online reviews, automated classification, sentiment analysis, association rules JEL Classification: C55, Z3 Z32 Received December 17, 2018 / Accepted February 18, 2019 Corresponding author: Olivera Grljević Faculty of Economics Subotica, Segedinski put 9-11, 24000 Subotica, Serbia E-mail: oliverag@ef.uns.ac.rs 60 O. GRLJEVIĆ, Z. BOŠNJAK, S. BOŠNJAK 1. INTRODUCTION Serbian tourism potentials have not been sufficiently exploited, but with adequate professional effort and willingness to grab the opportunities, the Republic of Serbia can successfully break into the map of important world tourism destinations. The first and the fundamental step in the formation of the desired set of competitive tourism products is valorizing the market potential of existing tourism products. In order to find out which tourism products are the most competitive and which tourism offers are in line with the expectations and demands of potential users, it is necessary to know the preferences and opinions of the users. Nowadays we have numerous tourism websites, where users state their opinion in the form of text reviews and/or ratings. Subjects of reviews and ratings in the field of tourism and hospitality are entities related to travel, stay, and services provided. These freely given comments are sources of useful information about tourism entities for prospective users, who use them when deciding on a particular tourism destination, as well as for tourism organizations and hotels’ management that can adapt their offer to the requirements and expectations of users (Hu, 2004; Tripp, 2011; Bing, 2016). The number of reviewing websites and user generated contents grows every day. According to the Statista 1 portal, 600 million reviews and opinions were posted by 2017 only on TripAdvisor website, while marked increase from year to year amounts to approximately 135 million reviews. This amount of user generated content cannot be efficiently processed manually. The methods of semantic analysis and machine learning techniques for opinion mining and sentiment analysis can be used for automated classification of positive from negative reviews (i.e. for determining the sentiment polarity of a review) and for identifying dimensions through which tourists value an offer or a service (Pang, 2002; Dave, 2003; Pang, 2008; Medhat, 2014). Good reviews, conveying positive attitude, can attract new tourists, while bad ones, conveying negative emotions, can damage the reputation of the organization, so it is important that tourism organizations constantly monitor the online activities of their guests and promptly respond to expressed problems, concerns or dissatisfaction (Lei, 2015). Particular attention should be given to negative comments, as they can indicate the deficiencies in the service provision and users' expectations fulfillment (Racherla, 2013), so that tourism organizations can correct them and better position themselves by focusing on their users’ needs and preferences, expressed in online comments. The next chapter first describes the background on different approaches and levels of the analysis of sentiment of user generated contents. Secondly, an overview on sentiment analysis research in the hospitality and tourism industry was given. Chapter 3 sheds light on the relation between sentiment analysis and online reputation management in tourism and hospitality. Chapter 4 describes the case study of sentiment analysis over the dataset from TripAdvisor. Through series of experiments in which different methods of sentiment analysis were applied, we investigated how successfully these methods performed the classification of online reviews into the positive and the negative class. Also, generating association rules over positive and negative class of reviews we indicate their helpfulness for management, as well as their potential to uncover relationships between concepts under discussion in negative and positive reviews. In the conclusion, the results of the conducted study, the benefits of the proposed approach in the field of tourism and hotel management and the directions for further research are highlighted. 1 https://www.statista.com/statistics/684862/tripadvisor-number-of-reviews/ Contemporary Data Analysis Techniques for Online Reutation Management in Hospitality and Tourism 61 2. RELATED WORK Consumer behavior has changed significantly as offline sources of information about products and services are replaced by electronic word of mouth marketing (eWOM) (Gruen et al, 2006). As the authors of the paper (Litvin, 2008) define it, eWOM refers to all informal communication with consumers, particularly related to the usage or characteristics of goods and services, through Internet-based technology. Generally in e-commerce customers have shown to believe the opinion of other people more and to trust them more than promotional campaigns of a company (Pitt, 2002; Berthon et al, 2012), and they put equal trust in online reviews and personal recommendations of friends (Park, 2007; Gligorijevic, 2012). Similar behavior is observed within tourism and hospitality sector. Customers have more trust in websites with reviews than in professional guides and travel agencies and perceive social media as more credible and trustworthy than traditional marketing communications (Leung, et al, 2013). As pointed in (Dijkmans, et al., 2015), service companies are more susceptible to the influence of social media due to the nature of their products. Services are intangible, non-standardized and must be consumed in order to be evaluated, which increases the possibility of discrepancies between customer expectations, their perceptions and the services themselves, and consequently can lead to more frequent complaints on social media. To turn publicly available reviews into competitive advantage and a tool for online reputation management, a company, particularly in service industry, should apply techniques of sentiment analysis over such data. Opinion expressed through reviews comprises entity, an aspect of entity, sentiment expression on particular aspect of entity, opinion holder, and time when opinion is expressed, (Liu, 2012). The author also defines sentiment as positive, negative, or neutral, which is expressed with different intensities (e.g., 1-5 stars mostly used on reviewing sites). Sentiment analysis studies such opinion which expresses or points to a positive or negative sentiment. Unlike facts, opinions and feelings are highly subjective. For this reason, it is necessary to analyze the set of opinions of different people instead of a single opinion that expresses the subjective view of the individual (Grljević, 2016). Thus sentiment analysis can aggregate the overall public opinion using summarization techniques, and characterize variations in affections over a certain period of time. In the following part of this section, we discuss approaches to sentiment analysis (Section 2.1) and applications of sentiment analysis in the tourism and hospitality sector (Section 2.2). 2.1. Approaches to sentiment analysis Reviewing websites represent a major data source for sentiment analysis and opinion mining. These websites provide numerical ratings (star or Lickert type scale) and/or textual reviews. Analyses based on a numerical rating system cannot uncover nuanced opinions that are often expressed in textual content. Only text reviews provide the understanding of expressed consumers’ opinions (Puri, 2017). In strict numerical ratings they are generally lost. However, numerical ratings can be used as indicators of sentiment (Pang et al, 2002; Racherla et al, 2013; Dave, 2003; Turney, 2002). There are two approaches to sentiment analysis of text reviews (Aye & Aung, 2018). The lexicon-based approach compares individual words from the sentences with the sentiment words listed in lexicons in order to determine whether the words used in a review convey any sentiment (positive or negative) or not. The authors in (Hu & Liu, 2004) regarded a 62 O. GRLJEVIĆ, Z. BOŠNJAK, S. BOŠNJAK sentence as positive/negative if there was a majority of positive/negative opinion words in the review. In the case where there was the same number of positive and negative opinion words in the sentence, they predicted the orientation using the average orientation of the closest opinion words for a feature in an opinion sentence or the orientation of the previous opinion sentence. In (Turney, 2002), the semantic orientation of a phrase that contains adjectives or adverbs was calculated as “the mutual information between the given phrase and the word "excellent" minus the mutual information between the given phrase and the word "poor"”. The machine learning approach uses unsupervised or supervised learning. The first compares each word of the text with positively or negatively valued word selected for a cluster centre. The sentiment orientation of a review is then predicted by the average semantic orientation of the words in the review. Various supervised machine learning algorithms could be used for sentiment classification. In such a case, it is necessary to provide several thousand examples of labeled text for the training of the classification algorithm. The associated labels could be: “positive”, “negative”, or “neutral”, but depending on the purpose of the analysis, some additional labels could be used, such as the sentiment target (Hu, 2004). Sentiment analysis can be performed at the document level (Pang & Vaithyanathan, 2002; Dave et al, 2003; Bibi, 2017), sentence level (Dos Santos & Gatti, 2014) or word level (Kim & Hovy, 2004). The document-level classification takes into account the whole document (i.e. reviews), and starts from the premise that a document discusses only one topic. Having in mind that the feedback mechanism of online reviews provided after service consumption plays critical role in the online sale of the hospitality and tourism industry (Schuckert & Law, 2014), the document-level classification often does not provide enough detail about the prevailing consumer opinion on the various aspects of the entity being monitored (Medhat, 2014). Since each tourism product should be evaluated based on its own characteristics, for hospitality and tourism industry the aspect-oriented sentiment analysis is more suitable (Tjahyanto & Sisephaputra, 2017; Marrese-Taylor et al, 2013; Bucur, 2015). An aspect-oriented analysis deals with the classification of sentiment at the sentence (Hu & Liu, 2004; Broß, 2013; Pontiki et al, 2016), or phrase level according to various aspects of the tourism product (Li et al, 2018). Aspects usually correspond to arbitrary topics considered important or representative of the text that is being analyzed. The third level of sentiment analysis involves classifying a word or phrase according to the polarity of the sentiment (Zhang et al, 2014; Agarwal et al, 2009; Ikeda et al, 2008). The polarity of sentiment, also referred to as sentiment orientation, points to positive or negative expression of sentiment. Almost all previous work in sentiment analysis is based on single, positive/negative category or scale such as star ratings. Such a one-dimensional scale does not accurately reflect the complexity of human emotions and sentiments. Sentiment analysis is often performed on text labelled on more fine-grained sentiment scale, such as Ekman’s six basic emotions joy, sadness, anger, fear, disgust, and surprise (Holzman & Pottenger, 2003; Alm et al, 2005; Strapparava & Mihalcea, 2007; Mohammad, 2012), Plutchik’s scale which is based on Ekman’s with addition of trust and anticipation (Brooks et al., 2013; Mohammad, 2012a; Suttles & Ide, 2013). The bag-of-words representation that is often used in current baseline methods cannot properly capture more complex linguistic phenomena in sentiment analysis. The method Contemporary Data Analysis Techniques for Online Reutation Management in Hospitality and Tourism 63 described in (Nakagawa et al, 2010) uses many manually constructed domain-specific resources (sentiment lexica, parsers, polarity-shifting rules), which, on one hand, increase precision of the model in the given domain, while, on the other hand, limit their applicability to a broader range of tasks and languages. In (Socher et al, 2011), authors introduce the model that, instead of using a bag-of-words representation, exploits the hierarchical structure and uses compositional semantics to understand sentiment. The system can be trained both on unlabeled domain data and on supervised sentiment data and does not require any language-specific sentiment lexica, parsers, etc. Rather than limiting sentiment to a positive/negative scale, the authors predicted a multidimensional distribution over several complex, interconnected sentiments. Visual contents such as photos and videos, represent the next boundary to be explored with cutting-edge technological tools. Deep learning techniques recently developed in natural language processing and, especially, computer image processing appears to be an ideal tool for many of the problems related to user-generated contents on the Internet. Long and Short Term Memory networks, a popular module in deep learning architectures, provides an effective way of sequentially composing the semantic understandings in texts (Ma et al, 2018). 2.2 Research on sentiment analysis in tourism and hospitality sector and impact on a business The positive relationship between customer rating and online sales of hotels is noted in (Öğüt & Onur, 2012). According to the findings, a 1% increase in online customer rating increased sales per room up to 2.68% in hotels in Paris and up to 2.62% in hotels in London. The authors also identified correlation between customer ratings and prices. Higher customer ratings also resulted in higher prices of the hotels and prices of high star hotels were more sensitive to online customer ratings. Xie et al (2017) have shown that overall rating, and ratings of attributes such as purchase value, location and cleanliness, variation and volume of consumer reviews, and the number of management responses are significantly associated with hotel performance. This study showed that providing timely and lengthy responses enhances future financial performance, whereas providing responses by hotel executives and responses that simply repeat topics in the online review lowers future financial performance. Lee and Song (2010) suggested that company responses that include an apology, compensation, or corrective action may help restore the company's positive image. The results showed that informational factors, such as vividness 2 and consensus 3 , facilitated consumers’ attribution to companies’ responsibility for the negative events, and subsequently led to changing their evaluation of the company. In addition, the authors found that corporate response strategy to online complaints should be different from the conventional response strategies. In (Sparks, 2016), an experimental approach to test the effect of providing or not providing a response to a negative online review is conducted. It was shown that four variables: the source of the response (Guest Service Agent or General Manager), the 2 Vividness refers to information capacity to attract and hold attention to excite imagination. 3 When people encounter negative information about products or services, they often consider other individuals’ reactions to the information prior to judging themselves what the causes of the problem are. 64 O. GRLJEVIĆ, Z. BOŠNJAK, S. BOŠNJAK communication style (professional or colloquial), the efficiency of the response (fast, moderate, or slow), and the action frame of the response (corrective action was taken in the past or is promised to be taken in the future), influences the potential customers' trust and concern regarding the hotel. Previous eWOM tourism studies assume a direct relationship between online consumer content, online reviews and tourism performance with empirical studies adopting a bi-variate methodology. The study of Phillips (2015) used artificial neural networks, which went beyond linear and bi-variate investigations, and provided evidence to suggest that online reviews together with traditional hotel characteristics should be considered as salient determinants of hotel performance. 3. REPUTATION MANAGEMENT BY SENTIMENT ANALYSIS Travel-related entities and services are the subject of online reviews and/or ratings provided by tourists after their travel experience. In order to study the textual content of the reviews and identify various dimensions on which consumers made an evaluation and to determine the polarity of expressed opinions or emotions, these reviews should be processed, and relevant information summarized for companies. Automated sentiment analysis requires either a dataset with clearly marked examples of positive and negative reviews or sentiment dictionaries, both provided in advance. These resources are unavailable or scares for under-researched languages, such as Serbian, or even for the English language in case of under-research domains. When datasets or sentiment dictionaries are unavailable, numerical ratings (star ratings, thumbs up, thumbs down) obtained from a reviewing website are a starting point in sentiment analysis, as they can be used to determine the sentiment orientation of a review (Pang et al, 2002; Racherla et al, 2013; Dave, 2003; Turney, 2002). In the paper (Grljević & Bošnjak, 2018) the authors described a method for determining the sentiment polarity of online reviews with assigned numerical rating of 3 on the one-to-five scale, as this category of review is identified as “mixed” category and the source of ambiguity of sentiment. The presented method implies similarity check in terms of vocabulary and writing style among this “mixed” category of reviews (reviews that convey both positive and negative sentiment and should not be mistaken for neutral reviews), and positive reviews (with numerical ranks 4 or 5), or negative reviews (with ranks 1 or 2). Once a dataset with clearly marked learning examples of positive and negative reviews is prepared, the consecutive sentiment analysis could be conducted. As pointed out in (Grljević & Bošnjak, 2018) automated sentiment analysis models can aggregate overall satisfaction or dissatisfaction of customers by summarizing positive and negative online comments. Although both positive and negative reviews enhance consumer awareness of hotels, positive reviews tend to improve overall attitude towards the hotels, and this considerably affects lesser-known hotels (Öğüt & Onur, 2012). The automated sentiment analysis can help hotel managers to improve their services. It provides summarized feedback on how their hotel is seen by customers, what services they liked or disliked (Bucur, 2015). They represent a useful tool for benchmark and analysis of public opinion towards the key competition through analysis of online reputation of competition, and public stance towards their key products, brands, or services. By implementing sentiment analysis models, any business can monitor variations in public opinion through time. Gaining a better understanding of the associations within Contemporary Data Analysis Techniques for Online Reutation Management in Hospitality and Tourism 65 the various attributes of the properties, and traveler reviews in general, may lead to an improvement of the services provided and a decrease in the postings of negative reviews. Another interesting behavior of consumers is revealed in the analysis of online reviews in (Racherla et al, 2013). Even when consumers have given very low ratings to a certain property, they have not completely given up on the property, so they are willing to return to it in the future. According to the authors’ findings, among consumers who give low ratings to the property, approximately 9% are willing to give a second chance if service providers are willing to take the negative feedback into consideration and ensure that service delivery is significantly improved. Consequently, managers must develop strategies that improve consumers’ perceptions of their responsiveness, i.e. willingness to take under consideration the reviews, both positive and negative, and enhance their service. Appropriate company response strategies to online complaints are necessary to protect or improve the company’s reputation. “No action” strategies may risk allowing negative information about the company to stand unchallenged, which in turn may damage the company’s reputation (Davidow, 2003). Having employees dedicated to responding to online reviews requires substantial human and financial resources. Understanding how such investment leads to financial outcomes can provide strong justification for investment in offering management responses (Xie et al, 2017). 4. CASE STUDY ON APPLICATION OF SENTIMENT ANALYSIS IN THE HOSPITALITY AND TOURISM INDUSTRY Sentiment analysis is basically a classification task. User generated contents are classified acording to the expressed sentiment usually into the positive or the negative class (Grljević & Bošnjak, 2018). In the case study described in this section, the input set of 20491 reviews collected from the TripAdvisor website (Alam, 2016) 4 was used for the analysis. The input set was divided into the training and the test set, in 75:25 ratio. Besides the textual comments on hotels, the reviewers provided the numerical ratings on the one-to- five Likert type scale, which expressed their overall satisfaction. Rank 1 denoted the lower satisfaction, while rank 5 denoted the reviews with the largest degree of satisfaction. Since reviews in the dataset were not labelled with sentiment polarity, we used these numerical ratings to denote polarity of each review, as explained in details in (Gljević & Bošnjak, 2018). Reviews ranked by marks 1 and 2 were labeled as examples of hotel reviews with negative sentiment. Reviews ranked by marks 4 and 5 were labeled as examples of hotel reviews with positive sentiment. Labeling of reviews ranked by mark 3 was calculated based on sentiment score. The sentiment score of a review was calculated as the sum of z-scores for similarities of words in the review with the frequent single words, bigrams and trigrams observed in the positive or negative category of reviews (Gljević & Bošnjak, 2018). We experimented with different supervised classification algorithms with the goal of building a model that will successfully distinguish positive from negative reviews. As suggested in the papers (Jurafsky, 2018; Yang, 2018; Medhat, 2014), we used well established algorithms for sentiment analysis: Logistic Regression (LR), Naïve Bayes (NB), Support Vector Machines (SVM), Random Forest (RF), and Xtreme Gradiant Boosting (XGB). The performance of classifiers was measured over test data using Accuracy 4 Accessible at://zenodo.org/record/1219899/#.W9CbtXszapp https://www.sciencedirect.com/science/article/pii/S074756321000049X#bib18 66 O. GRLJEVIĆ, Z. BOŠNJAK, S. BOŠNJAK (indicator of overall success rate), Precision (the ratio of correctly classified reviews into the positive class and the total number of entities of this class), and Recall (percentage of positive reviews correctly classified as positive) as basic measures, and F1-measure (harmonic mean of Precision and Recall) and Precision-Recall curve (PR - tradeoff between precision and recall for different threshold) as compound measures. Evaluation measures are presented in more detail in papers (Ballabio et al, 2017; Berrar, 2019; Visa & Salembier, 2014). Figure 1 illustrates performance measures (Precision, Recall, and F-measure) for each classifier. Fig. 1 Evaluation of classifiers The results indicate that the SVM classifier, with 66.52% correctly classified positive reviews is the least appropriate for classification of collected reviews. Furthermore, the SVM algorithm successfully classified only 11% of negative reviews. NB achieved the best accuracy (88.54%). The F1-measures have shown that NB and LR resulted in best classification models, as they had, at the levels of 91% and 90% respectively, correctly classified positive reviews, and with somewhat lower levels, but over 83%, had correctly classified the negative reviews. The use of PR curve is particularly advised in case of largely skewed class distribution, which is the case with our dataset, as positive reviews are more frequent in the dataset than negative ones (80.61% of positive reviews and 19.39% of negative reviews). The analysis of PR curve indicate that LR model has the overall best performance. Experiments and results are presented in more detail in (Grljević & Bošnjak, 2018). When a satisfactory classification model is built, a model which successfully distinguishes positive from negative reviews, different visualization and summarization techniques can be applied over the results. These techniques facilitate the decision making process and identification of the favorable and bottleneck aspects of the business and, consequently, help management to undertake corrective actions. Also, sentiment analysis model can be combined with other machine learning techniques to uncover additional knowledge. In this paper we illustrate application of association rules to generate a network showing the relationships between terms which occur in the positive and negative category of reviews. Visualization of resulting associations could help managers to Contemporary Data Analysis Techniques for Online Reutation Management in Hospitality and Tourism 67 understand relationships between concepts under discussion in negative and positive reviews. Associative rules represent a data mining technique that is successfully used in determining consumer behavior by identifying rules that indicate frequent data, correlations, and data dependencies. The evaluation of the resulting association rules is based on the following parameters: support (the significance of the rules), confidence (the reliability of the rules or degree of confidence), gain (the difference between confidence and the support of consequent), lift (or interest measures the degree of dependence between the item sets), and conviction (both the support of the antecedent and the support of the consequent of the rule are taken into account) (Jimenez et al, 2010). We used RapidMiner 5 tool to conduct association rule mining over a set of positive reviews and a set of negative reviews. Each association is labelled with name (Rule 1, Rule 2, etc.) and values of support and confidence are associated to each rule, respectively. Association rules over positive category of reviews are presented in Figure 2. We have generated 147 association rules. Due to a large amount of data, we have visualized only rules with higher values for lift measure, i.e. more interesting rules. Concepts discussed in positive reviews revolve around hotel, room, staff, or location. Based on presented relationships in Figure 2, we can see that people evaluate positively mostly expected aspects, such as great location, great service, friendly or helpful staff, or clean rooms. Fig. 2 Relationships between terms which occur in positive reviews 5 RapidMiner https://rapidminer.com/ 68 O. GRLJEVIĆ, Z. BOŠNJAK, S. BOŠNJAK Since negative reviews convey sources of dissatisfaction of consumers we look more closely to the resulting association rules over negative category of reviews. We have generated 89 association rules with a 55% confidence. Based on the results, some concepts which travelers discuss in negative reviews differ from the concepts in positive reviews. Mostly these concepts revolve around the overall stay in the hotel and the food, while the hotel and room are discussed in a different manner in negative compared to positive reviews. Figure 3 illustrates rules with mentions of room and hotel. While in positive reviews travelers put emphasis on the hotel, in negative review the emphasis is on room. The presented results point to the following sources of dissatisfaction among travelers: unclean bathroom, check-in process, problems with staff or room-service, no water, cleanliness, or the fact that the room was not ready upon traveler arrival (to obtain this conclusion we have inspected in more detail reviews containing the word told, in most cases travelers were told upon arrival that the room was not ready). A positive evaluation of food stretches through negative reviews (with support 0.189 and confidence level of 58.3%). Among the analyzed reviews, if travelers mention beach, they will also mention food. This rule has confidence of 70.5%. If we observe this vice versa, if a review contains a mention of food, we can say with 59.3% confidence that there will be a mention of beach as well. Fig. 3 Relationships between terms which occur in negative reviews Contemporary Data Analysis Techniques for Online Reutation Management in Hospitality and Tourism 69 The results of experiments led to the conclusion that all selected algorithms, except the SVM, are adequate for sentiment analysis of collected reviews in the hospitality and tourism sector. However, they have also highlighted that on the same data it is possible to obtain different results and therefore good analytic skills are necessary in order to conduct a large number of experiments with diversified classification algorithms, model parameters, and performance measures. Although we have not conducted aspect-based sentiment analysis, the application of association rules over positive and negative category of reviews and their visualization revealed useful insights regarding the preferences of travelers. 5. CONCLUSION User-generated contents on the Web, in the form of reviews and/or ratings, convey consumers’ opinions and feelings towards products, services or other entities. Prospective online customers use this freely provided data in their decision making process. With expansion of reviewing websites, online reviews are the third influencing factor on purchase decisions after coupons and discounts (Yang et al, 2015). Textual comments provide fine-grained information about the service provider’s reputation that is likely to engender a buyer’s trust in the service provider’s competence and credibility. Consequently, companies should monitor the online activities and sentiment of their customers and promptly respond to customer comments and problems. This is even more emphasized in the hospitality and tourism industry that provides subjective services, heterogeneous in nature, where no “try before you buy” or “return if not satisfied” features exist so the perceived risk for the consumers is even higher. With such a strong impact on purchasing decisions online reviews affect online sales as well, and they should be treated as a strategic tool in hospitality and tourism management, particularly in promotion, online sales, and management of online reputation (Schuckert & Law, 2014). Due to the constant growth of the number of websites with tourists’ reviews and the sheer amount of feedback information on their expectations, preferences, (dis)satisfaction, etc., automated techniques are needed to process all the reviews in order to gain full insight in tourists’ sentiment. Opinion mining and sentiment analysis enable tourism organizations to assess the influence of aggregated good or bad comments on the accommodation choices, to acquire the information on guests’ perception of the tourism and the business related entities (such as the staff, comfort of rooms, intensity of noise, quality of food or other aspects of the tourism offer). Such insights are vital for leveraging the services and the organization of the business. The case study we have conducted has shown that not all methods that we have at our disposal are suitable for sentiment analysis, while some of the developed automated approaches are more adequate than others. Skillful analysts could, by means of classification algorithms (LR, NB, RF, XGB, SVM) and association rules, reveal emotions, criticism, (dis)satisfaction in user generated reviews and present them to the management, who could act accordingly for online reputation improvement. In the past several years, our understanding of the impacts of online reviews has developed to a great extent, due to technical advancements in our capacity to process and analyze new forms of data in increasingly large quantities. Analytical methods for textual data (such as reviews) processing have evolved to more sophisticated machine learning 70 O. GRLJEVIĆ, Z. BOŠNJAK, S. BOŠNJAK tools such as sentiment analysis and topic modelling, to extract deep meanings from large quantities of texts. As social media websites continue to evolve and user-generated contents become more diverse and richer in terms of both content and format, our ability to understand managerial problems will likely be defined by technical tools to process, analyze, and interpret these new data. REFERENCES Agarwal, A., Biadsy, F. & Mckeown, K.R. (2009). Contextual phrase-level polarity analysis using lexical affect scoring and syntactic n-grams. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 24-32. Alam, M.H., Ryu, W.-J. & Lee, S. (2016). Joint multi-grain topic sentiment: modeling semantic aspects for online reviews. Information Sciences, 339, 206–223. Alm, C.O., Roth, D. & Sproat, R. (2005). Emotions from text: Machine learning for text-based emotion prediction. In Proceedings of the Joint Conference on HLT–EMNLP, Vancouver, Canada. Aye, Y.M. & Aung, S.S. (2018). Senti-Lexicon and Analysis for Restaurant Reviews of Myanmar Text. International Journal of Advanced Engineering, Management and Science (IJAEMS), 4 (5). Ballabio, D., Grisoni, F. & Todeschini, R. (2018). Multivariate comparison of classification performance measures. Chemometrics and Intelligent Laboratory Systems, 174, 33-44. doi: doi.org/10.1016/j.chemolab.2017.12.004 Berrar, D. (2019). Performance Measures for Binary Classification. Encyclopedia of Bioinformatics and Computational Biology, 1, 546-560. doi: doi.org/10.1016/B978-0-12-809633-8.20351-8 Berthon, P.R., Pitt, L.F., Plangger, K. & Shapiro, D. (2012). Marketing meets Web 2.0, social media, and creative consumers: Implications for international marketing strategy. Business Horizons, 55 (3), 261-271. Bibi, M. (2017). Sentiment Analysis at Document Level. Retrieved from https://www.researchgate.net/ publication/320729882_Sentiment_Analysis_at_Document_Level, uploaded on 31 October 2017. Bing, P. & Yang, Y. (2016). Monitoring and Forecasting Tourist Activities with Big Data. In: Management Science in Hospitality and Tourism. Theory, Practice, and Applications, New York, Apple Academic Press. Broß, J. (2013). Aspect-Oriented Sentiment Analysis of Customer Reviews Using Distant Supervision Techniques, Dissertation at Freie Universität Berlin, Berlin. Brooks, M., Kuksenok, K., Torkildson, M.K., Perry, D., Robinson, J.J., Scott, T.J., Anicello, O., Zukowski, A., Harris, P. & Aragon, C.R. (2013). Statistical affect detection in collaborative chat. In Proceedings of the 2013 conference on Computer supported cooperative work, pp. 317–328. ACM. Bucur, C. (2015). Using Opinion Mining Techniques in Tourism, 2nd Global Conference on Business, Economics, Management and Tourism, Prague, Czech Republic. Procedia Economics and Finance, 23, 1666-1673. Dave, K., Lawrence, S. & Pennock, D.M. (2003). Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. In WWW '03 Proceedings of the 12th international conference on World Wide Web. ACM New York, 519-528. Davidow, M. (2003). Organizational responses to customer complaints: What works and what doesn’t. Journal of Service Research, 5, 225-250. Dos Santos, C.N. & Gatti, M. (2014). Deep convolutional neural networks for sentiment analysis of short texts. In COLING, 69–78. Gligorijevic, B. & Luck, E. (2012). Engaging Social Customers – Influencing New Marketing Strategies for Social Media Information Sources. In Contemporary Research on E-business Technology and Strategy, 25-40. Springer Berlin Heidelberg. Grljević, O. (2016). Sentiment u sadržajima sa društvenih mreža kao instrument unapređenja poslovanja visokoškolskih institucija. Univerzitet u Novom Sadu, Ekonomski fakultet u Subotici, doktorska disertacija. Grljević, O. & Bošnjak, Z. (2018). Sentiment analysis of customer data. International Journal of Strategic Management and Decision Support Systems in Strategic Management, 23(3), 38-49. Grljević, O. & Bošnjak, Z. (2018). Evaluating customer satisfaction through online reviews and ratings. In V. Bevanda & S. Štetić (Eds.) 3rd International Thematic Monograph – Thematic Proceedings: Modern Management Tools and Economy of Tourism Sector in Present Era. Belgrade. Belgrade, Serbia: Association of Economists and Managers of the Balkans in cooperation with the Faculty of Tourism and Hospitality, Ohrid, Macedonia. ISBN: 978-86-80194-14-1 Contemporary Data Analysis Techniques for Online Reutation Management in Hospitality and Tourism 71 Gruen, T.W., Osmonbekov, T. & Czaplewski, A.J. (2006). eWOM: The impact of customer-to-customer online know- how exchange on customer value and loyalty. Journal of Business Research, 59 (4), 449-456. Holzman, L.E. & Pottenger, W.M. (2003). Classification of emotions in internet chat: An application of machine learning using speech phonemes. Tech. rep., Leigh University. Hu, M. & Liu, B. (2004). Mining Opinion Features in Customer Reviews. Proceedings of the 19 th International Conference on Artificial Intelligence AAAI'04, 755-760. Ikeda, D., Takamura, H., Ratinov, L.-A. & Okumura, M. (2008). Learning to Shift the Polarity of Words for Sentiment Classification. Transactions of the Japanese Society for Artificial Intelligence, 25 (1), 50-57. Jimenez, A., Berzal, F. & Cubero, J.C. (2010). Interestingness Measures for Association Rules within Groups. In E. Hullermeier, R. Kruse, and F. Hoffmann (Eds.): IPMU 2010, Part I, CCIS 80, pp. 298–307. Available at: https://pdfs.semanticscholar.org/40f9/fd7259b15bd09f6dc0552c4e54cebfbe92fb.pdf Jurafsky, D. & Martin, J.H. (2018). Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Draft of September 23, 2018. Kim, S.M. & Hovy, E. (2004). Determining the Sentiment of Opinions. In COLING '04 Proceedings of the 20th international conference on Computational Linguistic. Lee, Y.L. & Song, S. (2010). An empirical investigation of electronic word-of-mouth: Informational motive and corporate response strategy. Computers in Human Behavior, 26 (5), 1073-1080. Lei, S. & Law, R. (2015). Content Analysis of TripAdvisor Reviews on Restaurants: A Case Study of Macau. Journal of tourism, 16 (1), 17-28. Leung, D., Law, R., van Hoof, H. & Buhalis, D. (2013). Social media in tourism and hospitality: A literature review. Journal of Travel and Tourism Marketing, 30 (1-2), 3–22. Li, X., Bing, L., Li, P., Lam, W. & Yang, Z. (2018). Aspect Term Extraction with History Attention and Selective Transformation. IJCAI 2018, Computation and Language, arXiv:1805.00760 Litvin, S., Goldsmith, R.E., Pan, B. (2008). Electronic word-of-mouth in hospitality and tourism management. Tourism Management, 29 (2008), 458–468. Ma, Y., Xiang, Z., Du, Q. & Fan, W. (2018). Effects of user-provided photos on hotel review helpfulness: An analytical approach with deep leaning. International Journal of Hospitality Management, 71, 120-131. Marrese-Taylor, E., Velásquez, J.D., Bravo-Marquez, F. & Matsuo, Y. (2013). Identifying Customer Preferences about Tourism Products using an Aspect-Based Opinion Mining Approach, Procedia Computer Science 22, 182-191, 17 th International Conference in Knowledge Based and Intelligent Information and Engineering Systems - KES2013. Medhat, W., Hassan, A. & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5, 1093-1113. Mohammad, S. (2012). Portable features for classifying emotional text. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 587–591, Montreal, Canada. Mohammad, S.M. (2012a). #emotional tweets. In Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, SemEval ’12, pp. 246–255, Stroudsburg, PA. Nakagawa, K. Inui. & S. Kurohashi. (2010). Dependency tree-based sentiment classification using CRFs with hidden variables. In NAACL, HLT Öğüt, H. & Onur, T.B.K. (2012). The influence of internet customer reviews on the online sales and prices in hotel industry. The Service Industries Journal, 32 (2), 197 – 214. Pang, B. & Lee, L. (2008). Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, 2 (1-2), 1-135. Pang, B., Lee, L. & Vaithyanathan, S. (2002). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Volume 10. Association for Computational Linguistics, 79–86. Park, D.-H., Lee, J. & Han, I. (2007). The Effect of On-Line Consumer Reviews on Consumer Purchasing Intention: The Moderating Role of Involvement. International Journal of Electronic Commerce, 11 (4), 125-148. Phillips, P., Zigan, K., Santos Silva, M.M. & Schegg, R. (2015). The interactive effects of online reviews on the determinants of Swiss hotel performance: A neural network analysis. Tourism Management, 50, 130-141 Pitt, L.F., Berthon, P.R., Watson, R.T. & Zinkhan, G.M. (2002). The Internet and the birth of real consumer power. Business Horizons, 45 (4), 7-14. https://www.sciencedirect.com/science/article/pii/S074756321000049X#%21 https://www.sciencedirect.com/science/article/pii/S074756321000049X#%21 72 O. GRLJEVIĆ, Z. BOŠNJAK, S. BOŠNJAK Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., Mohammad, A. S. & Hoste, V. (2016). SemEval-2016 task 5: Aspect based sentiment analysis. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), 19-30. Puri, C.A., Kush, G., Kumar, N. (2017) Opinion Ensembling for Improving Economic Growth through Tourism. Procedia Computer Science 122, 237-244. Racherla, P., Connolly, D.J. & Christodoulidou, N. (2013). What Determines Consumers' Ratings of Service Providers? An Exploratory Study of Online Traveler Reviews. Journal of Hospitality Marketing and Management, 22 (2), 135-161, doi: 10.1080/19368623.2011.645187 Socher, R., Pennington, J., Huang, E.H., Andrew Y. Ng & Manning, C.D. (2011). Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, July 27–31, 151–161, http://www.aclweb.org/ anthology/D11-1014 Schuckert, M.L. & Law, X.R. (2014): Hospitality and tourism online reviews: Recent trends and future directions. Journal of Travel and Tourism Marketing, 32 (5), 608-621, doi:10.1080/10548408.2014.933154 Sparks, B.A., FungSo, K.K. & Bradley, G.L. (2016). Responding to negative online reviews: The effects of hotel responses on customer inferences of trust and concern. Tourism Management, 53, 74-85. Strapparava, C. & Mihalcea, R. (2007). Semeval-2007 task 14: Affective text. In Proceedings of SemEval-2007, pp. 70–74, Prague, Czech Republic. Suttles, J. & Ide, N. (2013). Distant supervision for emotion classification with discrete binary values. In Computational Linguistics and Intelligent Text Processing, pp. 121– 136. Springer. Tjahyanto, A. & Sisephaputra, B. (2017). The Utilization of Filter on Object-Based Opinion Mining in Tourism Product Reviews. Procedia Computer Science, 124, 38-45. Tripp, T. M. & Grégoire, Y. (2011). When Unhappy Customers Strike Back on the Internet. MIT Sloan Management Review, 52 (3), 37-44. Turney, P.D. (2002). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In proceedings of the 40 th annual meeting on association for Computational Linguistics (ACL'02), Philadelphia, Pennsylvania, USA. Visa, G.P. & Salembier, P. (2014). Precision-Recall-Classification Evaluation Framework: Application to Depth Estimation on Single Images. In D. Fleet et al. (Eds.): ECCV 2014, Part I, LNCS 8689, pp. 648– 662. Available at: https://imatge.upc.edu/web/sites/default/files/pub/cPalou14_0.pdf Xie, K.L., Kam Fung So, K. & Wang, W. (2017). Joint effects of management responses and online reviews on hotel financial performance: A data-analytics approach. International Journal of Hospitality Management, 62, 101-110. Yang, C.-S., Chen, C.-H. & Chang, P.-C. (2015). Harnessing consumer reviews for marketing intelligence: a domain-adapted sentiment classification approach. Information Systems and e-Business Management, 13 (3), 403-419. Yang, Y. & Loog, M. (2018). A benchmark and comparison of active learning for logistic regression. Pattern Recognition, 83, 401-415. Zhang, Y., Lai, G., Zhang, M., Zhang, Y., Liu, Y. & Ma, S. (2014). Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, 83-92. SAVREMENE TEHNIKE ANALIZE PODATAKA ZA MENADŽMENT ONLAJN REPUTACIJE U HOTELIJERSTVU I TURIZMU Saznanja o tome šta privlači a šta odvraća turiste od turističke posete i na koje proizvode obratiti posebnu pažnju, te koje proizvode ponuditi je od presudne važnosti za ostvarivanje dobrih ekonomskih rezultata. Do saznanja ove vrste možemo doći analizom onlajn komentara i recenzija koje savremeni turisti ostavljaju nakon turističkog iskustva na veb sajtovim (kao što su Booking.com, TripAdvisor, Trivago, i dr.). U radu je opisan značaj onlajn recenzija za menadžment, koji putem njih dobija informaciju o mišljenjima i emocijama korisnika njihovih turističkih usluga, a pogotovu o (ne)zadovoljstvu određenim aspektima ponude, te se pruža mogućnost da iskoriste uočene prednosti, a https://www.sciencedirect.com/science/article/pii/S0261517715300121#%21 Contemporary Data Analysis Techniques for Online Reutation Management in Hospitality and Tourism 73 isprave nedostatke preduzimanjem pravovremenih korektivnih mera i akcija. Kroz studiju slučaja nad 20491 recenzijom sa TripAdvisor-a su opisani savremeni pristupi i metode za analizu korisnički generisanog sadržaja i mogućnosti za unapređenje koje one donose u domenu hotelijerstva i turizma. Realizovana je sentiment analiza nad prikupljenim onlajn recenzijama sa ciljem izgradnje automatizovanog modela koji uspešno pravi razliku između pozitivnih i negativnih recenzija. Klasifikacioni model zasnovan na logističkoj regresiji ispoljava najbolje performanse. U 90% slučajeva uspešno klasifikuje pozitivne recenzije, dok u 83% slučajeva uspešno klasifikuje negativne. Pored primene sentiment analize, ilustrovana je upotreba asocijativnih pravila kao pomoć menadžmentu u otkrivanju relacija između koncepata o kojima posetioci diskutuju unutar pozitivnih, odnosno negativnih recenzija. Ključne reči: menadžment onlajn reputacijom, e-marketing “od usta do usta”, onlajn recenzije, automatizovana klasifiacija, sentiment analiza, asocijativna pravila