Vol. 4, No. 1 | January - June 2020

SJCMS | E-ISSN: 2520-0755 | Vol. 4 | No. 1 | © 2020 Sukkur IBA University
11

Aspect Based Sentimental Analysis of Hotel Reviews: A

Comparative Study

Sindhu Abro1, Sarang Shaikh1, Rizwan Ali1, Sana Fatima1, Hafiz Abid

Mahmood Malik2

Abstract:

The increasing use of the internet enables users to share their opinion about what they like and

dislike regarding products and services. For efficient decision making, there is a need to analyze

these reviews. Sentiment analysis or opinion mining is commonly used to detect polarity

(positive or negative) of reviews. But it does not show the aspect or orientation of the text. In

this study, we have employed state-of-art approaches to perform three tasks on the SemEval

dataset. Tasks A and B are related to predicting the aspect of the restaurant’s reviews, whereas

task C shows their polarity. Additionally, this study aims to compare the performance of two

feature engineering techniques and five machine learning algorithms to evaluate their

performance on a publicly available dataset named SemEval-2015 Task 12. The experimental

results showed that the word2vec features when used with the support vector machine algorithm

outperformed by giving 76%, 72%, and 79% off overall accuracies for Task A, Task B, and Task

C respectively. Our comparative study holds practical significance and can be used as a baseline

study in the domain of aspect-based sentiment analysis.

Keywords: Aspects Based Sentiment Analysis; Sentiment Analysis; Text Classification; Natural

Language Processing (NLP); Word2Vec; Machine Learning

1. Introduction

In recent years, there is a rapid growth of
content generated by users on the internet. The
web enables users to share their reviews and
experiences about services and products.
Moreover, it is a growing trend that customers
look already available reviews before
purchasing any product or service [1].
Therefore, sellers and organizations need to
analyze the reviews for effective decision
making. The manual process to analyze the
reviews is a labor-intensive and time-
consuming task. Hence, techniques like
sentiment analysis or opinion analysis are
commonly used to extract information from

1Department of Computer Science, Sukkur IBA University, Pakistan
2Department of Computer Science, Arab Open University, Bahrain

reviews. The sentiment analysis, under the
domain of natural language processing, used
to determine the general opinion (e.g. positive
or negative) of the group of individuals
regardless of topic or entity (e.g. food, price,
location, etc.) [2]. Therefore, it is
recommended to use aspect-based sentiment
analysis (i.e. ABSA). This concerned with the
decomposition of two tasks namely aspect
identification and sentiment analysis [3]. In
the first task, the aspect of an entity is
identified and in the second task, the polarity
is estimated for each identified aspect. The
sentiment analysis on the aspect level
performs an in-depth analysis of reviews [4].

Sindhu Abro (et al.), Aspect Based Sentiment Analysis of Hotel Reviews: A Comparative Study (pp. 11 - 20)

SJCMS | E-ISSN: 2520-0755 | Vol. 4 | No. 1 | © 2020 Sukkur IBA University
12

For example, when we look at the reviews
of the restaurant, ABSA not only returns the
overall sentiment of the reviews but also
returns for which entity the sentiment is
talking about. Such as food, price, location,
service, etc. Thus, the results generated from
this technique gives a better understanding of
what reviewers like and dislike regarding the
topic [1]. Moreover, it may help customers to
decide on the purchase of the products or using
the services. Additionally, ASBA enables
manufacturers to improve the quality of their
products and services. Therefore, in this study,
we have used ABSA to identify the aspects
and their polarity of the reviews related to the
restaurant.

The proposed solution employed the
different feature engineering techniques and
ML algorithms to classify restaurant reviews
under different entities, attribute, and their
polarities. Regardless of this extensive amount
of work, it remains difficult to compare the
performance of these approaches to classify
hotel reviews text. To the best of our
knowledge, the existing studies lack the
comparative analysis of different feature
engineering techniques and ML algorithms
regarding the reviews related to restaurants.
Therefore, this study contributes to solving
this problem by comparing two feature
engineering and five ML classifiers on the
standard dataset provided by SemEval. This
study will serve future researchers in the field
of automatic ABSA.

This rest of the paper is organized as
Section 2 highlights the related works. Section
3 discusses the methodology. Sections 4, and
5 explain the experimental setup, and results.
Finally, Section 6 discusses the conclusion,
and future work as well.

2. Related Works

Kiritchenko et al. [5] classified the reviews

using the lexicon and linguistic features.

Castellucci et al. [6] used a feature based on a

bag of words that have been learned from

external data. Hu and Liu [7] used an

association rule-based system for aspect

identification. Additionally, his book [8]

highlights the four methods to extract aspects

namely, frequent phrases, opinion, and target

relations, supervised learning, and topic

models. Jakob and Gurevych [9] employed

the conditional random fields for aspect term.

Bhattacharyya [11] developed the system
which uses dependency parsing rules for
opinion extraction. Many researchers used a
hybrid approach (i.e. NLP with statistical
methods) to improve the performance of the
system. In SemEval 2014, Kiritchenko et al.
[5] used an entity tagging system named as in-
house to extract outside and aspect terms. Toh
and Wang [12] used the tagging approach with
Wordnet and word clusters. Socher et al. [13]
employed grammatical cues with deep
learning. Carrascosa [14] study showed that an
ensemble learning technique can also be
applied in sentiment analysis. In the Aspect
Category Polarity Detection task in SemEval
2014, Mohammad et al. [15] achieved the best
performance by using different linguistic
features, additionally, they also used
publically available sentiment lexicon.

Broadly, ABSA methods can be divided
into two categories, one that uses domain-
independent solutions [16] and second is to
use domain-specific knowledge [4] to improve
the results. There is a common approach used
by researchers that they treat aspect extraction
and their polarity classification independently
[17], but others also trained one model to solve
the two problems [18].

3. Methodology

3.1. Overview

This section represents the overall research
methodology that has been followed to
perform the ASBA. Fig 1 shows the steps
required to train the model. As shown here,
our research methodology is composed of six
key steps namely data collection, data
preprocessing, feature engineering, data
selection, classification model construction,
and classification model evaluation. The

Sindhu Abro (et al.), Aspect Based Sentiment Analysis of Hotel Reviews: A Comparative Study (pp. 11 - 20)

SJCMS | E-ISSN: 2520-0755 | Vol. 4 | No. 1 | © 2020 Sukkur IBA University
13

details of each step are discussed in
subsequent sections.

Fig 1. Overall Proposed Methodology

Sindhu Abro (et al.), Aspect Based Sentiment Analysis of Hotel Reviews: A Comparative Study (pp. 11 - 20)

SJCMS | E-ISSN: 2520-0755 | Vol. 4 | No. 1 | © 2020 Sukkur IBA University
14

3.2. Data Collection

In this study, we have used publicly
available data set from SemEval-2016 - Task
513. This dataset contains reviews for laptops
and restaurants. In this study, we will only
focus on the reviews related to the restaurant.
There is 3658 number of instances for the
restaurant; 2799 for training and the remaining
859 for testing. In this dataset, the reviews can
be categorized on the basis of aspects (i.e.
category, entity, or attribute) and their
polarities. By using aspect-based
classification, the reviews can be labeled into
six distinct classes of entity columns namely,
food, restaurant, service, ambiance, drinks,
and location. Additionally, the attribute can be
labeled as general, quality, prices, style-
options, and miscellaneous classes. However,
their polarities can be positive, negative, or
neutral. The distribution of reviews in training
data based on entity, attribute, and polarity is
shown in Fig 2, Fig 3 and Fig 4 respectively.

3.3. Text Preprocessing

Several studies show that there is a need to
clean data for better classification results [19].
Therefore, we applied several preprocessing
techniques to remove features from the data
that are not informative. In this step, we have
dropped the instances with blank values i.e.
292. Additionally, we have dropped the
columns that are not required for text
classification i.e. review-id, sentence-id,
target, and category. After dropping the empty
cells and selecting the required attributes, we
converted the text (2507 remaining instance)
into a lower case. Using regular expressions
and pattern matching techniques, we removed
white spaces, punctuation's and stop words. In
addition, we have also applied tokenization
and lemmatization on the preprocessed text. In
tokenization, each sentence is converted into
tokens or words, then words are converted to
their root forms using WordNet lemmatizer
e.g. posts to post

3 The dataset is available at:
http://alt.qcri.org/semeval2016/task5/index.php?id
=data-and-tools

Fig. 2. Entity base distribution

Fig. 3. Attribute Base Distribution

Fig. 4 Review base distribution

Sindhu Abro (et al.), Aspect Based Sentiment Analysis of Hotel Reviews: A Comparative Study (pp. 11 - 20)

SJCMS | E-ISSN: 2520-0755 | Vol. 4 | No. 1 | © 2020 Sukkur IBA University
15

3.4. Feature Engineering

To learn classification rules, ML
algorithms need numerical vectors because
they cannot learn from raw data. Therefore, in
classification one of the key steps is feature
engineering. This step is used to extract the
key features from raw text and represents the
extracted features in numerical form. In this
study, we have performed two types of
features engineering techniques namely n-
gram [20] with TFIDF [21], and Word2vec
[22].

3.5. Data Selection

In this section, we have used two
approaches to build the models named as train
test split and whole data set. In the first
approach, we have used the Pareto Principle.
According to this principle, “80% of effects
come from 20% of causes” [28]. This principle
is also called an 80:20 ratios. In this study, we
have split preprocessed data into a previously
given ratio i.e. 80% for training and 20% for
testing. Table 1, Table 2, and Table 3 show the
class-wise distribution on the basis of an
entity, attribute, and polarity as well as their
train test splitting ratio. The training data is
used to train the classification models for
learning rules. However, the test data is used
to evaluate the trained models.

Table 1: Approach I (Entity)
Class Label Total Train Test

Ambience 0 255 204 51

Drinks 1 99 79 20

Food 2 1076 861 215

Location 3 28 22 6

Restaurant 4 600 480 120

Service 5 499 359 90

Total 2507 2005 502

Table 2: Approach I (Attribute)
Class Labe

l
Tota

l
Trai

n
Tes

General 0 1154 923 231

Miscellaneou
s

1 98 78 20

Prices 2 190 152 38

Quality 3 896 717 179

Style_options 4 169 135 34

Total 2507 2005 502

Table 3: Approach I (Polarity)

Class Label Total Train Test

Negative 0 749 599 150

Neutral 1 101 81 20

Positive 2 1657 1325 332

Total 3 2507 2005 502

In the second approach, we have used the

whole data (i.e. 2507 number of instances) to

train the model and for evaluation, different

test data (i.e. 859 number of instances) were

used. Table 4, Table 5 and Table6 show the

distribution of data on the basis of entity,

polarity, and attribute respectively.

Table 4: Approach II (Entity)

Class Label Total Train Test

Ambience 0 321 255 66

Drinks 1 137 99 38

Food 2 1467 1076 391

Location 3 41 28 13

Restaurant 4 796 600 196

Service 5 604 449 155

Sindhu Abro (et al.), Aspect Based Sentiment Analysis of Hotel Reviews: A Comparative Study (pp. 11 - 20)

SJCMS | E-ISSN: 2520-0755 | Vol. 4 | No. 1 | © 2020 Sukkur IBA University
16

Table 5: Approach II (Attribute)
Class Labe

l
Tota

l
Trai

n
Tes

General 0 1530 1154 376

Miscellaneou
s

1 131 98 33

Prices 2 238 190 48

Quality 3 1231 896 355

Style_options 4 236 169 67

Total 3366 2507 859

Table 6: Approach II (Polarity)

Class Label Total Train Test

Negative 0 953 749 204

Neutral 1 145 101 44

Positive 2 2268 1657 611

Total 3 3366 2507 859

3.6. Machine Learning Models

According to “no free lunch theorem” [23],
any single classifier cannot outperform better
on all types of datasets. Therefore, it is
suggested to apply several classifiers on a
master numerical vector to see which one
achieves better results. Hence, we chose five
different classifiers Naïve Bayes (NB) [24],
Support Vector Machine (SVM) [25],
Random Forest (RF) [26], Logistic Regression
(LR) [27], and Ensemble in approach 1.
Whereas in approach 2, we have chosen SVM
and NB classifiers.

3.7. Classifier Evaluation

In this step, the constructed classifiers
were used to predict the class of unlabeled text
using test sets. The classifier performance is
evaluated by calculating true positives (TP),
false positives (FP), true negatives (TN), and
false negatives (FN). These four numbers
constitute a confusion matrix as in Fig 5. To
assess the performance of the constructed
classifiers different performance metrics can
be used like precision, recall, F measure, or

accuracy. The details of given performance
measures are given in [29]. However, in this
study, we have used the most commonly used
measure i.e. accuracy to evaluate the
constructed classifiers. The details of this
performance measure are given below.

Fig. 5 Confusion Matrix

Accuracy

This evaluation matrix refers to the total

number of instances that are correctly

classified by the trained model. Refer to (1).

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
(𝑇𝑃 + 𝑇𝑁)

𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁
(1)

4. Experimental Setup

As mentioned in section A, the reviews

can be categorized on the basis of aspects and

their polarities. In this study, we have

performed three tasks. In Task A, we have

classified the reviews according to entity type

(i.e. food, restaurant, service, ambiance,

drinks, and location). In Task B, reviews are

categorized according to attributes and labeled

as general, quality, prices, style-options, and

miscellaneous classes. Whereas in Task C, we

have classified reviews according to their

polarity like positive, negative, and neutral.

For all these tasks we have used two

master feature representations namely n-gram

(bigram) with TFIDF [21] and Word2Vec

[22]. By using these master feature

representations, we have followed two

approaches to train the models. In approach 1,

we used the train test split to train the five

classifiers and evaluated their performance on

test data. Whereas in approach 2, we used the

Sindhu Abro (et al.), Aspect Based Sentiment Analysis of Hotel Reviews: A Comparative Study (pp. 11 - 20)

SJCMS | E-ISSN: 2520-0755 | Vol. 4 | No. 1 | © 2020 Sukkur IBA University
17

whole dataset to train the models which have

outperformed in approach 1 and evaluated

their performance by using different test data.

5. Results

This section reports the results of all three

tasks. Table 7, Table 8 and Table 9 show the

accuracy using approach 1 (i.e. train test split)

for Task A, B, and C, respectively. As shown

in all three tables, the highest accuracy for

Task A (0.71), Task B (0.69), and Task C

(0.81) were obtained by SVM with word2vec.

Table 7: Approach I Results (Train-Test

Split) - Task A
Task A

Bigram
(TFIDF)

Word2Vec

LR 0.59 0.70

NB 0.55 0.65

RF 0.58 0.57

SVM 0.63 0.71

Ensemble 0.61 0.67

Table 8: Approach I Results (Train-Test

Split) - Task B
Task B

Bigram
(TFIDF)

Word2Vec

LR 0.60 0.67

NB 0.61 0.58

RF 0.54 0.56

SVM 0.58 0.69

Ensemble 0.57 0.66

In text-classification models, the SVM

classifier performed exceptionally well among

all 5 classifiers. If we evaluate the

performance of all classifiers with respect to

master feature representation, then we can see

in Table 10 and Table 11 that for Task A and

Task C the SVM classifiers with both master

feature representations outperformed.

Whereas, from Table 8, Task B the NB

using bigram with TFIDF (0.61) and SVM

with word2vec (0.69) obtained the highest

accuracy. Therefore, in approach 2, we have

trained 6 models (3 Tasks x 2 master feature

representations) on the whole dataset. The

detail of all combinations is shown in Table

10.

Table 9: Approach I Results (Train-Test

Split) - Task C
Task C

Bigram
(TFIDF)

Word2Vec

LR 0.73 0.80

NB 0.75 0.74

RF 0.73 0.74

SVM 0.78 0.81

Ensemble 0.75 0.80

Table 10: Model Selection for Approach II
Task Bigram

(TFIDF)
Word2Vec

A NB SVM

B SVM SVM

C SVM SVM

Ensemble 0.75 0.80

We have evaluated all these models on test

data (i.e. 859). Table 11 shows the results of

approach 2. It shows that word2vec obtained

the best performance as compared to bigram

features with TFIDF.

Table 11: Approach 2 Results for Task A, B

& C

Task Bigram
(TFIDF)

Word2Vec

A 0.70 0.76

B 0.67 0.72

C 0.78 0.79

Sindhu Abro (et al.), Aspect Based Sentiment Analysis of Hotel Reviews: A Comparative Study (pp. 11 - 20)

SJCMS | E-ISSN: 2520-0755 | Vol. 4 | No. 1 | © 2020 Sukkur IBA University
18

Furthermore, Fig 6, Fig 7 and Fig 8 show the

confusion matrices of best-performing

analyses. Fig 6 shows the confusion matrix of

the SVM classifier using word2Vec for Task

A. As shown here, out of 859 instances, 651

were correctly classified. Of these 651

instances, 47, 11, 341, 140, and 112 were

classified as ambiance, drinks, food,

restaurant, and service respectively. We can

see that all 13 instances of location class were

falsely classified.

Fig. 6 Task A (Feature: Word2Vec,

Classifier: SVM)

However, Fig 7 shows the confusion

matrices of the SVM classifier using

word2Vec features for Task B. As shown here,

621 instances out of 859 were correctly

classified (i.e. General: 336 out of 376,

Miscellaneous: 0 out of 33, Prices: 19 out of

48, Quality: 262 out of 335, and Style-options:

4 out of 67).

For Task C, the confusion matrix is

shown in Fig 8. It shows that the SVM

classifier with word2Vec features correctly

classified 621 out of 859 instances, 124 as

negative, and the remaining 557 as positive.

As shown here, its performance was lowest in

class 1 (i.e. neutral).

Fig. 7 Task B (Feature: Word2Vec,
Classifier: SVM)

Fig. 8 Task C (Feature: Word2Vec,
Classifier: SVM)

6. Conclusion

This study applied automated text

classification techniques to classify the

restaurant’s reviews according to aspect and

their polarities. Moreover, this study

compared two feature engineering techniques

and five ML algorithms to perform three tasks

like a) classification of restaurant’s reviews

according to entity type, b) classification of

restaurant’s reviews according to their

attribute and c) classification of restaurant’s

reviews according to their polarities. The

experimental results showed that the

word2vec showed better results for all tasks as

Sindhu Abro (et al.), Aspect Based Sentiment Analysis of Hotel Reviews: A Comparative Study (pp. 11 - 20)

SJCMS | E-ISSN: 2520-0755 | Vol. 4 | No. 1 | © 2020 Sukkur IBA University
19

compared to bigram represented through

TFIDF feature engineering techniques.

Moreover, the SVM algorithm showed better

results as compared to NB, LR, RF, and

Ensemble for all three tasks. The lowest

results were observed in NB, RF, and LR for

Task A, Task B, and Task C respectively. The

outcomes from our study hold practical

significance because these will be used as a

baseline to compare future researches within

different automatic text classification

methods. In the future, the accuracy of the

proposed system’s classification can be

increased by the following two strategies.

First, the deep learning-based approaches will

be explored and evaluated by comparing it

with current state-of-the-art results. Secondly,

more instances will be collected and used in

the experiments for learning the classification

rules efficiently.

REFERENCES

[1] Ekawati, D., and M.L. Khodra. Aspect-based

sentiment analysis for Indonesian restaurant

reviews. in 2017 International Conference on

Advanced Informatics, Concepts, Theory,

and Applications (ICAICTA). 2017. IEEE.

[2] Wang, J., Encyclopedia of Data Warehousing

and Mining, (4 Volumes). 2009: iGi Global.

[3] Schouten, K. and F. Frasincar, Survey on

aspect-level sentiment analysis. IEEE

Transactions on Knowledge and Data

Engineering, 2015. 28(3): p. 813-830.

[4] Thet, T.T., J.-C. Na, and C.S. Khoo, Aspect-

based sentiment analysis of movie reviews on

discussion boards. Journal of information

science, 2010. 36(6): p. 823-848.

[5] Kiritchenko, S., et al. NRC-Canada-2014:

Detecting aspects and sentiment in customer

reviews. in Proceedings of the 8th

international workshop on semantic

evaluation (SemEval 2014). 2014.

[6] Castellucci, G., et al. Unitor: Aspect based

sentiment analysis with structured learning. in

Proceedings of the 8th international workshop

on semantic evaluation (SemEval 2014).

2014.

[7] Hu, M. and B. Liu. Mining opinion features in

customer reviews. in AAAI. 2004.

[8] Liu, B., Sentiment analysis and opinion

mining. Synthesis lectures on human

language technologies, 2012. 5(1): p. 1-167.

[9] Jakob, N. and I. Gurevych. Extracting opinion

targets in a single-and cross-domain setting

with conditional random fields. in

Proceedings of the 2010 conference on

empirical methods in natural language

processing. 2010. Association for

Computational Linguistics.

[10] Zhuang, L., F. Jing, and X.-Y. Zhu. Movie

review mining and summarization. in

Proceedings of the 15th ACM international

conference on Information and knowledge

management. 2006.

[11] Mukherjee, S. and P. Bhattacharyya. Feature

specific sentiment analysis for product

reviews. in International Conference on

Intelligent Text Processing and

Computational Linguistics. 2012. Springer.

[12] Toh, Z. and W. Wang. Dlirec: Aspect term

extraction and term polarity classification

system. in Association for Computational

Linguistics and Dublin City University. 2014.

Citeseer.

[13] Socher, R., et al. Recursive deep models for

semantic compositionality over a sentiment

treebank. in Proceedings of the 2013

conference on empirical methods in natural

language processing. 2013.

[14] Carrascosa, R., An entry to kaggle’s’

sentiment analysis on movie reviews’

competition. 2014.

[15] Mohammad, S.M., S. Kiritchenko, and X.

Zhu, NRC-Canada: Building the state-of-the-

art in sentiment analysis of tweets. arXiv

preprint arXiv:1308.6242, 2013.

[16] Lin, C. and Y. He. Joint sentiment/topic

model for sentiment analysis. in Proceedings

of the 18th ACM conference on Information

and knowledge management. 2009.

[17] Brody, S. and N. Elhadad. An unsupervised

aspect-sentiment model for online reviews. in

Human language technologies: The 2010

annual conference of the North American

chapter of the association for computational

linguistics. 2010. Association for

Computational Linguistics.

[18] Jo, Y. and A.H. Oh. Aspect and sentiment

unification model for online review analysis.

in Proceedings of the fourth ACM

international conference on Web search and

data mining. 2011.

Sindhu Abro (et al.), Aspect Based Sentiment Analysis of Hotel Reviews: A Comparative Study (pp. 11 - 20)

[19] Shaikh, S. and S.M. Doudpotta, Aspects

Based Opinion Mining for Teacher and

Course Evaluation. Sukkur IBA Journal of

Computing and Mathematical Sciences, 2019.

3(1): p. 34-43.

[20] Cavnar, W.B. and J.M. Trenkle. N-gram-

based text categorization. in Proceedings of

SDAIR-94, 3rd annual symposium on

document analysis and information retrieval.

1994. Citeseer.

[21] Ramos, J. Using tf-idf to determine word

relevance in document queries. in

Proceedings of the first instructional

conference on machine learning. 2003.

Piscataway, NJ.

[22] Mikolov, T., et al. Distributed representations

of words and phrases and their

compositionality. in Advances in neural

information processing systems. 2013.

[23] Ho, Y.-C. and D.L. Pepyne, Simple

explanation of the no-free-lunch theorem and

its implications. Journal of optimization

theory and applications, 2002. 115(3): p. 549-

570.

[24] Lewis, D.D. Naive (Bayes) at forty: The

independence assumption in information

retrieval. in European conference on machine

learning. 1998. Springer.

[25] Joachims, T. Text categorization with support

vector machines: Learning with many

relevant features. in European conference on

machine learning. 1998. Springer.

[26] Xu, B., et al., An Improved Random Forest

Classifier for Text Categorization. JCP, 2012.

7(12): p. 2913-2920.

[27] Wenando, F.A., T.B. Adji, and I. Ardiyanto,

Text classification to detect student level of

understanding in prior knowledge activation

process. Advanced Science Letters, 2017.

23(3): p. 2285-2287.

[28] Dunford, R., Su, Q., & Tamang, E. (2014).

The pareto principle.

[29] Seliya, N., T.M. Khoshgoftaar, and J. Van

Hulse. A study on the relationships of

classifier performance metrics. in 2009 21st

IEEE international conference on tools with

artificial intelligence. 2009. IEEE.