Engineering, Technology & Applied Science Research Vol. 7, No. 6, 2017, 2296-2302 2296  
  

www.etasr.com Mir et al.: Aspect Βased Classification Model for Social Reviews 

 
Aspect Βased Classification Model for Social 
Reviews 

 
Jibran Mir 
Computer Science Dept 

Shaheed Zulfikar Ali Bhutto 
Institute of Science and Technology 

Islamabad, Pakistan  
jibmir@gmail.com 

Azhar Mahmood 
Computer Science Dept 

Shaheed Zulfikar Ali Bhutto 
Institute of Science and Technology 

Islamabad, Pakistan 
azharmahmood8@hotmail.com 

Shaheen Khatoon 
College of Computer Science and 

Information Technology 
King Faisal University 
Al Ahsa, Saudi Arabia 

sha.xin@live.com 
 

Abstract—Aspect based opinion mining investigates deeply, the 
emotions related to one’s aspects. Aspects and opinion word 
identification is the core task of aspect based opinion mining. In 
previous studies aspect based opinion mining have been applied 
on service or product domain. Moreover, product reviews are 
short and simple whereas, social reviews are long and complex. 
However, this study introduces an efficient model for social 
reviews which classifies aspects and opinion words related to 
social domain. The main contributions of this paper are auto 
tagging and data training phase, feature set definition and 
dictionary usage. Proposed model results are compared with CR 
model and Naïve Bayes classifier on same dataset having 
accuracy 98.17% and precision 96.01%, while recall and F1 are 
96.00% and 96.01% respectively. The experimental results show 
that the proposed model performs better than the CR model and 
Naïve Bayes classifier. 

Keywords-POS; Chunking; Word Case; Feature Set; 
Dictionary; NER; IOB tagging 

I. INTRODUCTION 
Opinions are the key elements that influence a person’s 

decision making capability. It is natural for people to consult 
with friends or family whenever they make a decision, and so, 
to have access to diverse opinions. Similarly, to gain better 
business insight, organizations conduct surveys, opinion polls 
and focus groups discussions to better understand their 
customer opinions about a product or service. In the same way, 
an individual customer, before buying a product tries to find his 
family’s or friend’s opinion about it [1]. Hence, for marketing, 
public relations, and political campaign, gathering public and 
customer opinions has become a profitable business. Recently 
it has been made possible for everyone to have internet access 
and therefore express his/her opinion about something in a 
matter of one click on social networks or other sites [2]. As a 
result, one can find huge amounts of user experience about 
usage of a product and/or service online. Now, instead of 
consulting friends or family, people prefer to read reviews 
online in order to make a purchase decision about a product or 
service. Consequently, companies can utilize these reviews to 
understand their customer response about the product usage 
instead of conducting traditional surveys to collect customer 

feedback. However, due to the huge amount of user generated 
contents it is almost impossible for companies to go through 
each review and respond accordingly. Therefore, there is a 
need to extract useful information from this huge amount of 
data. Given the size and growth rate of this kind of data, this 
could only be possible by using an automated system. The 
opinions expressed in social networking sites or product review 
sites have already assisted in reshaping the political and 
businesses structure. It was social networking sites like 
Facebook that had changed the political thinking of Arab 
people in 2011 [1].  

Consequently, sentiment analysis is a new trend in 
information science. The sentiment analysis is a process of 
classifying and identifying the opinions related to a topic. In 
addition to this, sentiment analysis is conducted at three main 
levels such as: document level, sentence level and aspect based 
level [3].The document level determines the overall sentiment 
of product as a whole, whereas, sentence level sentiment 
analysis evaluates whether each sentence expressed positive, 
negative or neutral opinion. In contrast to this, aspect level 
sentiment analysis investigates opinions on bases of some 
specific aspect. For instance: “the price of the canon is very 
reasonable.” Here “canon” is a product, “price” is an aspect and 
“very reasonable” is a sentiment word. Therefore, aspect based 
opinion mining investigates opinions about specific features of 
a product or service. However, identifying features from a 
sentence is another challenge. Therefore, this research work 
will focus on aspect based opinion mining. There are different 
types of aspect based opinion mining such as regular and 
comparative, explicit and implicit, and multi-aspect sentiment 
analysis. Each type comes with huge challenges due to natural 
language processing issues. Analysis of human text is so 
complex, that none of the aspect based opinion mining 
techniques can accommodate all the types mentioned above. 
That is the reason why aspect based opinion mining researchers 
worked in different directions. Some work on domain 
adaptability and some work only on sentiment analysis, 
however, most of them work on aspect identifications. There 
are some challenges in aspect based opinion mining: 

The first challenge is the identification of implicit feature. 
Implicit features are those that are not used in a sentence 


Engineering, Technology & Applied Science Research Vol. 7, No. 6, 2017, 2296-2302 2297  
  

www.etasr.com Mir et al.: Aspect Βased Classification Model for Social Reviews 

 
directly instead they are implied in a sentence. For instance: 
“this mobile is too expensive” so here author refers expensive 
to the aspect “price”. The second challenge is multi-aspect 
sentence detection, where a sentence holds one explicit and one 
implicit aspects or both explicit or both implicit. For instance: 
“this phone’s battery life is great but it is too expensive”, this 
sentence is a multi-aspect sentence which contains one explicit 
and one implicit aspect. The third challenge is the detection of 
a comparative sentence, the example of this sentence is, “the 
Canon camera is better than the Nikon camera”. Another 
challenge is domain and language adaptability. If an aspect 
based opinion mining is designed for Chinese language it 
should be equally applicable to other languages. Similarly, if an 
aspect based opinion mining algorithm is designed for product 
domain it should be easily applicable to other domains such as 
hotel accommodation. Since, aspect based opinion mining has 
been applied in business domain, therefore, this study initiated 
an aspect based opinion mining model that will classify aspects 
and features of social reviews. In addition to this, the proposed 
model is able to classify explicit aspects efficiently and can 
identify and classify aspects from complex social reviews. The 
rest of the paper is organized as follows: Section II presents the 
literature review of different aspect based opinion mining 
algorithms. Section III presents the proposed model design. 
Section IV shows the experimental results and Section V offers 
the conclusion. 

II. LITERATURE REVIEW 
Authors in [4] proposed a feature based opinion mining 

using ontologies. Moreover, this research, introduced a vector 
based sentiment analysis method. Domain of the dataset is very 
important, if we know the domain of dataset in advance that 
will be helpful to find the domain aspects and sentiment words. 
However, there is no vibrant way to find dataset domain before 
sentiment analysis. The main contributions of this research are 
divided in to four parts: NLP, ontology based aspect detection, 
polarity assignment and opinion mining. This research 
successfully addresses the semantic relation in aspect 
identification process. Furthermore, they developed a 
mathematical solution for sentiment analysis. There is no 
discussion of implicit aspect identification method neither there 
is any method for multi-aspect sentence and comparative 
sentence detection. Domain independent method has been 
developed for electronic products in [5]. The crucial focus of 
this study is to develop a system of electronic product domain 
independent and explicit/implicit aspect identification from 
online customer’s reviews. There are a few limitations to this 
research, there is no method defined that can handle multi-
aspect sentences. It is domain independent only for camera, 
mobile phone and DVD player. It is not language adaptable. It 
does not group to gather semantically related aspect terms [6] 
and utilizes supervised approach with a balanced dataset[7]. 
Aspect identification and classification is a basic step in 
sentiment analysis, moreover, to make a domain adaptable, 
aspect method classification is important. Therefore, authors in 
[8] introduced an approach which will classify aspects with 
respect to domain. The main purpose of this research is to 
develop a model that will establish an association between 
product feature and domains. This will assist to build a domain 

independent solution. There is no discussion of implicit feature 
identification. In addition to this, there is no mechanism 
defined to handle multi-aspect and comparative sentences nor it 
is language adaptable. The strengths of this model are: aspect 
identification and sentiment analysis. One weakness of this 
approach is that the polarity assignment is done by global 
lexicon, however the global lexicon is unable to define polarity 
of polysemous words  [9]. 

Today, in aspect based opinion mining, most of the 
researchers carried out their work in explicit and implicit aspect 
detection. Authors in [10] introduced a semi supervised model 
for aspect identification. They have introduced two statistical 
methods such as “Seeded Aspect and Sentiment Model” and 
“Maximum Entropy SAS Model” in order to discover explicit/ 
implicit features. This model is not automatic as it requires 
users to provide some prior knowledge and user may be not 
aware of that domain knowledge [11]. General aspects can be 
found by this method, however, it is unable to find domain 
specific aspects [12]. According to authors in [13] in 
information retrieval, although facts such as finding the 
relevant information are based upon precision and recall play 
an important role, opinions also play a crucial role to know the 
sentiments about the searched item. Therefore, they established 
a search engine that not only brings out facts about the 
searched items but also mines opinions about them. The 
limitations of this study are that there is no discussion about 
implicit feature identification, no discussion about multi-aspect 
sentence detection and that it is neither domain nor language 
adaptable. The weakness of this study is that by using the TF-
TFID algorithm sentiment analysis cannot be evaluated 
properly [14]. As mentioned above there are many challenges 
in aspect based opinion mining, however, authors in [15] 
initiated a method for implicit aspect identification for reviews 
in Chinese only. There is no discussion of multi-aspect 
sentence detection. It is not domain or language adaptable. This 
method is not tested on bigger corpora. It should be adaptable 
for different domains and languages  [16].  

Aspect based opinion mining is widely used in business 
intelligence, where the interest is to discover the customer’s 
opinion about the product. Authors in [17] described a model 
that finds the product’s weaknesses and its competitive features 
found online in Chinese reviews. It is domain specific and 
single lingo supportive. Authors manually discriminated 
positive and negative opinion words. The method proposed in 
this paper is not able to properly discover customer satisfaction 
[18]. So far, aspect based opinion mining has been used in 
product domain, however, authors in [19] transformed the 
aspect-based opinion mining technique presented in [20] to be 
applicable on tourism domain. Most of the work has been 
conducted on physical items, however, for intangible services 
there is no opinion mining system. There is a difference in the 
reviews of physical items and service products, for instance 
hotels. The reviews of physical products are usually short and 
easy to handle whereas, service domain reviews are verbose 
and difficult to handle. This technique is domain specific and 
single lingo supportive. Neither there is a discussion of implicit 
nor multi-aspect identification. There is no mechanism defined 
to handle comparative sentences. The model described in [19] 
is unable to identify confusing and ambiguous terms in tourism 


Engineering, Technology & Applied Science Research Vol. 7, No. 6, 2017, 2296-2302 2298  
  

www.etasr.com Mir et al.: Aspect Βased Classification Model for Social Reviews 

 
reviews [21]. Moreover, hardly 35% of the explicit aspects are 
detected by this model  [22]. 

Aspect identification is an important step in aspect based 
opinion mining because it is a fundamental step in fine grain 
sentiment analysis. Most of the researches have been conducted 
in explicit aspect identification. However, they ignored the 
detection of implicit aspect. Therefore, authors in [23] focused 
on the identification of implicit aspect in Chinese reviews by 
using hybrid association rule mining. There is no discussion of 
comparative sentences, the model is not language adaptable 
while some of the features are extracted manually. This model 
used the most occurring explicit features as implicit feature 
indicators, however, it ignored the features that occur less 
often, even if they are most important [24]. Aspect based 
sentiment analysis is an emerging science and different 
scientists have adopted different ways to identify implicit and 
explicit aspects. This requires extensive prior knowledge of the 
dataset domain, however, some studies like [24] stated that this 
issue can be solved without prior knowledge, whereas previous 
studies require extensive prior knowledge. The main 
contribution is that they introduced an algorithm that 
discriminates the sentences that have implicit feature and the 
ones that don’t have. They defined a threshold for the value of 
an aspect. There is no discussion of multi-aspect sentence 
detection or of comparative sentences. It is neither language 
nor domain adaptable. The removal of miss-spelling and the 
reduction of similar implicit features have been done by 
manual clustering. A detailed comparison of model based 
methods and statistical based methods have been given in [25]. 
It must be noted that the research is biased towards model 
based methods. Hence, they have implemented and tested CRF 
model based technique. The research results are quite satisfying 
compared with other model based and statistical based 
methods. Their main contribution was the classification of 
feature words, opinion words and intensifiers and the feature 
set definition. Disadvantages of this research are that this 
model will not give fine results when applied to other domains, 
there is no detail discussion about implicit aspect identification 
and comparative sentences. The biggest disadvantage of this 
method is that the dataset training is very complicated and 
required a great need of care. 

III. PROPOSED MODEL 
Most of the previous studies developed different aspect 

based opinion mining models for product and service domain 
[1-8]. In this study, aspect based opinion mining model for 
social reviews is proposed. It consists of five main phases 
where information flow is top down similar to water fall 
approach as shown in Figure 1. Social reviews are more 
complex and it is difficult to extract aspects from them. That is 
the reason model structure is more complex and divided into 
five phases namely Pre-processing, Auto Tagging, Training 
and Classifier, Testing and Dictionary Phase. The selection of 
classifier is very important in any machine learning problem. 
This research used Conditional Random Fields (CRF) [26] 
which requires well-trained dataset and features set as input. 
The model’s performance is highly depended on these two 
inputs. The core reason for using this classifier is to solve the 
NER (Name Entity Recognition) problem. CRF is proven good 

in detecting NER in plain text [26]. Since, the dataset is 
consisted of social movie reviews which contain names of 
movies, actors, directors and writers, therefore, CRF is the right 
choice for this NER problem. 

A. Pre-processing 
The dataset for movie social reviews have been crawled 

from social websites and each review has been saved in a 
separate text file. Approximately, 2000 reviews have been 
crawled for 2000 different movies. These movie reviews are 
recent and written by different writers. The movie reviews are 
complex and detailed in contrast with the product reviews that 
are usually short and simple. For instance “this phone is light 
weight and cheaper in price”. It can be observed from this 
example that this is a simple and short sentence. The review is 
talking about aspects, such as “weight and price”, of a 
particular phone. In addition to this, “light” and “cheaper” are 
opinion words. However, the social review example is: “The 
Mayan Empire grew from about the year 400 to 900. At their 
height they became a people very advanced in science. Mayan 
notation for numbers made arithmetic easier for Mayan 
children than our numbers make it for our children. The 
Dresden Codex shows that they may not have understood 
exactly what eclipses were, but they knew when they were 
coming”. It can be observed that this is complex and detailed 
comparing to common product review. Therefore, this requires 
a great deal of effort to find feature words and opinion words 
from social reviews. 

B. Auto Tagging and Dataset Training 
Supervised machine learning methods are effective but they 

require a well-defined example dataset for training, moreover, 
preparing dataset for training is usually a manual and tedious 
task [26]. Therefore, an automated process has been developed 
to prepare the dataset. It involves five subtasks such as 
tokenizing, POS tagging, chunking, word case and IOB tagging 
(Inside Outside Beginning). The NLP [27] tasks such as 
tokenizing, POS tagging and chunking are all performed by 
using OpenNLP software. In first step each review is tokenized 
into a list of tokens and saved into a text file for further 
processing. Next each token Part of Speech (POS) tag is 
assigned something which is required for the next step to 
identify named entity. In next step POS tagged tokens are used 
to detect entity using chunking. In next step word case is 
assigned to each chunk. For instance: if the token starts with a 
capital letter then the word will be tagged as TC (Title Case), 
and if all the letters of the token word are capitals, then the 
word will be tagged with UC (Upper Case). The other will be 
tagged as LC (Lower Case). The reason for using the ‘word 
case tag’ is to identify the movies and person names, since it is 
observed that people used title case for person name and upper 
case for movie name. The last column of Figure 2 presents the 
IOB tagging.  IOB label is given based upon POS and word 
case information for given token. For instance, for the token 
“Cornell” its corresponding POS is “NNP” and if its Word 
Case is “TC” therefore, IOB tag shouldbe “B-PERS”. Twenty 
one (21) such patterns are derived as shown in Table I. As a 
result of tagging each token is annotated with Token name, 
POS tagger, chunk, word case and IOB tagging. Moreover, 


Engineering, Technology & Applied Science Research Vol. 7, No. 6, 2017, 2296-2302 2299  
  

www.etasr.com Mir et al.: Aspect Βased Classification Model for Social Reviews 

 
Figure 2 is an excerpt of the trained file. The annotation of 
dataset with these five columns is the major contribution of this 

work.  The same process will be repeated for test file. 

 
Fig. 1.  The Proposed model of aspect identification and classification for social reviews 

 
Fig. 2.  Example of IOB tagging  

 
Fig. 3.  Example of output file 

C. Training Classifier 
CRF++ CLI software [28] has been used to train and test 

the proposed model. This classifier takes two inputs, the 
template file and the trained file and outputs the model file for 
training purposes as shown in Figure 1.Trained file will be the 

output of previous phase, however, template file will be 
developed according to the trained file. Template file contains a 
set of rules regarding the training of the classifier. In this way, 
classifier looks at trained file by following the feature set or set 
of rules defined in template file. Table II shows an excerpt of 
the template file. Forty feature sets have been written in 
template file. The Model file is a binary file and will be used 
for testing purposes. In order to perform testing CRF++ 
software takes two inputs, the model file and the test file. After 
performing the testing, CRF++ output would be a six column 
text file as shown in Figure 3.Therefore, the last column in 
Figure 3 is labeled by CRF classifier and the previous column 
is label by IOB tagging subtask in Phase II. In this way these 
two columns can be compared in order to calculate accuracy.  

D. Testing  
The output text file produced by Training Classifier phase, 

will be used for evaluation purposes. The output text file will 
be given to Perl script and it will calculate precision, recall and 
f1 measures for movie name, person name, opinion words and 


Engineering, Technology & Applied Science Research Vol. 7, No. 6, 2017, 2296-2302 2300  
  

www.etasr.com Mir et al.: Aspect Βased Classification Model for Social Reviews 

 
feature words. To know the proposed model accuracy, 
performance evaluation is crucial. 

TABLE I.  A LIST OF PATTERNS DISCOVERED FROM DATASET 

Sr.No Pattern 
Entity 
Name 

1 IF WC[0] = UC Movie 

2 
IF POS[0] = JJ OR POS[0] =NNP 
OR POS[0] = NN AND WC[0] = 

TC 
Person 

3 
IF POS[0] = POS AND POS[-1] = 
NNP OR POS [-2] = NNP AND 

WC[0] = TC 
Person 

4 

IF POS[0] = JJ OR POS[0] = NNP 
OR POS[0] = NN AND 

WC[0]=TC AND WD[-1] = by 
AND CK[-2] = I-VB OR CK[-1] = 

B-VB 

Person 

5 

IF POS[0] = JJ OR POS[0] = NNP 
OR POS[0] = NN AND 

WC[0]=TC AND WD[-1] = by 
AND POS[-2] = RB AND CK[-3] 

= B-VP AND CK[-3] = I-VP 

Person 

6 

IF POS[0] = JJ OR POS[0] = NNP 
OR POS[0] = NN AND 

WC[0]=TC AND WD[-1] = by 
AND POS[-2] = NN 

Person 

7 

IF POS[0] = JJ OR POS[0] = NNP 
OR POS[0] = NN AND 

WC[0]=TC AND POS[-1] = JJ OR 
POS[-1] = NN AND POS[-2] = 

DT 

Person 

8 
IF CK[0] = B-VP AND NOT CH[-

3] = B-NP AND WC[0] = TC 
Person 

9 
IF POS[0] = NN AND POS[-1] = 

NNP AND WC[0] = TC 
Person 

10 
IF POS[0] = CC AND POS[-1] = 
NNP AND POS[-2] = NNP AND 
WC[-1] = TC AND WC[-2] = TC 

Person 

11 
IF POS[0] = CC AND POS[+1] = 
NNP AND POS[+2] = NNP AND 
WC[+1] = TC AND WC[+2] = TC 

Person 

12 
IF WD[0]= , AND POS[+1] = 

NNP AND POS[+2] = NNP AND 
WC[+1] = TC AND WC[+2] = TC 

Person 

13 
IF WD[0]= , AND POS[-1] = NNP 
AND POS[-2] = NNP AND WC[-

1] = TC AND WC[-2] = TC 
Person 

14 
IF WD[0] = , AND POS[+1] = 

NNP AND WC[+1] = TC 
Person 

15 
IF WD[0] = , AND POS[-1] = 

NNP AND WC[-1] = TC 
Person 

16 

IF CK[0] = B-NP AND WC[0] = 
LC OR WD[-1] = . AND POS[0] = 

DT AND POS[+1] = NN AND 
WC[+1] = LC 

Feature 
Word 

17 
IF POS[+1] = JJ AND WC[+1] = 

LC 
Feature 
Word 

18 IF POS[0] = NN 
Feature 
Word 

19 

IF POS[0] = DT AND CK[0] = B-
NP AND WC[0] = LC AND  

POS[+1] = RBS AND WC[+1] = 
LC 

Opinion 
Word 

20 
IF CK[0] = B-AD AND WC[0] = 

LC 
Opinion 

Word 

21 
IF CK[0] = B-VP AND WC[0] = 

LC 
Opinion 

Word 

TABLE II.  A LIST OF FEATURE SET DEFINITION 

Sr.No Feature Set Word Token 

1 U00:%x[-2,0] The  

2 U01:%x[-1,0] movie 

3 U02:%x[0,0] seems 

4 U03:%x[1,0] amazing 

 
E. Dictionary Phase  
At this level, model will classify all the explicit aspects but 

it will not be able to discriminate a person as actor, actress, 
director or writer. The reason for this is the availability of the 
less information about a person’s gender and job title in the 
review text. For that reason a dictionary has been used to 
identify gender and job title.  

IV. EXPERIMENTAL RESULTS 
The dataset consists of 2000 movie reviews crawled from 

internet movie database (imdb) official website 
imdb.com/reviews/index. Moreover, the proposed model, CR 
model and Naïve Bayes have been implemented on the movie 
reviews dataset. This dataset has been annotated with IOB 
tagging scheme. There are basically 9 different tags been used. 
B-MOVIE I-MOVIE, B-PERS I-PERS, B-OPINION I-
OPINION, B-FEATURE I-FEATURE and O. They stand for 
movie name, person name, opinion word and feature word. 
Here “B-” stands for beginning of entity name, “I-” stands for 
continuity of the entity name and “O” shows it doesn’t belong 
to any entity name. An example of tagging scheme is the movie 
name “V FOR VENDATTA” which will be annotated like “V 
B-MOVIE FOR I-MOVIE VENDATTA I-MOVIE”. For 
experimental purposes 119 movie reviews have been taken for 
training and 51 movie reviews have been taken for testing. 
Therefore, the ratio is 70% and 30% for training and testing. At 
this stage, 170 review comments have been used for training 
and the overall accuracy is 97.48% of the proposed model. 
Now, there is no need to train the classifier by using IOB 
tagging subtask from Phase II, moreover, there is no need to 
further define any new feature set. The CRF classifier 
accomplished this job on its own. In other words, the last 
column in Figure 2 will be labeled by CRF classifier and not by 
Phase II, this whole process is called Self-tagging. Manually, 
quality check has been done at every 100 self-tagging reviews. 
If the CRF classifier self-tagged 100 reviews correctly and 
there is no error then these 100 reviews are added to the already 
trained dataset. In this way, 700 reviews have been trained and 
the rest of the 1300 reviews was kept for testing. The overall 
accuracy of 2000 reviews of the proposed model was 98.17%. 
Tables III - V show the classified aspects for the proposed 
model, CR model and Naive Bayes. The proposed and CR 
model use the CRF classifier for aspect classification. The 
results show that the proposed model outperforms the CR 
model and Naive Bayes classifier. The reason why CR model is 
not performing well on this social review dataset is because it 
was not designed for the social domain. It doesn’t provide a 
well-defined method for dataset training, no method is defined 
for the name entity recognition problem and finally there is no 
feature set definition in it. That is why, the feature word and 


Engineering, Technology & Applied Science Research Vol. 7, No. 6, 2017, 2296-2302 2301  
  

www.etasr.com Mir et al.: Aspect Βased Classification Model for Social Reviews 

 
opinion word precision, recall and f1 is pretty good, while 
movie name and person name precision and recall is not quite 
satisfactory. Similarly, the results show that the Naive Bayes 
classifier is not efficient regarding NER problem. The 
incorrectly identified aspects are 6,491 only by the proposed 
model, whereas, the CR model incorrectly identified aspects 
are 37,388. Finally, the Naive Bayes incorrectly classified 
aspects are 71,121 as shown Figure 4. Table VI depicts the 
overall precision, recall, F1 and accuracy for the proposed 
model, CR model and Naive Bayes classifier. The proposed 
model’s overall accuracy is 98.17% which is way better than 
the CR model and Naïve Bayes classifier. Figure 5 shows the 
graphical representation of the overall accuracy. 

TABLE III.  2000 REVIEWS RESULTS FOR PROPOSED MODEL 

Aspects Precision Recall F1

Feature Word 96.12% 95.74% 95.93% 

Movie Name 96.15% 96.19% 96.17% 

Opinion Word 97.34% 97.53% 97.43% 
Person Name 91.08% 91.36% 91.22% 

TABLE IV.  2000 REVIEWS RESULTS FOR CR MODEL 

Aspects Precision Recall F1 
Feature Word 82.72% 89.20% 85.84% 
Movie Name 54.11% 19.42% 28.58% 

Opinion Word 76.46% 81.93% 79.10% 
Person Name 73.83% 81.94% 77.67% 

TABLE V.  2000 REVIEWS FOR NAIVE BAYES CLASSIFIER 

Aspects Precision Recall F1 
Feature Word 43.5% 79% 55.6% 
Movie Name 74.5% 45.8% 56.6% 

Opinion Word 56.5% 65.2% 59.8% 
Person Name 57.37% 65.47% 57.8% 

TABLE VI.  COMPARISON OF RESULTS FOR THE PROPOSED MODEL, CR 
MODEL AND NAIVE BAYES CLASSIFIER 

 
Precision Recall F1 Accuracy 

PM 96.01% 96.00% 96.01% 98.17% 
CR 77.95% 81.35% 79.62% 91.25% 
NB 57.37% 65.47% 57.8% 68.9% 

 
Fig. 4.  Number of incorrectly identified aspects for proposed model, CR 
model and Naive Bayes 

 
Fig. 5.  A comparison of the proposed model, CRF model and Naive Bayes 
classifier 

V. CONCLUSION 
Nowadays more people are engaged in generating online 

data. With the availability of plenty of data, the need of a 
mechanism that will extract useful information automatically 
emerged. This has opened the doors for aspect based opinion 
mining. This research has implemented an aspect based opinion 
mining method for identifying aspects from a social movie 
reviews dataset. The main contributions of this research were 
data training (phase II), feature set definition (phase III) and 
dictionary (phase V). The overall accuracy of our proposed 
method is 98.17% and precision, recall and f1 respectively are: 
96.01%, 96.00% and 96.01%. The experimental results show 
that the proposed model performs much better than the CR 
model and the Naive Bayes Classifier. Future work involves 
implementation of a model that will identify implicit aspects 
and calculate aspect wise sentiment analysis. Moreover, we 
will avoid the dictionary usage and find out the patterns for 
deep aspect classification. There are some other issues which 
are more challenging and tedious, for instance, comparative 
sentences, specific writing style of a person, number of times 
an entity reemerges in a dataset etc. These challenges will be 
the focus of the future research.  

REFERENCES 
[1] B. Liu, Sentiment analysis and opinion mining, Synthesis lectures on 

human language technologies Vol. 5, Morgan & Claypool, 2012 

[2] J. Mir, M. Usman, “An effective model for aspect based opinion mining 
for social reviews,” Tenth International Conference on Digital 
Information Management,  pp. 49-56, 2015 

[3] [3] T. Chinsha, S. Joseph, “A syntactic approach for aspect based 
opinion mining,” IEEE International Conference on Semantic 
Computing, pp. 24-31, 2015 

[4] [4] I. Penalver-Martinez, F. Garcia-Sanchez, R. Valencia-Garcia, M. 
A. Rodríguez-García, V. Moreno, A. Fraga, J. L. Sanchez-Cervantes, 
“Feature-based opinion mining through ontologies”, Expert Systems 
with Applications, Vol. 41, No. 13, pp. 5995-6008, 2014 

[5] A. Bagheri, M. Saraee, F. De Jong, “Care more about customers: 
unsupervised domain-independent aspect detection for sentiment 
analysis of customer reviews”, Knowledge-Based Systems, Vol. 52, pp. 
201-213, 2013 

[6] A. Bagheri, M. Saraee, F. De Jong, “ADM-LDA: An aspect detection 
model based on topic modelling using the structure of review 
sentences”, Journal of Information Science, Vol. 40, No. 5, pp. 621-636, 
2014 

[7] F. Tian, F. Wu, K.-M. Chao, Q. Zheng, N. Shah, T. Lan, J. Yue, “A 
topic sentence-based instance transfer method for imbalanced sentiment 


Engineering, Technology & Applied Science Research Vol. 7, No. 6, 2017, 2296-2302 2302  
  

www.etasr.com Mir et al.: Aspect Βased Classification Model for Social Reviews 

 
classification of Chinese product reviews”, Electronic Commerce 
Research and Applications, Vol. 16, pp. 66-76, 2015 

[8] C. Quan, F. Ren, “Unsupervised product feature extraction for feature-
oriented opinion determination”, Information Sciences, Vol. 272, pp. 16-
28, 2014 

[9] M. Zimmermann, E. Ntoutsi, M. Spiliopoulou, “Extracting opinionated 
(sub) features from a stream of product reviews using accumulated 
novelty and internal re-organization”, Information Sciences, Vol. 329, 
pp. 876-899, 2016 

[10] A. Mukherjee, B. Liu, “Aspect extraction through semi-supervised 
modeling”, 50th Annual Meeting of the Association for Computational 
Linguistics: Long Papers,Vol. 1, pp. 339-348, 2012 

[11] Z. Chen, B. Liu, “Mining topics in documents: standing on the shoulders 
of big data”, 20th ACM SIGKDD International Conference on 
Knowledge Discovery and Data Mining, pp. 1116-1125, 2014 

[12]  L. Zhang, B. Liu, “Aspect and entity extraction for opinion mining”, in 
Data Mining and Knowledge Discovery for Big Data, pp. 1-40, Springer, 
2014 

[13] M. Eirinaki, S. Pisal, J. Singh, “Feature-based opinion mining and 
ranking”, Journal of Computer and System Sciences, Vol. 78, No. 4, pp. 
1175-1184, 2012 

[14] L. Lizhen, S. Wei, W. Hanshi, L. Chuchu, L. Jingli, “A novel feature-
based method for sentiment analysis of Chinese product reviews”, 
Communications, China, Vol. 11, No. 3, pp. 154-164, 2014 

[15] H. Xu, F. Zhang, W. Wang, “Implicit feature identification in Chinese 
reviews using explicit topic mining model”, Knowledge-Based Systems, 
Vol. 76, pp. 166-175, 2015 

[16] K. Ravi and V. Ravi, “A survey on opinion mining and sentiment 
analysis: tasks, approaches and applications”, Knowledge-Based 
Systems, Vol. 89, pp. 14-46, 2015 

[17] W. Zhang, H. Xu, W. Wan, “Weakness Finder: Find product weakness 
from Chinese reviews by using aspects based sentiment analysis”, 
Expert Systems with Applications, Vol. 39, No. 11, pp. 10283-10291, 
2012 

[18] D. Kang, Y. Park, “Review-based measurement of customer satisfaction 
in mobile service: Sentiment analysis and VIKOR approach”, Expert 
Systems with Applications, Vol. 41, No. 4, Part 1, pp. 1041-1050, 2014 

[19] E. Marrese-Taylor, J. D. Velasquez, F. Bravo-Marquez, “A novel 
deterministic approach for aspect-based opinion mining in tourism 
products reviews”, Expert Systems with Applications, Vol. 41, No. 17, 
pp. 7764-7775, 2014 

[20] B. Liu, Web data mining: exploring hyperlinks, contents, and usage data, 
Springer Science & Business Media, 2007 

[21] M. Afzaal, M. Usman, “A novel framework for aspect-based opinion 
classification for tourist places”, Tenth International Conference on 
Digital Information Management, pp. 1-9, 2015 

[22] S. Y. Ganeshbhai, B. K. Shah, “Feature based opinion mining: A 
survey”, IEEE International Advance Computing Conference, pp. 919-
923, 2015 

[23] W. Wang, H. Xu, W. Wan, “Implicit feature identification via hybrid 
association rule mining”, Expert Systems with Applications, Vol. 40, 
No. 9, pp. 3518-3531, 2013 

[24] K. Schouten and F. Frasincar, “Finding Implicit Features in Consumer 
Reviews for Sentiment Analysis”, in Web Engineering: Springer, 2014, 
pp. 130-144. 

[25] L. Chen, L. Qi, F. Wang, “Comparison of feature-level learning methods 
for mining online consumer reviews”, Expert Systems with 
Applications, Vol. 39, No. 10, pp. 9588-9601, 2012 

[26] C. Sutton A. McCallum, An introduction to conditional random fields, 
Now Publishers, 2012 

[27] J. Baldridge, “The opennlp project”, url: https://opennlp.apache.org, 
(accessed 2 February 2012), 2005 

[28] T. Kudo, “CRF++: Yet another CRF toolkit”, Software available at 
http://crfpp. sourceforge. net, 2005