Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2014-2016 2014  
  

www.etasr.com Madhukar and Verma: Hybrid Semantic Analysis of Tweets: A Case Study of… 
 

Hybrid Semantic Analysis of Tweets:  
A Case Study of Tweets on Girl-Child in India 

 
Mani Madhukar 
IBM India Pvt Ltd 
Bangalore, India 

manimmadhukar@gmail.com  

Seema Verma 
Banasthali University 

Rajasthan, India 
seemaverma3@yahoo.com 

 
Abstract—Social networks have become one of the major and 
important parts of daily life. Besides sharing ones views the social 
networking sites can also be very efficiently used to judge the 
behavior and attitude of individuals towards the posts. Analysis 
of the mood of public on a particular social issue can be judged 
by several methods. Analysis of the society mood towards any 
particular news in form of tweets is investigated in this paper. 
The key objective behind this research is to increase the accuracy 
and effectiveness of the classification by the process of Natural 
Language Processing (NLP) Techniques while focusing on 
semantics and World Sense Disambiguation. The process of 
classification includes the combination of the effect of various 
independent classifiers on one particular classification problem. 
The data that is available in the form of tweets on twitter can 
easily frame the insight of the public attitude towards the 
particular tweet. The proposed work implements a hybrid 
method that includes Hybrid K, clustering and boosting. A 
comparison of this scheme versus a K-means/SVM approach is 
provided. Results are shown and discussed. 

Keywords-natural language processing (NLP); sentiment 
analysis; social networking analysis; social networking sites (SNS) 

I. INTRODUCTION  
With the continuous increase in the use of social 

networking sites (SNS), it has become quite possible to judge 
the sentiments and reactions of the public towards any news or 
tweet posted, a fact with interesting possibilities [1]. Every 
conversation that occurs in the SNS has several different 
reactions that depict the effectiveness of the post and help 
analyze the public sentiment towards it [2]. Sentiment analysis 
is an interdisciplinary field that crosses natural language 
processing, artificial intelligence, and text mining. Since most 
opinions are available in text format and its processing is easier 
than other formats, sentiment analysis has emerged as a 
subfield of text mining [3]. Sentiment analysis appeared in the 
literature in 90’s for the first time and it became a major 
research topic in 00’s. Classifying the polarity of a given text as 
positive or negative is the basic task of sentiment analysis. Due 
to its many aspects it is often referred to with different names 
such as opinion mining, sentiment classification, sentiment 
analysis, and sentiment extraction. It is widely believed that 
sentiment analysis is needed and useful while it is also widely 
accepted that extracting sentiment from text is a hard semantic 

problem even for human beings. Additionally, sentiment 
analysis is domain specific, therefore the polarity of some 
terms depends on the context in which they are used. For 
example, while “small” for “size” as a feature in the electronic 
products is positive, in agricultural products such as fruit it has 
a negative polarity. Sentiment analysis is used in different 
domains such as shopping, entertainment, politics, education, 
marketing, and research and development. There are several 
sentiments of the public towards the SNS posts that include: 
positive and negative along with the n-point scale that includes 
very good, good, satisfactory, bad and very bad [3]. Text 
mining is the famous way to analyze and understand the 
sentiment of people integrated with the content posted and its 
methods are: Machine Learning, Statistical/Quantitative 
Techniques or Natural Language Processing [4]. The sentiment 
analysis is of two kinds, supervised or unsupervised. In [5] a 
method to apply an approach of MLT, based on Maximal 
Discrepancy concept, to the problem of SVM model selection 
has been detailed. Sentiment analysis approaches are reported 
in [6-8] and an overall review can be found in [9]. Social media 
insights and especially twitter are also essential [10-13].This 
paper, following [14], focuses on sentiment classification in 
social issues, thus on determining the public’s reaction over a 
particular tweet in order to get the proper assistance to 
determine the public reaction towards the news and its effects.  

II. IMPLEMENTIG A HYBRID APPROACH 
Social Issues Sentiment Analysis automatically analyzes 

social issues. It identifies the positive, negative or neutral 
opinion. The conceptual framework comprises of four stages: 
Data collection and cleaning; preprocessing; sentiment analysis 
and finally experiments and results. The architecture of the 
proposed framework for sentiment analysis is presented in 
Figure 1. 

A. Text Collection and Cleaning Stage 
For the analysis of social issues tweets, data is collected. 

The input data will be raw text from tweets on social causes in 
India, in particular on ‘JNU agitation’, “Intolerance in India”. 
The motivation for the topics has been derived from the a web 
interview of Twitter India Director in 2016, as the topics had 
the power to polarize the entire country and to shape opinion of 
common countrymen leading to a sharp divide in Indian 


Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2014-2016 2015  
  

www.etasr.com Madhukar and Verma: Hybrid Semantic Analysis of Tweets: A Case Study of… 
 

society. For creating the corpus of tweets, the tweets are 
fetched from the Twitter database based on HashTags (#), 
using Twitter API for connecting and authenticating. The 
collected text is noisy and cleaning and parsing the data to form 
a corpus for further processing is required.  

B. Preprocessing Stage 
At this stage, the corpus is transformed into feature 

vectors. Strings are converted into words using some filtration 
techniques. We adapted a simple feature selection as a pre-
processing method to transform or tokenize the text stream to 
words. These methods constitute a sequence of the following 
asks: removing delimiters, removing numbers and stop words. 
For stop words removal, a list of stop words is provided in the 
filtration process. 

 
Fig. 1.  Architecture of the Proposed Methodology 

C. Sentiment Analysis stage 
This stage of framework handles the polarity measurement 

and sentiments. We approach these tasks by employing the 
following hybrid machine learning methods. In this approach, 
we have combined clustering with classification. In clustering, 
hybrid clustering approach is used which combines the feature 
of k-modes and k-medoid algorithms. The functionality of the 
Hybrid Clustering is as follows: 

 The size of cluster is fixed and the output of the first phase 
forms initial clusters. Here, the input array of elements is 
scanned and split up into sub-arrays, which represent the 
initial clusters. 

 The cluster sizes vary and the output of this phase are the 
finalized clusters. Initial clusters are inputs for this phase. 

The centroids of these initial clusters are computed first, on 
the basis of which distance from other data elements are 
calculated. Furthermore, the data elements having less or 
equal distance remains in the same cluster otherwise they 
are moved to appropriate clusters. The entire process 
continues until no changes in the clusters are detected.      

 For classification, Improved Bagging is used. Classifying 
the clustered data is performed by using improved bagging 
technique which decreases the variance of the prediction 
using dataset using combinations with repetitions to 
produce multi-sets of same size of the dataset. For each 
multi set the Boosting learning algorithm is applied to 
classify the instances and a model is created and a vote 
related to that model is generated. The average of all the 
predicted votes is considered to be the result of the 
classifier. 

III. EXPERIMENTAL RESULTS AND DISCUSIONS 
A. Performance Parameters 

 Accuracy: Accuracy is the percentage of testing set 
examples correctly classified by the classifier. It is the 
proportion of total number of predictions that are correctly 
classified in class. 

Accuracy =  (TP + TN) / (TP + FP + TN + FN)   (1) 

where, TP is True Positive, TN is True Negative,  FP is 
False Positive,  FN is False Negative 

 Precision: Percentage of selected instances that are relevant 
and are correctly classified in class out of all documents in 
class.  

Precision = (TP) / (TP + FP)       (2) 

 Recall: Percentage of correct documents that are selected in 
class from the entire document actually belonging to class. 

Recall = (TP) / (TP + FN)      (3) 

 Confusion Matrix: Also, known as contingency table or 
error matrix in supervised learning and in unsupervised 
learning is called matching matrix. In confusion matrix, 
ROC is used to plot graph. Various measures could be 
defined basing on values in a confusion matrix. 

 True Positive (TP) and False Positive (FP) Rate: For 
multiple comparisons TP and FP is used and it is a type of 
error. TP is also called Sensitivity as if a person has a 
disease how often will the test be positive is referred to as 
true positive rate. FP is an error in a test result indicates 
presence of a condition. 

TPR = (TP) / (TP + FN)                 (4) 

FPR = (FP) / (FP + TN)    (5) 

 F-measure: A measure that combines precision and recall. 

F-measure = 2*(Precision * recall) / Precision + Recall   (6) 

So, considering every 6th frame leads to good enough 
results and speeds up the task. The output images of these are 


Engineering, Technology & Applied Science Research Vol. 7, No. 5, 2017, 2014-2016 2016  
  

www.etasr.com Madhukar and Verma: Hybrid Semantic Analysis of Tweets: A Case Study of… 
 

then fused together for the final text detection. Thus, fusion 
results obtained are more informative when every 6th frame is 
fused together. Figures 2-4 shows the comparison graphs of TP 
rate and FP rate and Precision Recall and F Measure of 
proposed technique with previous techniques.  

 
Fig. 2.  Classification cccuracy comparison 

 
Fig. 3.  Number of classified instances comparison. 

 
Fig. 4.  Precision Recall and F Measure comparison. 

IV. CONCLUSION AND FUTURE SCOPE 
Sentiment analysis, as an interdisciplinary field that crosses 

natural language processing, artificial intelligence, and text 
mining, recognizes opinions of people regarding a product, 
service, object, or social issues expressed in a given text. Up 
until recently most, if not all, research in sentiment analysis has 
been done on the products and services. Determining public 
opinions regarding social issues is important for governance 

and decision making. This paper has focused on sentiment 
analysis of social issues. A new hybrid method for sentiment 
analysis of social issues is proposed. It extracts the opinions 
from each sentence, constructs correspondence opinion 
structures, and then determines their orientations regarding the 
social issue. This algorithm performs better in comparison with 
a previous k-means/SVM approach. In the future, the work can 
be extended by containing a larger dataset and considering 
more instances which may cooperate in higher accurate 
prediction analysis. 

REFERENCES 
[1] A. Shaikh, T. Pritam, P. Ankita, W. Shital, T. Pooja, “Stock Exchange 

Market Prediction”, International Journal of Advances in Computer 
Science and Technology, Vol. 3, No. 5, pp. 349-351, 2014 

[2] A. Joshi, A. R. Balamurali, P. Bhattacharyya, R. Mohanty, “C-Feel-It: A 
Sentiment Analyzer for Micro-blog”, HLT '11 Proceedings of the 49th 
Annual Meeting of the Association for Computational Linguistics: 
Human Language Technologies: Systems Demonstrations, pp. 127-132, 
Portland, Oregon, June 21, 2011  

[3] A. Abbasi, “Intelligent feature selection for opinion classification”,  
IEEE Intelligent Systems, Vol. 25, No. 4, pp. 75-79, 2010 

[4] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice 
Hall, 1998 

[5] D. Anguita, A. Ghio, N. Greco, L. Oneto, S. Ridella, “Model Selection 
for Support Vector Machines: Advantages and Disadvantages of the 
Machine Learning Theory”, Proc. of the International Joint Conference 
on Neural Networks (IJCNN), Barcelona, Spain, 2010 

[6] D. Rajput, M. Madhukar, S. Verma, M. Sharma, “Sentiment Analysis on 
Big Data using Machine Learning for Holiday Destinations”, 2015 IEEE 
European Modelling Symposium , Spain, October 6-8, 2015 

[7] A. Kumar, T. Mary, “Sentiment Analysis: A Perspective on its Past, 
Present and Future”,  International Journal of Intelligent Systems and 
Applications, Vol. 10, pp.1-14, 2012 

[8] A. Somla, S. V. N. Vishwanathan, Introduction to Machine Learning, 
Cambridge University Press, 2009 

[9] A. M. Kaplan, M. Haenlein, “Users of the World, Unite! The Challenges 
and Opportunities of Social Media”, Business Horizons, Vol. 5, No. 1, 
pp. 59-68, 2010 

[10] N. Anitha, B. Anitha, S. Pradeepa, “Sentiment Classification 
Approaches”, International Journal of Innovation Engineering and 
Technology, Vol. 3, No. 1, pp. 22-31, 2013 

[11] B. J. Jansen, M. Zhang, K. Sobel, A. Chowdury, “Twitter Power: Tweets 
as Electronic Word of Mouth”, Journal of The American Society for 
Information Science and Technology, Vol. 60, No. 11,pp. 2169–2188, 
2009 

[12] B. Pang, L. Lee, S. Vaithyanathan, “Thumbs up? Sentiment 
Classification using Machine Learning Techniques”, Proceedings of the 
Conference on Empirical Methods in Natural Language Processing, 
Philadelphia, pp. 77-86, 2002 

[13] C. Vicient, A. Moreno, “Unsupervised Topic Discovery in micro-
blogging networks”, Expert Systems with Applications, Vol. 42, pp. 
6472–6485, 2015 

[14] S. Verma, M. Sharma, D. Rajput, M. Madhukar, V. Mittal, R. Singh, 
“Disclosing Tweet Polarity using feature representation factor”, 
International Journal Of Latest Trends In Engineering and Technology, 
Vol. 5, No. 2, 2015