J. Nig. Soc. Phys. Sci. 3 (2021) 385–394 Journal of the Nigerian Society of Physical Sciences Sentiment Analysis using various Machine Learning and Deep Learning Techniques V. Umarania,∗, A. Juliana, J. Deepab aDepartment of Computer Science and Engineering, Saveetha Engineering College, Chennai, Tamil Nadu, India bDepartment of Information Technology, VelTech Institute of Technology, Chennai, Tamil Nadu, India Abstract Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process is automated by using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as Multinomial Naive Bayes, Bernoulli Naive Bayes, Logistic Regression, Support Vector Machine, Random Forest, K-nearest neighbor, Decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k-fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application. DOI:10.46481/jnsps.2021.308 Keywords: Convolution neural network, Long short term memory, Sentiment analysis, Supervised machine Learning, Deep learning Article History : Received: 21 July 2021 Received in revised form: 04 October 2021 Accepted for publication: 05 October 2021 Published: 29 November 2021 c©2021 Journal of the Nigerian Society of Physical Sciences. All rights reserved. Communicated by: W. A. Yahya 1. Introduction In the present days, social media is a popular technology that uses micro blogging platforms to connect millions of peo- ple. People can freely express their thoughts, ideas, and views as short messages called tweets on many micro blogging plat- forms in social networks (like Twitter) and business websites or web forums [1]. Researchers gather these unstructured tweets and use a variety of methods to extract information from them. ∗Corresponding author tel. no: +918610352527 Email address: umaranibharathy@gmail.com (V. Umarani ) This analysis of tweets or opinions provides predictions or mea- sures in a variety of application domains such as business, gov- ernment, education, sports, tourism, biomedicine and telecom- munication services [2]. Sentiment analysis or opinion mining is the study of opin- ions and prediction. Sentiment analysis (SA) is one of the text mining approach that use natural language processing for bi- nary text classification. Sentiment analysis can be performed in four levels based on the scope of text. They are document-level, sentence-level, aspect-level, and word level sentiment analysis [6]. In Document level SA, overall opinion of the document 386 Umarani et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 385–394 387 about single entity is grouped into positive or negative. In sen- tence level SA, the opinion expressed in a sentence is classi- fied as either positive or negative. In aspect level SA, opinions about entities are grouped based on specific entity elements. At word level SA, opinions about entities are grouped based on a specific word. In the proposed work, word level sentiment anal- ysis model is developed for restaurant review data set based on machine-learning and deep learning algorithms to classify sen- timents as positive or negative automatically. The sections in this paper are organized as follows: Sec- tion 2 examines the various sentiment analysis models that have been discussed in the literature. Section 3 discusses the various machine learning techniques. Section 4 analyzes the perfor- mance of machine learning techniques used in sentiment anal- ysis and finally discusses the conclusion and future work that need to be carried out in sentiment analysis. 2. Related Work The researchers create several sentiment analysis models based on data collected from social networks, business web- sites, and dataset providers. Examples of these analysis models built from data include Amazon reviews, Tweets from Twitter micro blogging sites, Yelp travel recommendations, and Movie Reviews from IMDB, Kaggle, and others [3]. This section dis- cusses some existing work as well as the benefits of their pro- posed model. Endsuy [4] used Twitter Datasets to conduct exploratory data analysis on the US Presidential Election 2020. They com- pare the sentiment of location-based tweets to that of on-the- ground public opinion. They collect features such as latitude, longitude, city, country, continent, and state code using Open Cage API and sciSpacy NER. They use two datasets from Kag- gle about Donald Trump and Joe Biden, both dated November 18, 2020. For lexicon-based feature extraction, they use a va- lence aware dictionary for sentiment reasoning (VADAR), and for classification, they use logistic regression machine learning approaches. Bibi et al. [5] created a Cooperative Binary-Clustering Frame- work for sentiment analysis on Indigenous data sets using Twit- ter. They use majority voting to partition the data and com- bine single linkage, complete linkage, and average linkage ap- proaches. Based on the confusion matrix, they divide the cluster into positive and negative. For feature selection, they use uni- gram, TF-IDF, and word polarity mechanisms. According to this analysis, the cooperative clustering approach outperforms the other individual partitioning techniques (75 %). Cekik et al. [6] use a filter-based feature selection method called Proportional Rough Feature Selector (PRFS) for feature selection and test it with various classifiers such as SVM,DT,KNN, and Naive Bayes. PRFS uses rough set theory to determine whether documents belong to a specific class or not. It improves classifier performance at a 95% confidence level. Peng et al. [7] proposed a sentiment model adversarial learn- ing method. It is made up of three parts: a generator, a discrim- inator, and a sentiment classifier. To obtain efficient semantic and sentiment information, the generator uses a multi-head at- tention mechanism. The Discriminator measures up the simi- larity of sentiment polarity in the generative vector and global vector produced by the Generator and Classifier, respectively. It uses gradients to update the weights in the generator, resulting in high-quality word embeddings. Word vectors with oppos- ing sentiment contexts will be classified as fake vectors in this context. It avoids the issue of similar context words with polar opposite sentiments. To improve the performance of aspect-based sentiment anal- ysis, Tan et al. [8] devised an Aligning Aspect Embedding method (AAE). Using the cosine measure metric, the AAE method discovered the relationship between aspect-categories and aspect- terms. The AAE method effectively solves the misalignment problem in aspect-based sentiment analysis and improves sen- timent analysis performance. Jain et al. [9] uses the Apirori algorithm to minimize the feature set for sentiment analysis and developed a feature se- lection approach based on Association rule mining. For ex- periments, they used supervised classification methods such as naive bayes, support vector machine, random forest, and logis- tic regression. Kalaivani et al. [10] developed a hybrid feature selection method for sentiment analysis. For feature extraction, they use the Unigram and Bigram models, as well as the TF- IDF weighting technique. They use information gain to shrink the subset of features. To select optimal features, this method employs a genetic algorithm. Then, using various classifiers such as naive bayes and logistic regression, support vector ma- chine (SVM), this hybrid model is put into practice. Based on this analysis, the SVM classifier outperformed the other classi- fiers. Using statistical feature selection methods, Ghosh et al. [11] proposed an ensemble feature selection technique to improve the performance of the sentiment analysis process. To find the best feature set, they use information gain, the Gini index, and the Chi-square method. They utilized five distinct classifiers: Multinomial Nave Bayes, KNN, Maximum Entropy, Decision Tree, and Support Vector Machine. Rodrigues et al. [12] formed a pattern-based method for extracting aspects and analyzing sentiment. Pattern analysis is used in this case to extract the explicit aspect syntactic pattern from product sentiments. It extracts the bigram features and uses Senti-Wordnet to determine the sentence’s sentiment po- larity. According to this analysis, the multi node clustering ap- proach outperforms the single node clustering approach. For tweet sentiment classification, Jianqiang et al. [13] cre- ated a GloVe-DCNN (Global Vectors for Word Representation – Deep Convolution Neural Network). They presented a method of word embedding that combines unigram and bigram feature vectors. A subset of sentiment features is formed by combining the twitter specific features vector and word sentiment polarity features. This feature set is used to train and predict sentiment classification labels in a deep convolution neural network. It resolves the data sparseness issue. Imran et al. [14] use tweets from Twitter and the sentiment 140 dataset. They utilize the LSTM model to estimate senti- ment polarity and emotion. For sentiment analysis, Li et al. 387 Umarani et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 385–394 388 [15] developed lexicon integrated CNN family models. They implemented a sentiment padding approach to ensure that in- put data sizes are consistent and that the percentage of senti- ment information in each row is increased. During neural net- work learning, the gradient vanishing problem between the in- put layer and the first hidden layer is solved using the sentiment padding method. The detailed literature survey on machine learning methods, illustrates that the suggested methods help in analyzing the text patterns faster and they can be used to automate the sentiment analysis process [16, 17]. Deep learning handles large volume of data and it uses artificial neural network to analyze the text pattern faster and it can solve the misalignment problems by extracting local features from a sentiment. As a result, deep learning algorithm takes word embedding as input and provides high accuracy on these kinds of tasks. In the proposed work, the most widely used machine learning algorithms have been deployed to analyze the performance of deep learning models LSTM and CNN. 3. Methodology Sentiment analysis process is carried out by several steps. First step is to collect data and perform preprocessing. In pre- processing step, data set is structured by normalization, stop word removal, stemming, and tokenization process [18]. Next step in sentiment analysis is to extract and select most relevant text features from the opinion. Feature extraction and feature selection methods are used for this purpose. These methods are used to reduce the number of input variables, avoid over fit- ting, decrease computational complexity or training time, and improve model accuracy [19]. Vectorization and word embed- ding methods are used for feature extraction [20]. Finally, ma- chine learning techniques are used to classify or categories text as positive, negative, or neutral based on sentiment polarity of opinions. Machine learning techniques classify the sentiments based on training and test data set [21]. In this section, the most widely used machine learning clas- sifiers namely Naı̈ve Bayes, Logistic Regression, Random for- est, Linear SVC (Support Vector Classifier) , K-nearest neigh- bor and Decision tree and deep learning technique such as Con- volution Neural Network and Long Short term Memory have been investigated The basics of these methods are discussed in detail. Theoretical Background Naive Bayes It is an intuitive approach and has good ability to work with small data with lower computation time for training and predic- tion [22]. It uses Bayes theorem for finding the probability of the event. Naive Bayes classifier is based on equation (1). P (Y/X1, X2, . . . , X2) = P (X1, X2, . . . , Xn/Y ) × P (Y ) P(X1, X2, . . . , Xn) (1) Here P (Y/X1, X2, . . . , X2) is called the posterior probabil- ity of an output class Y given input features X1, X2, . . . , Xn. Figure 1: Sentiment analysis Process ¶ (X1,X2, . . . ,Xn/Y) is the likelihood of input features X1, X2, . . . , Xn given their class Y. P(Y) is the prior probability of class Y. P(X1, X2, . . . , Xn) is the marginal probability. The common distributions are the normal (Gaussian), multinomial, and Bernoulli distributions used in Naı̈ve Bayes Classifier. Logistic Regression (LR) LR is a most widely used binary classifier which uses lo- gistic function to predict the probability of observation output value (Yi) with input Xi [23]. If P(Yi/Xi) is greater than 0.5 then predict class 1 otherwise considered it as 0. The logistic function is defined by equation (2). P(YI/X) = 1 1 + e−(1 +2 x) (2) where 1+2 are learning parameters, x is the training data, Y is observation output value and e is the Euler’s number Support Vector Classifier (SVC) SVC classifies the data by finding hyper plane that maxi- mize the margin between the predicted classes in the training data [24]. Support vector classifier is represented by equation (3). f (x) =0 + ∑ j∈S ( j × K(x j, x j)) (3) 388 Umarani et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 385–394 389 Where K is the kernel function which is used to compare the similarity of (x j, x j) observations and j is the learning pa- rameter, S is the set of support vector observations. SVC uses linear kernel or radial basis function kernel to create a hyper plane decision boundary. Decision tree classifier Decision tree classification is a non-parametric supervised method suitable for both classification and regression problem. Decision tree contain set of decision nodes, each node spec- ifies decision rule which create split to decrease the impurity recursively until all leaf nodes are belonging to specific class. Decision tree classifier uses various split quality measures like Gini impurity or Entropy or MAE (Mean Absolute Error) to de- crease the impurity at a node. By default, It uses Gini impurity which is calculated by equation (4). G (n) = 1 − c∑ i−1 P2i (4) Where G(n) is the Gini- impurity at node n and Pi specifies the proportion of observed class c at node n. Random Forest Random Forest is a kind of ensemble learning method in which many decision trees are trained and each tree carries bootstrapped sample of observations called out of bag observa- tions. For every observation, Random Forest learning algorithm calculates the overall score by comparing the observation’s true value with the prediction from a subset of trees not trained us- ing that observation. This overall score is taken as a measure of random forest performance. K-Nearest Neighbor KNN is simple and most widely used classifier in super- vised machine learning [24,25]. KNN algorithm first identifies the K closest neighbors based on the distance metrics and K- neighbors predict their classes based on k observations. Most widely used distance metrics as Euclidean or Manhattan dis- tance or Minkowski distance. They are defined by equation (5), (6) and (7). deuclid = √√ n∑ i=1 (xi − yi) 2 (5) dmanhat = n∑ i=1 |(xi − yi) | (6) dminkowski = ( n∑ i=1 |xi − yi | p) 1 p (7) where xi and yi are the observations and p is the hyper pa- rameter Convolution Neural Network It is a type of feed forward neural network called ConvNet (Convolution Network) which uses hierarchical structure for fast feature extraction and classification [26,27]. Most impor- tant layers form a CNN is shown in Figure 2. 1. Convolution Layer - It extracts the features and creates feature maps. Convolution is a mathematical operation which is a measure of two overlapping function. It uses filter or kernels to detect important feature in the input. The number of output features (No) is calculated by equa- tion (??) based on number of input features (Ni) and con- volution properties like kernel size k , padding size p and stride size s. No = ⌊ Ni + 2 p − k s ⌋ + 1 (8) Here, kernel size specifies the size of filter, stride spec- ifies the rate at kernel increases and padding adjust the size of input according to the requirement of input ma- trix. When kernel detects the important features then it is stored in a feature map. Zero padding adds many zeros to fit the input matrix with kernel size. If Zero padding is used in convolution layer called wide convolution oth- erwise it is called narrow convolution. Other variants are zero padding called valid padding which keeps only valid parts and ignore other parts. 2. Pooling layer - It progressively reduces size of input rep- resentation and the number of weights needed. Hence it reduces overall number of input features and computation in the network as well as controls the over fitting. Pool- ing layer reduce the dimensionality of feature map with- out reducing important information. It uses max pool- ing, average pooling or sum pooling technique to reduce dimensionality. Max pooling takes the maximum value from the feature map. Average pooling finds the average value from the feature map and Sum pooling finds the sum of all values from the feature map. The new values form a matrix called pooled feature map. The pooling process is also called sub sampling or down sampling. 3. Flattening Layer – It convert the multidimensional data to single dimensional data. Flattening Layer is connected to fully connected layer for classification. 4. Fully Connected Layer – In Fully connected Layer, neu- rons have full connections to all activations in neurons of previous layer. Fully connected layers classify the fea- ture which is obtained from the convolution and pooling layers into various classes based on the training set. It uses soft max activation function in output layer for text classification [33].Soft max function [24] is defined by equation 9. It takes input vector x j of n real numbers and normalize the output into a range [0,1]. so f tmax ( x j ) = ex j∑n k=1 e x j (8) 389 Umarani et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 385–394 390 Figure 2: Layers in Convolution Neural Network Figure 3: Hidden state Transformation in LSTM cell Long Short Term Memory LSTM is a type of feedback or recurrent neural network designed to solve various sequential and time series problems [28,29]. LSTM takes current observation and previous obser- vation as input and store information in a gated cell or memory cell. Input data is propagated to LSTM layers to make a predic- tion. LSTM layer contain set of recurrently connected blocks, each one carries memory cell and three gates namely input, out- put and forget gate. Each memory cell has input weight, out- put weight and hidden state are used to process the input data. LSTM control the data flow passes through cell with the help of gates. Input gate describe how much of newly calculated state for present input, Forget gate describe how much of previous state passes through it and output gate describe how much of internal state passes to next layer. These gates help to adjust the LSTM hidden state and every time, cell state c is calculated based on hidden state g, previous cell state, forget gate and in- put gate. Finally hidden state at time t is calculated by cell state with output gate. The transformations applied for hidden state is shown in Figure 3. The calculations done by the hidden state transformation are described by the following equations i = ((Wi × ht−1) + (Ui × xt)) (9) f = (( W f × ht−1 ) + ( U f × xt )) (10) o = ((Wo × ht−1) + (Uo × xt)) (11) g = tanh (( Wg × ht−1 ) + ( Ug × xt )) (12) ct = ((Ct−1 ◦ f ) + (g ∗ i)) (13) ht = tanh (ct) ◦o (14) Here i,f,o,g, ctand ht refer input gate , forget gate ,output gate and hidden state at time t-1 ,cell state at time t and hidden state at time t. Here is the sigmoid function which helps to adjust the output of gates between value 0 and 1. W, U are the weight matrix and transition matrix which helps to reduce the number of parameters learned by LSTM. xt refer input at time t. The present input xt and hidden state ht−1 are used to calculate internal hidden state g. Finally hidden state ht is calculated from ct and output gate value o. 4. Results and Discussion Restaurant reviews from Kaggle were taken to examine the various supervised learning techniques for assessing the sen- timent analysis process. This data set contains restaurant re- view text containing 1000 text reviews. Here Anaconda Python platform is used to evaluate and pre-process the restaurant re- views. 70% of the reviews are used for training, while 30% are used to test the supervised learning technique. Nltk is used for pre-processing, and Keras, Tensor flow (backend) is used to create LSTM (RNN with memory) and CNN neural network models[30]. Experiments are carried out by Google Collabo- ratory which provide python development environment and run code in Google cloud. The Figure 4 shows the five rows of initial configuration of restaurant data set which contain 1000 reviews and its opinion classification positive or negative. The statistical summary of restaurant data set is shown in Figure 5. Evaluation Parameters Precision, recall, F1-score, accuracy, AUC score, and train- ing time were used to assess the classifier’s performance [31]. They are calculated by Precision = T P T P + F P (15) 390 Umarani et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 385–394 391 Figure 4: First five rows of Restaurant Review Data set Figure 5: Statistical summary of dataset Recall = T P T P + F N (16) F1 score = 2 × (Recall × Precision) Recall + Precision (17) Accuracy = (T P + T N) (T P + F P + F N + T N) (18) Here True positive (TP) refers to restaurant reviews that were initially classified as positive and are also expected to be positive. False Positive (FP) reviews are those that are initially identified as negative but are predicted to be positive. True Neg- ative (TN) refers to restaurant reviews that were initially cate- gorized as negative and that the classifier predicted to be nega- tive. False Negative (FN applies to restaurant reviews that are initially positive but are expected to be negative. AUC score specifies the area under ROC curve from prediction score. Result analysis of Machine learning Technique for Sentiment Analysis Initially, pre-processing is carried out in sentiment reviews by removing non-character data such as digits and symbols, as well as punctuation and converts the sentence into lower- case. After preprocessing, cleaned text reviews are converted to numerical data that contain sentiment tokens and sentiment score called feature vector. The feature vector is formed TF- IDF vectorizer. The TF-IDF stands for Term-Frequency Times Inverse Document-frequency and it assign weight to each word based on how often it appears in the review text. The TF-IDF refer term-frequency times inverse document-frequency which assign weight to each word based on the frequency of that word appeared in review text. After extracting features with a vec- torizer, six machine learning classifiers are used for sentiment Figure 6a: ROC curve of Multinomial Naı̈ve Bayes Figure 6b: ROC curve of Bernouli Naı̈ve Bayes analysis: naive bayes, logistic regression, random forest, lin- ear SVC (Support Vector Classifier), K-nearest neighbor and decision tree. The performance of classifier is assessed by pre- cision, recall, F1-score, accuracy, AUC score, ROC curve and training time. The classification report of classifier is shown in Table 1. From this table, the highest AUC score obtained for Naı̈ve Bayes Classifier is 0.7642. So Naive Bayes model pro- vide better prediction compared to other machine learning clas- sifier for this restaurant data set. The table also shows that time taken for Naive bayes model is low compared to other machine learning algorithms. ROC curve of machine learning classifier is depicted in Figures 6a to 6f. Result analysis of deep learning techniques in Sentiment Anal- ysis Sentiment analysis is carried out by deep learning techniques CNN and LSTM. After pre-processing the review text are con- verted into tokens. The result of tokenizer is shown in Fig- ure 7. Following tokenization, the sentiment text is passed to word2vec model which converts word to vectors. The total number of words obtained for training data is 5118 words total, with a vocabulary size of 1839 and maximum sentence length is 18, while the total number of words obtained for test data is 391 Umarani et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 385–394 392 Table 1: Performance Analysis of Machine Learning Classifiers for Sentiment analysis Classifier Class Classification Report Accuracy AUC Score Time to train (Sec- onds) Precision Recall F1-score Support Train Test Naive Bayes Multinomial 0 1 0.74 0.79 0.78 0.75 0.76 0.77 256 244 0.76 0.7633 0.7642 0.009278 Naive Bayes Bernouli 0 1 0.72 0.79 0.79 0.73 0.76 0.76 256 244 0.76 0.756 0.758 0.020505 Logistic Re- gression 0 1 0.67 0.83 0.87 0.61 0.75 0.70 256 244 0.73 0.73 0.736 0.06633 Random Forest 0 1 0.65 0.89 0.93 0.54 0.76 0.67 256 244 0.72 0.7233 0.732 0.57832 Linear SVC 0 1 0.69 0.80 0.82 0.67 0.75 0.73 256 244 0.74 0.74 0.743 0.014712 KNN 0 1 0.70 0.74 0.72 0.71 0.71 0.72 256 244 0.72 0.716 0.716 0.079166 Decision Tree 0 1 0.65 0.77 0.80 0.61 0.72 0.68 256 244 0.70 0.70 0.704 0.1707437 Figure 6c: ROC curve of Logistic Regression Figure 6d: ROC curve of Decision Trees 574 words, with a vocabulary size of 415 and maximum sen- tence length of review text is 15. Figure 6e: ROC curve of KNN Figure 6f: ROC curve of Random Forest The CNN model passes the input data to series of layers. It 392 Umarani et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 385–394 393 Result of Tokenizer Figure 7a: ROC curve of CNN Figure 7b: ROC curve of LSTM uses convolution layer to extract features from input data, max pooling layer to reduce the dimensionality of trainable param- eters and a sigmoid activation function in output dense layer. Similarly, the input data in the LSTM model is passed to em- bedding layer, spatial dropout, LSTM, and output layer. The spatial dropout rate is taken as 0.2 for avoid over fitting. For compilation, both the CNN and LSTM model use the Adam optimizer. The classification report of CNN and LSTM classi- fier is shown in Table 2. The ROC curve of CNN and LSTM is shown in Figure 7a and 7b. Finally K-fold cross validation (K=10) is carried out to eval- uate the machine learning and deep learning algorithm with ran- Figure 8: Accuracy score of Machine learning and Deep Learning al- gorithms dom seed = 20. The results of accuracy score for each algorithm obtained is shown in Figure 8. This box plot shows the spread of accuracy score across each cross validation fold of these classifiers. From this box plot, the mean value of accuracy score obtained for Bernoulli Naive ayes (BNB) is 0.7714, Multinomial Naı̈ve Bayes (MNB) is 0.767, Logistic regression (LR) is 0.762, Random forest (RF) is 0.738, Linear Support Vector Machine (LSVC) is 0.748 K nearest Neighbor (KNN) is 0.752 and Decision tree (DT) is 0.72. The mean score value of accuracy score for LSTM is 0.823 and CNN is 0.828. Hence deep learning algorithm (LSTM and CNN) provide high prediction compared to machine learn- ing classifier algorithms. According to this experimental study, machine learning clas- sifiers such as naive bayes, logistic regression, SVC, and KNN are faster to train than deep learning models such as CNN and LSTM. The results of the experiment show that LSTM and CNN take longer to train but have higher accuracy in both train- ing data (98%) and test data (84%) for this restaurant dataset. 393 Umarani et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 385–394 394 Table 2: Performance Analysis of Deep learning techniques for Sentiment Analysis Classifier Class Classification Report Accuracy AUC Score Time to train (Sec- onds) Precision Recall F1-score Support Train Test CNN 0 1 0.83 0.76 0.79 0.80 0.81 0.78 256 244 0.98 0.845 0.84 7.3 LSTM 0 1 0.80 0.73 0.67 0.76 0.73 0.74 256 244 0.96 0.77 0.748 9.08 5. conclusion Sentiment analysis on restaurant data set is carried out to classify the opinion as positive or negative. Initially prepro- cessing is carried out to reduce the features and speed up the classification task. The classification task is performed by ma- chine learning methods (Multinomial Naive Bayes, Bernoulli Naive Bayes, Logistic Regression, Random Forest, SVC, KNN, Decision Tree) and deep learning methods (LSTM and CNN). Machine learning methods use bag of model approach (TFID- Vectorizer) to convert text into vector form and Deep learning methods use word embedding method for converting text into vector form. The classifier performance is evaluated by vari- ous metrics such as precision, recall, f1 score, AUC score and time taken for training. The accuracy score of each classifier is tested by k-fold cross validation technique. From these find- ings of the experiments, neural network-based learning has a higher training accuracy and a longer running time compared with machine learning classifier. For future work, improve the performance of classifier by adopting various feature selection technique and analyze the text prediction on multilingual texts. Acknowledgments The authors gratefully appreciate the anonymous reviewers for their constructive contributions to this work. References [1] P. Poomka, N. Kerdprasop & K. Kerdprasop, “Machine Learning Ver- sus Deep Learning Performances on the Sentiment Analysis of Product Reviews” International Journal of Machine Learning and Computing 11 (2021) 103, doi: 10.18178/ijmlc.2021.11.2.1021. [2] K. Klimiuk, A. Czoska, K. Biernacka & L. Balwicki, “Vaccine Misinfor- mation on Social Media–Topic-Based Content and Sentiment Analysis of Polish Vaccine-Deniers Comments on Face book”, Human Vaccines & Immunotherapeutic 17 (2021) 2026. [3] H. Tsaniya, R. Rosadi & A. S. Abdullah, “Sentiment Analysis towards Jokowis Government Using Twitter Data with Convolutional Neural Network Method”, Journal of Physics Conference Series, 1722 (2021) 012017,doi:10.1088/1742-6596/1722/1/012017. [4] R. D Endsuy, “Sentiment Analysis between VADER and EDA for the US Presidential Election 2020 on Twitter Datasets”, Journal of Applied Data Sciences 2 (2021) 8. [5] M. Bibi, W. Aziz, M. Almaraashi, I. H. Khan, M. S. A. Nadeem & N. Habib, “A Cooperative Binary-Clustering Framework Based on Majority Voting for Twitter Sentiment Analysis”, IEEE Access 8 (2020) 68580. [6] R. Cekik & S. Telceken, “A New Classification Method Based on Rough Sets Theory”, Soft Computing 6 (2018) 1881. [7] B. Peng, J. Wang & X. Zhang, “Adversarial Learning of Sentiment Word Representations for Sentiment Analysis”, Information Sciences 541 (2020) 426. [8] X. Tan, Y. Cai, J. Xu, H. F Leung, W. Chen & Q. Li, “Improving Aspect- Based Sentiment Analysis via Aligning Aspect Embedding”, Neuro com- puting 383 (2020) 336. [9] A. Jain & V. Jain, “Sentiment Classification Using Hybrid Feature Se- lection and Ensemble Classifier” Journal of Intelligent & Fuzzy Systems, 4(2021) 221. [10] P. Kalaivani & K. L. Shunmuganathan, “Sentiment Classification of Movie Reviews by Supervised Machine Learning Approaches”, Indian Journal of Computer Science and Engineering 4 (2013) 285. [11] M. Ghosh & G. Sanyal, “Performance Assessment of Multiple Classifiers Based on Ensemble Feature Selection Scheme for Sentiment Analysis”, Applied Computational Intelligence and Soft Computing 2018 (2018) 10. [12] A. P. Rodrigues & N. N. Chiplunkar, “A New Big Data Approach for Topic Classification and Sentiment Analysis of Twitter Data”, Evolution- ary Intelligence 2 (2019)11. [13] Z. Jianqiang, G. Xiaolin & Z. Xuejun, “Deep Convolution Neural Net- works for Twitter Sentiment Analysis”, IEEE Access 6 (2018) 23253. [14] W. Li, L. Zhu, Y. Shi, K. Guo, & E. Cambria, “User reviews: Sentiment analysis using lexicon integrated two-channel CNN–LSTMfamily mod- els”, Applied Soft Computing 94 (2020)106435. [15] A. S. Imran, S. M. Daudpota, Z. Kastrati & R. Batra, “Cross-Cultural Po- larity and Emotion Detection Using Sentiment Analysis and Deep Learn- ing on COVID-19 Related Tweets”, IEEE Access 8 (2020) 181074. [16] S. Rani, N. S. Gill & P. Gulia, “Survey of Tools and Techniques for Sen- timent Analysis of Social Networking Data”, International journal of Ad- vanced computer Science and applications 12 (2021) 222. [17] R. Cekik & A. K. Uysal, “A novel filter feature selection method us- ing rough set for short text data”, Expert Systems with Applications 160 (2020) 113691. [18] I. S. Ahma, A. B. Azuraliza & M. R. Yaakub, “A review of feature selec- tion in sentiment analysis using information gain and domain specific on- tology”, International Journal of Advanced Computer Research 9 (2019) 283. [19] C. Albon , “Machine Learning with python cook book : Practical solu- tions from preprocessing to deep learning”, OReilly media (2018) 366. [20] Z. Wu & S. King, “Investigating gated recurrent networks for speech syn- thesis”, IEEE International Conference on Acoustics, Speech and Signal Processing (2016) 5140. [21] B. Peng, J. Wang & X. Zhang, “Adversarial learning of sentiment word representations for sentiment analysis”, Information Sciences 541 (2020) 426. [22] Z. Jianqiang, G. Xiaolin & Z. Xuejun, “Deep convolution neural networks for twitter sentiment analysis” , IEEE Access 6 (2018)23253. [23] N. Isnaini, M. S. Mubarok & M. Y. A. Bakar, “A multi-label classifica- tion on topics of Indonesian news using K-Nearest Neighbor”, Journal of Physics: Conference Series 1192(2019) 012027. [24] T. Anuprathibha & C. S. KanimozhiSelvi, “Enhanced Medical Tweet Opinion Mining using Improved Dolphin Echolocation Algorithm Based Feature Selection”, International journal of Innovative Technology and Exploring engineering 2(2019)20. [25] H. Zikang, Y. Yong, Y. Guofeng & Z. Xinyu, “Sentiment analysis of agri- 394 Umarani et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 385–394 395 cultural product ecommerce review data based on deep learning”, Inter- national Conference on Internet of Things and Intelligent Applications, 27(2020) 7. [26] M. M. Ali, “Arabic sentiment analysis about online learning to mitigate covid-19”, Journal of Intelligent Systems 30(2021) 524. [27] U. Naseem, I. Razzak, M. Khushi, P. W. Eklund & J. Kim, “Covidsenti: A large-scale benchmark Twitter data set for COVID-19 sentiment analy- sis”, IEEE Transactions on Computational Social Systems 29(2021)175. [28] Z. Wang, H. Wang, Z. Liu & J. Liu, “Rolling Bearing Fault Diagnosis Using CNN-based Attention Modules and Gated Recurrent Unit”, Global Reliability and Prognostics and Health Management 7(2020) 6. [29] A. Vieira & W. Brandao, “Evaluating Acceptance of Video Games us- ing Convolutional Neural Networks for Sentiment Analysis of User Re- views”, Proceedings of the 30th ACM Conference on Hypertext and So- cial Media 2(2019) 273. [30] K. Hirota & F. Masahiro, “Efficient Attention Mechanism by Softmax Function with Trained Coefficient”, IEICE Technical Report 339 (2021) 52. 395