International Journal of Interactive Mobile Technologies (iJIM) – eISSN: 1865-7923 – Vol 17 No 09 (2023) Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques https://doi.org/10.3991/ijim.v17i09.38755 Rafeef A. Hameed1(), Wael J. Abed2, Ahmed T. Sadiq1 1 Computer Sciences Department, University of Technology, Baghdad, Iraq 2 Computer Techniques Engineering Department, Al-Mustaqbal University College, Hillah, Iraq cs.21.23@grad.uotechnology.edu.iq Abstract—The subject of sentiment analysis through social media sites has witnessed significant development due to the increasing reliance of people on social media in advertising and marketing, especially after the Corona pandemic. There is no doubt that the prevalence of the Arabic language makes it considered one of the most important languages all over the world. Through human com- ments, it can know things if they are positive or negative. But in fact, the com- ments are many, and it takes work to evaluate the place or the product through a detailed reading of each comment. Therefore, this study applied deep learning approaches to this issue to provide final results that could be utilized to differen- tiate between the comments in the dataset. Arabic Sentiment Analysis was used and gave a percentage for each positive and negative commentary. This work used eight methods of deep learning techniques after using Fast Text as embed- ding, except Ara BERT. These techniques are the transformer (AraBERT), RNN (Long short-term memory (LSTM), Bidirectional long-short term memory (BI- LSTM), Gated recurrent units (GRUs), Bidirectional Gated recurrent units (BI- GRU)), CNN (like ALEXNET, proposed CNN), and ensemble model (CNN with BI-GRU). The Hotel Arabic Reviews Dataset was utilized to test the models. This paper obtained the following results. In the Ara BERT model, the accuracy is 96.442%. In CNN, like the Alex Net model, the accuracy is 93.78%. In the sug- gested CNN model, the accuracy is 94.43%. In the suggested LSTM model, the accuracy is 95%. In the suggested BI-LSTM model, the accuracy is 95.11%. The accuracy of the suggested GRU model is 95.07%. The accuracy of the suggested BI-GRU model is 95.02%. The accuracy is 94.52% in the Ensemble CNN with BI-GRU model that has been proposed. Consequently, the AraBERT outper- formed the other approaches in terms of accuracy. Because the AraBERT has already been trained on some Arabic Wikipedia entries. The LSTM, BI-LSTM, GRU, and BI-GRU, on the other hand, had comparable outcomes. Keywords—Arabic sentiment analysis, NLP, deep learning, embedding, CNN, RNN, AraBERT 70 http://www.i-jim.org https://doi.org/10.3991/ijim.v17i09.38755 Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques 1 Introduction Social media today offers a fantastic platform for expressing thoughts and exchang- ing firsthand knowledge about various occasions, goods, and services. For internet us- ers to choose the best service or product to buy, such helpful information sources are highly interesting. Indeed, opinions are significant because they are impartial, inde- pendent, and founded on accurate user experiences with a particular good or service. The feedback from users is also valuable to businesses since it allows them to gauge client happiness and enhance the caliber of their goods and services. As a result, it is simple to gather data and distinguish between positive and negative emotions, which is a hotly debated study topic. Comparatively, few studies have been done on sentiment analysis of Arabic literature because Arabic is difficult to learn. This is because the majority of the literature on sentiment analysis focuses on the English language. Arabic Sentiment Analysis (ASA) has been an important research subject due to the recent enormous `more challenging [1]. There is little study on Arabic-related feelings, atti- tudes, emotions, and ideas [2]. The main goal of the ASA assignment is to assign Arabic text to predetermined classes based on its content. Text representation is an important process that impacts ASA performance, and as contextual embedding models can ac- count for both context and word meaning, they are useful for learning universal sen- tence representations. Often known as opinion mining, SA is the process of determining if a writer has a negative or positive attitude about a certain thing [3]. The main contri- bution of this paper is present a proposed CNN model for ASA. In addition, this paper uses other deep learning models for ASA that are based on AraBERT, CNN, LSTM, BiLSTM, and GRU, BiGRU. Fast Text [4, 5] embedding has been used for text repre- sentation. Datasets from various sources were used to train the deep-learning models. The following components comprise this article: Section two, which contains the rele- vant work; Section three, which gives the recommended technique; Section four, which has the experimental findings; and Section five, which contains the conclusion. 2 Related work In [6], a one-layer CNN architecture, two LSTM layers, and a deep learning model for Arabic sentiment analysis were expertly coupled. Fast Text word embedding is used to support the input layer of this design. The investigations on a multi-domain corpus revealed that this model performed very well in terms of precision, recall, F1-Score, and accuracy, scoring 89.10%, 92.14%, 92.44%, and 90.75%, respectively. The impact of word embedding techniques on Arabic sentiment categorization was carefully exam- ined in this study, and it was found that the Fast Text model is a better choice for learn- ing semantic and syntactic information. NB and KNN classifiers are used to evaluate the effectiveness of the proposed model. The results showed that SVM is the best-per- forming classifier, with an accuracy improvement of up to + 3.92%. Because of the effectiveness of the CNN in features extraction and the recurrent nature of LSTM. In [7], the author employed a long short-term memory recurrent neural network (LSTM), a convolutional neural network (CNN), and an ensemble model incorporating both iJIM ‒ Vol. 17, No. 09, 2023 71 Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques models to extract semantic information for short Arabic text at the word and character levels. A dataset comprised of dialectal Arabic corpora and Modern Standard Arabic corpora that were gathered from Twitter was used to train and test the models. The values obtained ranged from 89.7% to 69.7%. The ensemble model had the test set's highest accuracy rating of 96.7%. In [8], Compared and evaluated various sentiment analysis models on Arabic tweets in the article. Performance of four deep learning models—CNN, LSTM, BI-LSTM, GRU, and a hybrid model (BI-LSTM + GRU) with three text representation tech- niques—was empirically evaluated (i.e., AraVec, FastText, AraBERT). The proposed model (BI-LSTM + GRU) using the AraBERT model has the best accuracy of these models, coming in at 0,9434. The examination of the deep learning model outputs demonstrates unmistakably that for our dataset, the hybrid network performs better than other models for various word embeddings, and their accuracy is higher than that of other models. In [9], offered a tagged corpus of 40k Arabic tweets on a variety of sub- jects, such as politics, sports, health, sarcasm, proverbs, and poetry. The article also used three deep-learning methods for the suggested corpus. In particular, the paper tested how well the corpus performed using CNN, LSTM, and RCNN. The LSTM out- performed CNN with an accuracy of 75.72% and RCNN with an accuracy of 78.46% using the word embedding approach as the input layer to the three models. With an accuracy of 88.05% after using a data augmentation strategy on the corpus, LSTM has demonstrated a further improvement. In [10], They solved the issue with Arabic Text Sentiment Analysis. This study takes advantage of a deep learning model's performance-improving effects on the Arabic Sentiment Analysis system. To forecast the sentiment of the Arabic text, they employed the BI-LSTM deep learning model, which has the capacity to extract contextual infor- mation. On six benchmark datasets, experiments are run to gauge how well the pro- posed methodology performs. The outcomes demonstrate the efficiency of BI-LSTM in handling both forward and backward dependencies from feature sequences to exe- cute sequential data models and to further extract contextual information. Comparisons with various cutting-edge baseline techniques show that the deep learning model is typ- ically more productive and successful in terms of classification quality. Additionally, the model significantly outperforms the findings of the previous models in terms of Accuracy and F1-measure. In [11], They have put into practice an ensemble model based on the AraBERT and CAMe LBERT transformer language models. The balanced dataset, which is made up of reviews of contemporary standard Arabic books, was used to evaluate the suggested ensemble model. Additionally, the proposed model was trained on top of the Twitter dataset, Gold dataset, and ASTD dataset in order to further examine the performance of the model. Compared to the two independent transformer-based models and majority vote. In [12], carried out, an Arabic binary sentiment categorization. They used prepro- cessing at first to clean up the incoming texts. The LSTM layer has then been fed texts that have been represented as numerical vectors using a word embedding layer. After that, a SoftMax layer was added to predict the text's polarity. The studies had accuracy ranging from 80% to 82%, which were pretty good results. 72 http://www.i-jim.org Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques 3 Deep learning Deep learning techniques have created a major breakthrough in artificial intelligence in general and natural language processing in particular. There are many deep learning techniques used to analyze Arabic sentiments, such as the use of CNN techniques and RNN techniques that include LSTM, BI-LSTM, GRU, BI-GRU, and Transformer tech- niques that have made a major breakthrough in the field of sentiment analysis. CNN: Neural Convolutional Networks CNNs are feedforward neural networks that were initially created for computer vision applications [13, 14] and have demonstrated success in NLP tools[6]. They use a layer that utilizes locally applied convolving filters. Convolution is used instead of generic matrix multiplication, which is a feature of con- ventional neural networks. It becomes one of the DL algorithms that runs the quickest as a result of the decrease in the number of weights and the consequent decrease in network complexity. CNN furthermore has the benefit of requiring less preprocessing. This opened the way for its use in many other areas, including NLP, voice and hand- writing recognition, and picture. LSTM: Long Short-Term Memory (LSTM) networks, a form of recurrent neural network (RNN), are effective at Learning tasks involving sequential input. It resolves these problems by pointing up extensive temporal dependencies. Due to its complexity and module repetition, LSTM is resistant to the optimization issues affecting RNN's basic form [15]. The basic building blocks of the LSTM architecture are a memory cell that maintains its state across time and nonlinear gating devices that control information flow into and out of the cell [16]. Three of the most important gates are input, forget, and output gates. The input block is linked to every gate as well as the output block. The Figure 1 illustrate the component of LSTM framework. Fig. 1. Component of the LSTM framework Bi-LSTM: A bidirectional LSTM (BiLSTM) layer is used to learn the long-term bidirectional relationships between time steps of time series or sequence data. These dependencies can be useful [17, 18] when you want the network to learn from the full- time series at each time step. iJIM ‒ Vol. 17, No. 09, 2023 73 Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques GRU: The gated recurrent unit (GRU) framework was proposed in 2014 by [19]. Like LSTM, GRU has gating units that control the flow of information. In GRU, all contents are openly available, in contrast to LSTM networks where the gate limits how much memory may be used by other network nodes. It has been noted that GRU out- performs LSTM in all areas except language modeling [20]. Additionally, the perfor- mance gap between the LSTM and GRU networks can be narrowed by initializing the forget gate bias of the LSTM to one. Arabic NLP tasks have already been handled using GRU, notably [21]. The Figure 2 illustrate the component of the GRU framework. Fig. 2. Component of the GRU framework Bi-GRU: The GRU neural network uses recurrent structures to store and retrieve data over long periods of time, but because it only accesses historical data, its perfor- mance may not be as strong in practice as it is in theory [22]. The bidirectional GRU (Bi-GRU) network has a future layer where the data sequence is in the other direction to get around this problem. This network employs two hidden layers that are connected in the output layer in order to harvest information from both the past and the future [23]. These characteristics allow the bidirectional structure to aid the recurrent neural networks in extracting additional information, which increases the efficiency of the learning process [24, 25]. Transformer: In [26], the transformer (TRANS) idea was originally put out. The transformer is made comprised of the encoder and decoder's parts. The encoder con- verts the input sequence into a higher-dimensional space. The output sequence is sub- sequently generated by the decoder using the mapped input. For translation tasks, it is said to learn far more quickly than recurrent and convolutional systems [26]. With the clear goal of word prediction from context, transformers (feedforward architecture) provide quick training on huge datasets. Even though creating such models is expen- sive, many of them have previously been made public and are useful for related fields like SA. It has been suggested that certain smaller, supervised datasets may be used to optimize these models. 74 http://www.i-jim.org Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques 4 Proposed methodology The proposed system is based on several stages. Firstly, the dataset was chosen. In this paper, the HARD dataset was used. The HARD dataset contains two labels, posi- tive and negative. Secondly, preprocessing the dataset and preprocessing procedures required to get the data ready for the Sentiment analysis task are then discussed. Thirdly, the appropriate embedding was chosen. In this paper, FastText embedding was chosen with all deep learning techniques except AraBERT. Fourth and finally, Various deep learning methods were used to complete the sentiment analysis task. AraBERT technology was used within the transformer and CNN. Two models were worked on, like Alexnet and proposed CNN. And RNN technologies were worked on four types: LSTM, BI-LSTM, GRU, BI-GRU. Finally, an Ensemble Model was made between CNN-RNN and called CNN with BI-GRU model. The Figure 3 illustrates the whole stages for the proposed system, and the deep learning strategies in this work employed in our evaluation are described. Fig. 3. Proposed methodology 4.1 Datasets HARD. The Hotel Arabic Reviews Dataset (HARD) [27] is the dataset that was used. There are 93700 Arabic-language hotel reviews in this dataset. The hotel reviews were acquired in June and July 2016 from the Booking.com website. The evaluations employ both dialectal Arabic and contemporary standard Arabic. This study will use a balanced dataset (illustrated in Figure 4) with both positive and negative assessments. The ratings are mapped using both positive ratings (4 and 5) and negative ratings (ratings 1 & 2). There are no reviews that are impartial. Table 1 illustrates the number of reviews for the classes. Table 2 shows some reviews of the HARD Dataset. iJIM ‒ Vol. 17, No. 09, 2023 75 Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques Table 1. Illustrates the stats for the HARD dataset Fig. 4. Balanced HARD dataset Table 2. Sample of Hotel Arabic Review Dataset with English Translation Rating Review ال انصح”. لم یعجبني شي. لم یعجبني شي“ 1 "I do not recommend." I didn't like it. I didn't like it 2 مالي وجوالي ولم یتم التعاون معي ضعیف. . عدم األمان ألني فقدت مبلغ weak. . Insecurity because I lost my money and my mobile phone and no cooperation was done with me جید”. ممتاز من جمیع النواحي. ال یوجد مواقف للسیارات“ 4 "Good". Excellent in all aspects. There is no parking واسعة و المنظر الرائع و الھدوء والغرفةدمة ممتازة". الخ 5 . Excellent". The service and the room are spacious, and the view is wonderful and quiet. 4.2 Preprocessing Reviews contain many words that do not affect the analysis of feelings, whether negative or positive. It is useful to reduce the length of words and thus reduce the size of the word embedding. The data was cleaned and made ready for processing using the following processes. ─ Step 1: Read the dataset and check for missing values. ─ Step 2: Keep reviewing and rating and drop the rest of the columns. ─ Step 3: Mapping each rating value to the specified class by converting the values of 4 and 5 to positive and the values of 1 and 2 to negative. Review Number of reviews Class 1 14382 Negative 2 38467 Negative 4 26450 Positive 5 26399 Positive All review 105698 76 http://www.i-jim.org Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques ─ Step 4: Remove Arabic stop words except for negative letters. ─ Step 5: Apply the fastText Arabic version. ─ Step 6: Remove diacritics, punctuations. ─ Step 7: Normalize Arabic by converting [ إأآا] to [ ا]and [ ي] to [ ى] and so on. ─ Step 8: Remove repeating Characters such as [ فنااااادق] to [فنادق]. 4.3 CNN model In this paper, two CNN Model was built. The first proposed model consisted of three convolutional layers with one fully connected layer. The second model was built like the Alexnet model and consisted of five convolutional layers with three fully connected layers. The first model. It has one fully connected layer after three convolutional layers, the first layer, receives its input from the embedding layer. This layer's kernel size is 5, and it has 256 filters. The second stage is batch normalization, then the ReLU activation function, the MaxPooling layer, and finally (pool size is 2, and strides are 2). A convo- lution in the second layer gets inputs from the first layer., just like the first layer. It has a 4-size kernel, 512 filters, and 1 stride length. This layer is followed by the MaxPool- ing layer, Batch normalization, and ReLU activation function (pool size is 2 and strides is 2). The third layer consists of 1024 filters, a batch normalization function, a ReLU activation function, a kernel with a size of 5, and a stride of 1. Additionally, it is a convolution layer that takes input from the layer before it. A Dense layer (200 units), Batch normalization, ReLUa activation function, and a Dropout layer to avoid overfit- ting make up the fourth layer, which is made up of four entirely connected layers. The flattened layer that comes after the third layer is where it gets its input. The last layer, the output layer, uses softmax activation. Figure 5 below displays the recommended CNN model. Fig. 5. Proposed Model of CNN The second model, which resembles Alexnet, has three completely linked layers after the first five convolutional layers. The convolution layer, which receives inputs from the embedding layer, is the first layer in the stack. It has 96 filters, an 11-bit kernel, and four steps before being followed by Batch normalization, ReLU activation func- tion, and MaxPooling layer (pool size is 2 and strides is 2). The second layer is similarly a convolution layer that uses the inputs from the first layer. It has 256 filters, a kernel iJIM ‒ Vol. 17, No. 09, 2023 77 Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques size of 5, strides of 1, Batch normalization, ReLU activation function, and MaxPooling as its first four sublayers (pool size is 2 and strides is 2). Convolution occurs in the third layer, which receives input from the second layer. There are 384 filters and a 3 kernel size in this layer. The first stride is followed by batch normalization, the ReLU activa- tion function, and finally, the second stride. The fourth layer is also the convolution layer, which receives its inputs from the layer above. This layer has 384 filters, a kernel size of 3, and strides of 1. Batch normalization and ReLU activation function are the next two layers after this layer. The fifth layer is also the convolution layer, which receives its inputs from the preceding layer. This layer has 256 filters, a kernel size of 3, and strides of 1. Batch normalization, ReLU activation function, and MaxPooling layer are all placed after this layer (pool size is 2 and strides is 2). First, of the fully- connected layers, the sixth layer is composed of a Dense layer (4096 units), followed by Batch normalization, the ReLU activation function, and the Dropout layer to prevent overfitting. It receives its input from the flattened layer that follows after the fifth layer. In order to prevent overfitting, the seventh layer additionally comprises a Dense layer (4096 units), Batch normalization, ReLU activation function, and a Dropout layer. In order to prevent overfitting, the eighth layer additionally includes of a Dense layer (1000 units), Batch normalization, ReLU activation function, and a Dropout layer. The softmax activation function is used in the output layer, the last layer. The Alexnet model is shown in Figure 6. Fig. 6. Proposed model similar to AlexNet 4.4 RNN model Four RNN Model was built. All models are the same in structure except RNN. LSTM or Bi-LSTM or GRU, or Bi-GRU were used in RNN (only one of them must be used). This model contains the LSTM, Bi-LSTM, GRU, or Bi-GRU layer(Units=256), which takes its input from the Embedding layer, followed by the Dropout layer to avoid overfitting, then followed by the MaxPooling layer (pool size is 2 and strides is 2). To prevent overfitting, the following layer is fully-connected layers(Dense), which are composed of a Dense layer (128 units), a ReLU activation function, and a Dropout layer. Afterward, the Dropout layer was followed by the Dense layer (32 units), which was then followed by the Dense layer (64 units), a ReLU activation function, and fi- nally, the Dropout layer. The last layer is the output layer, which uses the softmax ac- tivation. Figure 7 below shows the RNN model. 78 http://www.i-jim.org Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques Fig. 7. RNN Model 4.5 Ensemble CNN with BI-GRU model In this model, BI-GRU is integrated. Where the CNN Model starts with the convo- lution layer (The number of filters in this layer is 100, the kernel size is 3, and the strides is 1) and then followed by the MaxPooling layer (pool size is 2 and strides is 2) and then followed by the Dropout layer The next layer is fully-connected layers, and it takes its input from the flattening layer and consists of Dense layer (100 units) and then fol- lowed by the Dropout layer. At the same time, the second part consists of the BI- GRU(UNITS=256) layer, followed by the dropout layer. Then the two models merge into one model. Figure 8 shows the Ensemble CNN with the BI-GRU model. Fig. 8. Ensemble CNN with BI-GRU model 4.6 AraBERT model In this model was used bert-base-arabertv02-twitter ,Emotional symbols, like emo- jis, have been introduced to the models' lexicon, along with familiar terms that weren't previously present. AraBERTv0.2-Twitter-base/large are new models for Arabic dia- lects and tweets that were developed by extending pre-training on about 60 million Arabic tweets utilizing the MLM task (filtered from a collection of 100M). Figure 9 below shows the AraBERT model. iJIM ‒ Vol. 17, No. 09, 2023 79 Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques Fig. 9. AraBERT Model 5 Experimental results By calculating the system's accuracy, precision, recall, and f1 score, the system was examined. According to the evaluation of the models, the accuracy, precision, recall, and f1-score for the AraBERT model are all 96.442%, 95.5%, and 97.3%,96.4%, re- spectively. The confusion matrix is displayed in Figure 10 below (starting from TN, FP, FN, and TP). Table 3 displays the classification report of AraBERT. Fig. 10. Confusion matrix (AraBERT) Table 3. Classification report of AraBERT Precision value Recall value f1-score value Negative 0.97 0.96 0.96 Positive 0.96 0.97 0.96 Accuracy - - 0.96 macro avg 0.96 0.96 0.96 80 http://www.i-jim.org Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques In CNN, like the AlexNet model, the accuracy is 93.78%, precision is 90.073%, recall is 96.5%, and the f1-score is 93.182%. The confusion matrix is displayed in Fig- ure 11 below (starting from TN, FP, FN, and TP). Table 4 displays the classification report of AlexNet. Fig. 11. Confusion matrix (AlexNet) Table 4. Classification report of CNN Like AlexNet model Precision Recall f1-score Negative 0.96 0.89 0.93 Positive 0.90 0.97 0.93 Accuracy - - 0.93 macro avg 0.93 0.93 0.93 In the proposed CNN model, the accuracy is 94.43%, and precision is 93.352%, and recall is 95.571%, and the f1-score is 94.448 %. The confusion matrix is displayed in Figure 12 below (starting from TN, FP, FN, and TP). Table 5 displays the classification report of proposed CNN. Fig. 12. Confusion matrix (proposed CNN) iJIM ‒ Vol. 17, No. 09, 2023 81 Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques Table 5. Classification report of proposed CNN model precision Recall f1-score Negative 0.96 0.93 0.94 Positive 0.93 0.96 0.94 Accuracy - - 0.94 macro avg 0.94 0.94 0.94 In the proposed LSTM model, the accuracy is 95%, and precision is 94.259 %, and recall is 95.752%, and the f1-score is 95%. The confusion matrix is displayed in Figure 13 below (starting from TN, FP, FN, and TP). Table 6 displays the classification report of LSTM. Fig. 13. Confusion matrix (LSTM) Table 6. Classification report of proposed LSTM model precision Recall f1-score Negative 0.96 0.94 0.95 Positive 0.94 0.96 0.95 Accuracy - - 0.95 macro avg 0.95 0.95 0.95 The accuracy, precision, recall, and f1-score of the suggested BI-LSTM model are 95.11%, 94.937%, 95.218%, and 95.077%, respectively. The confusion matrix is dis- played in Figure 14 below (starting from TN, FP, FN, and TP). Table 7 displays the classification report of BI-LSTM. Fig. 14. Confusion matrix (BI-LSTM) 82 http://www.i-jim.org Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques Table 7. Classification report of proposed BI-LSTM model Precision Recall f1-score Negative 0.95 0.95 0.95 Positive 0.95 0.95 0.95 Accuracy - - 0.95 macro avg 0.95 0.95 0.95 Accuracy, precision, recall, and f1-score for the proposed GRU model are 95.07%, 94.199%, 95.593%, and 95.068%, respectively. The confusion matrix is displayed in Figure 15 below (starting from TN, FP, FN, and TP). Table 8 displays the classification report of GRU. Fig. 15. Confusion matrix (GRU) Table 8. Classification report of proposed GRU Precision Recall f1-score Negative 0.96 0.94 0.95 Positive 0.94 0.96 0.95 Accuracy - - 0.95 macro avg 0.95 0.95 0.95 In the proposed BI-GRU model, the accuracy is 95.02%, precision is 93.89%, recall is 96.22%, and f1-score is 95.041%. The confusion matrix is displayed in Figure 16 below (starting from TN, FP, FN, and TP). Table 9 displays the classification report of BI-GRU. Fig. 16. Confusion matrix (BI-GRU) iJIM ‒ Vol. 17, No. 09, 2023 83 Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques Table 9. Classification report of proposed BI-GRU model Precision Recall f1-score Negative 0.96 0.94 0.95 Positive 0.94 0.96 0.95 Accuracy - - 0.95 macro avg 0.95 0.95 0.95 In the proposed Ensemble CNN with BI-GRU model, the accuracy is 94.52 %, pre- cision is 95.4%, recall is 93.452%, and f1-score is 94.416%. The confusion matrix is displayed in Figure 17 below (starting from TN, FP, FN, and TP). Table 10 displays the classification report of proposed Ensemble CNN with BI-GRU. Fig. 17. Confusion matrix (Ensemble CNN with BI-GRU) Table 10. Classification report of proposed Ensemble CNN with BI-GRU Precision Recall f1-score Negative 0.94 0.96 0.95 Positive 0.95 0.93 0.94 Accuracy - - 0.95 macro avg 0.95 0.95 0.95 The results achieved were demonstrated in the different models. The AraBERT has obtained the most accuracy and convergence in precision value, recall value, and f1- score value, RNN, and The AraBERT achieved an accuracy of 96.442%. Then RNN techniques followed it up with accuracy, as the LSTM accuracy reached 95%, the BI- LSTM accuracy reached 95.11%, the GRU accuracy reached 95.07%, and the BI-GRU accuracy reached 95.02%. Ensemble CNN-with-BI-GRU achieved an accuracy of 94.52%. Finally, CNN techniques achieved less accuracy. In the proposed CNN model, the accuracy is 94.43%, while in CNN, like the AlexNet model, the accuracy is 93.78%. 84 http://www.i-jim.org Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques Fig. 18. Comparison of different models 6 Conclusion In this study, a detailed comparison of deep learning methods for Arabic sentiment analysis was carried out. This comparison is the first of its kind because other studies only took some of the strategies this study suggests into account. The most important feature of the fastText representation is that fast and reliably processes big data. Be- cause of the power of fastText embedding, good results have been shown in CNN and RNN techniques. The proposal used Various models such as AraBERT, CNN, and RNN. AraBERT achieved the highest accuracy because it was pre-trained on Arabic language models. The AraBERT attained an accuracy of 96.442%. Then RNN tech- niques followed it up with accuracy, as the LSTM accuracy reached 95%, the BI-LSTM accuracy reached 95.11%, the GRU accuracy reached 95.07%, and the BI-GRU accu- racy reached 95.02%. Ensemble CNN-with-BI-GRU achieved an accuracy of 94.52%. Finally, CNN techniques achieved less accuracy. In the proposed CNN model, the ac- curacy is 94.43%, while in CNN, like the AlexNet model, the accuracy is 93.78%. The results showed that the like AlexNet model did not achieve high accuracy because the AlexNet was originally designed to process images and was not allocated to texts, but our work used it in the text. In future work, we suggest using an ensemble model be- tween transformer techniques and also suggest adding top layers to AraBERT and freezing some layers that are not very useful, which reduces time. 7 References [1] B. Brahimi, M. Touahria, and A. Tari, “Improving sentiment analysis in Arabic: A combined approach,” J. King Saud Univ. - Comput. Inf. Sci., vol. 33, no. 10, pp. 1242–1250, 2021. https://doi.org/10.1016/j.jksuci.2019.07.011 [2] J. K. Alwan, A. J. Hussain, D. H. Abd, A. T. Sadiq, M. Khalaf, and P. Liatsis, “Political Arabic articles orientation using rough set theory with sentiment lexicon,” IEEE Access, vol. 9, pp. 24475–24484, 2021. https://doi.org/10.1109/ACCESS.2021.3054919 iJIM ‒ Vol. 17, No. 09, 2023 85 https://doi.org/10.1016/j.jksuci.2019.07.011 https://doi.org/10.1109/ACCESS.2021.3054919 Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques [3] A. Khalid Al-Mashhadany, A. T. Sadiq, S. Mazin Ali, and A. Abbas Ahmed, “Healthcare assessment for beauty centers using hybrid sentiment analysis,” Indones. J. Electr. Eng. Comput. Sci., vol. 28, no. 2, p. 890, 2022. https://doi.org/10.11591/ijeecs.v28.i2.pp890-897 [4] E. Grave, P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov, “Learning word vectors for 157 languages,” arXiv [cs.CL], 2018. [5] R. I. Farhan, A. T. Maolood, and N. Hassan, “Hybrid feature selection approach to improve the deep neural network on new flow-based dataset for NIDS,” wjcm, vol. 1, no. 1, pp. 66– 83, 2021. https://doi.org/10.31185/wjcm.Vol1.Iss1.10 [6] A. H. Ombabi, W. Ouarda, and A. M. Alimi, “Deep learning CNN–LSTM framework for Arabic sentiment analysis using textual information shared in social networks,” Soc. Netw. Anal. Min., vol. 10, no. 1, 2020. https://doi.org/10.1007/s13278-020-00668-1 [7] A. Alwehaibi, M. Bikdash, M. Albogmi, and K. Roy, “A study of the performance of embedding methods for Arabic short-text sentiment analysis using deep learning approaches,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 8, pp. 6140–6149, 2022. https://doi.org/10.1016/j.jksuci.2021.07.011 [8] N. Habbat, H. Anoun, and L. Hassouni, “A Novel Hybrid Network for Arabic Sentiment Analysis using fine-tuned AraBERT model,” Int. J. Electr. Eng. Inform., vol. 13, no. 4, pp. 801–812, 2021. https://doi.org/10.15676/ijeei.2021.13.4.3 [9] A. Mohammed and R. Kora, “Deep learning approaches for Arabic sentiment analysis,” Soc. Netw. Anal. Min., vol. 9, no. 1, 2019. https://doi.org/10.1007/s13278-019-0596-4 [10] H. Elfaik and E. H. Nfaoui, “Deep Bidirectional LSTM Network learning-based Sentiment Analysis for Arabic text,” J. Intell. Syst., vol. 30, no. 1, pp. 395–412, 2020. https://doi.org/ 10.1515/jisys-2020-0021 [11] I. E. Karfi and S. E. Fkihi, “An ensemble of Arabic transformer-based models for Arabic sentiment analysis,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 8, 2022. https://doi.org/ 10.14569/IJACSA.2022.0130865 [12] A. Q. Al-Bayati, A. S. Al-Araji, and S. H. Ameen, “Arabic Sentiment Analysis (ASA) using deep Learning approach,” J. Eng., vol. 26, no. 6, pp. 85–93, 2020. https://doi.org/10.31026/ j.eng.2020.06.07 [13] Y. Lecun, “A theoretical framework for back-propagation,” in Proceedings of the 1988 Connectionist Models Summer School, CMU, Pittsburg, PA, Oxford, England: Morgan Kaufmann, 1988, pp. 21–28. [14] J. Q. Kadhim, I. A. Aljazaery, and H. T. H. S. ALRikabi, “Enhancement of online education in engineering college based on mobile wireless communication networks and IOT,” Int. J. Emerg. Technol. Learn., vol. 18, no. 01, pp. 176–200, 2023. https://doi.org/10.3991/ijet. v18i01.35987 [15] J. Schmidhuber, “Deep learning in neural networks: an overview,” Neural Netw., vol. 61, pp. 85–117, 2015. https://doi.org/10.1016/j.neunet.2014.09.003 [16] K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, and J. Schmidhuber, “LSTM: A search space odyssey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 10, pp. 2222– 2232, 2017. https://doi.org/10.1109/TNNLS.2016.2582924 [17] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 13--15 May 2010, vol. 9, pp. 249–256. [18] H. T. S. Alrikabi and H. Tuama Hazim, “Secure chaos of 5G wireless communication system based on IOT applications,” Int. J. Onl. Eng., vol. 18, no. 12, pp. 89–105, 2022. https://doi.org/10.3991/ijoe.v18i12.33817 [19] K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoder–decoder approaches,” in Proceedings of SSST-8, Eighth 86 http://www.i-jim.org https://doi.org/10.11591/ijeecs.v28.i2.pp890-897 https://doi.org/10.31185/wjcm.Vol1.Iss1.10 https://doi.org/10.1007/s13278-020-00668-1 https://doi.org/10.1016/j.jksuci.2021.07.011 https://doi.org/10.15676/ijeei.2021.13.4.3 https://doi.org/10.1007/s13278-019-0596-4 https://doi.org/10.1515/jisys-2020-0021 https://doi.org/10.1515/jisys-2020-0021 https://doi.org/10.14569/IJACSA.2022.0130865 https://doi.org/10.14569/IJACSA.2022.0130865 https://doi.org/10.31026/j.eng.2020.06.07 https://doi.org/10.31026/j.eng.2020.06.07 https://doi.org/10.3991/ijet.v18i01.35987 https://doi.org/10.3991/ijet.v18i01.35987 https://doi.org/10.1016/j.neunet.2014.09.003 https://doi.org/10.1109/TNNLS.2016.2582924 https://doi.org/10.3991/ijoe.v18i12.33817 Paper—Evaluation of Hotel Performance with Sentiment Analysis by Deep Learning Techniques Workshop on Syntax, Semantics and Structure in Statistical Translation, 2014. https://doi.org/10.3115/v1/W14-4012 [20] R. Jozefowicz, W. Zaremba, and I. Sutskever, “An empirical exploration of recurrent network architectures,” in Proceedings of the 32nd International Conference on Machine Learning, 07--09 Jul 2015, vol. 37, pp. 2342–2350. [21] S. Al-Azani and E.-S. El-Alfy, “Emojis-based sentiment classification of Arabic microblogs using deep recurrent neural networks,” in 2018 International Conference on Computing Sciences and Engineering (ICCSE), 2018. https://doi.org/10.1109/ICCSE1.2018.8374211 [22] Y. Deng, H. Jia, P. Li, X. Tong, X. Qiu, and F. Li, “A deep learning methodology based on bidirectional gated recurrent unit for wind power prediction,” in 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA), 2019. https://doi.org/10.1109/ICIEA. 2019.8834205 [23] X. Luo, W. Zhou, W. Wang, Y. Zhu, and J. Deng, “Attention-based relation extraction with bidirectional gated recurrent unit and highway network in the analysis of geological data,” IEEE Access, vol. 6, pp. 5705–5715, 2018. https://doi.org/10.1109/ACCESS.2017.2785229 [24] D. Zhang, L. Tian, M. Hong, F. Han, Y. Ren, and Y. Chen, “Combining convolution neural network and bidirectional gated recurrent unit for sentence semantic classification,” IEEE Access, vol. 6, pp. 73750–73759, 2018. https://doi.org/10.1109/ACCESS.2018.2882878 [25] A. Saleh Hussein, R. Salah Khairy, S. M. Mohamed Najeeb, and H. T. S. Alrikabi, “Credit card fraud detection using fuzzy rough nearest neighbor and sequential minimal optimization with logistic regression,” Int. J. Interact. Mob. Technol., vol. 15, no. 05, p. 24, 2021. https://doi.org/10.3991/ijim.v15i05.17173 [26] A. Vaswani et al., “Attention is all you need,” arXiv [cs.CL], 2017. [27] A. Elnagar, Y. S. Khalifa, and A. Einea, “Hotel Arabic-reviews dataset construction for sentiment analysis applications,” in Intelligent Natural Language Processing: Trends and Applications, Cham: Springer International Publishing, 2018, pp. 35–52. https://doi.org/ 10.1007/978-3-319-67056-0_3 8 Authors Rafeef Abd Al-Ameer obtained a Bachelor's degree in Computer Science from the University of Baghdad in 2014. She is a M.Sc student at the University of Technology (UOT) – Iraq. Wael J. Abed, Prof. Dr. in Computer Techniques Engineering Department, Al-Mus- taqbal University College. Ahmed T. Sadiq is a Professor in the Computer Science Department university of Technology Iraq. He received a B.Sc., M.Sc. & Ph.D. degree in Computer Science from the University of Technology. Article submitted 2023-02-10. Resubmitted 2023-03-28. Final acceptance 2023-03-29. Final version pub- lished as submitted by the authors. iJIM ‒ Vol. 17, No. 09, 2023 87 https://doi.org/10.3115/v1/W14-4012 https://doi.org/10.1109/ICCSE1.2018.8374211 https://doi.org/10.1109/ICIEA.2019.8834205 https://doi.org/10.1109/ICIEA.2019.8834205 https://doi.org/10.1109/ACCESS.2017.2785229 https://doi.org/10.1109/ACCESS.2018.2882878 https://doi.org/10.3991/ijim.v15i05.17173 https://doi.org/10.1007/978-3-319-67056-0_3 https://doi.org/10.1007/978-3-319-67056-0_3