Machine Learning-Based Fake News Detection with Amalgamated Feature Extraction Method Vol. 5, No. 2 | July – December 2022 SJET | P-ISSN: 2616-7069 | E-ISSN: 2617-3115 | Vol. 5 No. 2 July – December 2022 10 Machine Learning-Based Fake News Detection with Amalgamated Feature Extraction Method Muhammad Bux Alvi1*, Majdah Alvi1, Rehan Ali Shah1, Adnan Akhter1, Mubashira Munir1, Rakesh Kumar2, Kavita Tabbassum3 Abstract: Product fake reviews are increasing as the trend is changing toward online sales and purchases. Fake review detection is critical and challenging for both researchers and online retailers. As new techniques are introduced to catch the non-organic reviewer, so are their intruding ap- proaches. In this paper, different features are amalgamated along with sentiment scores to de- sign a model that checks the model performance under different classifiers. For this purpose, six supervised learning algorithms are utilized to build the fake review detection models, using LIWC, unigrams, and sentiment score features. Results show that the amalgamation of selected features is a better approach to counterfeit review detection, achieving an accuracy score of 88.76%, which is promising when compared to similar other work. Keywords: fake reviews, machine learning, amalgamated features, LIWC, sentiment score 1 Introduction A fake review is a false judgment or an opinionated text on a product or a service. Re- views can significantly affect the decision of buyers while shopping online. According to “Statista” statistics, e-commerce sales increase 6% in America from 2013 to 2020 [1]. As online purchase increases, so is the competi- tion of online retailer giants. Therefore, the re- tailers and manufacturers take these reviews on a serious note. Fake reviewers capitalize on this opportunity to artificially devalue or pro- mote products and services [2][3]. Hence, fake review prediction becomes a critical research area as online purchases increase. With the ex- plosive growth of online businesses, the quan- tity and importance of reviews continue to in- crease. Fake reviews severely threaten re- searchers [4] and online retailers [5]. Reviews 1Department of Computer Systems Engineering, Faculty of Engineering, The Islamia University of Bahawalpur, Pakistan 2Freelancer and Researcher 3Department Information Technology Center, Sindh Agriculture University Tandojam, Sindh, Pakistan Corresponding Author: mbalvi@iub.edu.pk can be positive to increase purchases on an online platform by manipulating users with fake customer reviews. Conversely, it can be a negative review to distract purchasers. It is es- timated that 80% of users believe in posted product reviews before purchasing any prod- uct [6]. Negative fake reviews are used to de- fame competitor’s reputations. People who post such fake reviews are usually freelancers, and companies hire their services for writing fake reviews. Giant retailers like Amazon find these fake reviews of severe threat to their rep- utation and filed a complaint against review spamming [7]. Fake review prediction can be performed manually or automatically. Research has been carried out on manual opinion spam prediction for several years [8]. Early methods of fake re- view prediction were rudimentary. Many texts mailto:mbalvi@iub.edu.pk Machine Learning-Based Fake News Detection with Amalgamated Feature Extraction Method (pp. 10 - 17) Sukkur IBA Journal of Emerging Technologies - SJET | Vol. 5 No. 1 January – June 2022 11 analysis-based approaches are found in the lit- erature [9]. Based on the research, commercial platforms developed opinion spam filtering systems to detect deceptive reviews. Nevertheless, these systems make the fake reviewers enhance their review quality and de- ceive the detecting systems [10]. As time elapsed, those traditional approaches would not work efficiently because the fake review- ers started behaving like regular users. Therefore, the trend of manual fake review prediction changed from text-based analysis to pattern and feature analysis like time [11], top- ics [12], ranking pattern [13], activity volume [14], and geolocation [15]. However, manual methods are slow, expensive, and of low accu- racy. Automated methods based on machine learning could also identify the opinion spams and spammers by analyzing the review fea- tures. Text mining and Natural Language Pro- cessing (NLP) work together to generate the concept of content mining, and review spam detection comes under this concept. Addi- tional review characteristics like review tim- ings, reviewer id, and deviation trend of the re- view from other reviews of the same category are also considered in spam review detection. Jindal et al.[16] used the machine learning technique and showed that the amalgam of fea- tures is more robust than a single feature for fake review prediction. Li et al. in [17] showed that combining a bag of words (BOW) with more general features performs better than BOW alone. Mukherjee et al. [18] used ma- chine learning with abnormal behavioral fea- tures of the reviewers and depicted that this technique was better than the linguistic fea- tures-based technique. The significant contribution of this re- search is to develop a fake reviews detection model that uses machine learning techniques that will employ a heuristic optimization algo- rithm for affecting features and test its reliabil- ity and robustness against existing techniques. Such a model, when employed, can benefit re- tailers and giant business companies to shield their businesses against fake reviews and re- viewers. 2 Literature Review Advancements have been made in fake re- view detection by introducing new techniques and methods by researchers. These techniques play their role in improving accuracy and per- formance. So far, reviews are marked as spam based on either review spam detection or re- viewer spam detection. Both techniques are helpful in fake review detection. Prior deals with content mining and natural language pro- cessing (NLP), whereas later technique ap- plied on reviewer id and his behavior. Jindal et al. [16] is the first researcher who studied opin- ion spamming using supervised learning. The author divided the reviews into three catego- ries (fake opinions, the brand only reviews, and non-review) and detected opinion spam- ming by finding duplicate reviews using the “w-shingling” method. The author used a da- taset from Amazon with more than 5 million product reviews, applied his devised technique with a logistic regression algorithm, and achieved an AUC of 78%. Lim, Nguyen, Jindal, Liu, and Lauw [19] proposed a behav- ioral methodology for revealing spammers for review. They tried to figure out some spammer habits like targeting goods and tried to opti- mize their effect. Moreover, they suggested a model focused on specific patterns to identify rating spammers. Ott et al. [20][21] created a data set for analysis in review spam detection. The data set comprised positive opinion spam with truthful reviews and negative opinion spam with real reviews. The author applied the n- gram and linguistic features to find fake re- views under a supervised learning mechanism, and the results were verified with human per- formance. In their research, Feng et al. [4] framed a model based on the normal distribu- tion of opinion to detect fake reviews. In their view, a product or a service review involved this concept of normal distribution of opinion. Shojaee et al. [9] suggested a novel technique for fake review detection by combining Lexi- cal and Synthetic features. Elmurngi and Gherbi [22] proposed a text classification and sentimental analysis approach for different machine learning algorithms with stop words and without stop words. They also applied a Machine Learning-Based Fake News Detection with Amalgamated Feature Extraction Method (pp. 10 - 17) Sukkur IBA Journal of Emerging Technologies - SJET | Vol. 5 No. 1 January – June 2022 12 decision tree algorithm to improve their re- sults. Shah, Ahsan, Kafi, Nahian, and Hossain [23] combined Supervised & Active learning and created a model to detect spamming. Both fictitious and real-life data were used for spam analysis. 3 Proposed Approach This section describes the proposed method to accomplish the task of fake review detection. This research uses two features as classification criteria with a sentiment score feature (an additional feature). These individ- ual features and combinations are used to train various classifiers and tested against evalua- tion metrics. Reviews are classified as fake or not fake. This study uses six classification al- gorithms: Naive Bayes, decision tree, in- stance-based KNN, support vector machine (SVM), logistic Regression, and Random For- est. Training data is 80%, and 20% of data is set aside for testing purposes with a 5-fold cross-validation technique. Figure 1 presents the adopted research method for this work. 3.1 Data Acquisition and Pre-proc- essing The data set selected for this research con- tains 1600 reviews combined from two data sets (hotel review data sets). The data sets were created by Myle et al. and are available from [20][21]. The data set contains eight hundred truthful reviews, of which four hundred are positive, and four hundred are negative. Simi- larly, 800 spam reviews are also included in this data set, of which half are positive, and half are negative. The preprocessing of the data set significantly affects the accuracy of re- sults [24][25][26]. Furthermore, preprocessing curbs feature vector space. Therefore, prepro- cessing techniques like missing values man- agement, tokenization, stop words removal, and generating n-gram are implemented on the data set to obtain cleaner data set. 3.2 Features Features are pieces (s) of text that have se- mantic significance. In the text data systems, features highly influence the effectiveness of the developed model. Fig. 1. Proposed Machine Learning Approach with Amalgamated Features for Fake Reviews Detection 3.2.1 N-Grams In this feature extraction method, n-adja- cent tokens are picked as a feature from review contents. It is denoted as unigram if one adja- cent word is selected, bigram if two adjacent words are selected, and trigram with three ad- jacent words at a time. These features can ef- fectively help model all the content within the text. In this research work, unigram is used as a feature. Machine Learning-Based Fake News Detection with Amalgamated Feature Extraction Method (pp. 10 - 17) Sukkur IBA Journal of Emerging Technologies - SJET | Vol. 5 No. 1 January – June 2022 13 TABLE I. STATE OF ART SPAM REVIEW DETECTION TECHNIQUES Reference Year Data set Learning type Techniques/ Algorithm Results Limitations [4] Feng 2012 Ott et al. data set with modification Supervised Learning LIBSVM classifier/ Term frequency Accuracy 72.5% Specific kind of dataset [9] Shojaee 2013 Ott et al. data set Supervised Learning SVM/ Naïve Bayes/ Stylometric Feature F-measure 84% Limited to a specific domain [13] Jindal N, Liu B 2007 Data set of the manufactured product only Supervised Learning Logistic Regression average AUC 78% Lack of accuracy of a real-world data set [18] Lim, Ee-Peng 2010 Amazon Data set Supervised Learning Behavioral features of Spammer Accuracy 78% Limit set of data for supervised learning [20] Jeffrey T. at el 2013 Ott et al. data set Supervised Learning Support Vector Machine (SVM) Accuracy 86 % Human judgments can be imperfect and biased. [21] Elmurngi E. 2017 Movie review data set Supervised Learning DT(DT-J48)/ SVM/KNN Accuracy 81.75% Feature selection methods are not used [22] Ahsan, Nahian, Kafi, Hossain and Shah 2017 Ott dataset Active/ Supervised Learning Hybrid classifier using NB/ SVC /DT /Maximum Entropy Accuracy 95% Small scale dataset is used for a specific domain 3.2.2 LIWC The Linguistic Inquiry and Word Count (LIWC) is a text analysis method. This method can analyze eighty different features, for ex- ample, psychological concerns like emotion, text functional aspects, and personal and per- ception concerns like religion [27]. 3.2.3 Sentiment Score It has been observed that spammers with negative reviews generally use more negative words like “bad” and “dissatisfied”. This way, the degree of negative sentiment is increased compared to a non-spam negative review. Likewise, spammers with positive reviews generally use positive terms such as “good”, “great”, “nice”, and “gorgeous”. Therefore, re- viewers show more positive sentiment than a non-spam positive review. The sentiment score of a review can be calculated by the fol- lowing formula [28]. Machine Learning-Based Fake News Detection with Amalgamated Feature Extraction Method (pp. 10 - 17) Sukkur IBA Journal of Emerging Technologies - SJET | Vol. 5 No. 1 January – June 2022 14 𝑆𝐶(𝑟𝑡) = ∑(−1)𝑛 𝑆(𝑊𝑖 ) 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 (𝑓𝑒𝑡𝑖 𝑊𝑖 ) (1) where “rt” is review text, “S(Wi)” is the sentiment polarity of word Wi (+1 or -1), “n” denotes the total number of negation-words in a feature with default = 0, “fet” refers to a fea- ture in a review sentence and “distance (fet, Wi)” is the distance between feature and word. 3.3 Classification Algorithms Six various classification algorithms are used in this paper in order to determine the ef- fect of different features and their combina- tions on classification accuracy and perfor- mance. 3.3.1 Naïve Bayes (NB) NB is based on the Bayes theorem [29]. It is a probabilistic multiclass classification algo- rithm assuming features independency to fore- see the output class. Equation 2 checks the probability of the feature-set being categorized into a particular class: 𝑃(𝑥) = 𝑃(𝑥1)𝑃(𝑥2)𝑃(𝑥3)𝑃(𝑥𝑛 ) (2) where “x = (x1,x2,…,xn)” are a set of fea- tures. Individual probabilistic classification of a feature may be calculated as given in equa- tion 3: 𝑃(𝑥) = 𝑝(𝐶𝑘 )𝑝(𝑥|𝐶𝑘 ) 𝑝(𝑥) (3) 3.3.2 Decision Tree (DT) The working principle of DT is based on a hierarchical breakdown of the data set used for training. In this classifier, features are used for labeling tree nodes, and the branches between them are given the weight representing the oc- currence of feature in the test data; finally, class names are assigned to the leaf. The data set is divided into the presence or absence of features. The data set is divided recursively un- til the leaf nodes are reached. Entropy Formula: 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = − ∑ 𝑃𝑖𝑗 𝑙𝑜𝑔2𝑃𝑖𝑗 𝑚 𝑗=1 (4) 3.3.3 Random Forest (RF) RF is a voting method where many deci- sion trees are grown simultaneously. The input features are fed to individual trees in the forest. The final classification is based on the overall most votes from all trees in the forest [30]. The mathematical form of random forest to calcu- late mean square error is: 𝑓 ^ = ∑ 1 𝑠 (𝑓𝑠 − 𝑦𝑠)2 𝑆 𝑠=1 (5) Where "𝑆" denotes the number of data points, "𝑓𝑠" is the value returned by the model, and "𝑦𝑠" is the actual value of data points. 3.3.4 Support Vector Machine (SVM) SVM is a classification algorithm that finds the maximum margin hyperplane to clas- sify the “ith“ vector. Optimal “y𝑖” (yi denotes the target), “𝑋𝑖” hyperplane is found by linear fea- tures between two classes (0 or 1). 3.3.5 K-nearest neighbor (KNN) KNN is an instance-based algorithm that assumes that similar things exist in close prox- imity. In this technique, the feature is classi- fied by the plurality vote of its neighbors by calculating their distances. It uses Euclidean distance formula to compute the distance be- tween the points, which is mathematically rep- resented as: 𝐷 = √∑(𝑥𝑖 − 𝑦𝑖 ) 2 𝑚 𝑖=1 (6) 3.3.6 Logistic Regression Logistic Regression is a model-based algo- rithm often used when the dependent variable is dichotomous in nature. However, it can be tuned to be used with multiclass classification tasks as well. Logistic Regression describes Machine Learning-Based Fake News Detection with Amalgamated Feature Extraction Method (pp. 10 - 17) Sukkur IBA Journal of Emerging Technologies - SJET | Vol. 5 No. 1 January – June 2022 15 the data set and defines the relationship be- tween one dependent binary variable with one or more independent variables. 3.4 Testing Metrics Accuracy, precision, recall, and f-score are used to evaluate model performance. These metrics can be defined as: 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑢𝑃 + 𝑇𝑢𝑁 𝑇𝑢𝑃 + 𝑇𝑢𝑁 + 𝐹𝑎𝑃 + 𝐹𝑎𝑁 (7) 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑢𝑃 𝑇𝑢𝑃 + 𝐹𝑎𝑃 (8) 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑢𝑃 𝑇𝑢𝑃 + 𝐹𝑎𝑁 (9) 𝐹 − 𝑆𝑐𝑜𝑟𝑒 = 2 ∗ (𝑟𝑒𝑐𝑎𝑙𝑙 ∗ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛) (𝑟𝑒𝑐𝑎𝑙𝑙 + 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛) (10) where "TuN", "TuP", "FaN", and "FaP" are true negative, true positive, false negative, and false positive respectively. 4 Experimental Results, Discussion and Evaluation This section describes the experimental re- sults, discusses the results, and evaluates the developed model quantitatively. Six machine learning algorithms (Naïve Bayes, decision tree, Random Forest, SVM, K-nearest neigh- bor, and Logistic Regression) were used to de- velop the model using three feature extraction techniques (LIWC, n-gram (unigrams), and sentiment score). The results of individual feature and their combinations are shown in table 2. An accu- racy of 63.22% is achieved when LIWC is used alone, but when combined with a senti- ment score, accuracy increases to 70.35%. Classification model using unigram feature alone gives an accuracy of 73.55%, but if com- bined with sentiment score, it increases to 80.34%. Maximum accuracy of 88.76% is at- tained by combining LIWC, unigram, and sen- timent scores. Eventually, this study supports and proves the initial hypothesis of getting im- proved results by using amalgamated features with machine learning algorithm-based classi- fication models. Fig. 2. Quantitative Comparison with Previous Similar Work 4.1 Qualitative Evaluation To strengthen the hypothesis, Figure 2 shows the quantitative comparison of previous work [4][9][22] and the proposed method in this work to detect fake reviews using the Ho- tel reviews dataset. It indicates that the under- taken work supersedes the other work. Since a balanced dataset is used, the accuracy score measure is good enough for quantitative per- formance comparison of the developed ma- chine learning algorithm-based models with previous work. This improvement is, at least, 4-5 points when compared with previous sim- ilar work. 5 Conclusin This work was an effort to determine the effective combination of features that per- forms well for fake reviews detection. The study work used n-gram, LIWC, and sentiment Machine Learning-Based Fake News Detection with Amalgamated Feature Extraction Method (pp. 10 - 17) Sukkur IBA Journal of Emerging Technologies - SJET | Vol. 5 No. 1 January – June 2022 16 score features for training purpose. Different classifiers are trained on these features. The classification algorithms we chose are de- scribed in section 3.3. After experimental work, logistic Regression outperformed all other machine learning algorithm-based mod- els. As far as features performance is con- cerned, unigram proved to be better when ap- plied in separation than LIWC. However, the combination of both (unigram + LIWC) with sentiment score performed more adequately, giving maximum accuracy of 88.76%. This re- sult is better than some other techniques de- scribed in section 2. For future work, it is sug- gested to use semi-supervised learning to check the accuracy and performance of uni- gram and LIWC features on the fake review detection method. In this way, possible perfor- mance enhancement will be measured. At the same time, the limitation of the labeled data set for supervised learning will also be resolved. TABLE II. PERFORMANCE EVALUATION OF DIFFERENT FEATURE EXTRACTION METHODS Approaches (Features) Maximum accu- racy (%) Precision Recall F-score LIWC 63.22 58.00 64.43 61.05 Sentiment score, LIWC 70.35 62.50 73.88 67.71 Unigram 73.55 78.00 74.50 76.21 Sentiment score, Unigram 80.34 91.50 77.59 83.97 Sentiment score, LIWC, Unigram 88.76 92.00 82.61 87.05 REFERENCES [1] “• U.S. e-commerce share of retail sales 2021-2025 | Statista.” https://www.statista- .com/statistics/379112/e-commerce-share-of-retail- sales-in-us/. [2] F. Li, M. Huang, Y. Yang, and X. Zhu, “Learning to identify review spam,” IJCAI Int. Jt. Conf. Artif. Intell., pp. 2488–2493, 2011, doi: 10.5591/978-1- 57735-516-8/IJCAI11-414. [3] R. Y. K. Lau, S. Y. Liao, R. Chi-Wai Kwok, K. Xu, Y. Xia, and Y. Li, “Text mining and probabilistic language modeling for online review spam detection,” ACM Trans. Manag. Inf. Syst., vol. 2, no. 4, Dec. 2011, doi: 10.1145/2070710.2070716. [4] S. Feng, L. Xing, A. Gogar, and Y. Choi, “Distributional footprints of deceptive product reviews,” ICWSM 2012 - Proc. 6th Int. AAAI Conf. Weblogs Soc. Media, pp. 98–105, 2012. [5] Sussin, J., and E. Thompson, “The consequences of fake fans,’Likes’ and reviews on social networks,” Gart. Res., vol. 2091515, 2012. [6] “Amazon sues to block alleged fake reviews on its website | Reuters.” https://www.reuters.com/article/us-amazon-com- lawsuit-fake-reviews-idUSKBN0N02LP20150410 . [7] “Local Consumer Review Survey 2022: Customer Reviews and Behavior.” https://www.brightlocal.com/research/local- consumer-review-survey/ . [8] N. Spirin and J. Han, “Survey on Web Spam Detection : Principles and Algorithms,” vol. 13, no. 2, pp. 50–64. [9] N. M. S. and S. N. Somayeh Shojaee, Masrah Azrifah Azmi Muradt, Azreen Bin Azman, “Detecting Deceptive Reviews Using Lexical and Syntactic Features,” pp. 219–223, 2013. [10] Y. Yao, B. Viswanath, J. Cryan, H. Zheng, and B. Y. Zhao, “Automated crowdturfing attacks and defenses in online review systems,” Proc. ACM Conf. Comput. Commun. Secur., pp. 1143–1158, 2017, doi: 10.1145/3133956.3133990. Machine Learning-Based Fake News Detection with Amalgamated Feature Extraction Method (pp. 10 - 17) Sukkur IBA Journal of Emerging Technologies - SJET | Vol. 5 No. 1 January – June 2022 17 [11] K. C. Santosh and A. Mukherjee, “On the temporal dynamics of opinion spamming: Case studies on yelp,” 25th Int. World Wide Web Conf. WWW 2016, pp. 369–379, 2016, doi: 10.1145/2872427.2883087. [12] S. Nilizadeh et al., “Poised: Spotting twitter spam off the beaten paths,” dl.acm.org, pp. 1159–1174, Oct. 2017, doi: 10.1145/3133956.3134055. [13] H. Chen, “Toward Detecting Collusive Ranking Manipulation Attackers in Mobile App Markets,” pp. 58–70, 2017. [14] D. Y. T. Chino, A. F. Costa, A. J. M. Traina, and C. Faloutsos, “VOLTIME: Unsupervised anomaly detection on users’ online activity volume,” Proc. 17th SIAM Int. Conf. Data Mining, SDM 2017, pp. 108–116, 2017, doi: 10.1137/1.9781611974973.13. [15] R. Deng, N. Ruan, R. Jin, Y. Lu, and W. Jia, “SpamTracer : Manual Fake Review Detection for O2O Commercial Platforms by Using Geolocation Features,” pp. 1–20. [16] N. Jindal and B. Liu, “Review spam detection,” 16th Int. World Wide Web Conf. WWW2007, pp. 1189–1190, 2007, doi: 10.1145/1242572.1242759. [17] J. Li, M. Ott, C. Cardie, and E. Hovy, “Towards a General Rule for Identifying Deceptive Opinion Spam,” pp. 1566–1576, 2014. [18] A. Mukherjee, V. Venkataraman, … B. L.-S. international A., and U. 2013, “What Yelp fake review filter might be doing?,” Proc. Int. AAAI Conf. Web Soc. Media, vol. 7, no. 1, pp. 409–418, 2011. [19] B. Liu and H. W. Lauw, “Detecting Product Review Spammers using Rating Behaviors,” pp. 939–948, 2010. [20] M. Ott, Y. Choi, C. Cardie, and J. T. Hancock, “Finding deceptive opinion spam by any stretch of the imagination,” ACL-HLT 2011 - Proc. 49th Annu. Meet. Assoc. Comput. Linguist. Hum. Lang. Technol., vol. 1, pp. 309–319, 2011. [21] M. Ott, C. Cardie, and J. T. Hancock, “Negative deceptive opinion spam,” NAACL HLT 2013 - 2013 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Main Conf., no. June, pp. 497–501, 2013. [22] E. Elmurngi and A. Gherbi, “An empirical study on detecting fake reviews using machine learning techniques,” 7th Int. Conf. Innov. Comput. Technol. INTECH 2017, no. Intech, pp. 107–114, 2017, doi: 10.1109/INTECH.2017.8102442. [23] M. N. I. Ahsan, T. Nahian, A. A. Kafi, M. I. Hossain, and F. M. Shah, “An ensemble approach to detect review spam using hybrid machine learning technique,” 19th Int. Conf. Comput. Inf. Technol. ICCIT 2016, pp. 388–394, 2017, doi: 10.1109/ICCITECHN.2016.7860229. [24] W. Etaiwi and G. Naymat, “The Impact of applying Different Preprocessing Steps on Review Spam Detection,” Procedia Comput. Sci., vol. 113, pp. 273–279, 2017, doi: 10.1016/j.procs.2017.08.368. [25] M. B. Alvi, N. A. Mahoto, M. A. Unar, and M. A. Shaikh, “An Effective Framework for Tweet Level Sentiment Classification using Recursive Text Preprocessing Approach,” no. July, 2019, doi: 10.14569/IJACSA.2019.0100674. [26] M. B. Alvi, N. A. Mahoto, M. Alvi, M. A. Unar, and M. Akram Shaikh, “Hybrid classification model for twitter data-A recursive preprocessing approach,” 5th Int. Multi-Topic ICT Conf. Technol. Futur. Gener. IMTIC 2018 - Proc., 2018, doi: 10.1109/IMTIC.2018.8467221. [27] C. G. Harris, “Detecting deceptive opinion spam using human computation,” AAAI Work. - Tech. Rep., vol. WS-12-08, pp. 87–93, 2012. [28] P. Cavallo et al., “Journal of Software,” vol. 9, no. 8, 2018. [29] M. Ben-bassat, K. L. Klove, and M. A. X. H. Weil, “CALO ( x = ALO ( x ) e,” vol. 2, no. 3, pp. 261– 266, 1980. [30] A. Akhter, M. B. Alvi, and M. Alvi, “Forecasting Multan estate prices using optimized regression techniques,” Univ. Sindh J. Inf. Commun. Technol. , vol. 5, no. 4 SE-Computer Science, Apr. 2022, https://sujo.usindh.edu.pk/index.php/USJICT/articl e/view/4340. [31] G. Mujtaba and E. S. Ryu, “Client-Driven Personalized Trailer Framework Using Thumbnail Containers,” IEEE Access, vol. 8, pp. 60417– 60427, 2020, doi: 10.1109/ACCESS.2020.2982992