International Journal of Interactive Mobile Technologies (iJIM) – eISSN: 1865-7923 – Vol. 15, No. 16, 2021 Paper—Intelligent Security Schema for SMS Spam Message Based on Machine Learning Algorithms Intelligent Security Schema for SMS Spam Message Based on Machine Learning Algorithms https://doi.org/10.3991/ijim.v15i16.24197 Ali Alshahrani(*) Arab Open University, Kingdom Saudi Arabia a.shahrani@arabou.edu.sa Abstract—SMS spam messages represent one of the most serious threats to current traditional networks. These messages have been particularly preva- lent overseas and are harmful to various types of devices. The current filtering scheme employed in conventional systems is unable to expose a large number of messages. To resolve this issue, a new intelligent security system is proposed to reduce the number of spam messages. It can detect novel spam messages that have a direct and negative impact on networks. The proposed system is heavily based on machine learning to explore various types of messages. The primary achievement of our study is the increase in the accuracy ratio as well as the reduction in the number of false alarms. According to the experimental results, it is clear that our system can realize outstanding results, detecting a massive number of massages. Keywords—security, protection, internet, SMS spam, intrusion detection, attacks 1 Introduction Security systems are considered one of the most important issues in the scientific research area [1]. A lot of modern applications have been suffered from luck protection techniques that expose various attacks. Short Message Service (SMS) or mobile text messages represent a communication service component of phone, web, or mobile com- munication systems that use standardized communication protocols for the exchange of short text messages between fixed-line or mobile phone devices [2][3]. Mobile text messages are used for communication between cell phone users when voice commu- nication is undesirable or impossible. However, some of the text messages that are forwarded to the user’s device are bothersome and unwanted, and these are called SMS spam. The user stores personal and confidential information on their smartphone, such as contact lists, numbers, passwords, and credit card information. Thus, using SMS spam, hackers can attack users’ devices and exploit this information. Privacy invasion and access to sensitive or unauthorized information are the main problems arising from spam messages. The privacy of the user is violated by individuals commonly known 52 http://www.i-jim.org mailto:a.shahrani@arabou.edu.sa Paper—Intelligent Security Schema for SMS Spam Message Based on Machine Learning Algorithms as spammers, who use various unethical activities to access user data stored on smart- phones without the knowledge of the user [4]. Spam messages are unwanted but are often unavoidable. SMS spam can be unde- sired emails delivered as text messages across mobile devices [5]. These messages are utilized by some businesses to promote and advertise their materials in order to increase their audience. Besides promoting services or products, SMS spam can threaten users’ privacy and can lead to identity theft and fraud through the use of attacks via spam text messages [6]. Spam messages originate from all regions worldwide; however, China represents a major source of these messages over other countries [7]. Recently, the popularity of SMS has increased due to the development of different communities of mobile users, which present various techniques and tools to spam mobile phones in order to maximize the desired result. Problems associated with SMS spam have inspired researchers to present different techniques for the effective detection and prevention of spam SMS. The general phases of SMS spam filtering are outlined in the following figure. Chandrasena. Premawardhena, N. (2012). Introducing Computer- Aided Language Learning to Sri Lankan Schools: Challenges and Perspectives. 15th International Conference on Interactive Collaborative Learning and 41st International Conference on Engineering Pedagogy (ICL & IGIP), Villach, Austria. Text Messages Filter of Spam Messages Non Spam to User Inbox Spam to Trash Fig. 1. SMS spam filtering In the detection of spam messages, the availability of SMS datasets used in the training/testing techniques is still limited by the small size. Moreover, the number of features utilized for spam message detection in the text is low because of the short iJIM ‒ Vol. 15, No. 16, 2021 53 Paper—Intelligent Security Schema for SMS Spam Message Based on Machine Learning Algorithms length of text messages. Different machine learning algorithms and techniques have been utilized to sort through spam messages and filter them. The objective of this paper is to present a system to resolve the problems associated with spam message detection. The proposed system is heavily based on the random forest algorithm and decision tree as classifiers of detection. The rest of this paper is organized as follows: Section 2 presents related works. The methodology of the proposed approach is described in Section 3. The experimen- tal results are given in Section 4. Finally, Section 5 presents conclusions and future directions. 2 Related works In classifying/detecting SMS spam, many types of research have been presented and a variety of related issues have been discussed. This section presents and summarizes the earlier research related to this field. In [4], the authors present a review to compare the performance of machine learning algorithms. The review proves that the use of support vector machine (SVM) and Naive Bayes leads to efficient performance. Another review for SMS spam classification/fil- tering is presented in [8]. This review focuses on expanding the number of features used for SMS classification and considers how the number of selected features affects the rate of accuracy. The authors aimed to contribute to determining spam’s impact level or risk. Machine learning has been widely utilized in the SMS spam classification field and many works have been presented. In [9], a survey is presented to prove the performance of SVM utilization to identify and filter spam messages. In [10], various methods are used to analyze SMS spam and a new pre-processing technique is utilized to obtain an actual dataset of SMS spam. Different algorithm techniques have used this dataset to develop a more suitable algorithm to achieve both recall and accuracy. The results prove that the Random Forest algorithm is capable of classifying a new dataset for ham and spam. To filter SMS spam, the Random Forest algorithm and Term Frequency– Inverse Document Frequency (TF–IDF) is used in [7]. The experimental results prove that the Random Forest algorithm achieves effective performance, with an accuracy of 97.50%. In [11], different machine learning models are used, such as LightGBM, XGBoost, and Bernoulli Naive Bayes, to achieve greater speed and efficiency in the classification of SMS spam with low latency. The results prove that Bernoulli Naive Bayes, followed by LightGBM with the TF–IDF matrix, generated the highest accuracy of 96.5% in 0.157 seconds and 95.4% in 1.708 seconds, respectively. A Recurrent Neural Network with SVM is used in [12] to detect bot spam emails based on employing a spam dataset. The results prove that the presented solution can achieve better performance for the detection of spam emails with 98.7%. Machine learning techniques are used in [13] to develop a system for spam filtering and identification of legitimate emails. The results prove that the classification system presents 92% and 94% correct spam identification, with less identification of false positives, at 1.0%. 54 http://www.i-jim.org Paper—Intelligent Security Schema for SMS Spam Message Based on Machine Learning Algorithms A framework for spam detection and risk estimation is presented in [14] based on data stream clustering and classification. The authors used Multinomial Naive Bayes and identified K-nearest neighbor algorithms for classification. In addition, the K-means algorithm is used in the clustering phase for SMS spam detection. For system evalu- ation, some metrics are used for performance assessment in classification/clustering methods. The WEKA text technique is used as a means of spam message classification/ filtering in [15]. Different algorithms are used for SMS dataset classification and some metrics, such as accuracy, error rate, and time, are computed to select the optimal one. In [16], analyzing of Bayesian filtering techniques is made to know to what extent these techniques are used for email spam blocking. The authors proposed two SMS spam test sets with some specific words and significant size based on using Machine Learning algorithms. The results prove that Bayesian filtering techniques can be effi- ciently transferred from email to SMS spam. Filtering of SMS spam and a review of modern researches in SMS spam filtering presented in [17]. This paper also studies data collection, analyses a large corpus of SMS spam, and provides results. In [18], the Naive Bayes algorithm was used to propose a spam classification model for mobile devices. This system aims to correctly filter incoming SMS received by users. Efficient and dependable results were obtained by the proposed model. A new public, large, real, and non-encoded SMS spam collection dataset was used in [3] with a comprehensive analysis. The performance presented by several machine learning techniques is compared. A novel machine learning system for SMS spam mes- sages detection is proposed in [19]. This paper utilized feature extraction and decision making as a method dependent on the proposed system. The results prove that high detection rates in terms of classification accuracy and F-measure can be achieved by the proposed system compared with other proposed researches. Through analyzing and searching the earlier mentioned system, our paper is distin- guished from others by presenting a detection system based on utilizing the Random Forest Algorithm and decision tree as a classifier to solve the problems associated with spam message detection. 3 SMS spam security Many threats attack SMS security involve message disclosure, Man-in-the-middle attack, SMS viruses, and SMS spamming. In SMS spamming, SMS used as a valid marketing channel, and many people had annoyed while receiving SMS spam. It easy for virtually everyone to send out mass SMS messages because of the availability of bulk SMS broadcasting [20]. To solve this problem spam SMS detection is dependent. The most prevalent malware used email to spread and logins by passwords to a system to steal confidential data. The Top 10 malicious programs spread by email pre- sented as follow [21]: • Trojan-Spy.HTML.Fraud.gen • Email worms.Win23.Bagle.gt • Email worms. Win23. Mydoom.m • Email worms. Win23. Mydoom.I • Trojan-Banker.HTML.Agent.p iJIM ‒ Vol. 15, No. 16, 2021 55 Paper—Intelligent Security Schema for SMS Spam Message Based on Machine Learning Algorithms • Trojan-Spy. Win23.Zbot.Ibda • Worms. Win23. Mabezat.b • Trojan-PSW.Win32.Tepfer.hjva • Email worms. Win23. NetSky.q • Trojan.Win32.Bublik.aknd One type of threat that could lead to a hazardous situation and exploit vulnerabilities is spam when the probability of potential risk to happen is high [22]. For SMS spam management, three phases which are spam detection, classification, and severity deter- mination level are dependent [23]. Risk management processes are necessary for SMS spam detection which are: identification of risk, assessing risk, responding to risk, and risk monitoring. To manage spam, there are three main processes which are: spam clas- sification, spam clustering, and determination level of spam’s severity [24]. 4 Methodology In this paper, the main objective of the proposed work is to classify SMS spam mes- sages either as normal or ham spam. The proposed system starts from the collection process of a dataset that generated from real-world to classification decision whereas normal or abnormal behaviors. The proposed work includes the process presented in Figure 2 below. Fig. 2. Processes of SMS spam classification 56 http://www.i-jim.org Paper—Intelligent Security Schema for SMS Spam Message Based on Machine Learning Algorithms In this paper, two tools of machine learning are utilized to identify normal from ham spam emails. However, these techniques are play important role in providing a secure environment among users that participate on various networks. 4.1 Dataset Source The SMS spam dataset used in this research is obtained from the Kaggle, a machine learning repository [15]. It contains two labels, v1 and v2, and 5572 instances. The (v2) label represents the input messages, which are either ham spam or normal spam. The predicted label (v) has two classes, which are 0 = ham spam and 1 = normal spam. In the data, 4900 are ham spam instances and 672 are normal spam instances. The dataset is presented in Table 1 [24]. Table 1. SMS spam dataset description Instance number Input message (v2) Predicted label (v1) 1 Cine there got a more wat… Ham 2 Go until Jurong point, crazy… Available only in bugis n great world la e buffet… Ham 3 Free entry in 2 a wkly comp to win FA cup final tkts 21st May 2005. Text FA to 87121 to receive entry question (std txt rate) T&C’s apply 08452810075over18’s. Spam . . . . . Rofl. It is true to its name . . . 5572 Ham Table 1 explains the main contents of emails that were sent/received between users. 4.2 SMS message spam classification Random forest algorithm (RF) and decision tree are machine learning algorithms used for a large number of datasets with various feature types, such as numer- ical, binary, and categorical [25]. In this system, RF and decision trees are utilized to efficiently classify normal and ham SMS spam. The system combines various sets of decision trees to eliminate the overfitting difficulties in the training phase. In the RF algorithm, each tree operates with randomly chosen attributes and it has the capability of providing prediction results that differ from others [26]. As a result, different levels of performance can be achieved by each tree, and the total average of their performance is generated and calculated. The processes of RF are presented in Figure 3. iJIM ‒ Vol. 15, No. 16, 2021 57 Paper—Intelligent Security Schema for SMS Spam Message Based on Machine Learning Algorithms Fig. 3. The Processes of Random Forest (RF) Algorithm 4.3 Performance measurement In this paper, some metrics are utilized for the measurement of the efficiency of the proposed system. The metrics, which are accuracy, confusing matrix, and recall, can be calculated as follows [27]: Accuracy = Number of correctly classified patterns Total numbeer of patterns (1) To measure and evaluate the proposed system’s performance, a confusing matrix is calculated, which involves four categories: true positive (TP), false positive (FP), true negative (TN) and false negative (FN). The calculation can be made by the following Equations [27]: TP TP TP FN = + (2) 58 http://www.i-jim.org Paper—Intelligent Security Schema for SMS Spam Message Based on Machine Learning Algorithms TN = + TP TN FP (3) FN = + FN FN TP (4) FP = + FP FP TN (5) • True positive (TP) is the number of correctly predicted normal spam messages. • False-positive (FP) is the number of wrongly predicted normal spam messages. • True negative (TN) is the number of correctly predicted ham spam messages. • False-negative (FN) is the number of wrongly predicted ham spam messages. The recall is the last metric used in this paper, which was calculated as follows: Recall = + TP TP FN (6) All these metrics are employed in this paper to test/evaluate the performance of the proposed security system. 5 Experimental results and discussion In order to compare the performance of the algorithms used in this experiment, this paper provides performance evaluation measures such as accuracy, precision-recall, f1 score sup- port, and time. Table 2 shows the performance evaluation of the random forest algorithm. Table 2. Random forest algorithm performance Random Forest Classifier Accuracy 98.6% Precision recall 86% f1 score support 94% Time 51s Meanwhile, Table 3 shows the efficiency of the security system that is based on the decision tree. Table 3. Decision tree classifier performance Decision Tree Performance Accuracy 94.8% Precision recall 79% f1 score support 83% Time 42s iJIM ‒ Vol. 15, No. 16, 2021 59 Paper—Intelligent Security Schema for SMS Spam Message Based on Machine Learning Algorithms As presented in Tables 2 and 3, the random forest and decision tree machine learning algorithms achieved the highest accuracy in the classification of SMS spam. Moreover, we achieved a 98.2% accuracy rate with the random forest machine-learning algorithm. Table 3 shows a comparison of our proposed model with an earlier model using the decision tree and random forest machine learning algorithms. Table 4. Algorithm comparison Reference Algorithm Accuracy [6] Decision Tree 96.57% [6] Random Forest 97.50% Proposed approach Decision Tree 94.8% Proposed approach Random Forest 98.6% According to Table 4, we can see that our proposal that is based on RF is more accu- rate than others. 6 Conclusion and future directions Nowadays, SMS spam detection is a major challenge due to the increase in the use of text messaging. In this paper, a technique for SMS spam detection is proposed based on utilizing the random Forest and decision tree algorithms. The dataset used in this work consists of 4900 ham spam instances and 672 normal spam instances. The exper- imental results show that the classification system using the random forest algorithm presents the best results, with a 98.2% accuracy rate. In future work, other machine learning methods will be employed to achieve an improved accuracy rate in the SMS spam classification field. 7 Acknowledgments The author would like to thank Arab Open University, Saudi Arabia for supporting this study. 8 References [1] H. Naman, N. Hussien, M. Al-dabag, and H. Alrikabi, “Encryption System for Hiding Infor- mation Based on Internet of Things,” International Journal of Interactive Mobile Technolo- gies, vol. 15, no. 2, 2021. https://doi.org/10.3991/ijim.v15i02.19869 [2] José. M. G. Hidalgo, T. A. Almeida, and A. Yamakami, “On the Validity of a New SMS Spam Collection,” 2012 11th International Conference on Machine Learning and Applica- tions, 2012, pp. 240–245. https://doi.org/10.1109/ICMLA.2012.211 [3] T. Almeida, J. M. G. Hidalgo, and T. P. Silva, “Towards sms spam filtering: Results under a new dataset,” International Journal of Information Security Science, vol. 2, no. 1, pp. 1–18, 2013. 60 http://www.i-jim.org https://doi.org/10.1109/ICMLA.2012.211 Paper—Intelligent Security Schema for SMS Spam Message Based on Machine Learning Algorithms [4] S. Alqahtani and D. Alghazzawi, “A survey of Emerging Techniques in Detecting SMS Spam,” Trans. Mach. Learn. Artif. Intell., vol. 7, no. 5, pp. 23–35, 2019. https://doi. org/10.14738/tmlai.75.7116 [5] Tiago A. Almeida and A. Y. Almeida, J. M. Gómez, “Contributions to the Study of SMS Spam Filtering: New Collection and Results,” p. 7, 2011. https://doi.org/10.1145/2034691.2034742 [6] M. Ghulam and M. Y. Mujtaba, “SMS Spam Detection Using Simple Message Content Features,” J. Basic Appl. Sci. Res, vol. 4, 2014. [7] Amir N. N. Sjarif, N. F. Mohd Azmi, S. Chuprat, H. M. Sarkan, Y. Yahya, and S. M. Sam, “SMS spam message detection using term frequency-inverse document frequency and ran- dom forest algorithm,” Procedia Comput. Sci., vol. 161, pp. 509–515, 2019. https://doi. org/10.1016/j.procs.2019.11.150 [8] K. Zainal and M. Z. Jali, “A review of feature extraction optimization in SMS spam mes- sages classification,” Commun. Comput. Inf. Sci., vol. 652, pp. 158–170, 2016. https://doi. org/10.1007/978-981-10-2777-2_14 [9] Z. S. Torabi, “Efficient Support Vector Machines for Spam Detection: A Survey,” vol. 13, no. 1, pp. 11–28, 2015. [10] S. S. Ali and J. Maqsood, “Net library for SMS spam detection using machine learning: A cross platform solution,” Proc. 2018 15th Int. Bhurban Conf. Appl. Sci. Technol. IBCAST, vol. 2018-January, pp. 470–476, 2018. https://doi.org/10.1109/IBCAST.2018.8312266 [11] A. Ora, “Spam Detection in Short Message Service Using Natural Language Processing and Machine Learning Techniques,” 2020. [12] M. Alauthman, “Botnet spam e-mail detection using deep recurrent neural network,” Int. J. Emerg. Trends Eng. Res., vol. 8, no. 5, pp. 1979–1986, 2020. https://doi.org/10.30534/ ijeter/2020/83852020 [13] S. J. Delany, “ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift,” pp. 1–5, 1826. [14] K. S. Adewole, N. B. Anuar, and A. Kamsin, “Ensemble based streaming framework for spam detection and risk assessment in microblogging social networks,” 2016. [15] D. R. Kawade, “SMS Spam Classification using WEKA,” vol. 5, no. ICICC, pp. 43–47, 2015. [16] G. Sethi and V. Bhootna, “SMS Spam Filtering Application Using Android,” Int. J. Comput. Sci. Inf. Technol., vol. 5, no. 3, pp. 4624–4626, 2014. [17] S. J. Delany, M. Buckley, and D. Greene, “SMS spam filtering: Methods and data,” Expert Syst. Appl., vol. 39, no. 10, pp. 9899–9908, 2012. https://doi.org/10.1016/j.eswa.2012.02.053 [18] S. M. S. Spam, F. For, and M. Mobile, “Sms Spam Filtering for Modern Mobile Devices,” vol. 13, no. 1, pp. 177–185, 2017. [19] A. B. Saeid, M. T. Kheirabadi, “An Effective Model for SMS Spam Detection Using Con- tent-based Features and Averaged Neural Network,” Int. J. Eng. Trans. B Appl., vol. 33, no. 2, pp. 221–228, 2020. https://doi.org/10.5829/ije.2020.33.02b.06 [20] N. Zalpuri and M. Arora, “An Efficient Model for S.M.S Security and SPAM Detection: A Review,” Int. J. Comput. Sci. Eng., vol. 3, no. 12, pp. 1–6, 2015. [21] D. Gudkova, “Kaspersky Security Bulletin-Spam Evolution 2013,” pp. 1–22, 2013. [22] K. Zainal and M. Z. Jali, “A Perception Model of Spam Risk Assessment Inspired by Danger Theory of Artificial Immune Systems,” Procedia Comput. Sci., vol. 59, no. Iccsci, pp. 152–161, 2015. https://doi.org/10.1016/j.procs.2015.07.530 [23] M. Z. Sulaiman and Jali, “Integrated Mobile Spam Model Using Artificial Immune System Algorithms,” Knowl. Manag. Int. Conf., pp. 405–409, 2014. [24] “Datasets.” [Online]. Available: https://www.kaggle.com/datasets. [Accessed: 15-Mar-2021]. iJIM ‒ Vol. 15, No. 16, 2021 61 https://doi.org/10.14738/tmlai.75.7116 https://doi.org/10.14738/tmlai.75.7116 https://doi.org/10.1145/2034691.2034742 https://doi.org/10.1016/j.procs.2019.11.150 https://doi.org/10.1016/j.procs.2019.11.150 https://doi.org/10.1007/978-981-10-2777-2_14 https://doi.org/10.1007/978-981-10-2777-2_14 https://doi.org/10.30534/ijeter/2020/83852020 https://doi.org/10.30534/ijeter/2020/83852020 Paper—Intelligent Security Schema for SMS Spam Message Based on Machine Learning Algorithms [25] M. Al-dabag, H. S. ALRikabi, and R. Al-Nima, “Anticipating Atrial Fibrillation Signal Using Efficient Algorithm,” International Journal of Online and Biomedical Engineering (iJOE), vol. 17, no. 2, 2021. https://doi.org/10.3991/ijoe.v17i02.19183 [26] N. Choudhary and A. K. Jain, “Towards filtering of SMS spam messages using machine learning based technique,” Commun. Comput. Inf. Sci., vol. 712, no. July, pp. 18–30, 2017. https://doi.org/10.1007/978-981-10-5780-9_2 [27] A. S. Hussein, R. S. Khairy, S. M. M. Najeeb, and H. T. ALRikabi, “Credit Card Fraud Detection Using Fuzzy Rough Nearest Neighbor and Sequential Minimal Optimization with Logistic Regression,” International Journal of Interactive Mobile Technologies, vol. 15, no. 5, 2021. https://doi.org/10.3991/ijim.v15i05.17173 9 Author Alshahrani is an associate professor in computer science studies faculty, Arab Open University - Saudi Arabia. He received his B.Sc. degree in Information Technology and Computing, in 2008. His M.Sc. and Ph.D. from University of Essex, UK in Computer science, in 2015. His research interests include network security, image processing, e-learning and mobile systems. Article submitted 2021-05-15. Resubmitted 2021-06-18. Final acceptance 2021-06-20. Final version pub- lished as submitted by the authors. 62 http://www.i-jim.org