International Journal of Interactive Mobile Technologies (iJIM) – eISSN: 1865-7923 – Vol  17 No  04 (2023)


Paper—Fake Accounts Identification in Mobile Communication Networks Based on Machine Learning 

Fake Accounts Identification in Mobile Communication 
Networks Based on Machine Learning 

https://doi.org/10.3991/ijim.v17i04.37645  

Ahdi Hassan1(), Abdalilah. G. I. Alhalangy2, Fahad Alzahrani3 
1 Global Institute for Research Education & Scholarship, Amsterdam, Netherlands 

2 Department of Computer Science, Qassim University, Buraydah, Saudi Arabia  
3 Department of Languages and Translation, University of Tabuk, Tabuk, Saudi Arabia 

ahdihassan441@gmail.com  

Abstract—Fake accounts on online social networks are increasing today 
with an increase in the number of active social network users. Social media 
websites allow users to share thoughts, facts, views and re-sharing these into 
various networks. Social media platforms provide users with enormous valuable 
information but with this great amount of information in social media, many 
issues like fake profile, online hacking have also grown. The fake profiles in 
online social sites create fake news and share unwanted material which contains 
spam links that affect natural users. The massive issue in social media 
communication networks is spam and it is necessary to identify fake profiles to 
stop spam. In this paper, a supervised machine learning algorithm called 
support vector machine (SVM) is used to identify fake accounts on social media 
effectively. In order to automatically identify fake online profiles, Random 
Forest classifier is used with SVM. With this concept, it can be applied online 
easily to identify millions of accounts that cannot be examined manually. The 
result of this model is compared with other identification techniques and the 
results show that the proposed algorithm performs better with high precision 
and recall. This method efficiently safeguards social media networks from 
online threats and attacks. 

Keywords—social media, SVM, random forest classifier, fake profile, mobile 
network 

1 Introduction  

Online social networks like Facebook, Instagram, Twitter, and Linkedin are 
becoming widely prevalent in the last few years. It is becoming more and more 
challenging for social media to identify fake accounts through manual inspection as 
the volume of content on the platform is expanding so quickly. The fake accounts on 
various social media may mislead the readers, provoke social panic, and even create 
violence, which could be prevented by using early identification methodology to 
timely detect fake accounts on social media. The fake profile people purposely design 
content to gather attention of other users. Due to the social network's qualities of 

64 http://www.i-jim.org

https://doi.org/10.3991/ijim.v17i04.37645


Paper—Fake Accounts Identification in Mobile Communication Networks Based on Machine Learning 

quick distribution and low cost, a significant amount of news content is disseminated 
there quickly [1].  

It has been a concern for users that as the usage of social networks grows, bad 
users would try to violate other users' privacy and create phony profiles using their 
names and login information. Therefore, in order to remove malicious individuals and 
fake accounts from social networking environments, social network service providers 
are attempting to identify them. More crime is done by making fake accounts on 
social networks than by any other type of cybercrime [2]. Through the online social 
networks, people can communicate, share information, plan organization, and even 
run their online businesses.  

Online social networks have an impact on various domains like science, education, 
business, employment, etc. The common technique followed by the fake accounts on 
social media to easily attract users is catchy headlines. The preprocessing of the 
dataset has been done to determine false profiles on mobile social communication 
sites. Random classifiers with support vector machine classification results are 
utilized to find fake accounts. Using a machine learning system to compare the 
precision rates of phony accounts, the method with the highest accuracy is suggested 
[3].   

1.1 Threats  

Due to widespread use of online social networks, many users are vulnerable to both 
privacy and security issues. These threats can be divided into four main categories: 
Conventional threats, Modern threats, Combination threats, and Targeted threats as 
shown in Figure 1.  

 
Fig. 1. Various Threats to Users of Online Social Network by Fake Accounts 

iJIM ‒ Vol. 17, No. 04, 2023 65


Paper—Fake Accounts Identification in Mobile Communication Networks Based on Machine Learning 

Conventional threats. The prevalence of the internet has raised concerns about 
conventional threats. Threats involving malware, spam, phishing, identity theft, and 
cross-site scripting appear to be a recurring issue [4]. Due to the structure and 
prevalence of online social networks, these threats have gone viral and could spread 
quickly among network users.  

Malware. Malware is software designed to interrupt a device's operation in order 
to record user passwords and access your personal information. The online social 
network framework is used by social network malware to proliferate among users and 
their network companions. 

Cross-site scripting. It is a web based attack. The hacker who utilizes cross-site 
scripting abuses the trust of the site user and installs spyware on the user’s mobile or 
computer to collect private information.  

Identity theft. The attacker steals another person's identity by using their social 
security number, contact number, and address without their consent. 

Spam & Phishing. Unwanted bulk electronic messages are termed as spam. Email 
is the regular technique to disseminate spam; social networking sites are highly 
effective in disseminating spam. Phishing is a type of social engineering attack where 
the attacker can obtain personal and sensitive data like username, password, and 
credit card details of people via fake accounts on emails and websites which appears 
to be real [5].  

Modern threats. Modern threats are assaults that penetrate user accounts using 
cutting-edge methods, while targeted attacks are those that are directed at a specific 
user and can be carried out by any user for a variety of personal reasons. 

Clickjacking. The practice of "clickjacking" involves tricking a person into 
clicking on a page other than the one he intended to. 

De-anonymization. In order to reveal a user's true identity, deanonymization 
attacks use methods such as cookie monitoring, network design, and community 
affiliation. 

Fake profiles. OSNs replicate human behavior using active or semi-automatic 
profiles, also referred to as styles or social bots. The use of bogus accounts is another 
way that users of social networking sites may be approached for personal information. 

Inference attacks. The inference attack uses other statistics that the user posts on a 
social networking site to infer confidential information of users. 

Combination Threats. The combination of conventional and modern threats to 
create a more complex threat is known as combination threats.  

Targeted Threats. Children and teens using online social networks are affected 
through these targeted threats.  

Cyberbullying. It is the utilization of e-media such as chats, emails, mobile chats, 
and online social networks to bully a user. Unlike classic bullying, it is continuously 
maintained through social media.  

Online predators. Cyberpredators, often known as child predators online, are the 
biggest threat to the privacy of children's private information. 

Risky behaviors. Children may engage in risky behaviors such as Internet 
interaction with foreigners, the use of chat boards for foreigner meetings, chats with 

66 http://www.i-jim.org


Paper—Fake Accounts Identification in Mobile Communication Networks Based on Machine Learning 

foreigners that are sexually suggestive, and sharing private information and pictures 
with foreigners. 

Cyber grooming. The primary goal of cyber grooming is to win the belief of the 
child so that the child will provide sensitive and personal information. 

1.2 Objective  

Number of mobile users is increasing currently and they can easily access social 
media with mobile in their hand. Fake accounts are created by some person to target 
this large group of users with an intention of causing harm. The main objective of this 
paper is to develop a framework that automatically finds the online fake accounts or 
profiles. Support vector machine learning technique is combined with random forest 
for automatic detection of fake profiles. It can secure the users social life and with this 
automatic detection method, the websites can manage the enormous number of 
profiles, which is not manually possible.  

2 Literature review 

Numerous studies have focused on removing fake accounts, hence in-depth studies 
on identifying false accounts in social networks have been conducted.  

[6] Proposed a spam recognition AI technique for Twitter sites. In this work, the 
author employed an ANN, vector support machine and a random forest method to 
create a technique. The outcomes are compared with RF and ANN techniques; the 
proposed SVM algorithm has the high precision, recall, and F-measure. This result is 
used in managing and tracking social media public images for the detection of 
offensive material and fake photos, as well as to protect social media from online 
threats and attacks. 

[7] Suggested a set of features by using a famous machine learning algorithm 
called support vector machine and neural networks. The system is built with the aim 
of identifying fake users of twitter social network. The accuracy is maintained in this 
work in recognizing fake accounts by various classification algorithms. After using 
the suggested features with consistent heaviness, the outcome displays the highest 
accuracy of the two classification algorithms.  

[8] Came up with a system that recognizes fake profiles automatically with high 
efficiency. This system uses random forest classification techniques to separate the 
profiles into original and fake profiles. This automatic detection method is useful for 
millions of social media network profiles that cannot be identified manually.  

Identifying fake profiles in LinkedIn is expressed in [9]. First the authors of this 
paper identified a minimal set of data profiles essential for recognizing fake profiles 
in LinkedIn, and suggest a proper data mining method for duplicate profile 
identification. Even with the limited data profile, their method identified fake profiles 
with eighty seven percent accuracy and ninety four percent true negative rates while 
comparing to larger data profiles. But the result provides only 14% accuracy 
approximately when compared to methods using same amounts and data types.  

iJIM ‒ Vol. 17, No. 04, 2023 67


Paper—Fake Accounts Identification in Mobile Communication Networks Based on Machine Learning 

[10] Presented a probable approach to alleviate the threat of the fake profile attack, 
where an attacker tries to copy a victim on an online social network where the user 
doesn't have a profile in place.  

3 Methodology 

Collecting the dataset, pre-processing it, choosing features, and applying machine 
learning techniques to it are the proposed tasks. The suggested system's architecture is 
depicted in Figure 2.  

Selection of Dataset: The main aim of this paper is to use the proposed algorithm 
SVM-RF to decide if the accounts identify as true or fake. The instagram dataset is 
considered in this research and collected from kaggle website. The proposed SVM-RF 
algorithm has the ability to appropriately categorize the accounts of the training 
dataset. The training dataset contains data pre-processing which includes feature 
extraction and machine learning technique. After applying the algorithm it identifies 
whether this model account is true or not.  

Data Pre-processing: To confirm that the system identifies the input and 
generates the finest feasible model, the collected data must be preprocessed via 
various steps before entering every classifier.  

Tokenization: Tokenization is the process of breaking down a text-based system 
into individual tokens, such as words, phrases, symbols, or other fundamental parts. 
Exploring sentences as a single phrase is the main goal.  

Stop words removing: Stop words are more general than traditional phrases like 
“are”, “and”, and more. They don't seem to be relevant to the basis of the data 
gathered. Therefore, they must be removed. 

Stemming: In order to achieve this goal more frequently than not correctly, 
stemming is a simple spontaneous technique that removes the ends of words. It 
typically involves the elimination of prefixes and suffixes, which happens regularly in 
English. 

Feature Selection: User-based features, or traits that are particular to each user, 
are used to describe the behavior of Instagram users. Content-based feature 
characteristics are connected to user accounts. Duplicate content cannot be shared by 
regular users, however, scammers frequently share duplicate contents.   

68 http://www.i-jim.org


Paper—Fake Accounts Identification in Mobile Communication Networks Based on Machine Learning 

 
Fig. 2. Proposed System Architecture  

Support Vector Machine: SVM is a supervised algorithm of machine learning. 
The approach to data grouping, training, and issues identification was identified by 
SVM as one of the most fundamental and practical strategies. It is a simple 
classification model and computed by using the given equation. 

 𝑠𝑠(𝑥𝑥) = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠[∑𝑛𝑛𝑚𝑚=1 𝛿𝛿𝑚𝑚𝑠𝑠𝑚𝑚𝜑𝜑(𝑥𝑥, 𝑥𝑥𝑚𝑚) + 𝑟𝑟] (1) 

Where 𝛿𝛿𝑚𝑚 is the “positive real constant” and r is the “real constant”.  
The SVM increases the margin by locating the ideal feature space. The margin is 

calculated and then doubled by taking into account a hyperplane and computing its 
separation between the vectors.  

Random Forest Classifier Algorithm: Classification and regression issues can be 
solved with random forests, commonly referred to as random decision forests. With 
random forest classifiers, many trees' predictions are combined. Many decision trees 
are produced using the random forest method. A portion of the features are used to 

iJIM ‒ Vol. 17, No. 04, 2023 69


Paper—Fake Accounts Identification in Mobile Communication Networks Based on Machine Learning 

construct each decision tree. To increase the accuracy of the Random Forest 
technique, each decision tree establishes a single class and eventually bootstraps the 
votes. A choice will be made at any node. The steps are as follows: 

1. Divide the data so that the total amount of data is gained. 
2. The node's state with the highest gain for dividing is picked. 
3. Splitting continues until entropy is higher than zero. 
4. After repeating the above procedure several times, a class is finally selected for a 

sample based on significant voting. 

4 Result & discussion 

The text-based Instagram dataset found from kaggle website. Totally 120 details 
are there in the testing dataset. The first 10 real and 10 fake profiles are mentioned in 
Table 1.  

Table 1.  Instagram Dataset Sample 

pr
of

ile
 p

ic
 

nu
m

s/
le

ng
th

 
us

er
na

m
e 

fu
lln

am
e 

w
or

ds
 

nu
m

s/
le

ng
th

 
fu

ll 
na

m
e 

na
m

e=
=u

se
r

na
m

e 

de
sc

ri
pt

io
n 

le
ng

th
 

ex
te

rn
al

 
U

R
L

 
pr
iv

at
e 

N
o.

 o
f p

os
ts

 
N
o.

 o
f 

fo
llo

w
er

s 

N
o.

 o
f 

fo
llo

w
s 

F
ak

e 
or

 r
ea

l  

1 0.33 1 0.33 1 30 0 1 35 488 604 0 
1 0 5 0 0 64 0 1 3 35 6 0 
1 0 2 0 0 82 0 1 319 328 668 0 
1 0 1 0 0 143 0 1 273 14890 7369 0 
1 0.5 1 0 0 76 0 1 6 225 356 0 
1 0 1 0 0 0 0 1 6 362 424 0 
1 0 1 0 0 132 0 1 9 213 254 0 
1 0 2 0 0 0 0 1 19 552 521 0 
1 0 2 0 0 96 0 1 17 122 143 0 
1 0 1 0 0 78 0 1 9 834 358 0 
0 0.05 1 0 0 0 0 0 0 0 2 1 
1 0.27 1 0 0 0 0 0 0 45 64 1 
0 0.07 1 0 0 0 0 0 0 19 30 1 
0 0 1 0 1 0 0 0 0 69 694 1 
0 0 2 0 0 0 0 0 0 22 82 1 
0 0.22 0 0 0 0 0 0 0 31 124 1 
0 0 3 0 0 0 0 0 0 9 25 1 
0 0 1 0 1 0 0 0 0 69 694 1 
0 0 1 0 0 0 0 0 0 23 33 1 
0 0.62 1 0.4 0 0 0 0 1 17 34 1 

 
70 http://www.i-jim.org


Paper—Fake Accounts Identification in Mobile Communication Networks Based on Machine Learning 

Evaluation Parameters: The legitimacy of positive (P) and negative (N) data is 
discussed in this section (N). Hacking is referred to as “hit or positive in reality” (TP), 
approved in “reality as negative” (TN), and approved fake sites incorrectly as a “false 
positive” (FP) or “false hit” (FP). The ratio of the classified examples profile over the 
entire profile number, as indicated in Equation, determines accuracy. 

 𝑂𝑂𝑂𝑂𝑂𝑂𝑟𝑟𝑂𝑂𝑂𝑂𝑂𝑂 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝑟𝑟𝑂𝑂𝐴𝐴𝐴𝐴 (%) = 𝑇𝑇𝑇𝑇+𝑇𝑇𝑇𝑇
𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇+𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇

  (2) 

Precision is the percentage of returning hits that were true positive or correct hits. 

 𝑃𝑃𝑟𝑟𝑂𝑂𝐴𝐴𝑠𝑠𝑠𝑠𝑠𝑠𝑃𝑃𝑠𝑠 (𝑃𝑃) =  𝑇𝑇𝑇𝑇
𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇

  (3) 

True positive recalled amount, i.e. how many accurate hits were also discovered.  

 𝑅𝑅𝑂𝑂𝐴𝐴𝑂𝑂𝑂𝑂𝑂𝑂 =  𝑇𝑇𝑇𝑇
𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇

   (4) 

Table 2.  Accuracy Results Obtained Through 

S. No Algorithms Accuracy Precision  Recall 
1 Logistic Regression 96% 93% 94% 
2 Artificial Neural Network 94% 93% 92% 
3 Naïve Bayes  90% 88% 89% 
4 SVM with Random Forest 98% 97% 98% 

 
When comparing the proposed SVM-RF algorithm to the logistic regression, 

artificial neural network, and Naïve Bayes algorithm, the proposed algorithm has the 
highest accuracy, precision and recall.  

 
Fig. 3. Accuracy Comparison 

86%

88%

90%

92%

94%

96%

98%

100%

Logistic
Regression

ANN Naive Bayes SVM-RF

iJIM ‒ Vol. 17, No. 04, 2023 71


Paper—Fake Accounts Identification in Mobile Communication Networks Based on Machine Learning 

 
Fig. 4. Precision Graph Comparison 

 
Fig. 5. Recall Graph Comparison 

Amongst the four classification methods compared, support vector machine 
learning algorithm with random forest provides the highest accuracy of 98%, 
precision of 97%, and recall of 98%. The proposed classification algorithm 
automatically identifies fake accounts in online social networks.  

82%

84%

86%

88%

90%

92%

94%

96%

98%

Logistic
Regression

ANN Naïve Bayes SVM-RF

84%

86%

88%

90%

92%

94%

96%

98%

100%

Logistic
Regression

ANN Naïve Bayes SVM-RF

72 http://www.i-jim.org


Paper—Fake Accounts Identification in Mobile Communication Networks Based on Machine Learning 

5 Conclusion 

The proposed work gives an analysis of essential approaches for identifying fake 
accounts on online social network websites. In this paper, the Instagram dataset is 
considered for instance. Consumers have recently found it more and more challenging 
to locate reliable information due to the vast amount of information available on 
social media sites. The SVM-RF machine learning model developed to separate an 
input as true or fake in order to combat the growing fraud on the internet. Many social 
media platforms are working to integrate these technologies into their platforms in 
order to stop the spread of fraudulent activities. The accuracy, precision, and recall 
values of the proposed model is compared with logistic regression, artificial neural 
network and Naïve Bayes and shown that the proposed algorithm automatically 
identifies fake accounts better than all three.  

6 References 

[1] Song, C., Ning, N., Zhang, Y., & Wu, B. (2021). Knowledge augmented transformer for 
adversarial multidomain multiclassification multimodal fake news detection. 
Neurocomputing, 462, 88-100. https://doi.org/10.1016/j.neucom.2021.07.077  

[2] Mohammadrezaei, M., Shiri, M. E., & Rahmani, A. M. (2018). Identifying fake accounts 
on social networks based on graph analysis and classification algorithms. Security and 
Communication Networks, 2018. https://doi.org/10.1155/2018/5923156  

[3] Hemeida, A. M., Alkhalaf, S., Mady, A., Mahmoud, E. A., Hussein, M. E., & Eldin, A. M. 
B. (2020). Implementation of nature-inspired optimization algorithms in some data mining 
tasks. Ain Shams Engineering Journal, 11(2), 309-318. https://doi.org/10.1016/j.asej.2019. 
10.003  

[4] Kumari, S., & Singh, S. (2015, April). A critical analysis of privacy and security on social 
media. In 2015 Fifth International Conference on Communication Systems and Network 
Technologies (pp. 602-608). IEEE. https://doi.org/10.1109/CSNT.2015.21  

[5] Fire M, Goldschmidt R, Elovici Y (2014). Online social networks: threats and solutions. 
IEEE Commun Surv Tutorials 16(4):2019–2036. https://doi.org/10.1109/COMST. 
2014.2321628  

[6] Prabhu Kavin, B., Karki, S., Hemalatha, S., Singh, D., Vijayalakshmi, R., Thangamani, 
M., ... & Adigo, A. G. (2022). Machine learning-based secure data acquisition for fake 
accounts detection in future mobile communication networks. Wireless Communications 
and Mobile Computing, 2022. https://doi.org/10.1155/2022/6356152  

[7] Kasliwal, N., Bachhav, T., Sonavane, D., Shinde, S., & Nivangune, M. (2019). Detection 
of fake accounts of Twitter using SVM and NN algorithms. IEEE Transactions on 
dependable and secure computing, 5(1), 37-48. 

[8] Reddy, S. D. P. (2019). Fake profile identification using machine learning. International 
Research Journal of Engineering and Technology (IRJET), 6(12), 1145-1150. 

[9] Adikari, S., & Dutta, K. (2020). Identifying fake profiles in linkedin. arXiv preprint 
arXiv:2006.01381. 

[10] Conti, M., Poovendran, R., & Secchiero, M. (2012, August). Fakebook: Detecting fake 
profiles in on-line social networks. In 2012 IEEE/ACM International Conference on 
Advances in Social Networks Analysis and Mining (pp. 1071-1078). IEEE. https://doi.org/ 
10.1109/ASONAM.2012.185  

iJIM ‒ Vol. 17, No. 04, 2023 73

https://doi.org/10.1016/j.neucom.2021.07.077
https://doi.org/10.1155/2018/5923156
https://doi.org/10.1016/j.asej.2019.10.003
https://doi.org/10.1016/j.asej.2019.10.003
https://doi.org/10.1109/CSNT.2015.21
https://doi.org/10.1109/COMST.2014.2321628
https://doi.org/10.1109/COMST.2014.2321628
https://doi.org/10.1155/2022/6356152
https://doi.org/10.1109/ASONAM.2012.185
https://doi.org/10.1109/ASONAM.2012.185


Paper—Fake Accounts Identification in Mobile Communication Networks Based on Machine Learning 

7 Authors 

Ahdi Hassan, he is serving as a Researcher at Global Institute for Research 
Education & Scholarship, Amsterdam, Netherlands, Commissioning Editor, IGI 
Global publisher publishes more than 170 journals quarterly and semi-annually, 
Researcher, "Vanishing Languages and Cultural Heritage", Austrian Academy of 
Sciences, and representative of Imperial English UK-A Trusted British Brand in 
English Language, Independent Research International [IRI] and Advisor Scholarly 
Journal Management. He has been Associate or Consulting Editor of numerous 
journals and also served the editorial review board from 2013- to till now. He has a 
number of publications and research papers published in various domains. Founder of 
Pakistani languages corpora and has earned his master’s degree in Linguistics from 
Quaid-i- Azam University, Islamabad in 2013. He has given contribution with the 
major roles such as using modern and scientific techniques to work with sounds and 
meanings of words, studying the relationship between the written and spoken formats 
of various Asian/European languages, developing the artificial languages in 
coherence with modern English language, and scientifically approaching the various 
ancient written material to trace its origin. He teaches topics connected but not limited 
to communication such as English for Young Learners, English for Academic 
Purposes, English for Science, Technology and Engineering, English for Business and 
Entrepreneurship, Business Intensive Course, Applied Linguistics, interpersonal 
communication, verbal and nonverbal communication, cross cultural competence, 
language and humor, intercultural communication, culture and humor, language 
acquisition and language in use (email: ahdihassan.41@gmail.com, Orcid: 
https://orcid.org/my-orcid?orcid=0000-0003-1734-3168). 

Dr. Abdalilah Alhalangy is an assistant professor of information systems. He got 
a bachelor’s degree in information technology from Al-Sharq Private College, a 
master’s in information technology, and a PhD in information systems from Al-
Neelain University in Sudan. He has taught at the level of higher education in Sudan 
(University of Kassala, Faculty of Computer Science and Information Technology) 
and Saudi Arabia (Qassim University) since 2006. At the University of Kassala, 
Sudan, he held several positions. He taught courses in the departments of computer 
science, information technology, and information systems (email: 
a.alhalangy@qu.edu.sa, ORCID ID: https://orcid.org/0000-0003-2735-8208). 

Dr. Fahad Alzahrani is a university professor of applied linguistics with 
educational backgrounds in linguistics and computer science. His major research 
interests fall under discourse analysis, computational linguistics, natural language 
processing, and corpus linguistics (email: fahad_alzahrani7@hotmail.com, ORCID: 
https://orcid.org/0000-0002-4270-8598). 

Article submitted 2022-11-24. Resubmitted 2023-01-04. Final acceptance 2023-01-06. Final version 
published as submitted by the authors. 

74 http://www.i-jim.org

https://orcid.org/my-orcid?orcid=0000-0003-1734-3168
https://orcid.org/0000-0003-2735-8208
https://orcid.org/0000-0002-4270-8598