Improving Accuracy for Diabetes Mellitus Prediction by Using Deepnet 
 

1 

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 12(1):e11, 2020 

OJPHI 

Improving Accuracy for Diabetes Mellitus Prediction by Using 
Deepnet 

Riyad Alshammari1*, Noorah Atiyah2, Tahani Daghistani1, Abdulwahhab Alshammari1 

1Health Informatics Department, College of Public Health and Health Informatics 

King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS) 

King Abdullah International Medical Research Center (KAIMRC) 

Ministry of National Guard Health Affairs, Riyadh, KSA 

2 Faculty of Health Sciences, Simon Fraser University, Burnaby British Columbia, Canada 

Abstract: 

Diabetes is a salient issue and a significant health care concern for many nations. The forecast for the 
prevalence of diabetes is on the rise. Hence, building a prediction machine learning model to assist in 
the identification of diabetic patients is of great interest. This study aims to create a machine learning 
model that is capable of predicting diabetes with high performance. The following study used the BigML 
platform to train four machine learning algorithms, namely, Deepnet, Models (decision tree), Ensemble 
and Logistic Regression, on data sets collected from the Ministry of National Guard Hospital Affairs 
(MNGHA) in Saudi Arabia between the years of 2013 and 2015. The comparative evaluation criteria for 
the four algorithms examined included; Accuracy, Precision, Recall, F-measure and PhiCoefficient. 
Results show that the Deepnet algorithm achieved higher performance compared to other machine 
learning algorithms based on various evaluation matrices. 

Keywords: Diabetes, Artificial Intelligence, Deep Learning 

*Corresponding Author: riyadalshamamri@gmail.com 

DOI: 10.5210/ojphi.v12i1.10611 

Copyright ©2020 the author(s) 

This is an Open Access article. Authors own copyright of their articles appearing in the Online Journal of Public Health Informatics. 
Readers may copy articles without permission of the copyright owner(s), as long as the author and OJPHI are acknowledged in the copy 
and the copy is used for educational, not-for-profit purposes. 

Introduction: 

Diabetes is a severe disease that affects all genders and ages [1]. Diabetes is a metabolic and 

systemic disease in which a disruption in the metabolism of carbohydrates occurs because of 

insufficient insulin production for the body's metabolic needs [2]. There are two main types of 

diabetes; Type 1, or insulin-dependent diabetes, which is a result of the elimination of insulin-

producing pancreatic cells [2]. Type 2, or non-insulin-dependent diabetes, correlates to obesity and 

mailto:riyadalshamamri@gmail.com


Improving Accuracy for Diabetes Mellitus Prediction by Using Deepnet 
 

2 

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 12(1):e11, 2020 

OJPHI 

results from a relative lack of insulin [2]. This disease is a result of an individual's carbohydrate 

intake exceeding the capacity of their pancreas's production of insulin [2]. The gravity of the 

condition is evident in the form of complications [3]. Common complications of diabetes include, 

but are not limited to, heart disease, stroke, and kidney disease, which can result in higher mortality 

[3]. At the patient level, an individual may fail to recognize they have the disease or fail to receive 

prompt appropriate care resulting in poor prognosis [3]. 

The global prevalence of diabetes for adults aged more than 18 years old was 8.5% in 2014 per 

the World Health Organization (WHO) report [4]. Of the population impacted by diabetes, 80% 

of the people lived in low-income and middle-income countries, with the highest diagnosis being 

Type 2 diabetes; however, there is an alarming rise in the prevalence of both Type 1 and 2 diabetes 

[1]. Parallel to the increasing prevalence rate of the disease, there is an increase in associated 

consequences due to the complications of diabetes, i.e. increase in heart disease, stroke, and poor 

health [3]. Therefore, mortality rates as a result of diabetes and its comorbid health problems are 

rising proportionally [5]. In 2015, there was an estimate of 1.6 million deaths as a direct cause of 

diabetes [1]. The International Diabetes Federation reported that the disease affects one in 11 adults 

worldwide, with one person dying of the disease every six seconds [1]. In 2030, WHO anticipates 

that diabetes will be the seventh leading cause of death [4]. In Saudi Arabia, there is an excessive 

prevalence of diabetes, with an estimated rise of more than 2.5 million patients by 2030 having the 

disease [6]. 

Early prediction of Type 2 diabetes is a prominent health research topic in Saudi Arabia. Diabetes 

Risk Score was the most convenient tool for diabetes prediction [7]. However, this method needs 

human intervention in decision-making. Nowadays, Computational models to predict the risk of 

diabetes can significantly support healthcare providers with decision-making and assist self-

disease management, which, in turn, can potentially decrease the diseases associated mortality 

rates [8]. Therefore, machine learning is gaining attention in the health field as these techniques 

produce high performance in predicting diabetes. 

Specifically, these models can help identify those who are at high risk of having diabetes, and for 

which early prevention and control programs can improve health outcomes [7,9]. At the same time, 

these techniques reduce the human error in necessary healthcare decisions. Thus, decreasing health 

burden and utilizing health service resources [5]. Ideally, further development of models that 

incorporate prior knowledge would be promising for diabetes prediction [10]. The availability of 

a patient's health data could help to extract meaningful information and hidden knowledge to better 

the prognosis of the individuals affected by this disease. 

Background 

The Biology of Diabetes 

Type 1 diabetes is an abnormal immune reaction controlled by a portion of the HLA-D region 

genes and works directly against molecules expressed only on the β-cells [11]. The pathway for 

immunological response systems is complex but involves mounting a response towards foreign 

antigens [11]. In Type 1 diabetes, similar attacks occur on certain pancreatic β-cells resulting in an 

insulin deficiency [11]. 


Improving Accuracy for Diabetes Mellitus Prediction by Using Deepnet 
 

3 

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 12(1):e11, 2020 

OJPHI 

Type 2 diabetes is one of the most frequent metabolic disorders, and it is a heterogeneous disease 

distinguishable by its deficient production of insulin secretion via the pancreatic islet β-cells [12]. 

Insulin deficiency results in insulin resistance or impaired insulin sensitivity, which leads to a 

decline in patient health [12]. 

Genome-wide association studies found for islet function; more than 400 Type 2 diabetes-

associated gene variations influenced secretion [12] [13]. However, genetic roles in the individual 

genes explain less than 20% of overall diabetic disease risk [12] [13]. In contrast, the literature on 

lifestyle modification indicates sedentary lifestyles, poor diet, and a myriad of social determinants 

of health (such as; low socio-economic status, psychological conditions, poor environment) all 

play a predominant role in the development of Type 2 diabetes [12] [14] [15] [16]. Furthermore, 

parental lifestyle has longitudinal impacts on the life course of an individual. Within utero 

programming and early postnatal metabolic transformation correlates to the risk of diabetes due to 

DNA methylation [12] [17] [18]. 

Type 2 diabetes results from a variety of factors but is often mitigated through lifestyle changes 

and preventative measures such as diet change, increased exercise, and overall holistic integration 

of health. 

In summary, over the past 50 years, diabetes mellitus, or diabetes in layman's terms, continues to 

increase, with individuals in Western, Western Pacific, Asian and African countries all 

experiencing an increase in disease prevalence [12]. Cho and colleagues [19] predict globally for 

years 2017 to 2045, a diabetes rate increase of at least 50%, meaning approximately 693 million 

people will be affected by the disease creating an estimated healthcare cost of US$850 billion per 

year. 

Diabetes in Saudi Arabia 

In the past four decades, Saudi Arabia has undergone significant socio-economic change [20]. 

Specifically, Saudi Arabia has seen an increase in an ageing population, progressive urbanization, 

decreased infant mortality rates and increased life expectancy [20]. The changes in population 

demographics also couples with a rapid change in lifestyles, where individuals are moving towards 

westernized patterns of consumption, shown in changes in nutrition, less physical activity, higher 

rates of obesity, and increases in smoking—all resulting in a dramatic rise in the prevalence of 

diabetes [21] [22] [23] [24]. 

The WHO reported in 2016 [25], the prevalence of diabetes in Saudi Arabia was 14.7% for males 

and 13% of females. The WHO [25] also found high prevalence of overweight individuals (67.5% 

males; 69.2% females), obesity (29.5% males; 39.5% females), and inactivity (52.1% males; 

67.7% females). The WHO [25] further reported high mortality rates attributed to diabetes with 

1070 males and 500 females (aged 30-69) and 1460 males and 1020 females (aged 70+) dying due 

to the disease. Overall, diabetes is an important health concern for the citizens of Saudi Arabia. 

Integrating early detection and prediction models would have both national and global benefits. 


Improving Accuracy for Diabetes Mellitus Prediction by Using Deepnet 
 

4 

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 12(1):e11, 2020 

OJPHI 

AI and Diabetes 

Dalakleidi et al. [26] applied Evolving Artificial Neural Networks (EANNs), Bayesian‐based 

algorithm, decision trees and Logistic Regression to predict the progress of diabetes and its 

complications related to cardiovascular disease. They achieved an accuracy of 80% with the 

EANNs algorithm. For the difficulties, they produced an accuracy of 92.86%. Meng et al. [27] 

compared three different techniques, namely Logistic Regression, decision tree and Artificial 

Neural Networks, to predict diabetes and prediabetes. They achieved the best performance with a 

decision tree with an accuracy of 77.87%, a sensitivity of 80.68% and specificity of 75.13%. Wang 

et al. [28] built a classification model to recognize people of developing Type 2 diabetes. They 

compared Artificial Neural Networks (ANNs) and Multivariate Logistic Regression (MLR). They 

showed that ANN outperformed MLP. Research is promising and demonstrates that AI has the 

potential to help in the diagnostic framework of diabetic or prediabetic patients. 

Methods 

The following section discusses the methodology of this research article. Furthermore, this section 

describes the gathering of the data-set and feature information; while also explaining the 

algorithms used in the following research and evaluation criteria. 

A. Data-set and Features 

The collection of health data-sets were between the years 2013 and 2015. The health-data was 

from the Electronic Health Record of the Ministry of National Guard Health Affairs databases for 

all adult patients who had tested for Hemoglobin A1c (HgbA1c). The process of labelling patients 

as diabetic relied on the results of the HgbA1c. If the value of HgbA1c was higher or equal to 

seven, patients were classified as diabetic. If the value of HgbA1c was less than seven, patients 

were classified as non-diabetic. After the pre-processing of the data-sets, the exclusion criteria 

(exempting participants from further analysis) included those with a missing value of 40% and 

higher. The usage of the manual inspection and domain knowledge technique allowed researchers 

to remove implausible values. Furthermore, to check the quality of data, this study used R to 

analyze the given information. Table 1 shows a descriptive analysis of the attributes. The data sets 

have 17 attributes organized into three categories: 

1) Demographic attributes such as gender, age, and region; 

2) Measurement attributes such as the Body Mass Index (BMI) and blood pressure; 

3) Lab tests. 

 
Table 1: Descriptive Statistics of Diabetes Risk Factors 

Risk Factors Data 

Cities   

Riyadh 54141 (81.63%) 

Dammam 11085 (16.71%) 

Jeddah 1099 (1.66%) 


Improving Accuracy for Diabetes Mellitus Prediction by Using Deepnet 
 

5 

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 12(1):e11, 2020 

OJPHI 

Sex   

Male 36811 (55.50%) 

Female 29514 (44.50%) 

Age Groups   

13-19 578 (0.87%) 

20-34 4067 (6.13%) 

35-44 4486 (6.76%) 

45-64 23949 (36.11%) 

65-84 29049 (43.80%) 

>85 4196 (6.33%) 

Body Mass Index (BMI) 30.77 ± 8.92 

Blood pressure   

High blood pressure 128.74 ± 18.225 

Low blood pressure 67.71 ± 11.154 

Lab Test   

eGFR 78.33 ± 40.83 

Mean corpuscular volume (MCV) 86.954 ± 7.589 

Mean corpuscular hemoglobin (MCH) 28.03 ± 2.91036 

Mean Corpuscular hemoglobin concentration (MCHC) 317.55 ± 38.99 

Red cell volume distribution width (RDW) 15.23 ± 2.43 

Platelet count (Plt) 273.70 ± 125 

Mean Platelet Volume (MPV) 8.55 ± 1.38 

White Blood Cell Count (WBC) 9.35 ± 5.81 

Red Blood Cell Count (RBC) 4.17 ± 0.84 

Hemoglobin (Hgb) 114.56 ± 26.72 

Hematocrit (Hct) 0.91 ± 4.44 

Values are mean ± SD and n (%). 

Males represented 55.50% of the data-set, while females represented 44.50%. Most of the data 

belonged to patients aged 45 to 84 years old. The percentage of diabetic patients in the data-set 

was 64.47%. The incidence of diabetes for both genders was higher in those aged 65-84 years, 

with males at 47.83%, and females at 48.6%. Comparatively, those with an age range of 45-64 

years demonstrated the following; 37.89% male and 38.03% female. The results show that the 


Improving Accuracy for Diabetes Mellitus Prediction by Using Deepnet 
 

6 

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 12(1):e11, 2020 

OJPHI 

female patients in the data-sets had a higher Body Mass Index (BMI) and blood pressure 

measurement compared to the male patients, see Table1. 

B. Algorithms and Evaluation Criteria 

BigML [29] is a cloud computing service that provides Machine Learning as a Service (MLaaS). 

The BigML offers a collection of state-of-the-art machine learning algorithms and demonstrates 

the ability to solve real-world challenges in various domains. The BigML was used to build four 

Machine Learning-based algorithms, namely Ensemble, Models (decision trees), Deepnet, and 

Logistic Regression. 

A model in BigML is similar to decision tree representation. Each node represents one of the input 

attributes (predictors), with the first node being the root. Each node except the root has two 

branches (leaves) that represent a value of an attribute. A leaf represents the outcome of the class 

(objective field) in the chain of branches, starting from the root to the leaf end. 

An ensemble in the BigML is a group of machine learning algorithms joined together to make a 

more reliable model. Logistic Regression is a supervised machine learning algorithm that uses a 

logistic function with the input values to build a learning model. Deepnet in the BigML is an 

optimized form of deep neural networks that is suitable for classification problems. The Deepnet 

is a supervised machine learning model that simulates the human brain neural circuitry. 

A 10-fold cross-validation technique was applied to evaluate the performance of each machine 

learning algorithms. It works by dividing the data set into ten equal folds. The training of the 

machine learning model utilized a one-fold test on the reaming folds, with an iteration of ten. At 

the end of the tenth iteration, the result shows the average of all the ten folds [30]. 

The application of the following matrices selected the best model in predicting the label classes 

(diabetic vs. non-diabetic); True Positive rate, False Positive rate, Precision, Recall, Area Under 

the Curve and F-measure. The calculated metrics were: 

● Accuracy: which represents the number of correctly classified records over the total 

number of evaluated records, calculated based on equation 1: 

Accuracy = (TP+ TN) / (TP + TN + FP + FN)     (1) 

● Precision: which represents the number of true positives correctly identified as 

diabetic patients over the total number of positive predictions, calculated based on 

equation 2: 

Precision = TP/(TP+FP)         (2) 

● Recall: which represents the number of true positives correctly identified as diabetic 

patients over the total number of positive records, calculated based on equation 3: 

Recall = TP/(TP+FN)         (3) 

● F-measure: which represents the harmonic mean of precision and recall, calculated 

based on equation 4: 

2 * (precision*recall) / (precision+ recall)      (4) 

● PhiCoefficient: represents the Matthews Correlation Coefficient, calculated based on 

equation 5: 


Improving Accuracy for Diabetes Mellitus Prediction by Using Deepnet 
 

7 

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 12(1):e11, 2020 

OJPHI 

PhiCoefficient = (TP * TN - FP * FN) / (√((TP + FP) * (TP + FN) * (TN + FP) * (TN 

+ FN)))          (5) 

Results and Discussion: 

The aim of this study was the following: to build a machine learning model(s) that can predict 

diabetes with high accuracy. Therefore, the usage of the BigML machine learning platform helped 

in the creation of the four machine learning models, namely Ensemble, Models (decision tree), 

Deepnet and Logistic Regression. Each machine learning algorithm has different machine learning 

techniques. The overall goal of this study was to find the best performance model and apply its 

technique to predict diabetes. The performance of the Deepnet model was better than Models 

(decision tree), Ensemble and Logistic Regression on all of the evaluation criteria, Table 2. 

Table 2: Evaluation of Predicting diabetes using AI Techniques. 

  ENSEMBLE MODELS Deepnet Logistic Regression 

Accuracy 88.1 87.8 88.48 88.19 

Precision 87.9 87.7 88.29 88.38 

Recall 87.8 87.6 88.36 87.63 

F-Measure 0.8783 0.8761 0.88 0.88 

PhiCoefficient 0.7566 0.7522 0.77 0.76 

The prediction of diabetic patients who may not know they have the disease is a crucial challenge 

in the healthcare domain. The machine learning technique demonstrates the ability to predict 

diabetes with high accuracy using only 17 attributes. Furthermore, an offered perk of this method 

is information collection can occur from routine checkups at a healthcare clinic. This process will 

allow the integration of up-to-date information into the system expediting medical care and easing 

the burden on healthcare workers and patients. 

Changing the healthcare workflow can enhance the early healthcare assessments of those with 

diabetes. As a result, this can decrease the prevalence of the disease and improve initial 

management practices. Furthermore, this will increase patients' satisfaction and overall quality of 

care. 

A comprehensive diagnostic framework has the potential to streamline medical services and 

empower patients. Machine learning based on algorithms offers a unique tool for healthcare 

professionals to utilize, from both an epidemiological and treatment perspective. From a systems 

standpoint, the ability to centralize medical data and predict trends in population health would 

allow resource allocation to the identified gaps, which in turn, strengthens the population's health. 

Moreover, another meaningful impact of integrating machine learning is the benefit to the patients. 

The WHO [31] defines empowerment as "a process through which people gain greater control 

over decisions and actions affecting their health" and affects both individual and community levels. 

To empower the population, they must get access to their information and have it delivered it in 


Improving Accuracy for Diabetes Mellitus Prediction by Using Deepnet 
 

8 

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 12(1):e11, 2020 

OJPHI 

an understandable format, transparent, and overall—user-friendly. Ease of use is paramount for 

patient engagement. 

Literature indicates that within the 21st century, mobile health technologies resulted in increases 

in connecting users on a community, population, and global level [32]. Mobile health addresses 

the rising burden of chronic diseases while encouraging health systems to shift towards patient-

centric designs [32]. Mobile health consists of medical practice supported by a portable diagnostic 

device [32]. The use of these devices at the point-of-care resulted in not only a change in healthcare 

delivery, but an increase in patient engagement, a reduction in healthcare costs, and improved 

patient prognosis [32]. 

Model learning has the potential to increase patient empowerment via mobile health. The 

compatibility for our connected world through accessibility from a smartphone, desktop or other 

personal electronic devices, in the way of an app, is potentially highly useful in capitalizing on our 

mobile interconnectedness. However, before the implementation of mobile health, guidelines to 

manage these machine learning models are essential for healthcare. 

Systematically developed statements based on research, best practices, best scientific evidence, 

and experience act as guidelines [31,33]. Guidelines support healthcare providers with an outline 

for patient care to ensure that individuals receive the same or similar patient care across healthcare 

facilities. 

Standardizing guidelines across healthcare facilities can aide authorities in bridging the gap 

between research and practices within these facilities, which helps to foster consistent services. 

Additionally, standardizing guidelines across healthcare facilities helps healthcare providers to 

identify the what, where, when, and how of the patient's health; while collecting, sharing, and 

reporting data improves and streamlines the process. The collected and reported data based on 

clinical guidelines assists healthcare and public health authorities in identifying the age groups or 

individual patients at high risk of having diabetes (Type 1 or Type 2). 

The collected and reported data assists healthcare authorities in planning prevention and treatment 

plans. The flexibility of the process allows healthcare authorities to navigate the ever-changing 

healthcare landscape. Gaps, limitations, and needs of population health are dynamic, and to avoid 

steep healthcare costs, the allocation of resources must have a basis in evidence to resolve pressing 

issues best. 

Conclusion: 

In this paper, the building of a machine learning model for early prediction of diabetes had a basis 

on real health data collected from the Ministry of National Guard Health Affairs, Saudi Arabia. 

The comparison of four machine learning algorithms, namely Deepnet, models (decision tree), 

ensemble and Logistic Regression, used 17 attributes. Under assessment, Deepnet achieves the 

best result using the four different evaluation criteria. This paper demonstrates that machine 

learning-based algorithms have excellent potential in predicting diabetes with high accuracy. 

Future work is to evaluate the model on a larger data-set and use the model with the Internet of 

Things devices. 


Improving Accuracy for Diabetes Mellitus Prediction by Using Deepnet 
 

9 

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 12(1):e11, 2020 

OJPHI 

Acknowledgement: 

This study was funded by the King Abdullah International Medical Research Center (KAIMRC), 

National Guard, Health Affairs, Riyadh, Saudi Arabia, with research grant No. RC17/248/R. 

References 

1. Chan JCN, Gregg EW, Sargent J, Horton R. 2016. Reducing global diabetes burden by 

implementing solutions and identifying gaps: a Lancet Commission. Lancet. 387(10027), 

1494-95. doi:https://doi.org/10.1016/S0140-6736(16)30165-9. PubMed 

2. Porta M, Last JMLM. "Diabetes," in A Dictionary of Public Health, J. M. Last, Ed. Oxford 

University Press, 2018. 

3. Kasemthaweesab P, Kurutach W. (2012, July). Association analysis of diabetes mellitus (DM) 

with complication states based on association rules. In Industrial Electronics and Applications 

(ICIEA), 2012 7th IEEE Conference on (pp. 1453-1457). IEEE. 

4. Retrieved November WHO. 8, 2017, from  

http://www.who.int/mediacentre/factsheets/fs312/en/ 

5. Collins GS, Mallett S, Omar O, Yu LM. 2011. Developing risk prediction models for Type 2 

diabetes: a systematic review of methodology and reporting. BMC Med. 9(1), 103. PubMed 

https://doi.org/10.1186/1741-7015-9-103 

6. Retrieved November WHO. 8, 2017, from  

http://www.who.int/diabetes/facts/world_figures/en/index2.html 

7. Alshammari R, Almutairi N. 2017. Building Diabetes Early Warning System Using Data 

Mining Techniques. J Med Imaging Health Inform. 7(3), 655-59.  

https://doi.org/10.1166/jmihi.2017.2043 

8. Zarkogianni K, Litsa E, Mitsis K, Wu PY, Kaddi CD, et al. 2015. A review of emerging 

technologies for the management of diabetes mellitus. IEEE Trans Biomed Eng. 62(12), 2735-

49. PubMed https://doi.org/10.1109/TBME.2015.2470521 

9. Razavian N, Blecker S, Schmidt AM, Smith-McLallen A, Nigam S, et al. 2015. Population-

level prediction of Type 2 diabetes from claims data and analysis of risk factors. Big Data. 

3(4), 277-87. PubMed https://doi.org/10.1089/big.2015.0020 

10. Shankaracharya DO, Samanta S, Vidyarthi AS. 2010. Computational intelligence in early 

diabetes diagnosis: a review. Rev Diabet Stud. 7(4), 252. PubMed  

https://doi.org/10.1900/RDS.2010.7.252 

11. Lernmark Å. 1985. Molecular biology of Type 1 (insulin-dependent) diabetes mellitus. 

Diabetologia. 28(4), 195-203. doi:https://doi.org/10.1007/BF00282232. PubMed 

https://doi.org/10.1016/S0140-6736(16)30165-9
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=27061676&dopt=Abstract
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=21902820&dopt=Abstract
https://doi.org/10.1186/1741-7015-9-103
https://doi.org/10.1166/jmihi.2017.2043
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=26292334&dopt=Abstract
https://doi.org/10.1109/TBME.2015.2470521
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=27441408&dopt=Abstract
https://doi.org/10.1089/big.2015.0020
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=21713313&dopt=Abstract
https://doi.org/10.1900/RDS.2010.7.252
https://doi.org/10.1007/BF00282232
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=3160627&dopt=Abstract


Improving Accuracy for Diabetes Mellitus Prediction by Using Deepnet 
 

10 

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 12(1):e11, 2020 

OJPHI 

12. Roden M, Shulman GI. 2019. The integrative biology of Type 2 diabetes. Nature. 576(7785), 

51-60. doi:https://doi.org/10.1038/s41586-019-1797-8. PubMed 

13. Mahajan A, et al. 2018. Refining the accuracy of validated target identification through coding 

variant fine-mapping in Type 2 diabetes. Nat Genet. 50(4), 559-71. 

doi:https://doi.org/10.1038/s41588-018-0084-1. PubMed 

14. Lean MEJ, et al. 2019. Durability of a primary care-led weight-management intervention for 

remission of Type 2 diabetes: 2-year results of the DiRECT open-label, cluster-randomised 

trial. Lancet Diabetes Endocrinol. 7(5), 344-55. doi:https://doi.org/10.1016/S2213-

8587(19)30068-3. PubMed 

15. Bellou V, Belbasis L, Tzoulaki I, Evangelou E. 2018. Risk factors for Type 2 diabetes 

mellitus: An exposure-wide umbrella review of meta-analyses. PLoS One. 13(3), e0194127. 

doi:https://doi.org/10.1371/journal.pone.0194127. PubMed 

16. Petersen KF, Dufour S, Befroy D, Lehrke M, Hendler RE, et al. 2005. Reversal of 

nonalcoholic hepatic steatosis, hepatic insulin resistance, and hyperglycemia by moderate 

weight reduction in patients with Type 2 diabetes. Diabetes. 54(3), 603-08. 

doi:https://doi.org/10.2337/diabetes.54.3.603. PubMed 

17. "Epigenome-wide association study of body mass index, and the adverse outcomes of 

adiposity. - PubMed - NCBI." [Online]. Available:  

https://www.ncbi.nlm.nih.gov/pubmed/28002404. [Accessed: 03-Mar-2020]. 

18. Cahill GF. 1970. Starvation in man. N Engl J Med. 282(12), 668-75. doi: 

https://doi.org/10.1056/NEJM197003192821209. PubMed 

19. Cho NH, et al. 2018. IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 

and projections for 2045. Diabetes Res Clin Pract. 138, 271-81. 

doi:https://doi.org/10.1016/j.diabres.2018.02.023. PubMed 

20. Alhowaish AK. 2013. Economic costs of diabetes in Saudi Arabia. J Family Community Med. 

20(1), 1-7. doi:https://doi.org/10.4103/2230-8229.108174. PubMed 

21. Alhowaish A. 2013. Economic costs of diabetes in Saudi Arabia. J Family Community Med. 

20(1), 1. doi:https://doi.org/10.4103/2230-8229.108174. PubMed 

22. El-Hazmi MA, Warsy AS. 1997. Prevalence of obesity in the Saudi population. Ann Saudi 

Med. 17(3), 302-06. doi:https://doi.org/10.5144/0256-4947.1997.302. PubMed 

23. Al-Nozha MM, et al. 2004. Diabetes mellitus in Saudi Arabia. Saudi Med J. 25(11), 1603-10. 

PubMed 

24. Al-Hamdan NA, Al-Zalabani AH, Saeed AA. 2012. Comparative study of physical activity 

of hypertensives and normotensives: A cross-sectional study of adults in Saudi Arabia. J 

Family Community Med. 19(3), 162-66. doi:https://doi.org/10.4103/2230-8229.102315. 

PubMed 

https://doi.org/10.1038/s41586-019-1797-8
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=31802013&dopt=Abstract
https://doi.org/10.1038/s41588-018-0084-1
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=29632382&dopt=Abstract
https://doi.org/10.1016/S2213-8587(19)30068-3
https://doi.org/10.1016/S2213-8587(19)30068-3
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=30852132&dopt=Abstract
https://doi.org/10.1371/journal.pone.0194127
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=29558518&dopt=Abstract
https://doi.org/10.2337/diabetes.54.3.603
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=15734833&dopt=Abstract
https://doi.org/10.1056/NEJM197003192821209
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=4915800&dopt=Abstract
https://doi.org/10.1016/j.diabres.2018.02.023
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=29496507&dopt=Abstract
https://doi.org/10.4103/2230-8229.108174
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=23723724&dopt=Abstract
https://doi.org/10.4103/2230-8229.108174
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=23723724&dopt=Abstract
https://doi.org/10.5144/0256-4947.1997.302
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=17369727&dopt=Abstract
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=15573186&dopt=Abstract
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=15573186&dopt=Abstract
https://doi.org/10.4103/2230-8229.102315
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=23230381&dopt=Abstract
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=23230381&dopt=Abstract


Improving Accuracy for Diabetes Mellitus Prediction by Using Deepnet 
 

11 

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 12(1):e11, 2020 

OJPHI 

25. World Health Organization. "diabetes country profile." Retrieved from:  

https://www.who.int/diabetes/country-profiles/sau_en.pdf 

26. Dalakleidi K, Zarkogianni K, Thanopoulou A, Nikita K. 2017. Comparative assessment of 

statistical and machine learning techniques towards estimating the risk of developing Type 2 

diabetes and cardiovascular complications. Expert Syst. https://doi.org/10.1111/exsy.12214 

27. Meng XH, Huang YX, Rao DP, Zhang Q, Liu Q. 2013. Comparison of three data mining 

models for predicting diabetes or prediabetes by risk factors. Kaohsiung J Med Sci. 29(2), 93-

99. PubMed https://doi.org/10.1016/j.kjms.2012.08.016 

28. Wang C, Li L, Wang L, Ping Z, Flory MT, et al. 2013. Evaluating the risk of Type 2 diabetes 

mellitus using artificial neural network: an effective classification approach. Diabetes Res 

Clin Pract. 100(1), 111-18. PubMed https://doi.org/10.1016/j.diabres.2013.01.023 

29. Casalboni A. BigML offers a managed platform to build and share your data-sets and models. 

Cloud Academy Blog, 26 Apr 2015 [Online]. Available:  

http://cloudacademy.com/blog/bigml-machine-learning/. Accessed 20 Feb 2020 

30. Shao C, Paynabar K, Kim TH, Jin JJ, Hu SJ, et al. 2013. Feature selection for manufacturing 

process monitoring using cross-validation. J Manuf Syst. 32(4), 550-55. 

https://doi.org/10.1016/j.jmsy.2013.05.006 

31. National Health Service. (2006) Using protocols, standards, policies and guidelines to enhance 

confidence and career development. http://www.wales.nhs.uk/sitesplus/861/opendoc/184096. 

Accessed Mar 12, 2015 

32. "Mobile technology and the digitization of healthcare | European Heart Journal | Oxford 

Academic." [Online]. Available:  

https://academic.oup.com/eurheartj/article/37/18/1428/2466287. [Accessed: 03-Mar-2020]. 

33. Agency for Healthcare Research & Quality (AHRQ) Clinical Guidelines and 

Recommendations. http://www.ahrq.gov/professionals/clinicians-providers/guidelines-

recommendations/. Accessed Mar 15, 2015 

34. Alghamdi M, Al-Mallah M, Keteyian S, Brawner C, Ehrman J, et al. 2017. Predicting diabetes 

mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse 

Testing (FIT) project. PLoS One. 12(7), e0179805. PubMed  

https://doi.org/10.1371/journal.pone.0179805 

35. Al-Mallah MH, Elshawi R, Ahmed AM, Qureshi WT, Brawner CA, et al. 2017. Using 

Machine Learning to Define the Association between Cardiorespiratory Fitness and All-Cause 

Mortality (from the Henry Ford Exercise Testing Project). Am J Cardiol. PubMed 

36. Daghistani T, Alshammari R. 2016. Diagnosis of Diabetes by Applying Data Mining 

Classification Techniques [IJACSA]. International Journal of Advanced Computer Science 

and Applications. 7(7), 329-32. https://doi.org/10.14569/IJACSA.2016.070747 

https://doi.org/10.1111/exsy.12214
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=23347811&dopt=Abstract
https://doi.org/10.1016/j.kjms.2012.08.016
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=23453177&dopt=Abstract
https://doi.org/10.1016/j.diabres.2013.01.023
https://doi.org/10.1016/j.jmsy.2013.05.006
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=28738059&dopt=Abstract
https://doi.org/10.1371/journal.pone.0179805
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=28951020&dopt=Abstract
https://doi.org/10.14569/IJACSA.2016.070747


Improving Accuracy for Diabetes Mellitus Prediction by Using Deepnet 
 

12 

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 12(1):e11, 2020 

OJPHI 

37. Selvakumar, S., Kannan, K. S., & GothaiNachiyar, S. 2017. Prediction of Diabetes Diagnosis 

Using Classification Based Data Mining Techniques. International Journal of Statistics and 

Systems. 12(2), 183-88. 

38. WEKA Software: Machine Learning Group at the University of Waikato. 

http://www.cs.waikato.ac.nz/ml/weka/ (2017) 

39. Archer KJ, Kimes RV. 2008. Empirical characterization of random forest variable importance 

measures. Comput Stat Data Anal. 52(4), 2249-60. https://doi.org/10.1016/j.csda.2007.08.015 

40. Chang C, Verhaegen PA, Duflou JR. (2014, June). A comparison of classifiers for intelligent 

machine usage prediction. In Intelligent Environments (IE), 2014 International Conference on 

(pp. 198-201). IEEE. 

41. Florkowski CM. 2008. Sensitivity, specificity, receiver-operating characteristic (ROC) curves 

and likelihood ratios: communicating the performance of diagnostic tests. Clin Biochem Rev. 

29(Suppl 1), S83. PubMed 

42. Mani S, Chen Y, Elasy T, Clayton W, Denny J. 2012. Type 2 diabetes risk forecasting from 

EMR data using machine learning [). American Medical Informatics Association.]. AMIA 

Annu Symp Proc. 2012, 606. PubMed 

43. Lemon SC, Roy J, Clark MA, Friedmann PD, Rakowski W. 2003. Classification and 

regression tree analysis in public health: methodological review and comparison with logistic 

Regression. Ann Behav Med. 26(3), 172-81. PubMed  

https://doi.org/10.1207/S15324796ABM2603_02 

44. Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, et al. 2017. Machine learning 

and data mining methods in Diabetes research. Comput Struct Biotechnol J. 15, 104-16. 

PubMed https://doi.org/10.1016/j.csbj.2016.12.005 

45. El-Hazmi MA, Warsy AS, Al-Swailem AR, Al-Swailem AM, Sulaimani R, et al. 1996. 

Diabetes mellitus and impaired glucose tolerance in Saudi Arabia. Ann Saudi Med. 16(4), 381-

85. doi:https://doi.org/10.5144/0256-4947.1996.381. PubMed 

46. “Diabetes mellitus in Saudi Arabia: The clinical pattern and complications in 1,000 patients. 

- PubMed - NCBI.” [Online]. Available: https://www.ncbi.nlm.nih.gov/pubmed/17589143. 

[Accessed: 03-Mar-2020]. 

47. N. C. for B. Information, U. S. N. L. of M. 8600 R. Pike, B. MD, and 20894 Usa, Patient 

empowerment and health care. World Health Organization, 2009. 

48. Leong TY, Kaiser K, Miksch S. 2007. Free and open-source enabling technologies for patient-

centric, guideline-based clinical decision support: a survey. Yearb Med Inform., 74-86. 

PubMed https://doi.org/10.1055/s-0038-1638529 

 
https://doi.org/10.1016/j.csda.2007.08.015
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=18852864&dopt=Abstract
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=23304333&dopt=Abstract
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=14644693&dopt=Abstract
https://doi.org/10.1207/S15324796ABM2603_02
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=28138367&dopt=Abstract
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=28138367&dopt=Abstract
https://doi.org/10.1016/j.csbj.2016.12.005
https://doi.org/10.5144/0256-4947.1996.381
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=17372456&dopt=Abstract
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=17700908&dopt=Abstract
https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=17700908&dopt=Abstract
https://doi.org/10.1055/s-0038-1638529