1 SUBMITTED 23 JUN 22 1 REVISIONS REQ. 18 SEPT & 14 NOV 22; REVISIONS RECD. 23 OCT & 24 NOV 22 2 ACCEPTED 8 DEC 22 3 ONLINE-FIRST: DECEMBER 2022 4 DOI: https://doi.org/10.18295/squmj.12.2022.069 5 6 Machine Learning Approach for Predicting Systemic Lupus Erythematosus 7 in Oman-based Cohort 8 *AlHassan AlShareedah,1 Hamza Zidoum,1 Sumaya Al-Sawafi,1 Batool Al-9 Lawati,2 Aliya Al-Ansari3 10 11 Departments of 1Computer Science and 3Biology, College of Science and 2Department of 12 Medicine, College of Medicine, Sultan Qaboos University, Muscat, Oman. 13 *Corresponding Author’s e-mail: al.hassan.satii@gmail.com 14 15 Abstract 16 Objectives: Design a machine learning-based prediction framework to predict the presence or 17 absence of Systemic Lupus Erythematosus (SLE) in a cohort of Omani patients. Methods: Records 18 of 219 patients from 2006 to 2019 were extracted from SQU Hospital electronic records, 138 19 patients have SLE, and the remaining 81 have other rheumatologic diseases. Clinical and 20 demographic features were analyzed to focus on the early stages of the disease. Our design 21 implements Recursive Feature Selection (RFE) to select only the most informative features. In 22 addition, the CatBoost classification algorithm is utilized to predict SLE and an explainer algorithm 23 (SHAP) is applied on top of the CatBoost model to provide individual prediction reasoning which is 24 then validated by rheumatologists. Results: CatBoost achieved an Area Under the ROC curve 25 (AUC) score of 0.95 and a Sensitivity of 92%. Four clinical features (Alopecia, renal disorders, 26 Acute Cutaneous Lupus, and hemolytic anemia) along with the patient’s age were shown to have 27 the greatest contribution to the prediction by the SHAP algorithm. Conclusion: We have designed 28 and validated an explainable framework to predict SLE patients and provide reasoning for its 29 prediction. Our framework enables early intervention for clinicians which leads to positive 30 healthcare outcomes. 31 2 Keywords: Systemic Lupus Erythematosus; Interpretation; Machine Learning; Supervised; 32 Clinical Decision Support System; Statistical Data; Data Analysis. 33 34 Advances in Knowledge 35  The first self-explainable prediction framework for SLE disease specific to the Omani 36 population is developed. 37  Achieved an AUC score of 0.956 and Sensitivity of 92%. 38  Identifies patterns in clinical manifestation which are unique to the Omani population. 39  The patient’s age and four clinical features (renal disorders, alopecia, cutaneous lupus, and 40 hemolytic anemia) had the highest contribution to the model’s prediction. 41  Compared to other Arab ethnicities, renal disorders frequency in Oman was the highest 42 while alopecia frequency was the lowest. 43  44 Application to Patient Care 45  The model can potentially be used as a clinical decision support system that alerts clinicians 46 to the presence of SLE which prompts further investigation until an official diagnosis is 47 made. 48  Enabling clinicians to contrast the information reported by the model with their knowledge 49 through an interpretation algorithm. Thereby increasing the probability of correct diagnosis 50 and encouraging the adoption of Machine Learning (ML) in healthcare. 51  A practical introduction of machine learning and interpretation tools to the medical 52 diagnosing process that improves early detection of SLE; a crucial factor in lowering flare 53 rate and reducing mortality. 54 55 Introduction 56 Systemic lupus erythematosus (SLE) is a chronic multisystem autoimmune disease. SLE is 57 caused by genetic and environmental factors that potentiate the creation of high-titer 58 autoantibodies aimed at native DNA and other cellular elements.1 The creation of these 59 autoantibodies leads to a pathological process that manifests into different medical conditions 60 in different organ systems, from skin arthralgia to cardiovascular and renal morbidity.2 The 61 clinical phenotype of SLE varies with race, gender, and age which makes the disease difficult 62 3 to diagnose.3 In Oman, it is estimated that the mortality rate is at 5% and the mean prevalence 63 is 38 per 100,000 individuals,4 this is higher than in Saudi Arabia and lower than in UAE. 64 Initial SLE symptoms are often nonspecific and mimic other medical conditions, increasing 65 the risks of diagnostic delay. Additionally, the heterogeneity of manifestations makes early 66 diagnosis even more difficult and subsequently delays the start of effective treatment before 67 the occurrence of organ damage. 68 69 In recent years, great improvements in treatment strategies for SLE have been made. However, 70 despite the improved prognosis, various challenges remain for the diagnosis and therapeutic 71 management of SLE.5 One of those challenges is early diagnosis. SLE onset is gradual and 72 clinically-evident manifestation develops over the years. Moreover, a variety of conditions 73 may mimic SLE conditions, including infectious and hematologic diseases.6 It has been proven 74 from database analysis that patients with a diagnosis window below 6 months (between 75 probable SLE onset and diagnosis) had low flare rates and hospitalizations compared with 76 patients with late diagnosis.7 Late diagnosis is also associated with the risk of developing 77 progressive organ damage and subsequently increases the mortality rate.8 78 79 This study focuses on effective SLE prediction as well as finding the associated clinical 80 features. With the aid of interpretation tools, clinicians can understand the decision-making 81 process of Machine Learning (ML) models. This, in turn, enables clinicians to be alerted to 82 different manifestations and symptoms at early stages and provide better healthcare outcomes. 83 The model is trained on a local cohort of 219 Omani patients with SLE as well as other control 84 diseases. Additionally, we identified the minimum set of clinical and demographic features 85 required for an accurate prediction. Finally, an explainable approach based on SHapley 86 Additive exPlanations (SHAP) method was applied to generate individual explanations of the 87 model's decisions as well as ranking clinical features by contribution. 88 89 Methods 90 The dataset used in this study was collected from structured and unstructured sources. This 91 includes the Electronic Medical Records (EMR) in Sultan Qaboos University Hospital’s 92 Rheumatology clinic named TrakCare. TrakCare stores the patients’ information, medical 93 4 state, and medical history. Patients' demographic data were obtained directly from TrakCare 94 meanwhile clinical data was unstructured as it was stored in the patient's medical history as 95 clinical notes from each visit to the hospital. Entry criteria for Rheumatology patients is a 96 positive Antinuclear Antibodies test (ANA test) while the Exclusion criteria included all non-97 Omani patients as well as patients with non-sufficient data. To separate patients with SLE and 98 control diseases, the most recent SLE classification criteria set by EULAR/ACR were used.9 99 When applied, patients with a score of 10 or above are diagnosed with SLE. A total of 219 100 patient records match the entry criteria, 138 are diagnosed with SLE, and 81 have other control 101 diseases, this was also validated by a rheumatologist on case-by-case bases. 102 103 Our framework contains three main stages, starting with feature selection that reduces noisy 104 data and utilizes only the most informative features followed by the classifier, which trains and 105 tests the model to predict the presence of SLE. After the model is trained, the explainer 106 algorithm proceeds to provide individual prediction breakdown through informative visual 107 plots. 108 109 In the first stage [Figure 1], the recursive feature elimination (RFE) algorithm with ten-fold 110 cross-validation (CV) was used. RFE works by building a model, selecting the best feature, 111 picking out the selected feature, and then repeating this process for the remaining features until 112 all the features are traversed. For the second stage of this framework, we have implemented 113 Categorical Boosting or CatBoost, an ensemble learning algorithm that is based on gradient 114 boosting.10 115 116 For the final stage, the SHAP library is implemented.11 SHAP calculates ‘Shapley values’ for 117 each feature to determine the contribution of a feature to the final prediction represented by the 118 magnitude and sign of the Shapley value. Specifically, the importance of the feature relative to 119 the prediction is represented by the magnitude of the Shapley value. SHAP tool can also 120 perform local and global interpretability simultaneously. With the help of SHAP algorithm, we 121 can break down each prediction individually. As a demonstration, we took two individuals 122 from the testing set, one that was predicted to have the disease and one that was not. Three 123 types of figures were used to show the prediction breakdown, force plot, waterfall plot, and 124 5 summary plot. The force plot demonstrates how the features contributed to the model’s 125 prediction for a specific observation. The colors in the force plot correspond to the feature 126 pushing the prediction probability higher or lower. The target in our model has two classes, 127 class 1 for a positive diagnosis of SLE and class 0 for a negative diagnosis of SLE. To obtain a 128 full list of features ranked by their contribution we use a waterfall plot. The summary plot 129 displays the feature's effects and their importance. Each point on the summary plot represents a 130 Shapley value for a feature and an instance. 131 132 To train and validate the performance of CatBoost, the dataset was divided into training and 133 testing sets. The former is used to train the model and the latter is used to test the performance 134 of the model. Additionally, a subset of the training data set was used for cross-validation to 135 protect the models from overfitting and optimize the model's parameters. Each of the models 136 undergoes a hyper-parameter optimization through grid search with five-fold cross-validation. 137 To avoid reporting biased results and limit overfitting, we calculated the measurement’s 138 average of 10 repetitions for each model. Finally, three other classifiers were evaluated 139 similarly, which are Multi-Layer Perceptron (MLP), Support Vector Machine (SVM), and 140 Random Forest. Their performance evaluations were compared to CatBoost to observe the 141 effectiveness of CatBoost. The classifiers were selected based on related studies that employed 142 ML for disease prediction.12 13 143 144 Due to the imbalanced nature of the problem, the AUC (area under ROC curve) and Sensitivity 145 parameters are used to evaluate the classification performance. 146 147 The study was approved by the Ethics Committee of the College of Medicine and Health 148 Science at Sultan Qaboos University (SQU) in protocol number MERC #1418 and #1650. No 149 participant consent is required for this study as per the regulation of Sultan Qaboos 150 University’s Hospital. 151 152 Results 153 The extracted data covers patient records from January 2006 to December 2019. Female 154 patients represent the majority of our records with 92%. Patients between 25 years old and late 155 6 30’s represent the largest age group with a mean age of 38. Al Batinah Governorate had the 156 highest number of patients (37.9%) followed by Muscat (23.7%). 157 158 Initial data contained 28 clinical, demographic, laboratory variables (so-called “features” in 159 ML), and no missing values were found in the data [Table 1]. Laboratory features include 160 immunological test results such as Anti-dsDNA Test, Anti-Smith antibody, and more. These 161 features however, are highly sensitive to SLE and can introduce bias to the prediction model 162 therefore it was dropped. The remaining data consist of 20 clinical and demographic features. 163 The majority of the features are represented by non-numerical (categorical) values. This entails 164 a transformation (encoding) to numerical values as this is a prerequisite for all statistical 165 models. Thus, Ordinal encoding was applied, moreover, because of the variance in range for 166 different features, Min-Max normalization was also applied.14 167 168 Applying the RFE feature selection algorithm, the optimal number of features selected was 13. 169 From the RFE selected features, three demographic features, as well as 10 clinical features, 170 were selected. CatBoost had an AUC score of 0.956, with the Random Forest classifier and 171 SVM scoring 0.935 and 0.916 AUC respectively. For Sensitivity, CatBoost had 92%, Random 172 Forest achieved 89% and SVM score is 86%. 173 174 Two samples from the testing set were used to generate the different SHAP plots. The first 175 sample (Patient 1) is predicted to have SLE, the force plot attributes this to renal disorders, and 176 the patient’s age [Figure 2.a]. Since the values are normalized we cross-referenced them with 177 test data and found that the patient’s age is 40 which falls within the age group SLE that is 178 most active. Additionally, the patient has been diagnosed with Lupus Nephritis a disease that 179 is commonly caused by an auto-immune disorder. On the other hand, the second sample 180 (patient 2) displays a lack of any autoimmune manifestation [Figure 2.b], a long disease 181 duration, and the age of 56 makes him outside the age group that SLE is most active. 182 183 Looking at the waterfall plot for patient 1[Figure 3.a], the feature with the highest SHAP value 184 is Renal by a large margin. Due to its high SHAP value, the presence of renal disorder in 185 patient 1 had the greatest contribution to the positive prediction of SLE. This was followed by 186 7 the age and province features. Overall, four blue features were pushing the prediction 187 probability lower toward class 0. The non-existence of alopecia, hemolytic anemia, and Acute 188 Cutaneous Lupus (ACL) in the patient 1 profile resulted in negative SHAP values. The 189 remaining features had minimal impact on the prediction probability evidenced by their low 190 SHAP values. In contrast, the waterfall plot for patient 2[Figure 3.b] indicates that age is the 191 largest contribution toward class 0, followed by the absence of any renal disorders. 192 193 In [Figure 4], similar to what was deduced, the older the patient is the less likely it is to have 194 SLE, which is evident by the red dots on the negative scale of SHAP values. The same can be 195 said for disease duration, we find that long disease durations without autoimmune 196 manifestation correlated with the absence of SLE. Our result indicates that the higher the 197 patient’s age and disease duration the less likely that SLE is the cause. Renal disorders are 198 ranked the highest in contribution followed by alopecia, ACL, and hemolytic anemia. The 199 lowest contributing features are Serositis, Proteinuria, and Leukopenia. 200 201 Discussion 202 In clinical applications, the ability to justify the prediction is equally as important as the 203 prediction score itself. This is because of the high sensitivity of the medical environment 204 where misclassification could lead to devastating results. It is therefore challenging to trust 205 complex ML models for several reasons. First, the models are often designed and rigorously 206 trained on specific diseases in a narrow environment. Second, it depends on the user’s 207 technical knowledge of statistics and ML. Third, how the data is labeled affects the results 208 produced by the model.15 For these reasons and more, Interpretable ML has thus emerged as 209 an area of research that aims to design transparent and explainable models by developing 210 means to transform a black-box ML model into white-box ML models. By providing 211 transparent prediction, domain experts can accurately interpret the results meaningfully. 212 213 Through the use of SHAP algorithm, clinicians can understand the model’s reasoning, thus 214 resembling clinical reasoning. Our model is situated between early to mid-screening 215 suggesting that physicians have minimum visible clinical symptoms and subsequently no 216 immunological test data.16 The model can reasonably make predictions that can alert clinicians 217 8 to investigate the presence of SLE by requesting immunological tests once suspicion of SLE is 218 predicted. Specifically, the ANA test and the anti-double stranded DNA (anti-dsDNA) are 219 highly sensitive and decisive if found positive.17 Additionally, an immunologist has compared 220 multiple individual prediction breakdown plots and validated the results and the model 221 justification. 222 223 One of the features that were used to profile patients are age, age-onset, and disease duration 224 features. It was deduced from the SHAP algorithm that older patients were the least affected 225 by the disease. Similarly, patients with long disease duration without adverse manifestations 226 such as anemia or lupus nephritis are shown statistically to be less likely diagnosed with SLE. 227 Experts point out, however, that SLE intensity increases and decreases at intervals differently 228 from patient to patient, thus in rare occasions clinical symptoms might not manifest until the 229 late phases of the disease.18 Research suggests that late-onset SLE occurs at a rate of 3-18% in 230 the exposed population.19 231 232 Renal disorders were the highest feature in contribution according to SHAP [Figure 4]. This 233 was in concordance with Beckwith and Lightstone (2014) who states that about 40-70% of 234 SLE patients develop clinically diagnosed renal involvement which is known as Lupus 235 Nephritis.20 Lupus Nephritis (LN) is commonly diagnosed through kidney biopsy, previous 236 research identified proteinuria, urine protein-to-creatinine ratio, anti-dsDNA, and complement 237 levels as laboratory markers of LN. However, these LN laboratory markers lack specificity 238 and sensitivity for identifying renal activity and damage.21 In Oman, LN is the most frequent 239 glomerular disease occurring in about 30%-36% of all patients who had a renal biopsy. This is 240 supported by Al Adhoubi (2020),4 where 52% of SLE patients have developed LN. Despite the 241 majority of our data lacking kidney biopsy information, LN is also present in 11% of patients 242 with renal disorders. 243 244 Moreover, we found other clinical features that had about the same influence on the prediction. 245 These are Alopecia, Cutaneous Lupus, and Anemia. Alopecia is a hair loss that also varies in 246 damage activity from non-scarring to scarring. Currently, it is estimated that more than half of 247 SLE patients develop alopecia, although most of the research that estimates alopecia 248 9 prevalence is limited by the small population size. Cutaneous Lupus which includes a butterfly 249 rash across the face between the eyes and nose. ACL is a sign of VGLL-3 & Anti-SSA 250 antibodies which indicate skin damage activity caused by Lupus.22 Anemia is the most 251 common blood disorder, affecting about half of all people with active lupus.23 Anemia is 252 caused by a shortage of healthy red blood cells needed by the body to carry oxygen to the 253 body's tissues. Hemolytic anemia, however, is not exclusive to SLE. 254 255 The prevalence of these influential clinical features across other Arab ethnicities was also 256 investigated. While no study examined the differences between ethnicities within the Arab 257 region, there have been few studies that have collected data on the SLE population locally. We 258 looked at three cohorts from Saudi Arabia,24 UAE,25 and Egypt [Figure 5].26 ACL or skin rash 259 was found more prevalent in all other Arab cohorts reaching as high as 62% in UAE. 260 Hemolytic anemia was the most varying feature, in Egypt and UAE, it is less prevalent than in 261 Oman while in Saudi Arabia it is more prevalent than in Oman.27 Renal disorders remained 262 high at around 50% of all cohorts having some renal damage except for a slight decrease to 263 33% in Egypt. Studies also indicate that out of all renal biopsies, approximately 10%–36% are 264 diagnosed with LN in the Gulf region. LN also tends to run a severe course in gulf populations 265 with a high incidence of Class IV LN.28 266 267 Overall, with three critical features out of four found more prevalent in other Arab ethnicities, 268 our model can be extended to include not only Omanis but also other Arab cohorts. It is 269 important to note that all of these clinical features are not exclusive to SLE, but to autoimmune 270 diseases in general. However, classification models can be trained to detect patterns specific to 271 the Omani population, these patterns are the bases of the model’s prediction for SLE presence. 272 273 These findings help to identify patterns in clinical manifestation which are unique to the 274 Omani population and the Arab region by employing explainable prediction. Moreover, our 275 research also highlights CatBoost algorithm, which had widespread attention in recent years 276 for its fast calculation speed, powerful generalization ability, and strong predictive 277 performance.29 30 31 We achieved a margin of improvement of 0.21 AUC over the other 278 classifiers, this may be attributed to its novel implementation of ordered boosting, and 279 10 permutation-driven alternative to the classic algorithm. This study also acknowledges the 280 problem with imbalanced classification evaluation where the research is biased toward the 281 performance of cases that are poorly represented in the data samples.32 Standard evaluation 282 criteria tend to focus the evaluation of the models on the most frequent cases, thus if applied, 283 could lead to sub-optimal classification models. Thus, AUC and Sensitivity were selected as 284 evaluation criteria. 285 286 Finally, by combining the framework’s prediction with the interpretation algorithm we are 287 promoting self-explainable frameworks that enable physicians to make meaningful decisions 288 based on ML-based information combined with their knowledge. Thereby improving the 289 probability of correct diagnosis and encouraging the adoption of ML in healthcare. These goals 290 however are hindered by the retrospective nature of the data. An ideal framework is much 291 more effective with longitudinal data of SLE patients that include pre-diagnosis profiles before 292 the appearance of adverse symptoms. Moreover, our framework may not scale properly with 293 large datasets. Specifically, large data will significantly increase the computational time for 294 SHAP, and categorical data with high cardinality is inefficient with the Ordinal encoder 295 algorithm.33 Different tools can also be applied to increase the accessibility and presentation of 296 our model such as presenting the outcome as a prediction probability instead of a binary value. 297 298 Conclusion 299 This study proposes a three-stage interpretable framework for predicting the presence or 300 absence of SLE in an Omani cohort of 219 patients. CatBoost classifier and SHAP 301 interpretation tool were implemented to predict and justify individual predictions and eliminate 302 any risk of misclassification. Four clinical features were identified to have the highest 303 influence on the prediction in addition to the patient’s age. Alopecia, Renal, Acute cutaneous 304 lupus, and Hemolytic Anemia are all indicators of lupus activity at varying rates, combined 305 with the patient’s age and age-onset the model was able to establish a profile of the disease 306 relative to Omanis. Overall, our findings aid in providing a practical introduction of machine 307 learning and interpretation tools to medical diagnosis, thereby increasing the efficiency of 308 medical testing and subsequently enabling early intervention which leads to better treatment 309 and a positive healthcare outcome. 310 11 311 Authors’ Contribution 312 HZ and AA conceived the idea. H.Z. and A.A. designed the study. SA collected the data. BA 313 and AA-A have validated the data and results. Research experiments, implementation, and 314 results were performed by AA with input from HA. AA drafted the manuscript with edits from 315 HZ, AA-A and BA. All authors approved the final version of the manuscript. 316 317 Acknowledgment 318 We would like to thank the Oman Research Council for supporting this research. We also 319 gratefully acknowledge the support from the Sultan Qaboos University Hospital for their 320 cooperation. 321 322 Conflict of Interest 323 The authors declare no conflicts of interest. 324 325 Funding 326 This work was supported in part by an internal Grant. 327 328 References 329 1. Gaipl U, Munoz L, Grossmayer G, Lauber K, Franz S, Sarter K et al. Clearance 330 deficiency and systemic lupus erythematosus (SLE). J Autoimmun 2007;28(2-3):114-331 121. doi: 10.1016/j.jaut.2007.02.005 332 2. Nisengard R. Diagnosis of systemic lupus erythematosus. Importance of antinuclear 333 antibody titers and peripheral staining patterns. Arch Dermatol 1975;111(10):1298-1300. 334 doi: 10.1001/archderm.111.10.1298 335 3. Lewis M, Jawad A. The effect of ethnicity and genetic ancestry on the epidemiology, 336 clinical features and outcome of systemic lupus erythematosus. Rheumatology (Oxford) 337 2016; kew399. doi: 10.1093/rheumatology/kew399 338 4. Al‐Adhoubi N, Al‐Balushi F, Al Salmi I, Ali M, Al Lawati T, Al Lawati B et al. A 339 multicenter longitudinal study of the prevalence and mortality rate of systemic lupus 340 erythematosus patients in Oman: Oman Lupus Study. Int J Rheum Dis 2021;24(6):847-341 854. doi: 10.1111/1756-185 X.14130 342 12 5. Felten R, Lipsker D, Sibilia J, Chasset F, Arnaud L. The history of lupus throughout the 343 ages. J Am Acad Dermatol 2020; doi: 10.1016/j.jaad.2020.04.150 344 6. Piga M, Arnaud L. The Main Challenges in Systemic Lupus Erythematosus: Where Do 345 We Stand?. J Clin Med 2021;10(2):243. doi: 10.3390/jcm10020243 346 7. Oglesby A, Korves C, Laliberté F, Dennis G, Rao S, Suthoff E et al. Impact of Early 347 Versus Late Systemic Lupus Erythematosus Diagnosis on Clinical and Economic 348 Outcomes. Appl Health Econ Health Policy 2014;12(2):179-190. doi: 10.1007/s40258-349 014-0085-x 350 8. Murimi-Worstell I, Lin D, Nab H, Kan H, Onasanya O, Tierce J et al. Association 351 between organ damage and mortality in systemic lupus erythematosus: a systematic 352 review and meta-analysis. BMJ Open 2020;10(5):e031850. doi: 10.1136/bmjopen-2019-353 031850 354 9. Aringer M, Brinks R, Dörner T, Daikh D, Mosca M, Ramsey-Goldman R et al. European 355 League Against Rheumatism (EULAR)/American College of Rheumatology (ACR) SLE 356 classification criteria item performance. Ann Rheum Dis 2021;80(6):775-781. doi: 357 10.1136/annrheumdis-2020-219373 358 10. Dorogush A, Ershov V, Gulin A. CatBoost: gradient boosting with categorical features 359 support. arXiv [Pre-print] 2018. doi: https://doi.org/10.48550/arXiv.1810.11363 360 11. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural 361 Inf Process Syst 2017;30. doi: https://dl.acm.org/doi/10.5555/3295222.3295230 362 12. Adamichou C, Genitsaridi I, Nikolopoulos D, Nikoloudaki M, Repa A, Bortoluzzi A et 363 al. Lupus or not? SLE Risk Probability Index (SLERPI): a simple, clinician-friendly 364 machine learning-based model to assist the diagnosis of systemic lupus erythematosus. 365 Ann Rheum Dis 2021; 80(6): 758-766. https://doi.org/10.1136/annrheumdis-2020-366 219069 367 13. Nalband S, Sundar A, Prince A, Agarwal A. Feature selection and classification 368 methodology for the detection of knee-joint disorders. Comput Methods Programs 369 Biomed 2016;127:94-104. https://doi.org/10.1016/j.cmpb.2016.01.020 370 14. Patro SGK, Sahu KK. Normalization: A Preprocessing Stage. arXiv [Pre-print] 2015. 371 DOI: 10.48550/ARXIV.1503.06462 372 https://doi.org/10.48550/arXiv.1810.11363 https://dl.acm.org/doi/10.5555/3295222.3295230 https://doi.org/10.1016/j.cmpb.2016.01.020 13 15. Stiglic G, Kocbek P, Fijacko N, Zitnik M, Verbert K, Cilar L. Interpretability of machine 373 learning‐based prediction models in healthcare. Wiley Interdiscip Rev Data Min Knowl 374 Discov 2020;10(5). doi: https://doi.org/10.1002/widm.1379 375 16. Binder A, Ellis S. When to order an antinuclear antibody test. BMJ 2013; 347 :f5060 376 doi:10.1136/bmj.f5060 377 17. Wichainun R. Sensitivity and specificity of ANA and anti-dsDNA in the diagnosis of 378 systemic lupus erythematosus: A comparison using control sera obtained from healthy 379 individuals and patients with multiple medical problems. Asian Pac J Allergy Immunol 380 Immunology 2013;31(4). doi: 10.12932/AP0272.31.4.2013 381 18. Arnaud L, Mathian A, Boddaert J, Amoura Z. Late-Onset Systemic Lupus 382 Erythematosus. Drugs Aging 2012;29(3):181-189. doi: 10.3899/jrheum.080957 383 19. LALANI S, POPE J, de LEON F, PESCHKEN C. Clinical Features and Prognosis of 384 Late-onset Systemic Lupus Erythematosus: Results from the 1000 Faces of Lupus Study. 385 J Rheumatol 2009;37(1):38-44. doi: 10.3899/jrheum.080957 386 20. Beckwith H, Lightstone L. Rituximab in Systemic Lupus Erythematosus and Lupus 387 Nephritis. Nephron Clin Pract 2014;128(3-4):250-254. doi: 10.1159/000368585 388 21. Mahajan A, Amelio J, Gairy K, Kaur G, Levy R, Roth D, et al. Systemic lupus 389 erythematosus, lupus nephritis and end-stage renal disease: a pragmatic review mapping 390 disease severity and progression. Lupus 2020;29(9):1011-1020. doi: 391 10.1177/0961203320932219 392 22. Yu H, Nagafuchi Y, Fujio K. Clinical and Immunological Biomarkers for Systemic 393 Lupus Erythematosus. Biomolecules 2021;11(7):928. doi: 10.3390/biom11070928 394 23. Giannouli S. Anaemia in systemic lupus erythematosus: from pathophysiology to clinical 395 assessment. Ann Rheum Dis 2006;65(2):144-148. doi: 10.1136/ard.2005.041673 396 24. Al Arfaj A, Khalil N. Clinical and immunological manifestations in 624 SLE patients in 397 Saudi Arabia. Lupus (2009); 18(5):465–473. https://doi.org/10.1177/0961203308100660 398 25. AlSaleh J, Jassim V, ElSayed M, Saleh N, Harb D. Clinical and immunological 399 manifestations in 151 SLE patients living in Dubai. Lupus 2008; 17(1):62-66. 400 doi:10.1177/0961203307084297 401 26. El Hadidi K, Medhat B, Abdel Baki N, Abdel Kafy H, Abdelrahaman W, Yousri A, et al. 402 Characteristics of systemic lupus erythematosus in a sample of the Egyptian population: a 403 14 retrospective cohort of 1109 patients from a single center. Lupus. 2018;27(6):1030-1038. 404 doi:10.1177/0961203317751856 405 27. Al Rasbi A, Abdalla E, Sultan R, Abdullah N, Al Kaabi J, Al-Zakwani I, et al. Spectrum 406 of systemic lupus erythematosus in Oman: from childhood to adulthood. Rheumatology 407 international 2018; 38(9):1691–1698. https://doi.org/10.1007/s00296-018-4032-2 408 28. Al-Shujairi A, Elbadawi F, Al-Saleh J, Hamouda M, Vasylyev A, Khamashta M. 409 Literature review of lupus nephritis From the Arabian Gulf region. Lupus 2022; 0(0). 410 doi:10.1177/09612033221137248 411 29. Hancock J, Khoshgoftaar T. CatBoost for big data: an interdisciplinary review. J Big 412 Data 2020;7(1). doi: 10.1186/s40537-020-00369-8 413 30. Prokhorenkova L, Gusev G, Vorobev A, Dorogush A, Gulin A. CatBoost: unbiased 414 boosting with categorical features. arXiv [Pre-print] 2017 Version 5 doi: 415 https://doi.org/10.48550/arXiv.1706.09516 416 31. Anghel A, Papandreou N, Parnell T, De Palma A, Pozidis H. Benchmarking and 417 optimization of gradient boosting decision tree algorithms. arXiv preprint 2018. Version 418 3 doi: https://doi.org/10.48550/arXiv.1809.04559 419 32. Branco P, Torgo L, Ribeiro R. A Survey of Predictive Modeling on Imbalanced 420 Domains. ACM Comput Surv 2016;49(2):1-50. doi: 421 https://dl.acm.org/doi/10.1145/2907070 422 33. Pargent, F., Pfisterer, F., Thomas, J., Bischl, B. Regularized target encoding outperforms 423 traditional methods in supervised machine learning with high cardinality features. 424 Comput Stat 2022. https://doi.org/10.1007/s00180-022-01207-6 425 426 Table 1: Dataset’s features and its occurrence 427 Feature Name Categories Occurrence in SLE Population (No. %, N=138) Occurrence in Control Population (No. %, N=81) Fever Yes No 41 (29.7%) 97 (70.2%) 7 (8.6%) 74 (91.3%) Acute cutaneous lupus (ACL) Yes (Rash) No 63 (45.6%) 75 (54.3%) 7 (8.6%) 74 (91.3%) Chronic cutaneous lupus Yes No 5 (3.6%) 133 (96.3%) 0 81 (100%) Oral ulcers Yes 29 (20%) 0 https://doi.org/10.1007/s00296-018-4032-2 https://doi.org/10.48550/arXiv.1809.04559 https://dl.acm.org/doi/10.1145/2907070 https://doi.org/10.1007/s00180-022-01207-6 15 No 109 (79%) 81 (100%) Alopecia Yes No 57 (41.3%) 81 (58.7%) 4 (4.9%) 77 (95%) Joint Involvement Yes No 121 (87.7%) 17 (12.3%) 0 81 (100%) Serositis Yes No 9 (6.5%) 129 (93.5%) 0 81 (100%) Renal disorders Yes No 62 (44.9%) 76 (55%) 0 81 (100%) Lupus Nephritis class None (No Kidney biopsy) Class II Class III Class IV Class V 35 (25.3 %) 1 (0.4%) 4 (1.8%) 16 (7.3%) 5 (2 %) 0 0 0 0 0 Proteinuria Yes No 51 (37%) 87 (63%) 0 81 (100%) vasculitis Yes No 12 (8.7%) 126 (91.3%) 0 81 (100%) Neurologic Disorder None Psychosis Seizure 121 (87.7%) 5 (3.6 %) 12 (8.7%) 81 (100%) 0 0 Hemolytic Anemia Yes No 47 (34%) 91 (66%) 6 (7.4%) 75 (92.6%) Leukopenia Yes No 18 (13%) 120 (86.9%) 1 (1.2%) 80 (98.7%) Thrombocytopenia Yes No 11 (8%) 127 (92%) 0 81 (100%) Anti-dsDNA Positive Negative 102 (73.9%) 36 (26%) 2 (2.4%) 79 (97.5%) Anti-Smith (Sm) antibody Positive Negative 17 (12.3%) 121 (87.7%) 0 81 (100%) Antiphospholipid Antibodies Positive Negative 46 (33.3%) 92 (66.6%) 2 (2.5%) 79 (97.5%) C3 Complement Positive Negative 95 (68.8%) 43 (31.1%) 2 (2.5%) 79 (97.5%) C4 Complement Positive Negative 95 (68.8%) 43 (31.1%) 2 (2.5%) 79 (97.5%) Rheumatoid factor Positive Negative 18 (13%) 120 (86.9%) 0 81 (100%) Gender Male 5 (3.6%) 12 (14.8) Female 133 (96.4%) 69 (85.2%) 20 years or less 16 (11.6%) 1 (1.2%) 21 – 25 year 15 (10.8%) 5 (6.2%) 26 – 30 year 25 (18.1%) 9 (11.1%) 16 Age Period 31 – 35 year 29 (21%) 11 (13.6%) 36 – 40 year 16 (11.6%) 8 (9.9%) 41 – 45 year 23 (16.6%) 7 (8.6%) 46 – 50 year 5 (3.6%) 7 (8.6%) more than 50 year 6 (4.3%) 33 (40.7%) 428 429 430 Figure 1: Flowchart of the three-stage interpretable framework. 431 432 17 433 Figure 2: force plot of CatBoost model prediction (values are normalized). f(x) is the 434 predicted probability. The arrows in each plot show the direction of influence each predictor 435 has over the payout i.e. the prediction. The colors are used to indicate the influence of the 436 predictors, whether it increases (red) or reduces (blue) the probability of having SLE. 437 438 439 Figure 3: waterfall plot of CatBoost model. The waterfall plot displays SHAP values 440 representing feature contribution toward a positive prediction. It reflects the magnitude of 441 influence each predictor had. The colors represent negative SHAP values for Blue, and 442 positive SHAP values for Red. 443 444 18 445 Figure 4: Summary plot of CatBoost model. The summary plot combines feature importance 446 with feature effects. Each point on the summary plot is a Shapley value for a feature and an 447 instance. The position on the y-axis is determined by the feature’s importance and on the x-448 axis by the Shapley value. The summary plot is similar to the waterfall plot in ranking the 449 contribution of all features based on SHAP values. The major difference between the two is 450 that it is applied to the entire testing set rather than one single data observation. Each dot 451 represents an observation from the testing set, and the color of the dot reflects the value of the 452 associated feature. For example, in the feature ‘AGE’ red dots correspond to patients with high 453 value i.e. old patients and Blue corresponds to young patients. 454 455 19 456 Figure 5: The frequency of the most influential features as shown by SHAP in cohorts across 457 the Arab region. 458