J. Nig. Soc. Phys. Sci. 5 (2023) 1137 Journal of the Nigerian Society of Physical Sciences Robust M-Estimators and Machine Learning Algorithms for Improving the Predictive Accuracy of Seaweed Contaminated Big Data O. J. Ibidojaa,b, F. P. Shanb, Mukhtarc, J. Sulaimand, M. K. M. Alib,∗ a Department of Mathematics, Federal University Gusau, Gusau, Nigeria b School of Mathematical Sciences, Universiti Sains Malaysia 11800 USM, Penang, Malaysia cI-CEFORY (Local Food Innovation), Universitas Sultan Ageng Tirtayasa Indonesia dSchool of Science and Technology, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia Abstract A common problem in regression analysis using ordinary least squares (OLS) is the effect of outliers or contaminated data on the estimates of the parameters. A robust method that is not sensitive to outliers and can handle contaminated data is needed. In this study, the objective is to determine the significant parameters that determine the moisture content of the seaweed after drying and develop a hybrid model to reduce the outliers. The data were collected with sensors from the v-Groove Hybrid Solar Drier (v-GHSD) at Semporna, South-Eastern Coast of Sabah, Malaysia. After the second order interaction, we have 435 drying parameters, each parameter has 1914 observations. First, we used four machine learning algorithms, such as random forest, support vector machine, bagging and boosting to determine the significant parameters by selecting 15, 25, 35 and 45 parameters. Second, we developed the hybrid model using robust methods such as M. Bi-Square, M. Hampel and M. Huber. The results show that there is a significant improvement in the reduction of the number of outliers and better prediction using hybrid model for the contaminated seaweed big data. For the highest variable importance of 45 significant drying parameters of seaweed, the hybrid model bagging M Bi-square performs better because it has the lowest percentage of outliers of 4.08 %. DOI:10.46481/jnsps.2023.1137 Keywords: Robust method, Hybrid model, Machine learning, Outliers, Big data. Article History : Received: 22 October 2022 Received in revised form: 08 January 2023 Accepted for publication: 08 January 2023 Published: 04 February 2023 c© 2023 The Author(s). Published by the Nigerian Society of Physical Sciences under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0). Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI. Communicated by: Tolulope Latunde 1. Introduction The purpose of regression analysis is to study the relation- ship between two or more independent variables and a depen- dent variable. Consider a multiple regression model: y = Xβ + ε, (1) ∗Corresponding author tel. no: +60 14-9543405 Email address: majidkhanmajaharali@usm.my (M. K. M. Ali ) where y is an n × 1 vector of response variables, X is known as the design matrix of order n × p, β is a p × 1 vector of unknown parameters and ε is an n × 1 vector of identically and independent distributed errors. The Ordinary Least Squares (OLS) is popularly used to es- timate the unknown parameters in a regression model. Accord- ing to [1, 2], the ordinary least squares (OLS) estimator of β is 1 O. J. Ibidoja et al. / J. Nig. Soc. Phys. Sci. 5 (2023) 1137 2 obtained as: β̂ = ( X ′ X )−1 X ′ y. (2) Observations that deviate from the distribution’s general shape or pattern are called outliers [3]. The relationship between the observed and the dependent variable can be estimated by OLS regression, by minimizing the sum of squares [4]. OLS also has limitations when the assumptions are violated [5]. Estimates from OLS are not precise due to the high variances and covariances [6]. The presence of outliers in the data makes the LS estimator unstable, inefficient, and unreliable [7]. Agricultural data has outliers because of factors that cannot be regulated, and these outliers will increase the standard errors [4, 8]. The presence of outliers affects the performance of OLS, and a robust regression is used [9]. When modelling data using regression analysis, various assumptions are tested but these assumptions are violated. This model needs to be tested on the error structure for the necessary assumptions before prediction [10]. The researcher can transform the variables to fulfil the assumptions, but this cannot eradicate the outliers in the data that affect the forecast and estimate of the parameters [11]. Data with outliers is common in the field of agriculture [11, 12]. To overcome this problem, robust estimators have been in- troduced. M-estimation is the most common method of robust regression, it was introduced by [13], it is a generality to the method of maximum likelihood estimation. Before we used the robust methods to reduce the outliers, four machine learning al- gorithms such as random forest, support vector machine, boost- ing and bagging are used to select the significant parameters that determine the moisture content of the seaweed. The major contributions of this study are: i. To determine the significant parameters for the moisture content removal of seaweed during drying and reduce the number of outliers. ii. To propose a hybrid model that combines robust M- estimators and machine learning models to improve the prediction accuracy. 2. Flowchart of the study Figure 1 shows the flowchart of the various stages in the study. 2.1. Stage I This involves the inclusion of all possible models. n! (n − r)!r! + number of single factor, (3) where n is the number of single factors, r is the number of or- ders. Equation (3) can be used to compute the total number of all possible models. Figure 1: Flowchart of the procedure for the hybrid model 2.2. Stage II Test for the assumptions of linear regression. The residual vs fitted plot, normal Q-Q plot are Kolmogorov-Smirnov test are used to verify the assumptions. Next, each machine learning model is used to select 15, 25, 35 and 45 highest important variables for optimization and easy comparison, to determine the moisture content removal of the seaweed after drying. We selected the number of variables because features selection can only provide the rank of important variables and does not tell us the number of significant factors [14]. Similarly, there is no rule to decide the number of parameters to be included in a prediction model [15]. Furthermore, the algorithms cannot tell us the number of significant variables except the ranks [16]. 2.3. Stage III After the selection of the significant parameters, the predic- tion is done and the validation metrics such as MAPE, SSE, MSE and R-square are computed. The outliers are also com- puted, and the robust method is introduced to build the hybrid model. 3. Materials and Methods 3.1. Data Description The data were collected from 8th April 2017 to 12th April 2017, between the hours of 8:00 am to 5:00 pm during the drying of seaweed by using v-Groove Hybrid Solar Drier (v- GHSD) at Semporna, South-Eastern Coast of Sabah, Malaysia. There are 435 parameters after the inclusion of the second order interaction in this study. 3.2. Machine learning algorithms Machine learning can learn from data and use the algo- rithms to understand and forecast the future [17]. Machine learning algorithms can be used to determine the rank of signif- icant explanatory variables that contribute significantly to the response variable. These high-ranking variables selected using variable importance can reduce the training time, complexity 2 O. J. Ibidoja et al. / J. Nig. Soc. Phys. Sci. 5 (2023) 1137 3 of the model and improve accuracy [18]. Four machine learn- ing algorithms such as random forest, support vector machine, bagging and boosting are used in this study, to determine the significant parameters that determine the moisture content re- moval of the seaweed. 3.2.1. Random Forest A random forest (RF) is a mixture of classification and re- gression trees (CARTs). It uses the highest number of votes (classification) or the mean forecasts (regression) of all the trees [19]. It uses the idea of bagging, and it is an ensemble learning method [20], [21]. If L is a learning set ,with a group of N pairs of features, with the output (x1, y1) , (x2, y2) , (x3, y3) . . . (xN, yN ) , if xi ∈ X and yi ∈ Y . A class of p-features xi ( f or i = 1, 2, . . . , N) is a N × p matrix X,where the rows i = 1, 2, ..., N relates as xi, with columns j = 1, 2, 3, ..., p as x j. Algorithm: For b = 1 to n 1. Create a bootstrapped sample D∗b from the training set D. 2. Grow the tree by using the m from the bootstrapped sam- ple D∗b. For a specific mode i. Select m variables randomly. ii. Identify the top split variables and values. iii. Divide a node using the top divided variables and values. Replicate the steps 1–3 till the stopping conditions are satisfied. 3.2.2. Suppor Vector Machine (SVM) Support vector machine can be used for regression and clas- sification problems [22]. SVM has the capacity to reveal non- linear connections with kernel function [20, 23]. The SVM was developed by Cortes & Vapnik [24]. A good tutorial and ex- planations were given by [25, 26]. In support vector regression, the � loss function is usually minimized. Beyond this particular bound, a straightforward linear loss function is applied, and any loss less than � is set to zero: L� = f (x) = { 0, if |yi − f (xi) | < � yi − f (xi) − �|, otherwise (4) For instance, suppose f (x) is a linear function f (x) = β0 + xtiβ, then the loss function is given as n∑ i=1 max ( yi − x t iβ−β0 − �, 0 ) (5) The � is the tuning parameter and can be written as the con- strained optimization problem: Minimize 1 2 ‖β‖2 (6) Subject to{ yi − xtiβ−β0 ≤ ε − ( yi − xtiβ−β0 ) ≤ ε . (7) If there are observations who do not lie within the ε band around that regression line,then there is no solution to the prob- lem. The slack variables ζi and ζ∗i are used ,this allows the observations to fall outside the ε band around that regression line. Minimize 1 2 ‖β‖2 + K n∑ i=1 ( ζi + ζ ∗ i ) (8) Subject to yi − xtiβ−β0 ≤ ε + ζi − ( yi − xtiβ−β0 ) ≤ ε + ζ∗i ζi,ζ ∗ i ≥ 0 (9) 3.2.3. Boosting Boosting is used to improve the accuracy of algorithms [27]. Boosting starts with an algorithm or method to discover the rough rules of thumb. It is called the “base” or “weak” learning algorithm many times. The base learning algorithm creates a new weak prediction rule each time it is called, and after many rounds, the boosting algorithm must merge these weak rules into a singular forecast rule that, ideally, will be significantly more precise than any of the weak rules [28]. Sup- pose we have this model matrix X = [ X1, X2, . . . , Xp ] �Rn×p, outcomes variable vector y ∈ Rn×1. The regression coefficients vector is given as β ∈ Rp, the value of predicted for the outcome variable is denoted by Xβ, and the residuals are denoted by ε = y − Xβ. For regression purposes, least squares boosting (LSB(ε)) gives an accurate description of the data and regularization [27]. The algorithm for LSB(ε) is as follows: Algorithm: LSB (ε) Choose the rate of learning ε > 0 and iterations number N. Define at β̂0 = 0, r̂0 = y, k = 0. 1. Do this for 0 ≤ k ≤ N 2. Establish the covariates ũ jk and jk as below: ûn = argmin u∈R  n∑ i=1 ( r̂ki − xinu )22 for n = 1, 2, 3, . . . , p, jk ∈ argmin 1≤n≤p n∑ i=1 ( r̂ki − xinũn )2 3. Revise the present errors and regression coefficients as: r̂k+1 ← r̂k − ε̃u jk β̂k+1jk ← β̂ k jk + ε̃u jk and β̂ k+1 j ← β̂ k j , j , jk 3.2.4. Bagging Breiman [29] introduced bagging (bootstrap aggregating) to decrease the variance of classification and regression tree mod- els. It is used to improve the present method and leads to an improvement in the accuracy. Bagging is used as an intensive methods to enhance erratic estimation. For a high - dimensional 3 O. J. Ibidoja et al. / J. Nig. Soc. Phys. Sci. 5 (2023) 1137 4 data problems, bagging can be used to find a good model. Sup- pose we have a feature ϕ (x,L) to predict y from x, if there is a training sequence {Lk} consisting of N objects , from L distribution, the aim here is to use the {Lk} to build a more ac- curate predictor than ϕ (x,L) as a specific training set predictor ϕ (x,L) [29]. If y is not discreet and we put ϕ (x,Lk) with the mean of ϕ (x,Lk) over k. We get continually many samples via the bootstrap { L(A) } , an from L, and form { ϕ ( x,L(A) )} . If y is continuous, then ϕA as ϕA (x) = averageϕA ( x,L(A) ) . The { L(A) } will form replicate datasets with M cases are randomly chosen from L and by applying replacement. Each (ym, xm) can appear many times in a any specific L(A). The technique to construct ϕ is an important factor to know if bagging improves precision or reliability. Theoretically bagging is described as follows: i. Build a bootstrap sample L∗i =( Y∗i , X ∗ i ) (i = 1, 2, 3, . . . , m) centred on an empirical dis- tribution of these pairs Li = (Yi, Xi) (i = 1, 2, 3, . . . , m). ii. Use the plug-in principle to ascertain the boot- strapped predictor θ̂∗m (x); which is, θ̂ ∗ m (x) = gm ( L1, L2, L3, . . . , Lm ) (x). iii. θ̂m;B (x) = E∗ [ θ̂∗m (x) ] means the bagged predictor. The bagging algorithm is as follows: Input: Data D = {(x1, y1) , (x2, y2) , (x3, y3) , . . . , (xm, ym)} ; Learning algorithm base L; Base learner’s numbers j. Process: For j = 1, 2, . . . , J: bs j = bootstrap(D); %Create the bootstrap sample from D θ j = L ( bs j ) % Train the base learner θ j from the bootstrap sample End Output: 1J ∑J j=1 θ j (x) % For regression studies 3.3. Robust Estimation Method Outliers are common with contaminated data and how to determine the observations is a challenge. A robust method can deal with the influence of outliers. Contaminated data can be analyzed using robust estimation [6], [30, 31, 32]. A robust method is used to solve the problems of traditional methods because of these outliers. To know the best method for the robust estimation methods, M estimation methods M Huber, M Hampel and M Bi-Square are compared. The M-estimation method attempts to minimise that the function ρ (•) operates on the residual. M-estimators define: β̂M = argmin β n∑ i=1 ρ (ei (β)). (10) The ρ is ρ−type M-estimation. Assume σ is known and the residuals approximate β be ei = yi −βT xi. The β in M-estimate minimizes the objective function: n∑ i=1 ρ { ei (β) σ } . (11) Figure 2: (a) Residuals vs Fitted (b) Residuals vs Normal Q-Q The σ robustly estimate and the scale σ̃M in M-estimator has solution: 1 n n∑ i=1 ρ ( ei σ ) = 1 n n∑ i=1 ρ ( yi −βT xi σ ) = k, (12) where the β has the p×1 parameter vector, and then the function ψ yields:∑ i ψ (ei) ∂ei ∂βi , for j = 1, 2, . . . , p. (13) The function ψ (e) = ∂ρ(e) ∂(e) derivatives the influence function. 4 O. J. Ibidoja et al. / J. Nig. Soc. Phys. Sci. 5 (2023) 1137 5 Table 1: Robust Regression M-estimation Description Methods Objective Function Weight Function Bi-Square  k2 6 { 1 − [ 1 − ( e k )2]3} for |e| ≤ k k2 6 for |e| > k  [ 1 − ( e k )2]2 for |e| ≤ k 0 f or |e| > k Huber { 1 2 e 2 for |e| ≤ k k |e| − 12 k 2 for |e| > k { 1 f or |e| ≤ k k |e| for |e| < k Hampel  e2 2 , 0 < |e| < a a |e| − e 2 2 , b < |e| ≤ c −a 2(c−b) (c − e) 2 + a2 (b + c − a) , b < |e| ≤ c  1 f or 0 < |e| < a a |e| for b < |e| ≤ c a c |e|−1 c−b for b < |e| ≤ c Table 2: Kolmogorov-Smirnov Test for Normality Test Statistic Value P-value Remarks 0.1641 2.2e-16 The residuals do not come from a normal distribution. Then the weight function defines: w (e) = ψ (e) e , (14) where function ψ (e) states:∑ i w (ei) ei ∂ei ∂βi = 0, for j = 1, 2, . . . , p (15) and the object becomes to obtain the following iterated re- weighted least square problem: min ∑ i w ( e(k−1)i ) e2i , (16) where k indicates the iterate number. Table 1 shows the summary of the M - estimators and their respective weight function. 4. Results and Discussion From the plot in Figure 2a, the residuals vs fitted plot shows that there is no pattern since the residuals did not spread out. There is evidence of non-linearity and heterogeneity. Figure 2b shows the normal Q-Q plot, the residuals are not normally dis- tributed, this also supports the result of Kolmogorov-Smirnov test in Table 2. The possible outliers are the observations 272 and 355. The observation 272 determine more the mois- ture content removal of the seaweed than the model predict. Though, it is an extreme case, but still affect the moisture con- tent removal. The observation 355 has a negative residual and determine less the moisture content removal of the seaweed than the model predicts. The normality assumption is checked with the Kolmogorov- Smirnov test for a two-taied test. From the results in Table 2, the p-value =2.2e-16, which is less than 0.05, it means we have enough evidence to say that the residuals do not come from a normal distribution. This also explains why we have this type of QQ plot in Figure 2. The results in Table 3 are the evaluation of each machine learning algorithm for 15, 25, 35 and 45 high - ranking variables that determine the moisture content removal of the seaweed. Based on the mean absolute percentage error (MAPE), mean squared error (MSE), R2 and sum of squared error (SSE), random forest outperforms support vector machine, bagging and boosting for the 15, 25, 35 and 45 significant parameters. This also confirms the results of [33], where random forest absolutely performed better than the other methods. Random forest when 45 significant parameters that deter- mine the moisture content of the seaweed were selected gave MAPE of 2.125891, MSE of 7.330011, R2 of 0.9732063 and SSE of 14029.64 gave the best performance. All the validation measures such as MAPE, MSE, R-square and SSE imply that significantly better results are obtained by random forest to the determine the moisture content removal of the seaweed. Table 4 is the summary of the original model without using robust method and the hybrid models ,which combines machine learning models and robust estimation techniques. It also shows the number and percentage of outliers using 2-sigma limit.The percentage for the outliers is the number of observations outside the 2-sigma limit. It shows the percentage of outliers outside the 2-sigma limit for the original model without using robust method and the hybrid model. This sigma limit can improve the outputs quality and eliminate the source of deficiencies [34]. Based on the results in Table 4 for the original model, for 5 O. J. Ibidoja et al. / J. Nig. Soc. Phys. Sci. 5 (2023) 1137 6 Table 3: Evaluation metrics for the 15, 25, 35 and 45 high-ranking important variables Machine Learning Model High-ranking important variables selected High-ranking important variables selected MAPE MSE R2 SSE Random Forest 15 2.458969 9.910512 0.9637737 18968.72 25 2.337353 9.010273 0.9670644 17245.66 35 2.174667 7.790909 0.9715216 14911.80 45 2.125891 7.330011 0.9732063 14029.64 Support Vector Machine 15 8.614626 45.25618 0.8347612 86620.32 25 7.980399 35.80985 0.8691446 68540.05 35 7.568951 34.00095 0.8757802 65077.81 45 7.351331 32.38644 0.8816661 61987.65 Bagging 15 12.25897 74.29053 0.7284423 142192.10 25 9.778194 47.33173 0.8269861 90592.93 35 8.413645 36.41955 0.8668739 69707.02 45 8.151903 33.65611 0.8769752 64417.80 Boosting 15 8.168942 142.4542 0.5310293 272657.30 25 8.697362 136.3236 0.5543729 260923.30 35 8.183671 140.1463 0.5368431 268240.10 45 8.203304 134.0864 0.5569358 256641.30 Table 4: Percentage of outliers outside 2 - sigma limits for hybrid models Machine Learning Model Robust Regression Method 15 highest important variables 25 highest important variables 35 highest important variables 45 highest important variables µ± 2σ(%) µ± 2σ (%) µ± 2σ (%) µ± 2σ (%) Random Forest Original 118(6.17) 113(5.90) 112(45.85) 118(6.17) M Bi-Square 118(6.17) 117(6.11) 75(3.92) 99(5.17) M Hampel 72(3.76) 88(4.60) 92(4.81) 93(4.86) M Huber 83(4.34) 90(4.70) 88(4.60) 102(5.33) Support Vector Machine Original 108(5.64) 98(5.12) 86(4.49) 87(4.55) M Bi-Square 64(3.34) 18(0.94) 84(4.39) 89(4.65) M Hampel 66(3.45) 62(3.24) 85(4.44) 86(4.49) M Huber 81(4.23) 83(4.34) 96(5.02) 99(5.17) Bagging Original 98(5.12) 96(5.02) 97(5.07) 84(4.39) M Bi-Square 126(6.58) 97(5.07) 95(4.96) 78(4.08) M Hampel 101(5.28) 97(5.07) 90(4.70) 85(4.44) M Huber 113(5.90) 99(5.17) 97(5.07) 89(4.65) Boosting Original 193(10.10) 168(8.78) 194(10.12) 194(10.12) M Bi-Square 77(4.02) 77(4.02) 133(6.95) 79(4.12) M Hampel 76(3.97) 76(3.97) 72(3.76) 80(4.18) M Huber 83(4.34) 81(4.23) 67(3.50) 85(4.44) 15 highest important variables, the maximum is boosting with 193 (10.1%) outliers, while the minimum is bagging with 98 (5.12%). For the 25 highest important variables, the maximum is boosting 168 (8.78%) , while the minimum is bagging with 96 (5.02%). For the 35 highest important variables, the maximum is boosting 194 (10.12%), while the minimum is 6 O. J. Ibidoja et al. / J. Nig. Soc. Phys. Sci. 5 (2023) 1137 7 support vector machine with 86 (4.49%). For the 45 highest important variables, the maximum is boosting 194 (10.12%) , while the minimum is bagging with 84 (4.39%). Based on this results, bagging with 45 variables importance gave the best performance because it has the lowest number of outliers of 84. Based on the results in Table 4 for the hybrid model, for the 15 highest important variables, bagging M Bi-square has the highest number of outliers of 126 with 6.58% ,while support vector machine M Bi-square has the lowest number of outliers with of 64 with 3.34%. For the 25 highest important variable random forest M Bi-square has the highest number of outliers of 117 with 6.11% ,while support vector machine M Bi-square has the lowest number of outliers with of 18 with 0.94%. For the 35 highest important variable boosting M Bi-square has the highest number of outliers of 133 with 6.95% ,while boosting M Huber has the lowest number of outliers with of 67 with 3.50%. For the 45 highest important variable random forest M Huber has the highest number of outliers of 102 with 5.33% ,while bagging M Bi-square has the lowest number of outliers with of 78 with 4.08%. Based on this result, bagging M Bi-square gave the best performance because it had the lowest number of outliers of 78 and used the highest number of high ranking variables. 5. Conclusion The aim of this study is to develop a hybrid model, to fore- cast seaweed drying parameters that determine the moisture content removal that would enhance the quality of the seaweed. Four predictive models such as random forest, support vector machine, bagging and boosting were built with M Huber, M Hampel and M Bi-Square to develop a hybrid model that can improve the predictive accuracy of the seaweed contaminated data. In summary, the best model to determine the moisture content removal of the seaweed big data is the bagging M Bi- square, it gave the best performance because it had the lowest number of outliers of 78 and used the highest number of high - ranking variables. For future study, a hybrid model with imbal- anced data or missing values can be investigated. Acknowledgement The authors are grateful to the Ministry of Higher Education Malaysia for Fundamental Research Grant Scheme with Project Code RGS/1/2022/STG06/USM/02/13 for their assistance. We are also grateful to the Editor, associate editor, and anonymous reviewers for their insightful comments and suggestions to im- prove the quality and clarity of the paper. References [1] D. N. Gujarati & D. N. Porter, Basic econometrics, 4th ed. New York, USA: The McGraw-Hill Companies, (2004). [2] O. G. Obadina, A. F. Adedotun, & O. A. Odusanya, “Ridge Estimation’s Effectiveness for Multiple Linear Regression with Multicollinearity: An Investigation Using Monte-Carlo Simulations”, Journal of the Nigerian Society of Physical Sciences 3 (2021) 278, doi: 10.46481/jnsps.2021.304. [3] A. B. Yusuf, R. M. Dima, & S. K. Aina, “Optimized Breast Cancer Clas- sification using Feature Selection and Outliers Detection,” Journal of the Nigerian Society of Physical Sciences 3 (2021) 298, doi: 10.46481/jn- sps.2021.331. [4] H. Y. Lim, P. S. Fam, A. Javaid, & M. K. M. Ali, “Ridge regres- sion as efficient model selection and forecasting of fish drying using v- groove hybrid solar drier”, Pertanika J Sci Technol. 28 (2020) 1179, doi: 10.47836/pjst.28.4.04. [5] A. Javaid, M. T. Ismail, & M. K. M. Ali, “Comparison of Sparse and Robust Regression Techniques in Efficient Model Selection for Moisture Ratio Removal of Seaweed using Solar Drier”, Pertanika J. Sci. & Tech- nol 28 (2020) 609. [6] A. Javaid, M. T. Ismail, & M. K. M. Ali, “Efficient Model Selection of Collector Efficiency in Solar Dryer using Hybrid of LASSO and Robust Regression”, Pertanika J. Sci. & Technol 28 (2020) 210. [7] I. Dawoud & M. R. Abonazel, “Robust Dawoud–Kibria estimator for han- dling multicollinearity and outliers in the linear regression model”, J. Stat. Comput. Simul. 91 (2021) 3678, doi: 10.1080/00949655.2021.1945063. [8] A. Rajarathinam & B. Vinoth, “Outlier Detection in Simple Linear Re- gression Models and Robust Regression-A Case Study on Wheat Produc- tion Data”, International Journal of Scientific Research 3 (2014) 531. [9] S. L. Jegede, A. F. Lukman, K. Ayinde, & K. A. Odeniyi, “Jackknife Kibria-Lukman M-Estimator: Simulation and Application”, Journal of the Nigerian Society of Physical Sciences 4 (2022) 251, doi: 10.46481/jn- sps.2022.664. [10] B. T. Tan, P. S. Fam, R. B. R. Firdaus, T. Mou Leong, & M. S. Gu- naratne, “Impact of climate change on rice yield in malaysia: A panel data analysis”, Agriculture (Switzerland) 11 (2021), doi: 10.3390/agri- culture11060569. [11] Y. Susanti, H. Pratiwi, H. Sulistijowati, & T. Liana, “M Estimation, s estimation, and mm estimation in robust regression”, International Jour- nal of Pure and Applied Mathematics 3 (2014) 349, doi: 10.12732/ij- pam.v91i3.7. [12] Y. Susanti & D. Pratiwi, “MODELING OF SOYBEAN PRODUC- TION IN INDONESIA USING ROBUST REGRESSION”, Bionatura 14 (2012) 148. [13] P. J. Huber, “Robust Estimation of a Location Parameter”, The Annals of Mathematical Statistics 35 (1964) 73. [14] F. Drobnic, A. Kos, & M. Pustisek, “On the interpretability of machine learning models and experimental feature selection in case of multi- collinear data”, Electronics (Switzerland) 9 (2020), doi: 10.3390/elec- tronics9050761. [15] M. Z. I. Chowdhury & T. C. Turin, “Variable selection strategies and its importance in clinical prediction modelling”, Fam Med Community Health 8 (2020), doi: 10.1136/fmch-2019-000262. [16] H. Kaneko, “Examining variable selection methods for the predic- tive performance of regression models and the proportion of selected variables and selected random variables”, Heliyon 7 (2021) 1, doi: 10.1016/j.heliyon.2021.e07356. [17] Mukhtar, M. K. M. Ali, M. T. Ismail, M. H. Ferdinand, & Alimuddin, “Machine learning-based variable selection: An evaluation of Bagging and Boosting”, Turkish Journal of Computer and Mathematics Education 12 (2021) 4343. [18] Mukhtar, M. K. M. Ali, M. T. Ismail, M. H. Ferdinand, Alimuddin, N. Akhtar, & A. Fudholi, “Hybrid model in machine learning–robust re- gression applied for sustainability agriculture and food security”, Inter- national Journal of Electrical and Computer Engineering 12 (2022) 4457, doi: 10.11591/ijece.v12i4.pp4457-4468. [19] S. Georganos, T. Grippa, A.N. Gadiaga, C. Linard, M. Lennert, S. Van- huysse, N. Mboga, E. Wolff., & S. Kalogirou, “Geographical random forests: a spatial extension of the random forest algorithm to address spa- tial heterogeneity in remote sensing and population modelling”, Geocarto Int 36 (2021) 121, doi: 10.1080/10106049.2019.1595177. [20] D. O. Oyewola, E. G. Dada, N. J. Ngozi, A. U. Terang, & S. A. Akinwumi, “COVID-19 Risk Factors, Economic Factors, and Epidemiological Fac- tors nexus on Economic Impact: Machine Learning and Structural Equa- tion Modelling Approaches”, Journal of the Nigerian Society of Physical 7 O. J. Ibidoja et al. / J. Nig. Soc. Phys. Sci. 5 (2023) 1137 8 Sciences 3 (2021) 395, doi: 10.46481/jnsps.2021.173. [21] V. Umarani, A. Julian, & J. Deepa, “Sentiment Analysis using vari- ous Machine Learning and Deep Learning Techniques”, Journal of the Nigerian Society of Physical Sciences 3 (2021) 385, doi: 10.46481/jn- sps.2021.308. [22] R. Gandhi, “Support Vector Machine — Introduction to Machine Learn- ing Algorithms”, Towards Data Science, (2018). [23] H. H. Rashidi, N. K. Tran, E. V. Betts, L. P. Howell, & R. Green, “Ar- tificial Intelligence and Machine Learning in Pathology: The Present Landscape of Supervised Methods”, Acad Pathol 6 (2019) 1, doi: 10.1177/2374289519873088. [24] C. Cortes & V. Vapnik, “Support-Vector Networks”, Mach Learn 20 (1995) 273. [25] A. J. Smola, B. Scholkopf, & S. Scholkopf, “A tutorial on support vector regression”, Kluwer Academic Publishers, (2004). [26] N. Guenther & M. Schonlau, “Support vector machines”, The Stata Jour- nal 3 (2016) 917. [27] Y. Freund, “Boosting a weak learning algorithm by majority”, Inf Comput 121 (1995) 256. [28] R. E. Schapire, “The Boosting Approach to Machine Learning an Overview”, MSRI Workshop on Nonlinear Estimation and Classification, (2002). [29] L. Breiman, “Bagging Predictors”, Mach Learn 24 (1996) 123. [30] Ö. G. Alma, “Comparison of Robust Regression Methods in Linear Re- gression”, Int. J. Contemp. Math. Sciences 6 (2011) 409. [31] A. E. Mohamed, H. M. Almongy, & A. H. Mohamed, “Comparison Be- tween M-estimation, S-estimation, And MM Estimation Methods of Ro- bust Estimation with Application and Simulation”, International Journal of Mathematical Archive 9 (2018) 55. [32] Mukhtar, M. K. M. Ali, A. Javaid, M. T. Ismail, & A. Fudholi, “Accurate and Hybrid Regularization - Robust Regression Model in Handling Multicollinearity and Outlier Using 8SC for Big Data”, Mathematical Modelling of Engineering Problems 8 (2021) 547, doi: 10.18280/mmep.080407. [33] R. C. Chen, C. Dewi, S. W. Huang, & R. E. Caraka, “Selecting critical features for data classification based on machine learning methods”, J Big Data 17 (2020) 1, doi: 10.1186/s40537-020-00327-4. [34] C. Njeru & A. Amayo, Evaluation of Quality Control in Clinical Chem- istry Using Sigma Metrics, (2022). 8