Microsoft Word - BRAIN_vol_9_issue_2_2018_v4_final2_ok.doc 198 The Prediction of the Rate of the Dropout of the Primary Schools Students by Using the Genetic Algorithm Sabah Manfi Redha Department of Statistics, College of Administration and Economics, Baghdad University, Iraq Baghdad, Iraq, Tel. +964 1 778 7086 drsabah@coadec.uobaghdad.edu.iq Abstract In this research, the ARIMA model of the time series has been applied for the prediction of the rate of the dropout of the primary schools for the male and female students during the period (2007-2015) by estimating the autocorrelation and partial coefficients. It shows that the time series is unstable. After estimating autocorrelation and partial coefficients, it manifests that the appropriate ARIMA models (1,1,0) for males and ARIMA (1,1,0) for females and ARIMA (1,1,0) for males and females together. Also, it has been assured that these models are good and give accurate predictions and close to the reality through statistical calculation Q. It turns out that the selected models are appropriate and good. Finally, the prediction of the rate of the dropout of the primary schools for the male and female students during the period (2007-2015) and required results were obtained by using the Genetic Algorithm through the applications (MATLAB R2013a Version: 8.1) and (GRETL 2016). Keywords: prediction, ARIMA model, genetic algorithm 1. Introduction Nations progressed through education and learning. When educated generations learn since a young age, a civilized and educated society arises. Hence, this Society will progress in terms of industry, agriculture, medicine and in all fields, and that the first stage of a person's life is the basis of education. So, the primary school is one of the most important stages in a student’s life, and if properly prepared, a student will be in interested in studying and will achieve his/her education which will benefit his/her own society. However, if it is improperly performed, the education will be a burden on the students. This will lead the student to think to drop out of school. In addition to other reasons, that may affect the student. So, from this point, this subject is selected, which is the prediction of the rate of the students’ dropout of the primary schools. The prediction model (Box- Jenkins) will be examined according to gender and to predict this model, a Genetic Algorithm will be used. 2. Theoretical aspect 2.1. First: Time series Is a set views on specific phenomenon is taken for a period of time and can be expressed in the form as follows (1): Also, the future values of the time series can be predicted by using the fixed form and which does not contain a random variable , and it is called determinism (Deterministic). While most of the times series are characterized in incidental (Stochastic). It means that the future values are subjected to probabilistic distribution by using a random model that contains a random variable . In some types of time series, it can be watched the changing values in every moment of the time periods and these types are called Continuous time series, and that most of the time, series include the views of the variable that can be viewed at equal points in time where it is called Discrete time series. This type can be obtained on either by recording phenomenon values in fixed times or by S. M. Redha - The Prediction of the Rate of the Dropout of the Primary Schools Students by Using the Genetic Algorithm 199 collecting phenomenon values in a fixed time (2). The components of the time series are General Trend, Seasonal Variations, Cyclical Variations, Random Variations (1,4). The time series is considered static if the static is constant that collects the data around and which is free from the influence of the general trend and seasonal variations. The time series has a constant statics and variable and contrasting is shown as follows (2,4): (2) (3) (4) If X1, X2, ..., Xn is values observation the time series {Xt} and if estimates are for , respectively the (5) (6) (7) The static time-series can be distinguished from non-static through self-correlation coefficients values where approaching its value to zero after the second and the third period with regard to the static time-series, while non-static time-series have significant differences close to zero after the seventh or eighth period, and there is a kind of time-series called seasonal time series if the series is repeating itself in each fixed period of time that is (8): Xt=Xt + S ... (8) where S represents the length of the season and can be known and distinguished through self-correlation coefficients, which is positive and so large that can differ significantly from zero at the time periods values. 2.2. Second: Autocorrelation Function (ACF): The autocorrelation function measures the correlation strength between the time series values of the phenomenon  tX within different time periods and it is used to determine the degree and the appropriate form by random errors residual for the purpose of knowing the extent of stability of time series or the series can be unstable. The mathematical formula is illustrated as follows (9): Since the variation of static time series is fixed and equal for each different time periods, it is estimated as follows: (10) 2.3. Third: Partial Autocorrelation Function (PACF): BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 9, Issue 2 (May, 2018), ISSN 2067-3957 200 The partial autocorrelation coefficient function contributes also in determining the appropriate model, and reveals the link between time series with fixing of other variables and a mathematical formula are as follows (2): Since represents the gradient regression. 2.4. Fourth: Formulation of model ARIMA 2.4.1. Autoregressive Model (AR): This model is considered of great benefit for the analysis of time series and it is the most frequently used and this model can be represented by the format of the degree of P this model will be as follows (1): (12) Since: et represents the random error, and  represents the model parameters, and P represents autoregressive rank. And this model can be written X using the background displacement factor B is described as follows: Since: (13) (14) Therefore, the autoregressive process called class P is irreversible. 2.4.2. Moving Average Model (MA): This model represents the current correlation views of the time series with the time series of the same error for previous periods and this model can be performed as mathematical degree q are as follows (4) (15) As the: θ represents the model parameters, q represents the rank of moving circles. By using posterior displacement factor B we get on the following formula: (16) The formula is as follows: (17) 2.4.3. Mixed Autoregressive Moving Average Model (ARMA (p,q) This model is considered as a cross between a self-regression model and moving circles, and represents the current correlation time series values with the previous values of the time series of the same series and with the same series error for previous periods of time and a model formula will be as follows (2) (18) (19) S. M. Redha - The Prediction of the Rate of the Dropout of the Primary Schools Students by Using the Genetic Algorithm 201 By using posterior displacement factor B the formula would be as follows: (20) If that: And that the model parameters are , , , and . Which is estimated from observations of the time series, and is considered independent random errors and a series of normal distribution an arithmetic mean of zero and constant variance . 2.4.4. Autoregressive Integrated Moving Average (ARIMA (p,d,q): Most of the time, series are not stable and are recognized by functions autocorrelation where the value does not reach zero after the second and third displacement remain significant for a number of displacements, so indiscriminate forms are used to convert the series to a stable series and then using backward differences operator and its form is illustrated as follows: If we substitute a with a in equation (20) we get a new model that deals with a particular type of unstable time series and it is called unstable mixed model and a mathematical formula for this model is as follows: 2.5. Fifth: Building time series model (Box-Jenkins): The style of Box-Jenkins to build a time series of works in accordance with the self- regression model and moving circles for a range of views of certain phenomenon, and going through several important stages for the construction of the model are as follows )1,2( : 2.5.1. Model Identification In this stage, the stability of the time series must be checked, and the required p,d,q is selected to select the ARIMA model. The functions of autocorrelation ACF and partial autocorrelation PACF must be designed to know the time series. However, if autocorrelation is located within the limits of the confidence interval 95%, the autocorrelation coefficient is close to zero and a series is considered static of zero rank and we are working on the values of the original views without transferring them. If the form of autocorrelation function does not fall within the limits of the confidence interval of 95%, the autocorrelation coefficient will not come close to zero and the series is considered static and must take its differences in order to convert the series into a static one; when taking the first difference, it will become the d = 1 value. However, if the second difference becomes the d = 2value of ARIMA (p, d, q) model. So, this can be summarized the diagnostic process through Table 1, as follows (1,2): Table 1. Diagnosis of the set of ARIMA models Model ACF PACF AR(1) Cede engineering starting from P1 Zero after Pkk1 AR(2) Cede engineering starting from P2 Zero after Pkk2 AR(P) Cede engineering starting from Pp Zero after Pkkp MA(1) Zero after P1 Ceding after Pkk1 MA(2) Zero after P2 Ceding after Pkk2 MA(q) Zero after Pq Ceding after Pkkq ARMA(1,1) Cede engineering starting from P1 Ceding after Pkk1 ARIMA(p,d,q) Cede engineering starting from Pp Zero after Pkkp BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 9, Issue 2 (May, 2018), ISSN 2067-3957 202 Since: P is the coefficient of autocorrelation function, Pkk represents coefficient partial autocorrelation function. 2.5.2. Estimation and Testing At this stage, to build the ARIMA model, estimating appropriate model is based on personalized parameters p, d, q, and using of appropriate equations for estimating; after the diagnosis and assessment stages are accomplished, it must be tested the efficient and estimated model. It is confirmed that Box-Jenkins is to check the residual to find out the stability and random residuals. The confidence limit must be tested which uses the value of the autocorrelation coefficients of the residuals and should be located between the extent set at probability 95% according to the illustrated formula : (31) Since is a white fuss factor (when the residual is distributed according to a normal distribution with zero average and constant variation), and when the above formula is achieved, it refers to random transactions of residuals autocorrelation. The diagnosed form is considered good, convenient, highly qualified and predictable. For the purpose of examining residual series, the Box- Jenkins test is used to make sure that random residual have been used to estimate parameter model and formula will be as follows (1,4) (32) 2.5.3. Prediction stage In this stage, the future values are predicted to observe the time series and it will T+L if we assume that Xt(L) represents the value at time (T). The prediction can be got by taking the conditional expectation at the original time t of the model after writing period T+L so Et (Xt+1 / Xt-1,....). Therefore, we get the future prediction of the time series. The prediction formula model can be written as follows: where writing the equation (19) will be as follows: (33) This equation can be written at the period T+L and by taking the conditional expectation at period (T), we will get equation prediction at the original time are as follows: = (34) 2.6. Sixth: Genetic Algorithm (GA) As one of the Optimization and search methods, it can be categorized as one of the developmental Algorithms Methods, as it's called (Genetic and Evolutionary Algorithm) (GEA). It depends on the reproduction of (Darwin's perspective) and that is to be, the living creatures try to adequate and adapt with their surroundings, without any alteration to its conditions through generations, and if they are not capable of adjusting or evolving, they will eventually go extinct. The high qualification creatures will prevail, while the low qualification creatures will fade and die. As for the genetic mutation, it's very rare to occur; it's one of the factors that assist to develop the genetic characteristics which are impacted by the genes, the Jainism Algorithm quotes the expressions and terms of the genetics, such as generations, transit, mutations etc. And in this Algorithm, we get to use solutions inside a random society chosen according to the available data, every solution will be shaped as codes, these codes formed as a chain of chromosomes and its size is either bilateral or eightfold etc. and every Gene inside the chromosome is symbolized as Bit . S. M. Redha - The Prediction of the Rate of the Dropout of the Primary Schools Students by Using the Genetic Algorithm 203 Numbers should be allocated according to the size of the function as a primary solution for this Algorithm, then we start by making fixed steps (fixed quantifications); a new solution is formed by applying the genetic commands and there are other solutions within this procedure by choosing random samples until we get The trade-off function (3,5) 2.7. Seventh: Genetic Algorithm Steps There are several steps to apply the Genetic Algorithm (3,5) 1. Initialization: as a start, generating many individual solutions randomly for the primary form of chromosome, and its size depends on the nature of the trouble, but normally there are hundreds or thousands of possible solutions. 2. Selection: choosing a proportion of the current chromosome to produce new generation throughout the successive generation, these are chosen by depending on the Fitness Function and there is another way by choosing a random set of chromosomes, but this procedure can take a very long time. 3. Reproduction: it's the process of reproducing a new generation of chromosomes that can be chosen through testing process then make a Hybridization and mutation process to produce children. 4. The Crossover Process: through choosing parents that are chosen due to the choosing test, by couple mating to produce two new children, this continues until a new group of chromosomes is found plus a new group of parents. There are so many methods in the Process of Crossover: A. One point Crossover: this process leads to producing the next generation of inhabitants that are inconsistent to the first generation, where all the data will be assembled around this point to change the data with the condition of not repeating it. B. Two points Crossover: this process leads to producing the next generation of inhabitants that are inconsistent to the first generation, where all the data will be assembled around these two points to change the data with the condition of not repeating it. C. The Crossover Interfaces: this process leads to disconnecting the chromosome from a part that is different from the second chromosome which leads to change the length of the chromosome. D. Mutation: it is a descent sudden change which results from The Crossover Process, these forms change in the chromosome's shape by changing the chromosome components (Bit), the result is not due to parents. Generally, the Reproduction results in producing new chromosomes which can be implemented on by the Fitness Function to produce new children. E. Termination: it is the process of finding a new generation and it continues until the causes of completion takes place, reaching to the best solution or to the exact number of generation or achieving the desired Budget like the count of time or money or reaching the Local Minimum, with no elapsing ability. 3. The Practical Side 3.1. First: Data Description The privet data of dropout during the period of (2007_2015) has been taken to serve the purpose of applying one of (Box-Jenkins) models to predict the proportion of student's dropout in the primary stage for males and females, and to analyze the time data that has been used in the GRETL 2016 program. BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 9, Issue 2 (May, 2018), ISSN 2067-3957 204 3.2. Second: Time Series Stability After collecting all the students’ dropout proportion for both males and females in the primary stage, the first step of the Box-Jenkins is to draw the time chain data to understand the chain's attitude. 8 10 12 14 16 18 20 22 24 2007 2008 2009 2010 2011 2012 2013 2014 2015 v 1 Figure 1. Drawing the time series for males primary stage We get to notice the instability of the time series, and to be more precise we draw each (Autocorrelation Function) ACF, and (Partial Autocorrelation Function) PACF in a row to ensure the stability according to the figure 2. -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag ACF for v1 +- 1.96/T^0.5 -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag PACF for v1 +- 1.96/T^0.5 Figure 2. Drawing of autocorrelation function and partial correlation for males primary stage The Unit Radix Dickey-Fuller Test is used to assure the series’ stability. The results are: Dickey-Fuller Test Estimated Value = 0.276151, Statistic Test = 0.87796, P-Value = 0.8984 We get to notice from the values above P-Value = 0.8984 on the abstract level of 0.05 which leads to accepting the Null Hypothesis and refusing the Alternative Hypothesis (Existence of a S. M. Redha - The Prediction of the Rate of the Dropout of the Primary Schools Students by Using the Genetic Algorithm 205 Radix Unit) implies that the time series is instable. By taking the first difference, it is observed that the stability of the Time Series has been accomplished . See figure 3: -8 -6 -4 -2 0 2 4 6 8 10 12 2008 2009 2010 2011 2012 2013 2014 2015 d _ v 1 Figure 3. Drawing the time series for Males primary stage students after getting the first difference We get to notice the stability of the time series, and to be more accurate we draw each (Autocorrelation Function) ACF, and (Partial Autocorrelation Function) PACF in a row to ensure the stability according to the figure (4). -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag ACF for d_v1 +- 1.96/T^0.5 -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag PACF for d_v1 +- 1.96/T^0.5 Figure 4. Drawing of autocorrelation function and partial correlation for males primary stage students We use the Unit Radix Dickey-Fuller Test to assure the series’ stability. The results are: Dickey-Fuller Test Estimated Value = 0.233403, Statistic Test = 0.125769, P-Value = 0.6405 The values above P-Value = 0.6405 is noted on the abstract level of 0.05 which leads to refusing the Null Hypothesis and accepting the Alternative Hypothesis (The Nonexistence of a Radix Unit) implies that the time series is stable. Figure 5 represents the Time series of Female Primary Stage Students: BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 9, Issue 2 (May, 2018), ISSN 2067-3957 206 10 12 14 16 18 20 22 24 26 28 30 32 2007 2008 2009 2010 2011 2012 2013 2014 2015 v 1 Figure 5. Drawing the time series for primary stage female students The instability of the time series is noticed and to be more precise we draw each (Autocorrelation Function) ACF, and (Partial Autocorrelation Function) PACF in a row to affirm the stability according to the figure 6. -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag ACF for v1 +- 1.96/T^0.5 -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag PACF for v1 +- 1.96/T^0.5 Figure 6. Drawing of auto correlation function and partial correlation for females primary stage students We use the Unit Radix Dickey-Fuller Test to assert the series’ stability. The results are: Dickey-Fuller Test Estimated Value = 0.47242, Statistic Test = 1.22797, P-Value = 0.9445 We notice from the values above P-Value =0.9445 on the abstract level of 0.05 which leads to accepting the Null Hypothesis and refusing the Alternative Hypothesis (Existence of a Radix Unit) implies that the time series is instable. By taking the first difference, it is noticed that the stability of the time series has been accomplished. See Figure 7: S. M. Redha - The Prediction of the Rate of the Dropout of the Primary Schools Students by Using the Genetic Algorithm 207 -10 -5 0 5 10 15 20 2008 2009 2010 2011 2012 2013 2014 2015 d _ v 1 Figure 7. Drawing the time chain for Females primary stage students after the first difference The stability of the time series is observed and to be more accurate we draw each (Autocorrelation Function) ACF, and (Partial Autocorrelation Function) PACF in a row to assure the stability according to the figure (8). -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag ACF for d_v1 +- 1.96/T^0.5 -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag PACF for d_v1 +- 1.96/T^0.5 Figure 8. Drawing of autocorrelation function and partial correlation for Females primary stage students The Unit Radix Dickey-Fuller Test is used to ensure the series’ stability. The results are: Dickey-Fuller Test Estimated Value = 0.736458 , Statistic Test = 0.380545 , P-Value = 0.794 We notice from the values above P-Value = 0.794on the abstract level of 0.05 which leads to refusing the Null Hypothesis and accepting the Alternative Hypothesis ( The Nonexistence of a Radix Unit) implies that the time series is stable. Figure (9) represents the time series of females and males in the primary stage students: BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 9, Issue 2 (May, 2018), ISSN 2067-3957 208 20 25 30 35 40 45 50 55 2007 2008 2009 2010 2011 2012 2013 2014 2015 v 1 Figure 9. Drawing the time series for males and females primary stage students The instability of the time series is recognized, and to be more accurate, we draw each (Autocorrelation Function) ACF, and (Partial Autocorrelation Function) PACF in a row to assure the stability according to the figure (10). -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag ACF for v1 +- 1.96/T^0.5 -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag PACF for v1 +- 1.96/T^0.5 Figure 10. Drawing of autocorrelation function and partial correlation for Males and Females primary stage students We use the Unit Radix Dickey-Fuller Test to ensure the series’ stability. The results are: Dickey-Fuller Test Estimated Value = 0.369693, Statistic Test =1.01829, P-Value=0.9194 We notice from the values above P-Value = 0.9194 on the abstract level of 0.05 which leads to accepting the Null Hypothesis and refusing the Alternative Hypothesis (Existence of a Radix Unit) implies that the time series is instable. By taking the first difference, we notice that the stability of the time series has been achieved. See Figure 11: S. M. Redha - The Prediction of the Rate of the Dropout of the Primary Schools Students by Using the Genetic Algorithm 209 -15 -10 -5 0 5 10 15 20 25 30 2008 2009 2010 2011 2012 2013 2014 2015 d _ v 1 Figure 11. Drawing the time series for males and females primary stage after the First difference The stability of the time series is noticed, so to be more precise we draw each (Autocorrelation Function ) ACF, and (Partial Autocorrelation Function) PACF in a row to ensure the stability according to the Figure 12. We use the Unit Radix Dickey-Fuller Test to assure the series’ stability. The results are: Dickey-Fuller Test Estimated Value = 0.215889, Statistic Test =0.114984, P-Value =0.719 The values above P-Value = 0.719is noticed on the abstract level of 0.05 which leads to refusing the Null Hypothesis and accepting the Alternative Hypothesis (The Nonexistence of a Radix Unit) implies that the time series is stable. -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag ACF for d_v1 +- 1.96/T^0.5 -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag PACF for d_v1 +- 1.96/T^0.5 Figure 12. Drawing of autocorrelation function and partial correlation for females primary stage students 3.3. Third: Diagnosing the Time Chain Model ,estimate it and choosing the optimum Accomplishing the time series Stability for the dropout proportion for both male and female students, let us move to the next step of diagnosing the appropriate model representing the Male and Female Time Chain. The (Akaike criterion), (Hannan-Quinn criterion) and (Schwarz criterion) all have been adopted to differentiate between set of models. The model with teachers estimates and has been adopted to represent the model tested results , with the grades and the comparison standers of the primary stage: BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 9, Issue 2 (May, 2018), ISSN 2067-3957 210 Table 2. The results of the models tested with the ranking and benchmarks primary stage Akaike Hannan- Quinn Schwarz Used grades The primary stage 52.30670 50.96962 52.19852 ARIMA(1,1,0) males 54.82685 53.48977 54.71867 ARIMA(1,1,0) females 63.28090 61.94382 63.17272 ARIMA(1,1,0) males +females Table 2 shows the best model and according to comparison models which is established by its intangibleness. It has been given the least value standers for the three used models, the model teachers have been estimated according to (the Maximum Faculty). Table (3) presents the Model Milestones and the Intangible Milestones for the males of primary stage students, as it's shown: Table 3. The estimated and spirits parameters for the primary stage of males p-value Z std. error coefficient Parameters ** 0.0497 1.963 0.277415 −0.544476 phi_1 The reckoned model for males' primary stage is formed as: Table 4 presents the Model and Intangible Milestones for females of primary stage students as it is shown: Table 4. The (estimated and spirits features of the primary stage for females p-value Z std. error coefficient Parameters ** 0.0562 1.910 0.279002 −0.532785 phi_1 The reckoned model for females primary stage is formed as: Table 5 shows the Model and Intangible Milestones for Males Females primary stage students as it is presented: Table 5. The (estimated and spirits parameters for primary school male and female) p-value Z std. error coefficient Parameters ** 0.0536 1.930 0.278258 −0.536956 phi_1 The reckoned model for males and females primary stage is formed as: 3.4. Forth: Testing the Accuracy Model: After diagnosing and evaluating the models, the accommodating and the sufficiency of the models must be checked for Males primary stage, through applying the compute (Ljung-Box Q) to check the model accommodation on the function level 0.05 so the Q value occurs of males primary stage students: Ljung-Box Q' = 1.28784, With p-value = P(Chi-square(1) > 1.28784) = 0.2564 Knowing that the Tabulated value equals 3.841, whereas the Q value is less than Tabulated value, so it accepts the Null Hypothesis which indicates the emptiness of the evaluated model out of the contrast accordance trouble. S. M. Redha - The Prediction of the Rate of the Dropout of the Primary Schools Students by Using the Genetic Algorithm 211 It's possible to notice the two parameters functions (Autocorrelation and Partial correlation Functions) of the residues for males primary stage, in which the residues value is located within the confidence interval limits which means the residues series is random and the Evaluated Model is good and convenient as it is presented: -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag Residual ACF +- 1.96/T^0.5 -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag Residual PACF +- 1.96/T^0.5 Figure 13. Drawing of autocorrelation Function and partial correlation of the residues for primary stage males students After diagnosing and evaluating the models, the accommodating and the sufficiency of the models must be checked for primary stage female students, through applying the compute (Ljung- Box Q) to check the model accommodation on the function level 0.05 so the Q value occurs of primary stage female students: Ljung-Box Q' = 0.966626, With p-value = P(Chi-square(1) > 0.966626) = 0.3255 However, the Tabulated value equals 3.841 whilst the Q value is less than Tabulated value, so it accepts the Null Hypothesis which indicates the emptiness of the evaluated model out of the contrast accordance trouble. It's possible to notice the two parameters functions (Autocorrelation and Partial Correlation Functions) of the residues for females primary stage students, in which the residues value is located within the confidence interval limits, which means the residues series is random and the evaluated model is good and convenient as it is shown: -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag Residual ACF +- 1.96/T^0.5 -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag Residual PACF +- 1.96/T^0.5 Figure 14. Drawing of autocorrelation function and partial correlation of the residues for primary stage males students BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 9, Issue 2 (May, 2018), ISSN 2067-3957 212 After diagnosing and evaluating the models, the accommodating and the sufficiency of the models must be checked for males and females of primary stage students, through applying the compute (Ljung-Box Q) to check the model accommodation on the Function level 0.05 so the Q value occurs of males and females of primary stage students: Ljung-Box Q' = 1.10306, With p-value = P(Chi-square(1) > 1.10306) = 0.2936 Note that the Tabulated value equals 3.841 while the Q value is less than Tabulated value, so it takes the Null Hypothesis which manifests that the emptiness of the evaluated model out of the contrast in accordance trouble. It's possible to notice that the two parameters functions (Autocorrelation and Partial correlation Functions) of the residues for male females primary stage, in which the residues value is located within confidence interval limits, which means the residues series is random and the evaluated model is good and convenient as it is presented. -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag Residual ACF +- 1.96/T^0.5 -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 lag Residual PACF +- 1.96/T^0.5 Figure 15. Drawing of autocorrelation function and partial correlation of the residues for males and females primary stage students 3.5. Fifth: Prediction The time series values have been predicted using the Genetic Algorithm for the next five years, and it manifests the described results as shown in table (6): Table 6. Predict the results of the standard deviation and confidence limits for each prediction primary stage 2020 2019 2018 2017 2016 Stage 11.2 11.2 10.8 11.0 11.1 Males 17.8 17.7 17.7 17.1 17.3 Females 31.6 31.1 29.9 30.1 29.8 Males and Females S. M. Redha - The Prediction of the Rate of the Dropout of the Primary Schools Students by Using the Genetic Algorithm 213 -5 0 5 10 15 20 25 30 35 2012 2013 2014 2015 2016 2017 2018 2019 2020 v1 forecast 95 percent interval Figure 16. Drawing of the time series for males of primary stage students and its prediction -5 0 5 10 15 20 25 30 35 40 45 2012 2013 2014 2015 2016 2017 2018 2019 2020 v1 forecast 95 percent interval Figure 17. Drawing of the time series for females of primary stage students and its prediction -10 0 10 20 30 40 50 60 70 80 2012 2013 2014 2015 2016 2017 2018 2019 2020 v1 forecast 95 percent interval Figure 18. Drawing of the time series for males and females of primary stage students and its prediction) 4. Conclusions and Recommendations Through the results, the study has reached to: 1. There is a noticeable rise in the dropout rate of both males and female students in the predicted years compared with the original data, so the dropout rate would reach to (32.6%) BRAIN – Broad Research in Artificial Intelligence and Neuroscience, Volume 9, Issue 2 (May, 2018), ISSN 2067-3957 214 in 2020; that is a very high percentage and it would put Iraq in a great embarrassment in the global education ranking. 2. There is a noticeable rise in the dropout rate for the predicted years compared with the original data, so the dropout rate would reach (11.2%) in 2020. 3. There is a noticeable rise in the dropout rate for the predicted years compared with the original data, so the dropout rate would reach to (17.8%) in 2020. 4. Female students’ dropout rate is much bigger than male students dropout rate since several families are convinced with the idea that girls should stay at home after primary education. 5. Also, some families have financial difficulties which lead to the increase the dropout rate, and fathers tend to make their children work after the primary education to make a profit of their financial source. Therefore, we recommend the following: 1. Increase the awareness and importance of girl's education in the society, through mass media and other facilities. 2. There must be an intervention by the government on child labor. References Kasyok, Al. 2013. Simple Steps for Fitting Arima Model to Time Series Data for Forecasting Using R. International Journal of Science and Research (IJSR), ISSN (Online): 2319-7064. Ong, C.-S., Huang, J. J., & Tzeng, G. T. (2005). Model identification of ARIMA family using genetic algorithms. Applied Mathematics and Computation 164 (2005) 885–912. Alabsi, F., Naoum, R. (2012). Comparison of Selection Methods and Crossover Operations using Steady State Genetic Based Intrusion Detection System. Journal of Emerging Trends in Computing and Information Sciences, Vol. 3, No.7, July 2012, ISSN 2079-8407. Asadul, I. (2007). Explaining And Forecasting Investment Expenditure In Canada: Combined Structural And Time Series Approaches (1961-2000). Applied Econometrics and International Development, Vol.7-1 (2007). Razali, N. M., Geraghty, J. (2011). Genetic Algorithm performance with different selection strategies in solving TSP, in Proceedings of the World Congress on Engineering, ISBN 978-988-19251-4-5, Vol II, July 6 - 8, London, U.K.