J. Nig. Soc. Phys. Sci. 3 (2021) 191–200 Journal of the Nigerian Society of Physical Sciences Goodness of Fit Test of an Autocorrelated Time Series Cubic Smoothing Spline Model Samuel Olorunfemi Adamsa,∗, Davies Abiodun Obaromib, Alumbugu Auta Irinewsc aDepartment of Statistics, University of Abuja, Abuja, Nigeria bDepartment of Statistics, Confluence University of Science and Technology, Osara, Kogi State, Nigeria cDepartment of Mathematics and Statistics, Federal Polytechnic, Nasarawa, Nasarawa State, Nigeria Abstract We investigated the finite properties as well as the goodness of fit test for the cubic smoothing spline selection methods like the Generalized Maximum Likelihood (GML), Generalized Cross-Validation (GCV) and Mallow CP criterion (MCP) estimators for time-series observation when there is the presence of Autocorrelation in the error term of the model. The Monte-Carlo study considered 1,000 replication with six sample sizes: 30; 60; 120; 240; 480 and 960, four degree of autocorrelations; 0.1; 0.3; 0.5; and 0.9 and three smoothing parameters; λGML= 0.07271685, λGCV= 0.005146929, λMCP= 0.7095105. The cubic smoothing spline selection methods were also applied to a real-life dataset. The Predictive mean square error, R-square and adjusted R-square criteria for assessing finite properties and goodness of fit among competing models discovered that the performance of the estimators is affected by changes in the sample sizes and autocorrelation levels of the simulated and real-life data set. The study concluded that the Generalized Cross-Validation estimator provides a better fit for Autocorrelated time series observation. It is recommended that the GCV works well at the four autocorrelation levels and provides the best fit for time-series observations at all sample sizes considered. This study can be applied to; non-parametric regression, non-parametric forecasting, spatial, survival and econometric observations. DOI:10.46481/jnsps.2021.265 Keywords: Cubic spline, Goodness-of-fit test, Generalized Maximum Likelihood (GML), Generalized Cross-Validation (GCV), Mallow CP criterion (MCP) Article History : Received: 18 June 2021 Received in revised form: 21 July 2021 Accepted for publication: 26 July 2021 Published: 29 August 2021 c©2021 Journal of the Nigerian Society of Physical Sciences. All rights reserved. Communicated by: T. Latunde 1. Introduction A cubic spline is the most widely recognized example of the smoothing spline regression model. It’s anything but a piece- wise cubic function that interpolates a bunch of observation focuses and ensures smoothness of the observations [1]. It is piecewise third-degree polynomials that go through a core of ∗Corresponding author tel. no: +23480xxxx572 Email address: samuel.adams@uniabuja.edu.ng (Samuel Olorunfemi Adams ) interests. It has a nonstop first and second subordinate with the request for (d −1) coherence, where d is the polynomial degree [2]. The model with shortened force premise work b(t) changes the factors ti and fit a model utilizing these changed factors, which adds non-linearity to the model and empowers the splines to fit smoother and adaptable non-straight cubic measures. It is assumed that the variables (ti, yi)and (ti+1, yi+1) are connected by a cubic polynomial S it = ait3 + bit2 + cit + di that is valid for ti ≤ t ≤ tt+1 for i = 1, 2, . . . , n − 1[3]. The interpolation func- tion is derived by firstly finding the coefficients ai, bi, ci, di, for each of the cubic functions. For n points, there are n − 1 191 Adams et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 191–200 192 cubic functions to find, and each cubic function requires four coefficients. Therefore we have a total of 4(n − 1) unknowns, which implies that 4(n − 1) independent equation coefficients are required. Firstly, cubic functions must intersect the observation on the left and the right: S i (ti) = yi, i = 1, 2, · · · , n − 1, S i (ti+1) = yi+1, i = 1, 2, · · · , n − 1 (1) Equation (1) produces 2(n − 1) conditions. Then, we need each cubic function to join as easily with its neighbours as could be expected, so we compel the splines to have consistent first and second subsidiaries at the observations i = 1, 2, . . . , n − 1: S ′ i (ti+1) = S ′ i+1 (ti+1) , i = 1, 2, · · · , n − 2, S ′′ i (ti+1) = S ′′ i+1 (ti+1) , i = 1, 2, · · · , n − 2 (2) Besides, S i(t) is figured by choosing to fit the additional con- ditions being performed. A typical arrangement of definite im- peratives accepts that the subsequent subsidiaries are zero at the endpoints; this implies that the bend is a ”straight line” at the endpoints, written as; S ′′ 1 (t1) = 0, S ′′ n−1 (tn) = 0 (3) There exist a few studies on the goodness-of-fit test for non- linear regression models in the literature; these current studies can be grouped as a penalized smoothing, polynomial regres- sion model and soothing spline test statistics, double coordina- tions regression and nonparametric regression models. [4] pro- posed another test estimation for testing uprightness of assault of a mth demand polynomial backslide model. The test mea- surement is; ∫ 10 [ µ (n) λ(t) ]2dt, where µ(n)λ is the zthrequest subsidiary of a zth request smoothing spline estimator for the regression model µ and λ is its related smoothing parameter. The huge example qualities of the test measurement are got- ten from both the invalid and elective speculation. [5] portrays a goodness-of-fit technique for testing the parametric capacity for the regression model and the change in a parametric nonlin- ear regression model. [6] proposed a likelihood and restricted likelihood extent tests for decency of-attack of a nonlinear re- gression using first-request Taylor assessment gauge around the maximum likelihood estimator of the relapse boundary to harsh the invalid and elective theory is shown nonparametrically us- ing penalized splines. [7] applied bootstrap techniques that are computationally productive to assess the achievement of goodness-of-fit measurement and see that for the most part, the power and type one error of the goodness-of-fit measure- ments rely upon the model being scrutinized. [8] considered a smoothing-based test estimation and surmised its invalid scat- tering using a bootstrap philosophy to propose a goodness-of- fit test for examining parametric covariance capacities against general nonparametric choices for both irregularly noticed lon- gitudinal perceptions and thickly noticed useful perception. [9] offered a goodness-of-fit test for nonparametric regression mod- els with straight smoother structure by noticing factual depen- dence between the assessed error terms and the covariates using the Hilbert-Schmidt Independence Criterion (HSIC). The boot- strap is used to acquire p-values and show the fitting type one error and power of the test execution through Monte-Carlo data reenactment. It is clear from the existing literature that the goodness-of-fit of smoothing spline for time series observations have not been investigated so far. This paper presents a goodness-of-fit test for time series observations using three classical cubic spline nonparametric regression functions. In section two, the cubic smoothing spline was discussed, smooth- ing spline selection parameters like Generalized Cross-Validation, Generalized maximum Likelihood, Mallow’s C.P. criterion and performance evaluation criteria were also addressed in this sec- tion. The simulation result is given in section three, while sec- tion four presented the real-life dataset result. Finally, a discus- sion of findings and conclusion are presented in Section five. 2. Materials and Methods 2.1. Cubic Smoothing Spline The spline smoothing model is written as; yi = f (ti) + εi (4) Where; yi is the response/dependent variable, f is an unknown smoothing function, ti is the independent/predictor variable and εi is zero mean autocorrelated stationary process [10]. The general cubic spline function is given as; f (t) = at3 + bt2 + ct + d + ε (5) where; a, b, c, and d is real number coefficients and a , 0, t is the independent variable, ε is the error term and d. f. is k−d−1 (k is number of knots and d is the degree of the cubic spline) The cubic smoothing spline estimate f̂ of the function, f is de- fined to be the minimizer (over the class of twice differentiable function) of; S ( f ) = n∑ i−1 ( yi − f̂ (ti) )2 + λ ∫ b a ( f̂ ′′ (t) )2 dt (6) where; 1. λ > 0 is a smoothing parameter, 2. The initial part of the equation is the residual sum of the square for the goodness of fit for the observation. 3. The subsequent term is a roughness penalty, which is enormous when the incorporated second derivative of a regression function f ′′ (t) is likewise huge 4. If λ approaches 0, then f (t) Simply interpolates the ob- servations. 5. If λ is very large, then f (t) will be chosen so that f ′′ (t) Is wherever 0, which suggests a by and large direct least- squares fit the perceptions. If f (t) values are fixed at f (t1) , . . . . , f (t2) the roughness∫ b a ( f̂ ′′ (t) )2 dt is minimized by a natural cubic spline, this solu- tion is written as a basic function as; f (t) = β0 + β1 f1 (t) + . . . + βn+3 fn+3 (t) (7) 192 Adams et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 191–200 193 2.2. Selection of the smoothing parameter The smoothing parameter in cubic spline smoothing is to control the smoothness of the fitted curve, to estimate the op- timal value of the smoothing parameter λ, three smoothing pa- rameter selection criteria are considered and compared in this study: Generalized Cross-Validation (GCV), Generalized Max- imum Likelihood (GML) and Mallow Cp (MCP). The generalized Cross-Validation (GCV) selection method was suggested by [11, 12] as a substitution for Cross-Validation (CV), which is the most famous technique for selecting the in- tricacy of statistical models. The essential standard of cross- validation is to leave the information that brings up each in turn and choose the estimation of λ under which the rest of the in- formation best predicts the missing focuses [13, 14]. To be ex- act, let g−1 λ be the smoothing spline determined from all the information sets aside from (ti, yi) utilizing the worth λ for the smoothing boundary. The cross-approval decision of λ is then the estimation of λ, which limits the Cross-Validation score, be; CV (λ) = 1 n ∑ {yi − ĝ(ti)} 2 (8) Equation (8) is similar to the criterion for model estimation in regression, generally [15]. Define a matrix A (λ) by; Ai j (λ) = n −1g ( ti, t j ) (9) CV (λ) = 1 n n∑ i=1 {yi − ĝ(ti)} 2 {1 − Aii(λ)} 2 (10) [12, 16, 17] also suggest the use of a related criterion, called Generalized Cross-validation, obtained from (10) by replacing Aii(λ) by its average value, n−1trA(λ), this gives the score. GCV (λ) = n−1RS S (λ)( 1 − n−1trA(λ) )2 (11) Where; RSS (λ) is the residual sum of squares, ∑ {yi − ĝ(ti)} 2. In their study [12] likewise give hypothetical contentions to show that Generalized Cross-Validation ought to, asymptoti- cally, pick an ideal estimation of λ in the sense of minimizing the average squared error at the design points. The predicted published practical examples bear out a good performance in [18]. The summed-up Cross-validation technique is notable for its optimal properties [19]. If there exists an n x n, the impact matrix, with the property f̂n, λ (t1) f̂n, λ (t2) . . . f̂n, λ (tn)  = S (λ) y (12) and W0 (λ) = ∑n k=1 ( ak jy j − yk )2 (1 − akk) 2 (13) Generalized Cross-Validation is the adjusted form of Cross- Validation, a traditional technique for estimating the smoothing parameter. The GCV score is constructed by comparison to the CV score obtained from the ordinary residuals by dividing them by 1 − S (λ))ii. The accepted format of GCV is to replace the notation 1 − (S (λ)) in Cross-Validation with the mean score 1 − n−1 trace S (λ)Thus, by adding the squared residual and notation {1 − n−1 trace S (λ)} 2 , by the already known ordinary cross-validation, the GCV smoothing method is written mathe- matically as; GCV (λ) = 1 n ∑n k=1 {y − fk (x1)} 2{ 1 − n−1trace (S λ) }2 (14) GCV (λ) = n−1‖(I − S λ) y‖2[ n−1trace (I − S λ) ]2 (15) where; n is the dataset (xi, yi), λ refers to the smoothing parameters and S (λ) is the ith diagonal member of a smoother matrix Generalized Maximum Likelihood (GML) selection method; [20] proposed the GML technique for correlated data with one smoothing parameter. In a bivariate model, two smoothing pa- rameters should be assessed simultaneously along with the co- variance boundaries. Following a comparative determination, GML is given as; GML (λ) = yI W (I − S (λ))[ det+W (I − S (λ) ) ] 1 n−m (16) det+ (I − S (λ)) is the product of the n− m nonzero eigenvalues of (I −− S (λ)) . λ is Smoothing parameter, W is the correlation structure, S (λ) is the diagonal element of the smoother matrix, n is n1 + n2 pairs of measurements/observations and m are the number of zero eigenvalues. Mallow’s C.P. Criterion (MCP) selection method was devel- oped by [21] to estimate the fit of a regression model dependent on Ordinary Least Square. It is applied to a model choice situ- ation where explanatory variables can predict a few results and locate the best model associated with subset independent vari- ables. The more modest the estimation of the Cp, the generally exact it is, the Cp is written numerically as; MCP (λ) = ‖(S λ − I)y‖2 tr (I − S λ) (17) where; n is the measurements or observations, λ is the smooth- ing parameters and S (λ) is the ith diagonal member of the smoother matrix. The assumption underlying the application of the Gen- eralized Cross-Validation (GCV), Generalized Maximum Like- lihood (GML) and Mallow’s CP criterion (MCP), the observa- tions must be well represented by the model. 2.3. Simulation Study In this section, a simulation study is performed to assess the per- formance of the three cubic smoothing spline estimator, namely 193 Adams et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 191–200 194 Generalized Cross-Validation (GCV), Generalized Maximum Likelihood (GML) and Mallow’s CP criterion (MCP) when au- tocorrelation is present in the error term. Before the results were computed, datasets for the different simulation combina- tions are generated using codes written in the [22]. The data generation procedure, with accompanying explanation is pre- sented in Table 1. In the data generation process, n = 30, 60, 120, 240, 480 and 960, nrepl.= 1000 and ρ= 0.1, 0.3, 0.5 and 0.9, λGML = 0.07271685, λGCV = 0.005146929, λMCP = 0.7095105. In this study, a non- parametric smoothing function was used to generate the data under different conditions. The function is given as; y (xi) = 5S in ( π xi ) + εxi (18) Where; xi = i−0.5 n ,π= 180 0, εx ∼ N ( 0, ρW−1 ) , a first-order Autoregressive process with mean 0, standard deviation is 0.8 and Autocorrelation levels ρ = 0.1, 0.3, 0.5 and 0.9 with 95% confidence interval. 2.4. Performance Evaluation Criteria A comparative analysis was performed made to test the perfor- mance and goodness-of-fit test of the three cubic spline esti- mation methods (i.e. Generalized Crossed Validation (GCV), Generalized Maximum Likelihood (GML) and Mallow CP Cri- terion (MCP) in the presence of autocorrelation error. The Pre- dictive Mean squared prediction error of a smoothing or Curve fitting procedure according to [22, 23, 24] is the normal worth of the square distinction between the fitted value suggested by the predictive function f̂ (xi) and the value of the observed func- tion f (xi). It is utilized to assess the performance and nature of explanatory variables or Smoothing techniques like Cross- Validation, Generalized Cross-Validation, Generalized Maxi- mum Likelihood and so forth. The Predictive Mean Square Error (PMSE) is written mathematically as; PMS E (λ) = E  n∑ i=1 ( f (xi) − f̂ (xi) )2 (19) PMS E (λ) = 1 n n∑ i = 1 ( f (xi) − f̂ (xi) )2 (20) PMS E (λ) = n∑ i=1 ( E [ f̂ (xi) ] − f (xi) )2 = n∑ i=1 var [ f̂ (xi) ] (21) The Predictive Mean Square Error is usually grouped into two parts; the first part is the sum of square biases of the fitted ob- servations, while the second is the total of variances of the fitted observations. Where; f (xi) is the observed value and f̂ (xi) is the fit- ted/predicted/estimated value. Based on each estimate of the parameter, the methods were ranked according to their performance at the criterion. The evaluation of methods was concluded at two levels using in- dividual measures and the totality based on the standard. For the first level, the ranks were added for each technique and the whole method. Then the methods of estimation were ranked by this total. The smoothing procedure with the least capacity was adjudged the most preferred method and the one with the largest sum the least preferred. These ranks were added together over all the criteria to know how each estimator performs in each parameter in the model. The best estimator in terms of the model was identified by fur- ther adding all the ranks over the model’s parameters. An esti- mator is ranked as the best if it has a minimum sum of levels. Here the groups’ total was used, which will give identical re- sults in terms of ranks if the mean levels had been used. But the consequences might be different if the median of the groups had been used. The disadvantage of the median is that if further work were to be done on these ranks, the mathematical pro- cedure would be at least slightly more complex than with the mean. The goodness of fit of the smoothing methods explains how well the methods fit the simulated and real-life data. It also summarizes the differences between the observed value and es- timated/predicted values. The Adjusted R-square was used to determine the best-fit smoothing methods. It is written mathe- matically as; Ad justed R−S quare = ( 1 − (1 − Rsquare) × (n − 1) n − p ) (22) Where; n = number of observations and p = number of pa- rameters 3. Simulation Result Table 2 presents the summary fit result of the cubic spline regression model and the model performance criteria, namely; the predictive mean square error (PMSE), multiple and adjusted R-square based on six small sizes (T = 30, 60, 120, 240, 480 and 960) and autocorrelation level (ρ = 0.1). It was revealed from the result that all the coefficients of the smoothing meth- ods’ parameters were significant at (P-value <0.001, <0.01 and < 0.05). The adjusted R-Square result indicated that GCV had the high- est values at all sample size levels (T = 30, 60, 120, 240, 480 and 960) and ρ=0.1 with adjusted R-squares of; 0.9963, 0.9979, 0.9971, 0.9992, 0.9986 and 0.9804 respectively. It can be in- ferred from the result above that; the GCV smoothing method provides the best fit to the time-series observations at time se- ries size of (T = 30, 60, 120, 240, 480 and 960) and ρ=0.1. Tables 3, 4 and 5 show the predictive mean square error (PMSE), R-Square and Adjusted R-square simulation results of GCV, GML and MCP for autocorrelation levels 0.3, 0.5 and 0.9 for sample sizes; 30, 60, 120, 240, 480 and 960. The result in- dicated that the adjusted R-square value for GCV was greater than GML and MCP’s value; this is an indication that the cu- bic smoothing spline chosen by Generalized Cross-Validation possesses the best fit model. Figures 1 to 6 clearly show the comparisons of the behaviours of the cubic smoothing spline selected by GCV, GML and MCP for sample sizes 30, 60, 120, 240, 480 and 960, respectively. It 194 Adams et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 191–200 195 Table 1. Data generation process with explanations Steps Explanation Step 1: Obtain n, nrepl. and ρ The sample size of the simulated dataset, number of replication of n and autocorrelation levels respectively Step 2: Decide on Xi, and Y i, Read the simulated sample data (xi, yi) for i = 1 − T and each i′s Step 3: Produce, λt, and f (λ) Determine the Pre-selected smoothing pa- rameters λ1 , . . . ,λt, calculate the re- spective set of smoothing Spline estimates f (λ) = { f̂λ1 , . . . , f̂λt } Step 4: Fit the values of f (xi) and f̂ (xi) For the given λ, σ and T use the data in 1 above to fit a curve and the estimate ahead by linear extension f (xi) and f̂ (xi) Step 5: Compute, GCV, GML and MCP Compute values of the coefficients of GCV, GML and MCP Step 6: Generate the PMSE values Obtain the predictive mean square error PMS E ( f̂ λ) =∑t i = 1 [( f ( x) − f̂ (xi) ))2] for these points sum up all the PMSEs to get the corre- sponding GCV, GML and MCP scores for the given values of n, λ and ρ. Table 2. Simulation result for GCV, GML and MCP for Autocorrelation level = 0.1 Cubic Smoothing Methods Sample sizes Predictive Mean Square Error criterion R-square Adj. R- square β̂0 β̂1 β̂2 β̂3 GCV 30 -0.0965 11.3553 -33.4818 22.4204 0.9998 0.9963 60 -0.1753 11.4502 -33.3385 22.2447 0.9996 0.9979 120 -0.1767 11.7285 -34.2144 22.8765 0.9977 0.9971 240 -0.2000 11.9830 -34.7553 23.1753 0.9995 0.9992 480 -0.1854 11.8463 -34.4414 22.9781 0.9988 0.9986 960 -0.1893 11.8804 -34.4933 22.9920 0.9804 0.9804 GML 30 -0.1278 11.5373 -33.8893 22.6035 0.9824 0.9804 60 -0.1080 10.9222 -32.1454 21.4069 0.9827 0.9818 120 -0.2242 12.0912 -34.7044 22.9920 0.9769 0.9763 240 -0.1968 12.0617 -34.9399 23.2581 0.9804 0.9802 480 -0.1815 11.9975 -34.9133 23.3067 0.9770 0.9768 960 -0.2157 12.0444 -34.8217 23.2068 0.9784 0.9784 MCP 30 -0.0965 11.3553 -33.4818 22.4204 0.9698 0.9663 60 -0.1753 11.4502 -33.3385 22.2447 0.9696 0.9679 120 -0.1953 11.7876 -34.1166 22.6815 0.9778 0.9772 240 -0.2092 12.1147 -35.0511 23.3478 0.9837 0.9835 480 -0.1913 11.9395 -34.6317 23.0761 0.9798 0.9796 960 -0.2103 12.0782 -34.9856 23.3324 0.9799 0.9798 195 Adams et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 191–200 196 Table 3. Simulation result for GCV, GML and MCP for Autocorrelation level = 0.3 Cubic Smoothing Methods Sample sizes Predictive Mean Square Er- ror criterion R-square Adj. R- square β̂0 β̂1 β̂2 β̂3 GCV 30 -0.2800 13.376 -37.869 24.757 0.9106 0.9003 60 -0.3475 13.005 -36.993 24.732 0.9201 0.9137 120 -0.1212 10.645 -31.490 21.112 0.8699 0.8665 240 -0.1015 11.572 -34.618 23.474 0.9037 0.9025 480 -0.2467 12.431 -36.011 24.109 0.9304 0.9297 960 -0.1395 11.605 -34.211 22.962 0.8938 0.8935 GML 30 -0.0353 8.3035 -25.345 16.764 0.8526 0.8356 60 -0.0564 11.046 -32.398 21.367 0.8986 0.8932 120 -0.2749 13.134 -37.862 25.307 0.9151 0.9129 240 -0.1426 11.686 -34.316 22.989 0.8941 0.8927 480 -0.1887 12.008 -34.743 23.089 0.9021 0.9015 960 -0.2065 12.082 -35.038 23.379 0.8978 0.8975 MCP 30 -0.2806 12.211 -33.454 21.491 0.8358 0.8168 60 -0.1306 8.8825 -27.929 19.017 0.8569 0.8492 120 -0.1361 11.428 -33.283 22.081 0.9021 0.8996 240 -0.1511 11.424 -33.114 22.003 0.8881 0.8867 480 -0.2311 12.348 -35.816 23.998 0.8911 0.8904 960 -0.2375 12.676 -36.672 24.519 0.8995 0.8992 Table 4. Simulation result for GCV, GML and MCP for Autocorrelation level = 0.5 Cubic Smoothing Methods Sample sizes Predictive Mean Square Error criterion R-square Adj. R- square β0 β1 β2 β3 GCV 30 -0.0847 11.0575 -33.6550 22.9325 0.8242 0.7923 60 -0.1924 10.4646 -30.4378 20.3870 0.8479 0.8291 120 -0.2296 11.5794 -32.9893 21.6783 0.8481 0.8416 240 -0.2663 12.9193 -36.9713 24.5493 0.8999 0.8974 480 -0.1601 11.8929 -34.5237 22.9573 0.8733 0.8718 960 -0.2469 12.5111 -36.1165 24.1212 0.8705 0.8698 GML 30 -0.1644 9.7970 -32.003 22.1713 0.8335 0.8143 60 -0.0666 11.0730 -34.2401 23.5602 0.7834 0.7718 120 -0.2531 13.5851 -40.1524 27.3208 0.7438 0.7372 240 -0.1301 11.9951 -35.3193 25.6299 0.8072 0.8048 480 -0.1119 11.5210 -34.3366 23.1544 0.7587 0.7571 960 -0.0844 11.3560 -33.9543 22.9082 0.7652 0.7645 MCP 30 -0.3708 12.9624 -35.0598 22.4159 0.7934 0.7696 60 -0.3290 13.8516 -38.0849 24.5585 0.7509 0.7376 120 -0.1718 12.078 -34.8942 23.1811 0.7514 0.7450 240 -0.1913 11.556 -33.2619 22.1224 0.7155 0.7119 480 -0.2059 12.1036 -35.2078 23.5514 0.7514 0.7499 960 -0.1775 12.3371 -35.9588 24.0466 0.7738 0.7731 196 Adams et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 191–200 197 Figure 1. Cubic smoothing spline and fitted curve using GCV, GML and MCP, n=30 Figure 2. Cubic smoothing spline and fitted curve using GCV, GML and MCP, n = 60 Figure 3. Cubic smoothing spline and fitted curve using GCV, GML and MCP, n = 120 was observed that Generalized Cross-Validation provided the best fitted/estimated value when compared to the Generalized Maximum likelihood (GML) and Mallow’s CP Criterion (MCP). 4. Application to Real-life data In this section, the performance of the cubic smoothing spline selection methods on the real-life dataset of the federal govern- ment capital expenditure (in billion nairas) in Nigeria between 1981-2019 sourced from [25] is presented as our example. This series has 39 datasets of the expenditure, the cubic spline was fitted for the mean function (i.e. f ∈ w) and a first-order Au- toregressive process AR (1) for the disturbance. The General- ized Cross-Validation (GCV) was used in the example because it was found to perform better than the other cubic smoothing spline competing models in the simulation study presented in section three. This cubic smoothing spline curve presented in Figure 7 showed that the observed data in our curve is very 197 Adams et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 191–200 198 Figure 4. Cubic smoothing spline and fitted curve using GCV, GML and MCP, n = 240 Figure 5. Cubic smoothing spline and fitted curve using GCV, GML and MCP, n = 480 Figure 6. Cubic smoothing spline and fitted curve using GCV, GML and MCP, n = 960 close to the estimated data. This provided great insight on the cubic smoothing spline selection method whose model pro- duces the best fit for the time-series observations used as an example to validate generalized cross-validation (GCV) cubic spline selection method as the preferred model for time series observation. 5. Discussion and Conclusion This paper presents the goodness-of-fit test for time series observations using three cubic spline nonparametric regression functions. A simulation study and real-life dataset on the total federal government capital expenditure (in billion nairas) be- tween 1981-2019 in Nigeria were used to demonstrate how the three classical cubic smoothing spline selection methods per- form when a time series dataset possesses autocorrelation in its error term. 198 Adams et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 191–200 199 Table 5. Simulation result for GCV, GML and MCP for Autocorrelation level = 0.9 Cubic Smoothing Methods Sample sizes Predictive Mean Square Error criterion R-square Adj. R- square β0 β1 β2 β3 GCV 30 -0.2659 10.7596 -33.8282 23.4447 0.4702 0.4575 60 -0.2095 9.4790 -27.5371 18.5292 0.3126 0.4198 120 -0.2638 11.3711 -31.8619 20.6751 0.4751 0.6665 240 -0.3234 13.7239 -38.8914 25.7508 0.5659 0.4267 480 -0.1290 11.8463 -34.4156 22.8385 0.5244 0.5226 960 -0.2833 12.9439 -37.2473 24.9099 0.5186 0.4717 GML 30 -0.4094 8.4909 -30.8536 22.0745 0.642 0.6007 60 -0.0169 10.5741 -34.3813 24.2411 0.5504 0.5263 120 -0.5117 11.8106 -31.1566 19.9740 0.3301 0.3128 240 -0.0496 10.5045 -31.9189 21.8216 0.4964 0.4900 480 -0.3971 13.7294 -39.3648 26.5502 0.4985 0.4954 960 -0.0487 11.0139 -33.4744 22.7946 0.4958 0.4942 MCP 30 0.2659 10.7596 -33.8282 23.4447 0.4702 0.4091 60 -0.2095 9.4790 -27.5371 18.5292 0.3126 0.2757 120 -0.2638 11.3711 -31.8619 20.6751 0.4751 0.4615 240 -0.3234 13.7239 -38.8914 25.7508 0.5659 0.5604 480 -0.1290 11.8463 -34.4156 22.8385 0.5244 0.5214 960 -0.2834 12.9439 -37.2473 24.9099 0.5186 0.5171 Figure 7. Smoothing curve of Nigeria Federal Government capital expenditure (in billion nairas) between 1981-2019 (green line) and estimates (red line) with Smoothing Parameters Chosen by GCV. In the general structure of the simulated result, it was observed that an increase in the sample size and changes in the level of disturbances from autocorrelation affect the performance of the three cubic smoothing spline methods see (Tables 1-4 and Fig- ures 1–6). The adjusted R-Square result indicated that the GCV had the highest values of 0.9992 at n = 240 and ρ = 0.1, closely followed by the GML and MCP. It was discovered that the generalized cross-validation (GCV) smoothing method pro- vides the best model fit and proved to be more efficient than the other smoothing methods for the simulated time-series obser- vations with autocorrelation levels (ρ = 0.1, 0.3, 0.5 and 0.9) in the error term and for sample sizes - (n = 30, 60, 120, 240, 480 and 960). The smoothing curve of the real-life data set of Nigeria Federal Government capi- tal expenditure (in billion nairas) between 1981-2019 validated the efficiency of the Generalized Cross-Validation smoothing method as its’ model provided the best fit model without any defection and shortcoming under cubic spline functional form when compared with the competing smoothing methods. Our findings also revealed that the GCV smoothing spline estimator out-performed the other competing selection methods for time series observation disturbed with four autocorrelation levels (ρ = 0.1, 0.3, 0.5 and 0.9). The GCV is a smooth- ing spline method fitted without any defection and shortcoming under cubic spline functional form with the highest adjusted R- Square of 0.9992, 0.9297, 0.8974, at ρ = 0.1, 0.3, 0.5, for n = 240 and 480 respectively. This finding is corroborated by [13, 26, 27, 28 ] whose find- ings found that GCV was fairly better when compared to GML for n = 64. That GCV was distinctly unrivalled for n = 128, while for n= 32, GCV was better for more modest σ2 and the examination close for bigger σ2. Other findings recommended generalized cross-validation as the best method for penalized Spline Smoothing parameter estimation and that GCV-Spline decides fitting measures of smoothing fMRI time arrangement. Acknowledgments The authors thank the referees for their positive comments and suggestions, which have greatly helped the improvement of this work. 199 Adams et al. / J. Nig. Soc. Phys. Sci. 3 (2021) 191–200 200 References [1] Q. Kong, T. Siauw & A. M. Bayen, Python Programming and Numerical Methods: A Guide for Engineers and Scientist, Elsevier, ISBN: 978-0-12- 819549-9. (2020) https://doi.org/10.1016/C2018-0-04165-1 [2] R. G. McClarren, Computational nuclear engineering and radiological science using python, Elsevier (2018) 439. https://doi.org/10.1016/C2016-0-03507-16 [3] J. R. Buchanan, “Cubic Spline Interpolation: MATH 375, Numerical Analysis”, Banach. Millersville.edu (2010). [4] J. Chen, “Testing goodness of fit of polynomial models via spline smoothing techniques”, Statistics and Probability Letters 19 (1994) 65. https://doi.org/10.1016/0167-7152(94)90070-1 [5] N. Caouder & S. Huet, “Testing goodness-of-fit for nonlinear regression models with heterogeneous variances”, Computational Statistics and Data Analysis 23 (1998) 491. https://doi.org/10.1016/S0167-9473(96)00049-1 [6] C.M. Crainiceanua & D. Ruppert, “Likelihood ratio tests for goodness- of-fit of a nonlinear regression model”, Journal of Multivariate Analysis 91 (2004) 35. [7] M. Tang, Y. Pei, W. Wong, & J. Li, “Goodness-of-fit tests for correlated paired binary data”, Statistical Method in Medical Research 21 (2012) 331. https://doi.org/10.1177/0962280210381176 [8] S. T. Chen, L. Xiao, & M. Staicu, “A smoothing-based goodness-of- fit test of covariance for functional data”, Biometrics 75 (2019) 562. https://doi.org/10.1111/biom.13005 [9] S. J. T. Hidalgo, C.W. Michael, M. E. Stephanie & M.R. Kosorok, “Goodness-of-fit test for nonparametric regression models: smoothing spline ANOVA models as example”, Computational Statistics and Data Analysis 122 (2018) 135. https://doi.org/10.1016/j.csda.2018.01.004 [10] S. O. Adams & H.U. Yahaya, “Comparative study of GCV-MCP hybrid smoothing methods for predicting time series observations”, =American Journal of Theoretical and Applied Statistics 9 (2020) 219. https://doi:10.11648/j.ajtas.20200905.15 [11] G. Wahba, Applications of Statistics in P. Krishnaiah Edition, A survey of some smoothing problems and the method of generalized cross-validation for solving them, Northern Holland, Amsterdam (1977). [12] P. Craven & G. Wahba (1979), “Smoothing noisy data with spline func- tions”, Numerical Mathematics 31 (1979) 377. [13] G. Wahba, “A comparison of GCV and GML for choosing the smoothing parameters in the generalized spline smoothing problem”, The Annals of Statistics 4 (1985) 1378. [14] G. Wahba, “Optimal convergence properties of variable knot kernel and orthogonal series methods for density estimation”, Annals of Statistics 3 (1975) 15. [15] R. D. Cook & S. Weisberg, “Residuals and influence in regression”, Jour- nal of the American Statistical Association 86 (1982) 328. [16] G. Wahba, ”Automatic smoothing of the log periodogram”, Journal of the American Statistical Association 75 (1980) 122. [17] B.W. Silverman, “Spline smoothing: the equivalent variable kernel method”, Annal of Statistics 12 (1984) 898. [18] D. Xiang & G. Wahba, “Approximate Smoothing Spline Methods for Large Data Sets in the Binary Case. Proceedings of the 1997”, ASA Joint Statistical Meetings, Biometrics Section (1998) 94. [19] G. Wahba, “Spline models for observational data”, CBMS-NSF Re- gional Conference Series in Applied Mathematics, Philadelphia: SIAM 59 (1990) 1. [20] P. J. Diggle & M.F. Hutchinson, “On spline smoothing with autocorre- lated errors”, Australian Journal of Statistics 31 (1998) 166. [21] C. L. Mallows, “Some comments on Cp”, Technometrics 15 (1973) 661. [22] R Core Team, R: A language and environment for statistical comput- ing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/ (2020). [23] C. Daniel, “One at a time plans” Journal of American Statistical Associa- tion, 68 (1973) 353. [24] S.O. Adams & R. A. Ipinyomi, “A new smoothing method for time series data in the presence of autocorrelated error”, Asian Journal of Probability and Statistics (AJPAS), 04 (2019) 1. https://doi.org/10.9734/ajpas/2019/v4i430121 [25] Central Bank of Nigeria Statistical Bulleting, (2019) Edition. [26] C. Yanrong, Z.W. Tracy, L. Haiqun, & Y. Yan, “Penalized spline estima- tion for functional coefficient regression models”, Computational Statis- tics and Data Analysis 54 (2010) 891. [27] M. A. Lukas, F .R. De Hoog & R. S. Anderssen, “Practical Use of Robust GCV and Modified GCV for Spline Smoothing”, Computational Statis- tics 31 (2016) 269. [28] A. R. Devi, I .N. Budiantara & V. Vita-Ratnasari, “ Unbiased risk and cross-validation method for selecting optimal knots in multivariable non- parametric regression spline truncated (case study: the unemployment rate in central java, Indonesia, 2015)”, AIP Conference Proceedings, (2018). 200