The Simulation Study to Test the Performance of Quantile Regression Method With Heteroscedastic Error Variance CAUCHY –Jurnal Matematika Murni dan Aplikasi Volume 5(1)(2017), Pages 36-41 p-ISSN: 2086-0382; e-ISSN: 2477-3344 Submitted: 15 May 2017 Reviewed: 24 July 2017 Accepted: 2 November 2017 DOI: http://dx.doi.org/10.18860/ca.v5i1.4209 The Simulation Study to Test the Performance of Quantile Regression Method With Heteroscedastic Error Variance Ferra Yanuar1*, Laila Hasnah2, Dodi Devianto3 1,2,3Jurusan Matematika FMIPA Universitas Andalas Kampus Limau Manis 25163 Padang * Corresponding Author. E-mail: ferrayanuar@yahoo.co.id ABSTRACT Least square estimator has many limitations. This estimator will not be a Best Linear Unbiased Estimator (BLUE) in the condition of the variance error term have heteroscedasticity problem. Quantile regression is a robust approach in situations where the limitation addressed above present for least square estimator. The purpose of this study is to describe the performance of quantile regression method in modeling a data set which contain the heteroscedasticity problem. To achieve the goal, a data set is generated and statistical framework quantile method then applied to the data. The consistency of the proposed model is then checked by doing a simulation study. This study proves that the quantile regression method is able to produce acceptable parameter model since the proposed models have large Pseudo R2 and small mean square error (MSE) for all parameter estimated. It could be conclude here that quantile regression method is an unbiased estimator method and able to result acceptable model althought in the present of heteroscedasticity problem of variance error. Keywords: Heteroscedasticity, quantile regression, simulation study, Pseudo R2, mean square error (MSE). INTRODUCTION In modeling the relationship between covariates and responses, it need estimator method to estimate the parameter model. To estimate the value of parameters, it usually use the Ordinary Least Squares (OLS). The principle of this method is to minimize the sum of the squares of the error. This OLS is applied if all model assumptions are met (independent observations, linearity of conditional means, normality of response variable and homogeneity of error variance). In all model assumptions are met, the estimator method is called as BLUE (Best Linear Unbiased Estimator). However, if one or more of the assumptions are not met, the results could be misleading [1]. In regression, an error is how far a point deviates from the regression line. Ideally, our data should be homoscedastic (i.e. the variance of the errors should be constant). In many real world applications, this situation rarely happens. Most data is heteroscedastic by nature. Due to these limitatios, an alternative approach to classical linear regression is in demand. Quantile regression is a robust approach in situations where the limitations addressed above [2] present for ordinary least square estimator. The quantile method is one of the regression modeling methods by dividing a batch of data into the same parts after the data is sorted from the smallest or the largest [1, 3]. Quantile regression is an approach in regression analysis introduced by Koenker and Basset [4]. Quantile mailto:ferrayanuar@yahoo.co.id The Simulation Study to Test the Performance of Quantile Regression Method With Heteroscedastic Error Variance Ferra Yanuar 37 regression in his theory is able to overcome the violation of normality assumptions, heteroscedasticity, multicollinearity problems and so on. This method uses the parameter estimation approach by separating or dividing the data into quantities, by assuming the conditional quantization function on a distribution of data and minimizing the absolute asymmetry of unsymmetric weighted error and presupposes a conditional quantile function on a distribution of data [5]. In this paper, we adopt the quantile regression approach to modeling groups of data with non homogeneity of error variance. Section 2 of the paper, describes the theoretical framework of quantile regression and its indicators to determine the goodness of fit of the proposed model. In section 3, we illustrate the implementation of quantile regression through a simulated case-study. We choose two covariates in our model hypothesis.as the predictors to the response variable. We end with a short discussion in Section 4. FUNDAMENTAL THEORIES AND RELATED WORKS In stricly linear models, a simple approach to estimating the conditional quantiles is suggested in Koenker and Basset [2]. Based on the classical regression model, we have: 𝑦𝑖 = π’™π’Šβ€²πœ· + πœ€π‘– , 𝑖 = 1,2, … , 𝑛 (1) where 𝑦𝑖 are dependent variable for each data , π‘₯𝑖 are independent matrix 𝑛π‘₯𝑝, with (𝑦𝑖 ) = π’™π’Šβ€²πœ·, 𝜷 is vector of parameter model size 𝑝π‘₯1, πœ€π‘– are error for each data . The parameter estimate by classical regression, by minimizing the sum of the error squares, is written as π‘šπ‘–π‘› βˆ‘ (𝑦𝑖 βˆ’ 𝑦�̂�) 2𝑛 𝑖=1 (2) where are estimated value for each data . 2.1. Quantile Regression Method The prediction based on the median, that is, by minimizing the absolute number of errors can be written using following equation : π‘šπ‘–π‘› βˆ‘ |𝑦𝑖 βˆ’ 𝑦�̂�| 2𝑛 𝑖=1 (3) Furthermore, the linear equations for -th quantil can be written as follows : 𝑦𝑖 = π’™π’Šβ€²πœ·π‰ + πœ€π‘– , 𝑖 = 1,2, . . , 𝑛 (4) Noted that 𝑦�̂� = π’™π’Šβ€²πœ·π‰, the parameter estimate for th quantile is to minimize the absolute value of the error by weighting Ο„ for the positive and weighted error (1- Ο„) for the negative error [4]. The 𝜏th (0 < 𝜏 < 1) quantile of πœ€π‘– is the value, π‘„πœ, for which 𝑃 (πœ€π‘– < π‘„π‘Œ(𝜏)) = 𝜏. The 𝜏th conditional quantile of 𝑦𝑖 given π’™π’Š is then simply [6,7] : π‘„πœ(𝑦𝑖 |π’™π’Š) = π’™π’Šβ€²πœ·π‰, where πœ·π‰, is a vector of coefficients dependent of 𝜏. The 𝜏th regression quantile is defined as any solution, �̂�𝝉, to the quantile regression minimasation problem : π‘šπ‘–π‘›π›½πœ–β„› βˆ‘ 𝜌𝜏 𝑛 𝑖=1 (𝑦𝑖 βˆ’ π’™π’Šβ€²πœ·π‰), (5) where the loss function : 𝜌𝜏(𝑒) = 𝑒(𝜏 βˆ’ 𝐼(𝑒 < 0)). (6) Equivalently, we may rewrite (5) as : π‘šπ‘–π‘›π›½πœ–β„› {βˆ‘ 𝜏|𝑦𝑖 βˆ’ π’™π’Šβ€²πœ·π‰| + βˆ‘ (1 βˆ’ 𝜏)|𝑦𝑖 βˆ’ π’™π’Šβ€²πœ·π‰| 𝑛 𝑖=1 𝑛 𝑖=1 } (7) The meaning which can be added to explain the equation (7), namely that all observations greater than the quantile value, multiplied by the weighting Ο„ and the observations whose value is less than the quantile multiplied by 1 βˆ’ 𝜏. 2.2 Goodness of Fit using Pseudo R2 Simple quantile regression model with n independent variables can be formed as follows: π‘„πœ(οΏ½Μ‚οΏ½|𝒙) = οΏ½Μ‚οΏ½0(𝜏) + οΏ½Μ‚οΏ½1(𝜏)𝒙 + β‹― + �̂�𝑛 (𝜏)𝒙 (8) i i Λ† iy i   The Simulation Study to Test the Performance of Quantile Regression Method With Heteroscedastic Error Variance Ferra Yanuar 38 The indicator of the goodness of fit for the model can be predicted with Pseudo R2 as defined below [2]: Pseudo π‘…πœ 2 = 1 βˆ’ π‘…π΄π‘Šπ‘†πœ π‘‡π΄π‘†π‘Šπœ (9) where : π‘…π΄π‘Šπ‘†πœ = βˆ‘ 𝜏|𝑦𝑖 βˆ’ οΏ½Μ‚οΏ½0(𝜏) βˆ’ οΏ½Μ‚οΏ½1(𝜏)π’™π’Š βˆ’ β‹― βˆ’ �̂�𝑛 (𝜏)π’™π’Š| 𝑦𝑖β‰₯οΏ½Μ‚οΏ½0(𝜏)+οΏ½Μ‚οΏ½1(𝜏)π’™π’Š+β‹―+�̂�𝑛(𝜏)𝒙 + βˆ‘ (1 βˆ’ 𝜏)|𝑦𝑖 βˆ’ οΏ½Μ‚οΏ½0(𝜏) βˆ’ οΏ½Μ‚οΏ½1(𝜏)π’™π’Š βˆ’ �̂�𝑛(𝜏)π’™π’Š|𝑦𝑖<οΏ½Μ‚οΏ½0(𝜏)+οΏ½Μ‚οΏ½1(𝜏)π’™π’Š+β‹―+�̂�𝑛(𝜏)𝒙 (10) and π‘‡π΄π‘†π‘Šπœ = βˆ‘ 𝜏|𝑦𝑖 βˆ’ οΏ½Μ‚οΏ½| +𝑦𝑖β‰₯𝜏 βˆ‘ (1 βˆ’ 𝜏)|𝑦𝑖 βˆ’ οΏ½Μ‚οΏ½|𝑦𝑖<𝜏 (11) The value of π‘…π΄π‘Šπ‘†πœ (Residual Absolute Sum of Weighted) is always less than the value of π‘‡π΄π‘†π‘Šπœ (Total Absolute Sum of Weighted) so that the Pseudo π‘…πœ 2 will be in the range 0 to 1. The closer the Pseudo R2 value to one the model will be better. However, the virtues of Pseudo R2 can not be used to test the overall goodness of fit for the model, it can only be used to test the merits of the selected quantile [1,2]. 2.3 Mean Square Error (MSE) The parameter estimate obtained is said to be good if it has a small bias and small variance. Therefore, to see the goodness of estimating the parameters based on the bias and variance values simultaneously, represented in the value of Mean Square Error (MSE) [8, 9, 10], formulated as follows : 𝑀𝑆𝐸(�̂�𝑗 (πœπ‘ž )) = π‘‰π‘Žπ‘Ÿ ((�̂�𝑗 (πœπ‘ž )) + π΅π‘–π‘Žπ‘  (�̂�𝑗 (πœπ‘ž )) 2 (12) where : 𝑀𝑆𝐸(�̂�𝑗 (πœπ‘ž )) : value of MSE for 𝑗 = 1,2, … 𝑝 πœπ‘ž = π‘žπ‘’π‘Žπ‘›π‘‘π‘–π‘™ π‘ž = 1,2, … , π‘˜ π‘‰π‘Žπ‘Ÿ ((�̂�𝑗 (πœπ‘ž )) : variance for selected quantile π‘‰π‘Žπ‘Ÿ ((�̂�𝑗 (πœπ‘ž )) = 𝑛 βˆ‘ (�̂�𝑗(πœπ‘ž)) 2𝑛 𝑖=1 βˆ’(βˆ‘ (�̂�𝑗(πœπ‘ž)) 𝑛 𝑖=1 ) 2 𝑛(π‘›βˆ’1) π΅π‘–π‘Žπ‘  (�̂�𝑗 (πœπ‘ž )) : the value of bias for selected quantile is obtained from the mean of the difference of the expected value and the estimated value, or : π΅π‘–π‘Žπ‘  (�̂�𝑗 (πœπ‘ž )) = 1 𝑛 βˆ‘ ((�̂�𝑗 (πœπ‘ž )) βˆ’ 𝑛 𝑖=1 (𝛽𝑗 (πœπ‘ž )) RESULT AND DISCUSSION We describe our approach to quantile regression by conducting the simulation study. In this research, we design two covariates each measuring 100 samples. The response variable, 𝑦𝑖 is generated from the model : 𝑦𝑖 = 𝛽0 + 𝛽1π‘₯𝑖1 + 𝛽2π‘₯𝑖2 + πœ€π‘– , 𝑖 = 1, … ,100 (13) where covariate π‘₯𝑖1 is generated from a standard normal distribution and π‘₯𝑖2 is generated from exponential with one degrees of freedom. The parameter 𝛽0, 𝛽1 and 𝛽2 are set to 1.2, 1, and 1.7 respectively. The data for error is also generated by taking the mean at zero and its variances have heteroscedasticity problems. We consider the heteroscedastic normal, 𝑁(0, √0.01 x (π‘Ώπœ·)2) for distribution of error term. In this example, we choose 𝜏 = 0.10, 0.25, 0.50, 0.75 and 0.90 as the quantile points for estimated. Table 1 below shows the parameter estimated and its corresponding standard error. The Simulation Study to Test the Performance of Quantile Regression Method With Heteroscedastic Error Variance Ferra Yanuar 39 Table 1. Quantile Regression Estimates 𝜏th Quantile Parameter Estimates Standard error 0.10 b0 1.0251* 0.0308 b1 0.8004* 0.0220 b2 1.4783* 0.0219 0.25 b0 1.0891* 0.0310 b1 0.8438* 0.0222 b2 1.5931* 0.0220 0.50 b0 1.1840* 0.0417 b1 0.9955* 0.0298 b2 1.7640* 0.0296 0.75 b0 1.2829* 0.0435 b1 1.0871* 0.0312 b2 1.8561* 0.0309 0.90 b0 1.3215* 0.0287 b1 1.1514* 0.0205 b2 2.0225* 0.0204 (* significant at Ξ± = 0,05) Table 1 informs us that estimated coefficient (οΏ½Μ‚οΏ½0, οΏ½Μ‚οΏ½1 and οΏ½Μ‚οΏ½2) for all quantile points are close to initial value, 𝑏0 = 1.2 𝑏1 = 1 and 𝑏2 = 1.7. For example, consider at 0.50th quantile, proposed parameter estimated are οΏ½Μ‚οΏ½0 = 1.1840, οΏ½Μ‚οΏ½1 = 0.9955 and οΏ½Μ‚οΏ½2= 1.7640. The next analysis in the quantile regression is a consistency test of the proposed model to reveal the performance of the quantile approach and its associated algorithm in recovering the true parameters of the quantile regression analysis. Consistency test is done by doing simulation study. Simulation study does so by generating a set of new data set by sampling with replacement from the original data set, and fitting the model to each new data set [11, 12]. To compute standard errors for calculating the 95% confidence interval of all parameters in this study, roughly 25 model fits are determined. The goodness of fit of each model are also calculated. Table 2 presents the result taken from the simulation study. Table 2. Simulation Results of 25 Data Sets Using Quantile Regression Approach -th Quantile Parameter Paramter Estimated (Standard Error) 95% Interval Confidence Pseudo R2 0.10 b0 1.0317* (0.0439) (1.0233 ; 1.0401) 0.8104 b1 0.8974* (0.0312) (0.8898 ; 0.9049) b2 1.5091* (0.0321) (1.4973 ; 1.5208) 0.25 b0 1.0992* (0.0322) (1.0928 ; 1.0992) 0.8288 b1 0.9334* (0.0229) (0.9270 ; 0.9398) b2 1.1613* (0.0229) (1.6000 ; 1.6226) 0.50 b0 1.1912* (0.0301) (1.1861 ; 1.1972) 0.8477 b1 0.9949* (0.0220) (0.9906 ; 0.9992) b2 1.6973* (0.0218) (1.6897 ; 1.7050) 0.75 b0 1.2837* (0.0341) (1.2747 ; 1.2928) 0.8655 b1 1.0533* (0.0243) (1.0478 ; 1.0589) b2 1.8074* (0.0244) (1.7925; 1.8223) 0.90 b0 1.3693* (0.0414) (1.3550 ; 1.3837) 0.8854 b1 1.1064* (0.0299) (1.0961 ; 1.1168) b2 1.9164* (0.0295) (1.8976 ; 1.9352) (* significant at Ξ± = 0,05) Table 2 informs us that the estimated value of the model parameter coefficients (οΏ½Μ‚οΏ½0, οΏ½Μ‚οΏ½1 and οΏ½Μ‚οΏ½2) on each quantile are close to the initial value (𝑏0 = 1.2 𝑏1 = 1 and 𝑏2 = 1.7). For example,  The Simulation Study to Test the Performance of Quantile Regression Method With Heteroscedastic Error Variance Ferra Yanuar 40 consider the 50th quantile, the estimated value for οΏ½Μ‚οΏ½0, οΏ½Μ‚οΏ½1 and οΏ½Μ‚οΏ½2 are 1.1916, 0.9949, and 1.6973 respectively. Based on Table 2, we also know that all parameter estimated values fall within 95% confidence intervals obtained from the simulation study. It means quantile 95% confidence interval seem to work well here and parameter estimated are acceptable. The goodness of fit for the each quantile regression model is presented by the value of Pseudo R2, shown in the last column in Table 2. All Pseudo R2 values obtained here are more than 80%, indicating that all proposed model are adequate and could be accepted. In this study we also determine the value of of MSE (Mean Square Error) to ensure that parameter estimated have small bias and small variance. Table 3 below presents the MSE value of the quantile regression method for all three parameter estimated at corresponding quantile points. Table 3. The Value of MSE For Any Quantile Points Parameter MSE (Mean Square Error) 0.10 0.25 0.50 0.75 0.90 b0 0.0011 0.0011 0.0008 0.0022 0.0054 b1 0.0015 0.0012 0.0005 0.0008 0.0028 b2 0.0037 0.0033 0.0015 0.0058 0.0093 Table 3 above gives information that all parameter estimated have small MSE. These results indicate that quantile regression method is able to produce unbiased parameter model since it has small bias and small variance. Based on any results here, we could believe that the power of our quantile regression method result the best fit for the model althought in the existence of heteroscedastic error variance. CONCLUSIONS This present study purposes to describe the performace of quantile regression method in modeling the data containing non-uniform variance problem (heteroscedasticity). A data set with two covariates is generated, each measuring 100 samples. The distribution for error term is heteroscedastic normal, 𝑁(0, √0.01 x (π‘Ώπœ·)2) is designed to generate a response variable. Each parameter model are also set as initial values. A simulation study is done to check the power of quantile regression algorithm. This study resulted that all parameter estimated are close to the initial values. The value of Pseudo R2 for all proposed model at any selected quantile points are quite large, more than 80%. Based on simulation study, the value of parameter estimated are within 95% confidence intervals indicating that parameter estimated could be accepted. This study also result that quantile regression method is able to produce small value of MSE. Therefore, it could be concluded here that quantile regression methods is unbiased estimator method and could result the acceptable model although in the due of heteroscedasticity problem of error variance. REFERENCES [1] Ferra Y, Hazmira Y and Izzati R. 2016. Penerapan Metode Regresi Kuantil pada Kasus Pelanggaran Asumsi Kenormalan Sisaan. Eksakta, 1 (XVII) : 33 – 37. [2] Davino C, Furno M, and Vistocco D. 2014. Quantile Regression: Theory and Applications. John Wiley & Sons, Ltd. [3] Arbia, G. 2006. Spatial Econometrics: Statistical Foundation Application to Regional Convergence. Springer, Berlin [4] Koenkar.R. and Basset,G.Jr.1978. Quantiles Regression. Econometrica. 46. 33-50 [5] Kozumi H and Kobayashi G. 2011. Gibbs sampling methods for Bayesian quantile regression. Journal of Statistical Computation and Simulation, 81 (11) : 1565 – 1578. [6] Feng X and Zhu L. 2016. Estimation and Testing of Varying Coefficients in Quantile Regression. Journal of the American Statistical Association 111, 266 – 274. The Simulation Study to Test the Performance of Quantile Regression Method With Heteroscedastic Error Variance Ferra Yanuar 41 [7] Yanuar F. 2013. Quantile Regression Approach to Determine the Indicator of Health Status. Scientific Research Journal , I (IV): 17 – 23. [8] Yang Y, Wang HJ and He X. 2015. Posterior inference in Bayesian quantile regression with asymmetric Laplace likelihood. International Statistical Review, 0 0, 1-18 doi: 10.1111.insr.12114 [9] He X. and Zhu LX. 2011. A lack of fit test for quantile regression. Journal of the American Statistical Association, 98 (464) : 1013 - 1022 [10] Feng, X., He, X., and Hu, J. 2011. Wild Boostrap for Quantile Regression. Biometrica, 98, 995–999. [11] Yanuar F, Ibrahim K and Jemain AA. 2013. Bayesian structural equation modeling index in the health model. Journal of Applied Statistics, 40 (6) : 1254–1269. [12] Yanuar F. 2014. The Estimation Process in Bayesian Structural Equation Modeling Approach. Journal of Physics : Conference Series, 495, 012047.