Comparisons between Resampling Techniques in Linear Regression: A Simulation Study


CAUCHY –Jurnal Matematika Murni dan Aplikasi 
Volume 7(3) (2022), Pages 345-353 
p-ISSN: 2086-0382; e-ISSN: 2477-3344 

Submitted: December 23, 2021 Reviewed: May 25, 2022 Accepted: July 24, 2022 
DOI: http://dx.doi.org/10.18860/ca.v7i3.14550    

Comparisons between Resampling Techniques in Linear 
Regression: A Simulation Study 

Anwar Fitrianto1,*, Punitha Linganathan2 

1,*Department of Statistics, IPB University, Indonesia 
2Department of Mathematics, Universiti Putra Malaysia, Malaysia 

Email: anwarstat@gmail.com 

ABSTRACT 

Parameter estimations in linear regression need to fulfill some assumptions. Once the 
assumptions are not fulfilled, the conclusion is questionable. Bootstraps and Jackknife are 

resampling techniques that do not require assumptions in estimating the �̂�. The study aims to 
compare resampling techniques in linear regression. The data used in the study is clean, without 
any influential observations, outliers, or leverage points. The ordinary least square method was 
used as the primary method to estimate the parameters and then compared with resampling 
techniques. The variance, p-value, bias, and standard error are used as a scale to estimate the best 
method among random bootstrap, residual bootstrap and delete-one Jackknife. After all the 
analysis, it was found that random bootstrap did not perform well while residual and delete-one 
Jackknife works quite well. Random bootstrap, residual bootstrap, and Jackknife estimate better 
than ordinary least square. The study also found that residual bootstrap works well in estimating 
the parameter in the small sample. At the same time, it is suggested to use Jackknife when the 
sample size is big because Jackknife is more accessible to apply than residual bootstrap and 
Jackknife works well when the sample size is large. 

Keywords: jackknife; linear; regression; resampling 

INTRODUCTION 

Regression analysis is a statistical analysis that constructs relationships between 
dependent or response variables 𝑦 and independent or regressor variables (𝑥1,𝑥2, …, 𝑥𝑘). 

Ordinary least square (OLS) is a traditional way of finding parameter estimates, �̂� but it 
relies strongly on assumptions [1]. The reliability and validity of the conclusion in 
regression analysis are essential ([2], [3]), and they depend on how far the data follows 
the assumption and on the sample size of the data. It is easier to find the estimated 

regression coefficient, �̂� without any assumption or distribution. Bootstrap and Jackknife 

are resampling techniques that do not need any assumptions in estimating the �̂� ,([4]–[6]. 
Sahinler and Topuz [7] compared the bootstrap and Jackknife methods. Their research 
discussed strategies for building a regression model using the Jackknife and bootstrap 
method. The four methods used in their research are bootstrap based on the resampling 
observations, bootstrap based on the resampling errors, delete-one Jackknife regression 
and delete-d Jackknife regression. These methods were used to find the parameter 
estimates, bias, standard errors, and confidence intervals. Their research concluded that 

http://dx.doi.org/10.18860/ca.v7i3.14550
mailto:anwarstat@gmail.com


Comparisons between Resampling Techniques in Linear Regression: A Simulation Study 

Anwar Fitrianto 346 

large bootstrap replicates ensure that the parameter is close to the true parameter. They 
also suggested that bootstrap replicate is sufficient for estimating the variance and 𝐵 =
1000 for estimating the standard errors.  Their research tests the accuracy of bootstrap 
and Jackknife methods in estimating the distribution of regression parameters with 
various sample sizes and various bootstrap replicates. Sahinler and Topuz [7] and Li et. 
al. [8] found that the bootstrap method is appropriate for linear regression and it is usable 
even when the error is not normally distributed. Algamal and Rasheed [9]  further develop 
resampling in linear regression. 

The advantage of bootstrap approximations is that, in general, it needs a smaller 
sample than the ordinary least square for estimating the parameter. Meanwhile, the 
disadvantages of bootstrap methods were discussed in Ma et al., [10], Wan et al., [11],  
[12], and Phaladiganon et al.,  [13] A few of the disadvantages of the methods are as 
follows: 
a) Bootstrap distribution of  is not a good approximation of 𝐹, if the sample size is small 

and with the existence of an outlier, 
b) Bootstrap is not suggested to use in dependence structure case like time series, and 
c) It is not preferable to use residual bootstrap when the assumptions are violated. 

 
Algamal and Rasheed, [10] concluded that Jackknife method perform quite well when the 
sample size is large enough (𝑛 ≥ 50). Meanwhile, recent studies by Shao, J., & Tu, D., [14] 
and Beyaztas, U., & Alin, A., [15] discussed bootstrap and Jackknife in linear regression.  

Based on that, the study is aimed to compare parameter estimates of multiple linear 
regression based on several resampling methods. There are several methods to estimate 

the �̂� in bootstrap and Jackknife.  The scope of this research is to investigate the bootstrap 
and Jackknife method with different scenariosThis research considered random 
bootstrap, residual bootstrap, and Jackknife delete-one observation. The study is limited 
to multiple linear regression model. First the sample size will be selected with different 
size and estimate the parameter. The bias and variance will be observed then the 
relationship between the bias and variance will be investigated. The distribution also will 
be observed by varying with the increase in the sample size. The value of bootstrap 
resampling with different bootstrap replicates and sample size gives less bias than 
ordinary least square. The Jackknife coefficient is calculated by using, 

�̂�𝑗 =
1

𝑛
∑ �̂�𝑗𝑖

𝑛

𝑖=𝑛

(1) 

where n is the sample size and �̂�𝑗𝑖 parameter estimate for each sample formed after 

deleting one of the observations. While the bootstrap coefficient is calculated from  

�̂�𝑏 =
1

𝐵
∑�̂�𝑏𝑟

𝐵

𝑟=1

(2) 

�̂�𝑏𝑟 = �̂�𝑜𝑙𝑠 + (𝑥
′𝑥)

−1
𝑥′𝑒𝑏𝑟 (3) 

where 𝑟 = 1,2,…,𝐵 is bootstrap replicate, 𝑒𝑏𝑟 is error of the regression,𝑥 is the 

independent variable and �̂�𝑜𝑙𝑠 is the parameter estimate from ordinary least square 
method. 

 
Comparisons between Resampling Techniques in Linear Regression: A Simulation Study 

Anwar Fitrianto 347 

METHODS 

Data 

The data used in this study is pressure-dropping data, which is available in 
Montgomery et al., [16]. It has one dependent variable 𝑦, and four independent variables, 
that is 𝑥1,𝑥2,𝑥3 and 𝑥4. There are 62 observations in the data. The data was collected from 
research where the pressure drop was measured for two-phase flow through screen-plate 
bubble columns. The research was conducted to test the reason of the pressure drop 
through the bubble cap. A bubble column is used to observe the reaction between the gas 
and liquid.  

The first factor considered in that research is the superficial fluid velocity of the gas. 
The gas's speed and direction of motion are measured by flow in the column. The second 
factor is the kinematic viscosity. The friction caused by the thickness of gas when the gas 
moves through the liquid particles was calculated. Then the distance across the space 
between two parallel threads was considered. The last factor used in research is the 
dimensionless number, which is not associated with the physical dimension. It is 
calculated to relate the gas's superficial fluid velocity and the liquid's superficial fluid 
velocity. For building the model, the dependent variable 𝑦 denotes the dimensionless 
factor for the pressure drop through a bubble cap. The independent variables are 𝑥1 
(superficial fluid velocity of the gas (𝑐𝑚 𝑠⁄ ), 𝑥2 (kinematic viscosity), 𝑥3 (mesh opening, 
cm), and 𝑥4 (dimensionless number relating the gas's superficial fluid velocity to the 
liquid's superficial fluid velocity). 

Simulation Study Scenarios 

The original data will be analyzed using ordinary least square regression data. Then 
assumptions checkings will be conducted using the residuals of the model.  Then, using 
the sampe original data, resampling techniques using the residuals and random bootstrap 
resampling  will be conducted with four different sample sizes, which are 20, 40,50 and 
62. Each sample will be used in three different bootstrap replicates, namely 100, 1000 and 
10000.  

For the delete-one Jackknife bootstrap, the resampling will be conducted at different 
sample sizes, namely 20, 40, 50 and 62. The bias, variance, standard error and p-value will 
be calculated for each method. The best method among this three methods will be chosen 
according to the value of bias, variance, standard error and p-value. 

RESULTS AND DISCUSSION  

In this study, full model was used for the reference, which means all independent 
variables were included in the model regardless the significance of the variables. The 
fitted full regression model which was obtained based on ordinary least square using SAS 
software  is written as follows:   

�̂� = 5.88839 − 0.48460𝑥1 + 0.18263 𝑥2 + 35.39109𝑥3 + 5.92695𝑥4  

Random Bootstrap Approach 

Random bootstrap technique was first used to analyze the data. The resampling was 
conducted at different sample size 20, 40, 50 and 62. The bootstrap replication were 
applied in every sample size, namely 100, 1000 and 1000.  
 
 
Comparisons between Resampling Techniques in Linear Regression: A Simulation Study 

Anwar Fitrianto 348 

Table 1. Summary Statistics for Multiple Linear Regression Using Random Bootstrap at Different Bootstrap  
Replicates and Sample Sizes for 𝛽0 and 𝛽3 

Parameter 
Estimate 

Bootstrap 
Replicate 

Sample 
Size 

Bias Variance p-value 
Standard 
Error 

�̂�0 

100 

20 -2.2181 85.6307 0.0001 0.9254 

40 3.3626 22.8120 <.0001 0.4776 

50 1.5437 15.1000 <.0001 0.0469 

62 -0.6707 19.9445 <.0001 0.4466 

1000 

20 -1.1044 83.9495 <.0001 0.2897 

40 2.9549 34.4348 <.0001 0.0012 

50 1.4503 18.4197 <.0001 0.1357 

62 -0.8994 19.1731 <.0001 0.1385 

10000 

20 -1.2754 203.4686 <.0001 0.0108 

40 2.6252 41.8398 <.0001 0.0647 

50 1.3527 18.8707 <.0001 0.0434 

62 -0.9042 4.9842 <.0001 0.0461 

�̂�3 

100 

20 2.6410 574.9345 <.0001 2.3978 

40 -5.7656 111.8724 <.0001 1.0577 

50 -5.3883 61.2876 <.0001 0.7829 

62 1.0310 125.2369 <.0001 1.1191 

1000 

20 2.4814 629.8633 <.0001 0.7936 

40 -5.4017 211.8649 <.0001 0.4603 

50 -4.5249 73.4070 <.0001 0.2709 

62 1.6356 116.7295 <.0001 0.3417 

10000 

20 3.1247 634.0890 <.0001 0.2518 

40 -4.5548 261.6325 <.0001 0.1618 

50 -4.2045 87.0297 <.0001 0.0933 

62 1.8947 37.2858 <.0001 0.1146 

 
Table 1 shows the changes in �̂�3 
and �̂�0 at different sample sizes and bootstrap replicates. 

For each parameter estimate, as the sample size changes, the bias changes. More 
specifically, the bias is getting smaller as the sample size increases.  

The variance of �̂�3 
decreases from 574.9345 when the sample is 20 to 61.2876 when 

the sample size is 50. But, the bias of �̂�3 increases when the sample is 62 . It can be 
observed that as the sample size increased from 20 to 62, the variance of parameter 
estimates decreased. Meanwhile, the bias decreases as the bootstrap replicate increases. 
For B was set to 100, the intercept shows bias as 1.5437. This value decreases to 1.4503 
when the number of bootstrap replicates, B, increases to 1000. When the number of 
bootstrap replicates was increased to 10000, the bias decreases again to 1.3527. From the 
results, it can be observed that the bias decreases as the replicate increases. When the 
bootstrap replicate, B increases from 100 to 1000, the variance decreases from 125.2369 
to 116.7295. It decreases further to 37.2858 when B is equal to 10000, which shows 
70.23% difference when we compare to 125.2369. 


Comparisons between Resampling Techniques in Linear Regression: A Simulation Study 

Anwar Fitrianto 349 

Residual Bootstrap Approach 

The second resampling technique that has been used to analyze the data was residual 
bootstrap. This section displays some results such as parameter estimates, bias, and 

variances of the parameter estimates using residual bootstrap. The results of �̂�0 and �̂�1 
are shown in Table 2. In residual bootstrap, the results were more apparent than in 
random bootstrap. It shows a clear trend of parameter estimates, bias, and variance at 
different sample sizes and the number of bootstrap replicates. The bias decrease as the 
sample size increases. When 𝑛 = 20, the bias is 0.2307. Then when the sample increased 
to 40 the bias became 0.2266 and bias is 0.0684 when the sample size is 50 and at last, 
when 𝑛 is 62 the bias became 0.01368. In general, there is a noticeable difference in bias 
when the sample size increases. 
 
Table 2. Summary Statistics for Multiple Linear Regression Using Residual Bootstrap at Different Bootstrap 

Replicates and Sample Sizes for 𝛽0 and 𝛽1 

 
The resampling techniques in Table 2 show a clear decrease of the variances when the 

sample size increases. Let’s consider the changes in the variance of �̂�0 
when the bootstrap 

replicate is 1000. When the sample size is 20 the variance is 28.6300, and the value 

Parameter 
Estimate 

Bootstrap 
Replicate 

Sample 
Size 

Bias Variance p-value 
Standard 
Error 

�̂�0 

100 

20 1.5277 30.9685 <.0001 0.5565 

40 2.6535 27.0046 <.0001 0.5197 

50 1.5324 19.8861 <.0001 0.4459 

62 -0.3345 15.4073 <.0001 0.3925 

1000 

20 0.9635 28.6300 <.0001 0.1692 

40 2.3838 22.7581 <.0001 0.1509 

50 2.0622 20.0467 <.0001 0.1416 

62 0.0035 15.4785 <.0001 0.1244 

10000 

20 0.6704 30.9949 <.0001 0.0557 

40 2.2883 24.0894 <.0001 0.0491 

50 2.2491 20.2725 <.0001 0.0450 

62 -0.0193 17.0400 <.0001 0.0413 

�̂�1 

100 

20 0.2307 0.2037 <.0001 0.0451 

40 0.2266 0.1566 <.0001 0.0396 

50 0.0684 0.1098 <.0001 0.0331 

62 0.0137 0.0819 <.0001 0.0286 

1000 

20 0.1630 0.2196 <.0001 0.0148 

40 0.2061 0.1612 <.0001 0.0127 

50 0.0732 0.1322 <.0001 0.0115 

62 -0.0066 0.1025 <.0001 0.0101 

10000 

20 0.1547 0.2103 <.0001 0.0046 

40 0.2180 0.1579 <.0001 0.0040 

50 0.0608 0.1338 <.0001 0.0037 

62 -0.0024 0.1071 <.0001 0.0033 


Comparisons between Resampling Techniques in Linear Regression: A Simulation Study 

Anwar Fitrianto 350 

becomes 22.7581 when the sample size is 40. Then the variance decrease as the sample 
size increases to 50 and 62 where the bias become 19.8861 and 15.4785, respectively. 

Now let’s observe the changes in bias caused by the bootstrap replicate, B, when it is 

increased from hundred to thousand then ten thousand. For the estimated constant, �̂�0, 
when the sample size is 40 the bias changes from 2.6535 to 2.3838, then 2.2883 when B 
increases from 100 to 1000 then 10000, respectively. The variance also decreases when 
the bootstrap replicate increases.  

Delete-one Jackknife Approach 

The third technique that was used in this research is Jackknife delete-one. The method 
was applied with different sample sizes , which are 20, 40, 50 and 62. Table 3 and Figure 
1 display the changes in bias of all parameters for delete-one Jackknife. The bias decreases 
as the sample size increases. But when sample size equal to the population size the bias 
shows an increasing state. Using the population as sample size might show this type of 
result. Plot of variance versus sample size for all parameters are shown in Figure 2. From 
the plot, it can be seen that the variance also shows a decreased state from sample 20 to 
sample 62. Small variances give a better estimation in linear regression. The bias and 
variance also not interrelated in delete-one Jackknife. The p-value also shows that all 
parameter estimates are significant. The standard error also clearly shows that the 
increase in sample size will give a better estimation.  
 

Table 3. Summary Statistics for Multiple Linear Regression Using Delete-one  
Jackknife at Different Sample Size .  

Parameter 
Estimate 

Sample 
Size 

Bias Variance p-value Standard Error 

�̂�0 

20 0.5586 2.9683 <.0001 0.3852 

40 2.2937 0.7335 <.0001 0.1354 

50 2.1648 0.3625 <.0001 0.0851 

62 -3.1721 0.2212 <.0001 0.0597 

�̂�1 

20 0.1617 0.0161 <.0001 0.0284 

40 0.2182 0.0054 <.0001 0.0117 

50 0.0662 0.0046 <.0001 0.0096 

62 0.6613 0.0029 <.0001 0.0069 

�̂�2 

20 0.0249 0.0001 <.0001 0.0017 

40 0.0006 0.0000 <.0001 0.0007 

50 -0.0045 0.0000 <.0001 0.0005 

62 0.0054 0.0000 <.0001 0.0004 

�̂�3 

20 -2.0491 18.1473 <.0001 0.9526 

40 -7.3628 3.7708 <.0001 0.3070 

50 -5.6852 1.4059 <.0001 0.1677 

62 -3.7014 0.9218 <.0001 0.1219 

�̂�4 

20 0.3589 1.9284 <.0001 0.3105 

40 0.6624 0.4470 <.0001 0.1057 

50 -0.0712 0.3889 <.0001 0.0882 

62 0.7431 0.4339 <.0001 0.0837 


Comparisons between Resampling Techniques in Linear Regression: A Simulation Study 

Anwar Fitrianto 351 

 
Figure 1. Changes of Bias in All Parameter Estimation when Sample Size Increases in Delete-one Jackknife. 

 
Figure 2. Changes of Variance in All Parameter Estimation when Sample Size Increases in Delete-one 

Jackknife. 

The difference between residual bootstrap estimation and random bootstrap 
estimation is obvious when the sample size is 20 (small). The residual bootstrap provided 
better parameter estimation than random bootstrap in bias and variance. This shows that 
residual has a big influence in linear regression. But, as the sample size increases, both 
residual and random bootstrap methods show similar results. The increase in bootstraps 
replicates and sample size gave better parameter estimation in both methods. Jackknife 
delete-one gave a small variance, but the value of the bias was big when the sample size 
was small. The bias and variance decrease as the sample size increases. 

CONCLUSIONS 

Residual bootstrap, random bootstrap, and delete-one Jackknife were compared. 
Jackknife is not advisable to use when the sample size is small. However, when the sample 

-7

-6

-5

-4

-3

-2

-1

0

1

2

3

20 40 50 62

b
ia

s

sample size

β4

β3

β2

β1

β0

0

5

10

15

20

25

20 40 50 62

va
ri

a
n

ce

sample size

β4

β3

β2

β1

β0


Comparisons between Resampling Techniques in Linear Regression: A Simulation Study 

Anwar Fitrianto 352 

size is big enough which is near to population size, it will give better parameter estimation 
than random bootstrap and residual bootstrap. In a situation where the sample size is 
small due to cost consideration,  it is better to use residual bootstrap than other methods 
in linear regression. In conclusion, it is advisable to use residual bootstrap when the 
sample is small. The bigger bootstrap replicates will give better parameter estimation. 
The Jackknife can be used when the sample size is big enough. This method will be useful 
when the sample size is too big which may take time to process in both random and 
residual bootstrap. 

In the future, this research can be extended to observe how these methods react when 
there is an outlier, influential point or leverage point. Moreover, the comparisons may 
involve other resampling techniques to compare which method works well in multiple 
linear regression. 

REFERENCES 

 
[1] M. Alrasheedi, “Parametric and non-parametric bootstrap: A simulation study for a 

linear regression with residuals from a mixture of Laplace distributions,” European 
Scientific Journal, vol. 9, no. 12, 2013. 

[2] R. F. Gunst and R. L. Mason, Regression analysis and its application: a data-oriented 
approach. CRC Press, 2018. 

[3] A. Althubaiti, “Information bias in health research: definition, pitfalls, and 
adjustment methods,” J Multidiscip Healthc, vol. 9, p. 211, 2016. 

[4] M. R. Chernick, “Resampling methods,” Wiley Interdisciplinary Reviews: Data Mining 
and Knowledge Discovery, vol. 2, no. 3, pp. 255–262, 2012. 

[5] R. E. McRoberts, S. Magnussen, E. O. Tomppo, and G. Chirici, “Parametric, bootstrap, 
and jackknife variance estimators for the k-Nearest Neighbors technique with 
illustrations using forest inventory and satellite image data,” Remote Sensing of 
Environment, vol. 115, no. 12, pp. 3165–3174, 2011. 

[6] R. G. Clark and S. Allingham, “Robust resampling confidence intervals for empirical 
variograms,” Mathematical Geosciences, vol. 43, no. 2, pp. 243–259, 2011. 

[7] S. Sahinler and D. Topuz, “Bootstrap and jackknife resampling algorithms for 
estimation of regression parameters,” Journal of Applied Quantitative Methods, vol. 
2, no. 2, pp. 188–199, 2007. 

[8] X. Li, W. Wong, E. L. Lamoureux, and T. Y. Wong, “Are linear regression techniques 
appropriate for analysis when the dependent (outcome) variable is not normally 
distributed?,” Invest Ophthalmol Vis Sci, vol. 53, no. 6, pp. 3082–3083, 2012. 

[9] Z. Y Algamal and K. B Rasheed, “Re-sampling in linear regression model using 
jackknife and bootstrap,” IRAQI JOURNAL OF STATISTICAL SCIENCES, vol. 10, no. 18, 
pp. 59–73, 2010. 

[10] J. Ma et al., “Probabilistic forecasting of landslide displacement accounting for 
epistemic uncertainty: a case study in the Three Gorges Reservoir area, China,” 
Landslides, vol. 15, no. 6, pp. 1145–1153, 2018. 

[11] C. Wan, Z. Xu, Y. Wang, Z. Y. Dong, and K. P. Wong, “A hybrid approach for 
probabilistic forecasting of electricity price,” IEEE Transactions on Smart Grid, vol. 
5, no. 1, pp. 463–470, 2013. 

[12] G. A. Nelson, “Cluster sampling: a pervasive, yet little recognized survey design in 
fisheries research,” Trans Am Fish Soc, vol. 143, no. 4, pp. 926–938, 2014. 


Comparisons between Resampling Techniques in Linear Regression: A Simulation Study 

Anwar Fitrianto 353 

[13] P. Phaladiganon, S. B. Kim, V. C. P. Chen, J.-G. Baek, and S.-K. Park, “Bootstrap-based 
T 2 multivariate control charts,” Communications in Statistics—Simulation and 
Computation®, vol. 40, no. 5, pp. 645–662, 2011. 

[14] J. Shao and D. Tu, The jackknife and bootstrap. Springer Science & Business Media, 
2012. 

[15] U. Beyaztas and A. Alin, “Sufficient jackknife-after-bootstrap method for detection 
of influential observations in linear regression models,” Statistical Papers, vol. 55, 
no. 4, pp. 1001–1018, 2014. 

[16] D. C. Montgomery, E. A. Peck, and G. G. Vining, Introduction to linear regression 
analysis. John Wiley & Sons, 2021.