Approach of the value of a rent when non-central moments of the capitalization factor are known: an R application with interest rates following normal and beta distributions


Ratio Mathematica                                                  Volume 39, 2020, pp. 7-32   

 
7 

 
A simple goodness-of-fit test for 

continuous conditional distributions 

 
Peter Veazie* 

Zhiqiu Ye† 
 

Abstract  

This paper presents a pragmatic specification test for conditional 

continuous distributions with uncensored data.  We employ Monte 

Carlo (MC) experiments and the 2011 Medical Expenditure Panel 

Survey data to examine coverage and the power to discern 

deviations from the correct model specification in distribution and 

parameterization. We carry out MC experiments using 2000 runs 

for sample sizes 500 and 1000. The experiments show that the test 

has accurate coverage under correct specification, and that the test 

can discern deviations from the correct specification in both the 

distributional family and parameterization. The power increases as 

sample size increases. The empirical example shows the test’s 

ability to identify specific distributions from other candidates using 

real cost data. Although the test can be used as a goodness-of-fit 

test for marginal distributions, it is particularly useful as an easy-

to-use test for conditional continuous distributions, even those with 

one observation per pattern of explanatory variables. 

Keywords: Goodness-of-fit test; model specification test; 

conditional continuous distributions. 

2010 AMS subject classification: 62F03‡ 

 
* University of Rochester, Rochester New York, USA; peter_veazie@urmc.rochester.edu. 
† University of Rochester, Rochester New York, USA; sophieye999@gmail.com. 
‡ Received on June 10th, 2020. Accepted on December 17th, 2020. Published on December 31st, 

2020. doi: 10.23755/rm.v39i0.524. ISSN: 1592-7415. eISSN: 2282-8214. ©Peter Veazie et al. 

This paper is published under the CC-BY licence agreement. 


P. Veazie, Z. Ye 

 
8 

 
1  Introduction 

To determine whether a probability model is statistically adequate for 

representing a data generating process (DGP), it is common to test whether the 

model fits with a data set produced by that process. The investigation into the 

model specification of a conditional distribution is fundamental for methods 

such as Maximum Likelihood Estimation (MLE), which is consistent and 

asymptotically efficient only if the distribution is correctly specified 

(Amemiya, 1985). However, there are two key challenges for a general test of 

continuous conditional distribution models, if it is to be broadly adopted in 

applied sciences such as social and health sciences: First, is the sparse 

empirical information regarding the conditional distribution when patterns of 

the explanatory variables have few corresponding observations. Second, is the 

ease of use: many researchers do not have the background, time, or inclination 

to engage in complicated programming in order to implement a statistical 

test—to be useful to such researchers a test must be easily implemented.   

Regarding sparse information, consider the data shown in Figure 1: 

although some data points appear close to each other, for most of the data there 

is no more than one observation at each value of x. Consequently, the 

empirical distribution of random variable Y conditioned on such a value for 

variable X is based on a trivial point mass. How then can we test a model of 

the conditional distribution of Y for such sparse data?  

 Regarding ease of use, existing tests for conditional distributions require 

more mathematical and computational skill than many applied researchers may 

have to make their implementation generally accepted.  Some of these tests 

require the use of kernel or local polynomial functions with arbitrary 

smoothing parameters (Zheng, 2000, Fan et al., 2006). Others, such as the 

Conditional Kolmogorov Test, compare model and distribution functions 

additionally incorporating the empirical distribution functions of the 

conditioning set of variables (Andrews, 1997).  Transformations to the unit 

interval have been applied to construct tests for goodness of fit such as the 

Rincon-Gallardo et al. test for multivariate normality (Rincon-Gallardo et al., 

1979). However, their method is also technically difficult and computationally 

intense in general applications due to procedures involved in the 

transformation (O'Reilly and Quesenberry, 1973). Additionally, some are 

dependent on the order of the data being transformed (O'Reilly and Stephens, 

1982); therefore, researchers may obtain variant test results if the same data 

were ordered differently. What is needed for the applied researcher who does 

not have the mathematical or programming skills to meaningfully implement 

complex algorithms is a simple pragmatic test. This paper presents a pragmatic 


A Simple Goodness-of-fit Test 
 

9 

 
general goodness-of-fit statistic for continuous conditional models using 

uncensored data.  

In the next section, we introduce the goodness-of-fit test and the rationale 

behind it. We then evaluate the performance of the goodness-of-fit statistic in 

Section 3 using two groups of Monte Carlo experiments. The first group of 

experiments focuses on discerning deviations from correct specification in the 

distributional family; the second group focuses on discerning deviations in 

parameterization. We choose these investigations because they represent the 

two misspecification issues in estimating conditional probability models. In 

Section 4, we apply the goodness-of-fit test to the 2011 Medical Expenditures 

Panel Survey (MEPS) dataset, modeling three health care expenditure 

outcomes as functions of patient characteristics. Finally, in Section 5, we 

conclude our paper with a summary of the findings and discussions about the 

applications of the goodness-of-fit test. The Appendix provides the expected 

value of the statistic and the procedure for the calculation of the degrees of 

freedom for the test statistics, the data generating process for each Monte 

Carlo experiment, and the analyses modelling cost data from MEPS. 

Figure 1. A typical conditional Gumbel distribution with sparse observations 

for each conditional value of observed X. 

-1
0

-5
0

5
1

0
Y

-5 0 5 10
X

F( Y | X ) a Gumbel Distribution


P. Veazie, Z. Ye 

 
10 

 
2  A proposed goodness-of-fit test  

The Pearson Chi-square goodness-of-fit statistic is based on comparing the 

number of actual observations within each set of a partition of the random 

variable’s range to the number of observations that would be expected to show 

up in those sets if the model correctly represents the DGP (Schervish, 1995). If 

the model is correct, then the expected number of observations is the expected 

number for the DGP; consequently, the observed and predicted number in each 

set should be different merely by random variation.   

The Chi-square goodness-of-fit statistic for continuous distributions is 

created by partitioning the range of a continuous random variable Y into K 

regions. Denote each region k{1, 2, …K} as 

 
the number of observations with values of y in region Rk as Nk, and the total 

sample size as N. The probability of an observation with y in region Rk is then 

 
in which fY(y) is the probability density for y associated with a cumulative 

distribution function (CDF) FY(y).   The Chi-square statistic is defined as 

 
The corresponding sample statistic is 

 
in which  is a consistent estimator of . If FY(y;) accurately represents 

the data generating process,  converges to C with increasing N and the 

corresponding asymptotic distribution of CN is a Chi-square with degrees of 

freedom equal to the number of groups in the partition minus the number of 

estimated parameters plus one (Schervish, 1995).   

If we are interested in a model of the conditional distribution F(y|x;), the 
preceding statistic is not generally applicable because Nk can contain 

insufficient observations to inform the conditional distribution. Indeed, with x 


A Simple Goodness-of-fit Test 
 

11 

 
containing precisely measured continuous variables, there may be only one 

value y for some observed x values (see Figure 1 as an example).  However, 

we can take advantage of the probability integral transform and consequent 

fact that the CDF of a continuous random variable is itself a random variable 

with a uniform distribution on the unit interval.  Because the uniform 

distribution is the same regardless of underlying CDF, a set of random 

variables from independent observations with different conditional 

distributions can all be converted by their CDFs to the same uniform 

distribution.  We can use this fact to construct a test of the conditional 

distribution; even if each observation has a different conditioning value (i.e. 

the data in Figure 1 will pose no problem for this test).  

Because the CDF for each random variable has a uniform distribution, the 

CDF values of sample results from a correctly specified model for each 

random variable will produce a single realization from a uniform distribution.  

Therefore, the full sample results should together provide a histogram that 

deviates only by chance from a uniform distribution.  We can use a Pearson 

Chi-square type statistic applied to the uniform distribution to test of the 

specification for the conditional distributions.   

The process is quite simple.  For each observation i we have a model 

specification for the distribution F(yi|xi) and therefore can obtain from the 

estimated model the sample quantity ui = F(yi|xi) for which (xi, yi) are the 

observed values for observation i.   The random variable underlying ui has a 

uniform distribution on the unit interval if F is correctly specified.  We can 

construct a goodness-of-fit test by partitioning the unit interval into K 

subintervals defined by equally spaced boundary points, which for K = 10 is  

  s.t. k {1, 2, … 10}. 

The statistic is then 

 
for which N is the total sample size, and Nk is the observed number of u values 

in the Rk interval.  This statistic can alternatively be written as 

 
P. Veazie, Z. Ye 

 
12 

 
in which  is the observed proportion in interval Rk.  Because the statistic is 

based on a partition of the uniform into K equal sized intervals, Pk = 1 K ; 

therefore,     
21ˆ( )

k K
U K N P=   − . 

As shown in the Appendix, the expected value of U, which is the degrees of 

freedom for its approximating Chi-square distribution, is equal to the degrees 

of freedom for the usual Pearson Chi-square test (i.e. K − 1) minus a factor due 

to the estimation of model parameters. 

Since Pk is known, which in the case of K = 10 intervals is 0.1, we can 

simply state the statistic for K = 10 as 

 
The selection of K = 10 is arbitrary, as it is with the Hosmer-Lemeshow test 

for logistic regression (Hosmer and Lemeshow, 1980).  For other values of K, 

the degrees of freedom can be directly estimated as shown in the Appendix or 

determined by Monte Carlo simulation (see Box 2).   

The U statistic has a distribution proportional to the sum of gamma random 

variables with different parameters.  Specifically, denoting 1ˆ
k K

P −  as zk, as 

shown in the appendix zk is asymptotically normally distributed with mean 0 

and variance k
2.  Consequently, the ratio of zk squared to k

2 has an 

asymptotic Chi-square distribution with degrees of freedom 1, which is a 

Gamma distribution with parameters 0.5 and 2 (i.e. (0.5, 2)).  Therefore, zk
2 

has a distribution k
2(0.5, 2), which is (0.5, 2k

2). U is therefore 

proportional to the sum of K differently scaled gamma random variables.  

Moschopoulos shows that sum of such variates can be express as a gamma 

series in which the series coefficients can be recursively determined 

(Moschopoulos, 1985).  The use of this recursive coefficient determination and 

gamma series is overly complex for the practical application of this statistic 

among many applied researchers.  However, ease of use is the purpose of this 

goodness-of-fit statistic.  Fortunately, the Monte Carlo experiments presented 

below indicate that for a correct specification the statistic is approximately 

Chi-square in distribution with degrees of freedom 7.5 when K = 10 and 

calculated as shown in the Appendix or as shown in Box 2 if K is not 10. 

 
A Simple Goodness-of-fit Test 
 

13 

 
3  Simulation experiments 

3.1 Methods 

We investigated finite sample performance of the proposed statistic using 

Monte Carlo experiments of conditional Normal, Gumbel, Gamma, and 

Weibull models, each applied to data generating processes based on the same 

set of distributions.  The first set of experiments comprised a total of sixteen 

model/DGP comparisons.  We evaluated each model/DGP pair for sample 

sizes 500 and 1000, each using 2000 Monte Carlo samples from the DGP (see 

Appendix Table A1 for parameter specifications).  We inspected rejection 

rates for significance levels spanning between 0 and 0.2 for each comparison.  

For each correct model/DGP pair (i.e. Normal/Normal, Gumbel/Gumbel, 

Gamma/Gamma, and Weibull/Weibull), the plot of the empirical cumulative 

distribution function (eCDF) of the calculated p values, across the 2000 MC 

samples, should approximately match the significance level (i.e. this plot 

should be approximately a straight line).  For example, the use of a 

significance level of 0.01 should reject the model for approximately 1 per cent 

of the 2000 samples; using a significant level of 0.05 should reject 

approximately 5 per cent of the samples; and a 0.1 significance level should 

result in approximately 10 per cent rejections.  For mismatched pairs (e.g. 

Weibull/Gumbel), if the fit test is useful it should produce rejection rates that 

are higher than the significance levels and increase with sample size; 

consequently, the eCDF of the test’s p-value should be above the significance 

level. 

The second set of experiments compared models in which parameters are 

specified as linear in conditioning variables to the DGP having the same 

distributional family but with parameters quadratic in the conditioning 

variables (see Appendix Table A2 for parameter specifications). For the 

normal distribution, we estimated models with homoscedasticity and 

heteroscedasticity. In the case of heteroscedasticity both the mean and variance 

were generated as quadratic in X in the DGP, but they were modeled as linear 

in the misspecified model. Similarly, we carried out experiments for sample 

sizes 500 and 1000.   These experiments provide evidence regarding whether 

the test can identify deviations in parameterization as well as distributional 

family. In the Monte Carlo experiments reported below, we applied the steps 

presented in Box 1 to obtain p-values for each of 2000 data sets generated for 

each model/DGP being considered.  We calculated both a p-value using 

degrees of freedom equal to 7.5 and also using the mean of the 2000 calculated 

U values for each DGP considered when using the correct model (remember 

that the degrees of freedom are associated with the distribution of U given the 

model is correct). 


P. Veazie, Z. Ye 

 
14 

 
3.2 Results 

Because we tested continuous conditional distributions, it is difficult to see 

the differences between the model and DGP for all patterns of explanatory 

variables.  However, Table 1 shows the probability density functions for the 

true DGP (in the solid line) and the estimation model (the dotted line, using the 

average parameter values across the 2000 estimated models) evaluated at the 

mean of X.  This gives some sense of the differences between the distributions 

being tested in the first set of experiments; however, the deviation of the 

model from the underlying distribution that drives larger values of U may be 

from other regions of the conditioning set than at the mean of X. 

Tables 2 and 3 present the eCDFs of the statistic’s p-values for each 

indicated model applied to the indicated DGP plotted for significance levels up 

to 0.2.  Table 2 presents results for sample sizes of 500; Table 3 presents 

results for sample sizes of 1000.  The thin straight lines show the points where 

the eCDFs would be if it corresponded to the significance level.  The thick 

dark lines (or curves) are the eCDFs associated with p values based on degrees 

of freedom set to 7.5.  The thick light lines (or curves) are the eCDFs 

associated with the Monte Carlo based empirical degrees of freedom.  We 

determined the 7.5 degrees of freedom approximation by the average of the 

four empirical degrees of freedom across the DGPs using sample sizes of 

1000.   We also ran Monte Carlo experiments for correct model specifications 

using 10 correlated explanatory variables (results not presented); these 

BOX 1.  How to calculate U and its p-value  
using K = 10 

Step 1. For a candidate model F(yi | xi ; ), estimate the parameters, 

obtaining̂ . 

Step 2. Calculate the CDF value of ui = F(yi | xi ; ̂  ) for each observation (yi , 
xi) in the data. 

Step 3. Calculate the proportion ( ˆkP  ) of ui in each of the ten intervals Rk for k 

 {1, 2, … 10}. 
Step 4.  Calculate the statistic U using the equation 

. 

Step 5. Calculate the p-value as the upper tail area of a Chi-square distribution 
with degrees of freedom set to 7.5 or set to the estimated value 
determined by the equations presented in the Appendix or the model-
specific Monte Carlo determined empirical degrees of freedom (see 
Box 2 for the algorithm).  


A Simple Goodness-of-fit Test 
 

15 

 
experiments showed that the empirical degrees of freedom remained around 

7.5 in multivariable models.  Specifically, the means of the U statistics, and 

therefore the degrees of freedom, in these experiments for the Normal, 

Gamma, Weibull, and Gumbel were 7.52, 7.32, 7.54, and 7.27 respectively. 

The figures on the diagonals of Tables 2 and 3 show the coverage of the test 

when the model is correctly specified.  The results fell along the line 

representing accurate coverage: the eCDF corresponds to the significance 

level.  Not surprisingly, the empirical degrees of freedom (the thick lighter 

line) were more accurate than using the approximate degrees of freedom of 

7.5; however, the differences were slight, particularly up to the 0.1 

significance level. 

The off-diagonal figures in Tables 2 and 3 show the rejection rate for the 

test of misspecified models across significance levels.  The test was 

sufficiently powerful for some of the model/DGP combinations to reject the 

model for all 2000 samples at all significance levels greater than 0.001.  

Results for these combinations are simply indicated by the phrase ‘ALL 

DATA SETS REJECTED AT SIGNIFICANCE LEVEL 0.001’.   Not 

surprisingly, comparing Table 2 to Table 3, the curve has a greater departure 

from the straight line in Table 3; it is evident that the power of the test 

increases with sample size.  It is also clear that using the approximate 7.5 

Table 1. Probability density functions of the true data generating process (solid 

curve) and the estimated model (dashed curve) evaluated at X=0 for the Monte 

Carlo simulations. 


P. Veazie, Z. Ye 

 
16 

 
Table 2. Monte Carlo Simulation: Empirical CDFs of Experiments on Distribution 

Specifications (N=500). 

degrees of freedom provide similar results to that of using the Monte Carlo 

determined empirical degrees of freedom. 

Table 4 presents results for the second set of experiments, which tested 

deviations from correct specification in the parameterization. The upper two 

rows show results for sample sizes of 500; the lower two rows show results for 

sample sizes of 1000.  Similar to the first set of experiments, results showed 

accurate coverage for the test when the model was correctly specified and the 

ability to discern deviations from correct specification in parameterization.  As 

the sample size went up, the power of the test to discern such deviations 

increased. The approximate 7.5 degrees of freedom yields results that were 

similar to the Monte Carlo calculated empirical degrees of freedom. 

4  Example 

To present an example with real data, we used a random sample of 2000 

individuals from the Household Component of the 2011 Medical Expenditure 

Panel Survey data file (MEPS). As one of the largest national health survey, 


A Simple Goodness-of-fit Test 
 

17 

 
MEPS has been widely used to study the patterns of health care access, 

utilization and expenditures in the United States (Cohen et al., 2009). We 

modeled each of the three outcomes – annual total health care expenditure, 

total office-based visits expenditure, and total dental care expenditure – as a 

function of individual demographics, socioeconomic status, self-rated health 

status, common chronic conditions, presence of usual source of care provider, 

and health insurance coverage. These covariates were selected in accordance 

with prior studies focusing on modeling health care costs using MEPS survey 

data (Fenton et al., 2012, Fleishman and Cohen, 2010). 

For each model, we included all individuals who reported an expense on the 

outcome of interest and took the log of the expenditure as the dependent 

variable. There were 1527 and 1215 individuals reporting expenses on health 

care and office-based services, which represented 76.4% and 60.8% of the 

total sample, respectively. Much fewer individuals reported any expenses on 

dental care (N= 724, 36.2%).  Appendix Table A3 presents the descriptive 

statistics and distribution of the outcome variables and the covariates that we 

employed in the model.   

We used Pregibon’s link test (Pregibon, 1980) to identify a statistically 

Table 3. Monte Carlo Simulation: Empirical CDFs of Experiments on Distribution 

Specifications (N=1000). 


P. Veazie, Z. Ye 

 
18 

 
adequate specification of the explanatory variables for each model. We then 

computed U to test the hypothesis that the specified distribution was correct. 

This allows us to use the test to focus on testing deviations in the distributional 

family. We calculated the p-value based on the approximate degrees of 

freedom of 7.5 and the empirical degrees of freedom calculated from the 

parameter estimates of the specified model, based on 500 Monte Carlo 

samples. The algorithm for computing the empirical degrees of freedom is 

shown in Box 2.   Table 5 presents the results from the empirical example for 

the three health care expenditure outcomes. The test clearly discerns the 

goodness-of-fit performance of different distributions. Results for the model of 

the logarithm of total health care expenditure strongly rejected the hypothesis 

that the conditional distribution follows a Gamma, Weibull or Gumbel 

distribution (U ranges from 21.985 to 116.578, p-value < 0.001 for all), and 

unequivocally failed to reject the hypothesis for normal (U = 5.304, p-value = 

0.676 with approximate degrees of freedom of 7.5). For the model of office-

based visits expenditure, we strongly rejected the hypotheses for the Gumbel 

and Weibull distribution (U equals to 59.626 and 33.776, respectively, p-value  

Table 4. Monte Carlo Simulation: Empirical CDFs of Experiments on Parameter 

Specifications. 


A Simple Goodness-of-fit Test 
 

19 

 
< 0.001 for both) and fail to reject the Normal (U=12.467, p-value = 0.107) or 

Gamma (U = 8.897, p-value = 0.305). For the model of dental care 

expenditure, we rejected all distributions except for the Gumbel (U = 14.333, 

p-value = 0.058 with degrees of freedom of 7.5). Figures A1-A3, in the 

Appendix, show the histograms of the residuals obtained from these models, 

standardized by the estimated standard deviations.  Figure A1 shows the 

symmetry expected of a Normal distribution, which was not rejected by the 

test that unambiguously rejected the other distributions. Figure A2 shows a 

right-skewedness characteristic of a Gamma distribution (Model 1), but it is 

insufficiently skewed to reject the Normal at a significance level of 0.05. 

However, U is smaller in the Gamma indicating a better fit to the data.   Under 

certain circumstances (i.e. shape parameter sufficiently large, >15), the 

Gamma distribution is approximately a Normal distribution (Rothschild and 

Logothetis, 1986). In this real-data example, the estimated shape parameter 

equaled to 35 in the model assuming Gamma distribution. It is therefore not 

surprising the test did not reject either the Gamma or the Normal distributions. 

Figure A3 demonstrates the clear right-skewedness of the residual from the 

model of dental care expenditure, which is expected of a Gumbel distribution. 

The calculated Monte Carlo empirical degrees of freedoms were 

approximately 7.5 for all three outcomes and therefore yielded similar results.  

As there were 21 variables in the empirical model, these results again show 

that the degrees of freedom for the statistic distribution based on 10 categories 

is approximately 7.5 in multivariable models.  

 
Table 5. Empirical Example: Goodness-of-Fit Tests on Conditional Probability 

Models for Log-Transformed Health Expenditures from MEPS. 

5  Conclusion 

In this paper, we presented a simple specification test for conditional 

continuous distributions using uncensored data (see Box 1).  We showed, 

using simulation experiments, that the test has accurate coverage under correct 

specification, and that the test can discern deviations from correct specification 

in both the distributional family as well as parameterization. The empirical 

example shows its ability to distinguish specific distributions from other 

candidates using real data. 


P. Veazie, Z. Ye 

 
20 

 
The results of our analysis indicate that U is approximately distributed Chi-

square with degrees of freedom 7.5.  We also provide a Monte Carlo method 

for an empirical determination of degrees of freedom in Box 2 and a direct 

estimator in the Appendix should the researcher not wish to use the 

approximating 7.5, for example when the p-value using the approximating 7.5 

degrees of freedom is close to the test’s designated significance level.  

However, comparing the empirical degrees of freedom to 7.5 across all Monte 

Carlo experiments and real-data analyses of our study, the differences were 

slight and not likely to impact inferences.  If a researcher does not wish to 

approximate the distribution using a Chi-square, a p-value based on the Monte 

Carlo distribution of statistic values generated in the process of Box 2 can be 

used as a parametric bootstrap test (Davison et al., 2003). 

Because the test discerns deviations in parameterization as well as the 

distributional family, an extra step is required to investigate the distributional 

family alone.  Specifically, the researcher should engage in standard tests to 

identify the best parameter specification within each proposed model (e.g. we 

used Pregibon’s link test in the preceding example).  Using the best within-

family model specification, the test will then primarily be identifying 

deviations in the distributional family.  

It is important to note that our results using multiple explanatory variables 

in the models indicate the degrees of freedom for the statistic’s distribution is 

not a function of the number of estimated parameters.  This is different from 

BOX 2.  How to calculate the empirical degrees of freedom 

Step 1. Obtain the parameter estimates predicted from the estimated model ( ). 

Step 2. Generate outcome values as random draws from the distribution 

defined by the estimated parameters for all xi in the 

data. 

Step 3. Re-estimate the model using the generated outcomes. 

Step 4. Obtain the predicted parameter estimates ( ) from using the ‘correctly’ 
specified model in Step 3. 

Step 5. Calculate the value of  for each observation. 

Step 6. Calculate U. 

Step 7. Repeat the steps 2 through 6 multiple times (e.g. we repeated 500 times 
in the empirical example), saving the statistic values. 

Step 8. Set the degrees of freedom to the mean of the calculated U values. 


A Simple Goodness-of-fit Test 
 

21 

 
the direct application of the Pearson Chi-square test to distributions with 

multiple parameters in which the degrees of freedom depend on the number of 

parameters m.  This is an advantage since the degrees of freedom in the latter 

case is typically K−m−1, which implies m must be less than K −1 for those 

applications (Schervish, 1995): our test does not have this constraint. 

Although our test can be used as a goodness-of-fit test for marginal 

distributions, it is particularly useful as an easy-to-use model fit test of 

continuous conditional distributions for uncensored data, particularly in the 

case of few observations, indeed even one observation per pattern of 

explanatory variables, such as a time-series. 

 
P. Veazie, Z. Ye 

 
22 

 
References 

AMEMIYA, T. 1985. Advanced Econometrics, Cambridge, MA, Harvard 

University Press. 

ANDREWS, D. W. K. 1997. A Conditional Kolmogorov Test. Econometrica, 

65, 1097-1128. 

COHEN, J. W., COHEN, S. B. & BANTHIN, J. S. 2009. The medical 

expenditure panel survey: a national information resource to support 

healthcare cost research and inform policy and practice. Med. Care., 47, 

S44-50. 

DAVISON, A. C., HINKLEY, D. V. & YOUNG, G. A. 2003. Recent 

developments in bootstrap methodology. Statistical Science, 18, 141-157. 

FAN, Y. Q., LI, Q. & MIN, I. 2006. A Nonparametric bootstrap test of 

conditional distributions. Economet. Theor., 22, 587-613. 

FENTON, J. J., JERANT, A. F., BERTAKIS, K. D. & FRANKS, P. 2012. The 

cost of satisfaction: a national study of patient satisfaction, health care 

utilization, expenditures, and mortality. Arch. Intern. Med., 172, 405-11. 

FLEISHMAN, J. A. & COHEN, J. W. 2010. Using Information on Clinical 

Conditions to Predict High-Cost Patients. Health. Serv. Res., 45, 532-552. 

HOSMER, D. W. & LEMESHOW, S. 1980. A goodness-of-fit test for the 

multiple logistic regression model. Communications in Statistics Part A-

Theory and Methods, 10, 1043-1069. 

MOSCHOPOULOS, P. G. 1985. The distribution of the sum of independent 

gamma random variables. Annals of the Institute of Statistical Mathematics, 

37, 541-544. 

O'REILLY, F. J. & QUESENBERRY, C. P. 1973. The Conditional Probability 

Integral Transformation and Applications to Obtain Composite Chi-Square 

Goodness-of-Fit Tests. Ann. Statist., 1, 74-83. 

O'REILLY, F. J. & STEPHENS, M. A. 1982. Characterizations and Goodness 

of Fit Tests. J. Roy. Statist. Soc. Ser. B, 44, 353-360. 

PREGIBON, D. 1980. Goodness of Link Tests for Generalized Linear Models. 

J. Roy. Statist. Soc. Ser. C, 29, 15-24. 

RINCON-GALLARDO, S., QUESENBERRY, C. P. & O'REILLY, F. J. 1979. 

Conditional Probability Integral Transformations and Goodness-of-Fit Tests 

for Multivariate Normal Distributions. Ann. Statist., 7, 1052-1057. 

ROTHSCHILD, V. & LOGOTHETIS, N. 1986. Probability distributions, 

Wiley. 

SCHERVISH, M. J. 1995. Theory of Statistics, New York, Springer-Verlag. 

ZHENG, J. X. 2000. A consistent test of conditional parametric distributions 

Economet. Theor., 16, 667-691. 

 
A Simple Goodness-of-fit Test 
 

23 

 
Appendix 

A1  The Expected Value of the U-Statistic 

The expected value of U is the expected value associated with the 

distribution of the standard Pearson Chi-square goodness-of-fit statistic minus 

a factor due to estimating the parameters of the model.  In this appendix we 

provide the determination of the expected value, and we provide an estimator 

for the adjustment factor and thereby an estimator of the expected value for the 

proposed statistic. 

The expected value of U is proportional to the sum of expected values 

across the K equal-length regions of the partition of the unit interval being 

considered: 

2

2

ˆ( ) [ ( ) ]

ˆ[( ) ]

k k

k

k k

k

E U E K N P P

K N E P P

=   −

=   −




 

The expected values under the summation sign on the right-hand side of this 

equation are variances. This is seen by denoting an indicator of whether 

observation i falls in region k as 

,

1 (( 1) 0.1, 0.1)

0

i

k i

u k k
I

Otherwise

 −  
= 


 
and noting that the expected value of the estimated proportion in category k is  

1
,

1

1
,

1

1
,

1

,

ˆ ˆˆ ˆ[ ] [ | ] ( )

ˆ ˆ[ | ] ( )

ˆ ˆ[ | ] ( )

ˆ ˆ( ) ( )

ˆ ˆ( ) ( ), for  for all 

ˆ( ( ))

k k

N

k iN

i

N

k iN

i

N

k iN

i

k k i k

k

E P E P dF

E I dF

E I dF

P dF

P dF P P i

E P

 

 

 

 

 



=

=

=

=

=

= 

= 

= =

=











 
To determine ˆ( ( ))
k

E P  , consider a first order Taylor series approximation 

around the true value  


P. Veazie, Z. Ye 

 
24 

 
ˆ ˆ( ) ( ) ( )k
k k

P
P P   




= +  −


, 

which yields 

ˆ ˆ( ( ) ( )) ( )k
k k

P
N P P N   




 − =   −


. 

For an estimator, such as the maximum likelihood estimator, for which 

ˆ( )N   −  converges to a normal distribution N(0,) by a central limit 

theorem, the left-hand side converges in distribution to a normal as well: 

ˆ( ( ) ( )) (0, )
d k k

k k

P P
N P P N 

 

 
 − ⎯⎯→ 

 
. 

Therefore, ˆ( )
k

P   has an asymptotic distribution with expected value of 

ˆ[ ( )] ( )
k k

E P P =  and variance of 
1ˆ[ ( )] k k

k

P P
V P

N


 

  
=   

  
.  Consequently, 

since  ˆ[ ( )] ( )
k k

E P P = ,  

2ˆ ˆ[( ) ] [ ]
k k k

E P P V P− = . 

The expected value of the U is then proportional to the sum of variances: 

2ˆ ˆ[ ( ) ] [ ]
k k k

k k

E K N P P K N V P  − =    . 

The variance terms under the summation sign on the right-hand side are 

2

1
,

1

1
,

1

1

21

21

1

ˆ ˆˆ ˆ[ ] [ | ] ( )

ˆ ˆ[ | ] ( )

ˆ ˆ[ | ] ( ), for independent observations

ˆ ˆ ˆ( )(1 ( )) ( )

ˆ ˆ ˆ( ( ) ( ) ) ( )

ˆ ˆ ˆ[ ( ( )) ( ( )) ( ( ))

[

k k

N

k iN

i

N

k iN
i

k kN

k kN

k k kN

N

V P V P dF

V I dF

V I dF

P P dF

P P dF

E P E P V P

 

 

 

  

  

  

=

=

=

= 

= 

=  −

=  −

=  − −

= 











2 ˆ( ) ( ) ( ( )]
k k k

P P V P  − −

 
Therefore, 


A Simple Goodness-of-fit Test 
 

25 

 
2 21

1

2

ˆˆ[ ( ) ] [( ) ( ( ))]

1 1 ˆ[( ) ( ( ))]

ˆ( 1) ( ( ))

k k k k kN

k k

kN

k

k

k

E K N P P K N P P V P

K N V P
K K

K K V P







  − =    − −

=    − −

= − − 

 





 
The expected value of U is the degrees of freedom for a common Pearson 

Chi-Square test statistic (i.e. K − 1) minus a factor due to estimation of the 

distribution parameters.  For K = 10, the expected value of U is then 
ˆ9 10 ( ( ))

k

k

V P −  .  

A2  Estimation of the Shrinkage Factor 

The variance terms in the shrinkage factor can be estimated by using 

consistent estimators for the derivatives k
P






 and the covariance matrix .  The 

derivative of kP is determined by noting that 

* *

1
[ ( ( ) | ; ) ( ( ) | ; )] ( )

k k k x
P F y x x F y x x dF x 

−
= − , 

for which 
*

k
y  are the critical values  

* 1
( ) ( | )

k

k
y x F x

K

−
= . 

Therefore, assuming we can interchange the order of integration and 

differentiation, 

* *

1
( ( ) | ; ) ( ) ( ( ) | ; ) ( )k

k x k x

P
F y x x dF x F y x x dF x 

  
−

  
= −

  
  . 

Estimating the integrals on the right-hand side of the equation by sample 

means yields the estimator 

* *

1

1 1

1 1ˆ ˆ( ( ) | ; ) ( ( ) | ; )
N N

k
k i i k i i

i i

P
F y x x F y x x

N N
 

  
−

= =

  
=  − 

  
  . 

The estimator for the variances in the shrinkage factor is therefore 

1ˆˆ ˆ[ ( )] k k
k

P P
V P

N


 

  
=   

   
. 


P. Veazie, Z. Ye 

 
26 

 
For the maximum likelihood estimator, note that the scaled deviation of the 

estimator converges in distribution to a normal:  

11ˆ( ) (0,[ ( ( ))] )
d

N
N N E H  

−
 − ⎯⎯→ − , 

for H denoting the matrix of second derivatives of the log-likelihood with 

respect to the parameters.  Therefore,   

11

1

[ ( ( ))]

[ ( ( ))]

N
E H

N E H





−

−

 = −

=  −
 

Using the sample mean for the expectation of the Hessian, evaluated at the 

estimated parameter values, yields the estimator 

2 1ˆˆ [ ( )]N H 
−

 =  − . 

The estimated variance of ˆ( )
k

P   is then 

1ˆ ˆˆ[ ( )] [ ( )]k k
k

P P
V P N H 

 

−
  

=   −  
   

. 

For example, consider the Weibull distribution specified in Table A1.  The 

Weibull CDF is  

( )
( | )

b b x0 1a a x e0 1e y
F y x 1 e

+ 
+ 

− 
= − . 

The derivatives with respect to the parameters are 

( ) ln( )

( ) ln( )

0 1

0 1

0

1

b b x

0

b b x

1

F
D

a

F
D x

a

F
D e y

b

F
D e y x

b

+ 

+ 


=




= 




=  




=   



 
where,  

b b x0 1b b x a a xe0 1 0 1
0 1a a xe y eD y e e

+ 
+  + 

+ − 
=   . 


A Simple Goodness-of-fit Test 
 

27 

 
Evaluating each of these derivatives and each observation in the sample i  

{1, … N} at the estimated parameter values, data values ix  , and the 

corresponding critical values 
*

0
( )

i
y x , and 

*
( )

k i
y x  for each k  {1,…10} creates 

variables for which the sample means can be used to determine 
k

P






.  These 

estimated derivatives combined with the estimated parameter covariance 

matrix ̂  provide the information to calculate the shrinkage factor as shown 

above.   

 Table A0 presents the means of the estimated expected value of U 

using the above equations and means of the calculated U values across 

100,000 data sets of sample sizes 100, 1000, and 10,000.   The mean estimated 

E(u) was very similar to the mean of U values, rounding to 7.37 for each.  An 

alternative for estimating the expected value of U (i.e. degrees of freedom for 

an approximating Chi square distribution) is the Monte Carlo method shown in 

Box 2 of the main text. 

 
Table A0.  Mean estimated E(u) and mean U across 100,000 samples. 

 
A3  Additional Tables and Figures 

 
P. Veazie, Z. Ye 

 
28 

 
Table A1. Simulation process: conditional distribution of the data for the test of 

incorrect distributional family. 

 
True Data Generating Process 

Normal/Normal (homoscedasticity) 

 
Normal/Normal (heteroscedasticity) 

 
Gumbel/Gumbel 

 
Location   

Scale   

 
Gamma/Gamma 

 
 Shape   

Scale   

 
Weibull/Weibull 

 
Shape   

Scale   

Table A2. Simulation process: conditional distribution 

of the data for the test of incorrect parameterization 

 
A Simple Goodness-of-fit Test 
 

29 

 
Table A3. Distribution of the cost-related outcome variables and patient 

characteristics. 

 
P. Veazie, Z. Ye 

 
30 

 
0
.1

.2
.3

.4

D
e
n

s
it
y

-4 -2 0 2 4
Standardized Residual

 
Figure A1. Histogram of the Standardized Residual from the Model for Annual Total 

Health Care Expenditure. Model: MLE assuming Normal distribution with 

heteroskedasticity. 

 
A Simple Goodness-of-fit Test 
 

31 

 
Figure A2. Histogram of the Standardized Residual from 

the Model for Annual Total Expenditures on Office-

Based Visits. 

 
P. Veazie, Z. Ye 

 
32 

 
Figure A3. Histogram of the Standardized Residual from the Model for Annual Total 

Expenditures on Dental Care.