Approach of the value of a rent when non-central moments of the capitalization factor are known: an R application with interest rates following normal and beta distributions Ratio Mathematica Volume 39, 2020, pp. 7-32 7 A simple goodness-of-fit test for continuous conditional distributions Peter Veazie* Zhiqiu Ye† Abstract This paper presents a pragmatic specification test for conditional continuous distributions with uncensored data. We employ Monte Carlo (MC) experiments and the 2011 Medical Expenditure Panel Survey data to examine coverage and the power to discern deviations from the correct model specification in distribution and parameterization. We carry out MC experiments using 2000 runs for sample sizes 500 and 1000. The experiments show that the test has accurate coverage under correct specification, and that the test can discern deviations from the correct specification in both the distributional family and parameterization. The power increases as sample size increases. The empirical example shows the test’s ability to identify specific distributions from other candidates using real cost data. Although the test can be used as a goodness-of-fit test for marginal distributions, it is particularly useful as an easy- to-use test for conditional continuous distributions, even those with one observation per pattern of explanatory variables. Keywords: Goodness-of-fit test; model specification test; conditional continuous distributions. 2010 AMS subject classification: 62F03‡ * University of Rochester, Rochester New York, USA; peter_veazie@urmc.rochester.edu. † University of Rochester, Rochester New York, USA; sophieye999@gmail.com. ‡ Received on June 10th, 2020. Accepted on December 17th, 2020. Published on December 31st, 2020. doi: 10.23755/rm.v39i0.524. ISSN: 1592-7415. eISSN: 2282-8214. ©Peter Veazie et al. This paper is published under the CC-BY licence agreement. P. Veazie, Z. Ye 8 1 Introduction To determine whether a probability model is statistically adequate for representing a data generating process (DGP), it is common to test whether the model fits with a data set produced by that process. The investigation into the model specification of a conditional distribution is fundamental for methods such as Maximum Likelihood Estimation (MLE), which is consistent and asymptotically efficient only if the distribution is correctly specified (Amemiya, 1985). However, there are two key challenges for a general test of continuous conditional distribution models, if it is to be broadly adopted in applied sciences such as social and health sciences: First, is the sparse empirical information regarding the conditional distribution when patterns of the explanatory variables have few corresponding observations. Second, is the ease of use: many researchers do not have the background, time, or inclination to engage in complicated programming in order to implement a statistical test—to be useful to such researchers a test must be easily implemented. Regarding sparse information, consider the data shown in Figure 1: although some data points appear close to each other, for most of the data there is no more than one observation at each value of x. Consequently, the empirical distribution of random variable Y conditioned on such a value for variable X is based on a trivial point mass. How then can we test a model of the conditional distribution of Y for such sparse data? Regarding ease of use, existing tests for conditional distributions require more mathematical and computational skill than many applied researchers may have to make their implementation generally accepted. Some of these tests require the use of kernel or local polynomial functions with arbitrary smoothing parameters (Zheng, 2000, Fan et al., 2006). Others, such as the Conditional Kolmogorov Test, compare model and distribution functions additionally incorporating the empirical distribution functions of the conditioning set of variables (Andrews, 1997). Transformations to the unit interval have been applied to construct tests for goodness of fit such as the Rincon-Gallardo et al. test for multivariate normality (Rincon-Gallardo et al., 1979). However, their method is also technically difficult and computationally intense in general applications due to procedures involved in the transformation (O'Reilly and Quesenberry, 1973). Additionally, some are dependent on the order of the data being transformed (O'Reilly and Stephens, 1982); therefore, researchers may obtain variant test results if the same data were ordered differently. What is needed for the applied researcher who does not have the mathematical or programming skills to meaningfully implement complex algorithms is a simple pragmatic test. This paper presents a pragmatic A Simple Goodness-of-fit Test 9 general goodness-of-fit statistic for continuous conditional models using uncensored data. In the next section, we introduce the goodness-of-fit test and the rationale behind it. We then evaluate the performance of the goodness-of-fit statistic in Section 3 using two groups of Monte Carlo experiments. The first group of experiments focuses on discerning deviations from correct specification in the distributional family; the second group focuses on discerning deviations in parameterization. We choose these investigations because they represent the two misspecification issues in estimating conditional probability models. In Section 4, we apply the goodness-of-fit test to the 2011 Medical Expenditures Panel Survey (MEPS) dataset, modeling three health care expenditure outcomes as functions of patient characteristics. Finally, in Section 5, we conclude our paper with a summary of the findings and discussions about the applications of the goodness-of-fit test. The Appendix provides the expected value of the statistic and the procedure for the calculation of the degrees of freedom for the test statistics, the data generating process for each Monte Carlo experiment, and the analyses modelling cost data from MEPS. Figure 1. A typical conditional Gumbel distribution with sparse observations for each conditional value of observed X. -1 0 -5 0 5 1 0 Y -5 0 5 10 X F( Y | X ) a Gumbel Distribution P. Veazie, Z. Ye 10 2 A proposed goodness-of-fit test The Pearson Chi-square goodness-of-fit statistic is based on comparing the number of actual observations within each set of a partition of the random variable’s range to the number of observations that would be expected to show up in those sets if the model correctly represents the DGP (Schervish, 1995). If the model is correct, then the expected number of observations is the expected number for the DGP; consequently, the observed and predicted number in each set should be different merely by random variation. The Chi-square goodness-of-fit statistic for continuous distributions is created by partitioning the range of a continuous random variable Y into K regions. Denote each region k{1, 2, …K} as the number of observations with values of y in region Rk as Nk, and the total sample size as N. The probability of an observation with y in region Rk is then in which fY(y) is the probability density for y associated with a cumulative distribution function (CDF) FY(y). The Chi-square statistic is defined as The corresponding sample statistic is in which is a consistent estimator of . If FY(y;) accurately represents the data generating process, converges to C with increasing N and the corresponding asymptotic distribution of CN is a Chi-square with degrees of freedom equal to the number of groups in the partition minus the number of estimated parameters plus one (Schervish, 1995). If we are interested in a model of the conditional distribution F(y|x;), the preceding statistic is not generally applicable because Nk can contain insufficient observations to inform the conditional distribution. Indeed, with x A Simple Goodness-of-fit Test 11 containing precisely measured continuous variables, there may be only one value y for some observed x values (see Figure 1 as an example). However, we can take advantage of the probability integral transform and consequent fact that the CDF of a continuous random variable is itself a random variable with a uniform distribution on the unit interval. Because the uniform distribution is the same regardless of underlying CDF, a set of random variables from independent observations with different conditional distributions can all be converted by their CDFs to the same uniform distribution. We can use this fact to construct a test of the conditional distribution; even if each observation has a different conditioning value (i.e. the data in Figure 1 will pose no problem for this test). Because the CDF for each random variable has a uniform distribution, the CDF values of sample results from a correctly specified model for each random variable will produce a single realization from a uniform distribution. Therefore, the full sample results should together provide a histogram that deviates only by chance from a uniform distribution. We can use a Pearson Chi-square type statistic applied to the uniform distribution to test of the specification for the conditional distributions. The process is quite simple. For each observation i we have a model specification for the distribution F(yi|xi) and therefore can obtain from the estimated model the sample quantity ui = F(yi|xi) for which (xi, yi) are the observed values for observation i. The random variable underlying ui has a uniform distribution on the unit interval if F is correctly specified. We can construct a goodness-of-fit test by partitioning the unit interval into K subintervals defined by equally spaced boundary points, which for K = 10 is s.t. k {1, 2, … 10}. The statistic is then for which N is the total sample size, and Nk is the observed number of u values in the Rk interval. This statistic can alternatively be written as P. Veazie, Z. Ye 12 in which is the observed proportion in interval Rk. Because the statistic is based on a partition of the uniform into K equal sized intervals, Pk = 1 K ; therefore, 21ˆ( ) k K U K N P=   − . As shown in the Appendix, the expected value of U, which is the degrees of freedom for its approximating Chi-square distribution, is equal to the degrees of freedom for the usual Pearson Chi-square test (i.e. K − 1) minus a factor due to the estimation of model parameters. Since Pk is known, which in the case of K = 10 intervals is 0.1, we can simply state the statistic for K = 10 as The selection of K = 10 is arbitrary, as it is with the Hosmer-Lemeshow test for logistic regression (Hosmer and Lemeshow, 1980). For other values of K, the degrees of freedom can be directly estimated as shown in the Appendix or determined by Monte Carlo simulation (see Box 2). The U statistic has a distribution proportional to the sum of gamma random variables with different parameters. Specifically, denoting 1ˆ k K P − as zk, as shown in the appendix zk is asymptotically normally distributed with mean 0 and variance k 2. Consequently, the ratio of zk squared to k 2 has an asymptotic Chi-square distribution with degrees of freedom 1, which is a Gamma distribution with parameters 0.5 and 2 (i.e. (0.5, 2)). Therefore, zk 2 has a distribution k 2(0.5, 2), which is (0.5, 2k 2). U is therefore proportional to the sum of K differently scaled gamma random variables. Moschopoulos shows that sum of such variates can be express as a gamma series in which the series coefficients can be recursively determined (Moschopoulos, 1985). The use of this recursive coefficient determination and gamma series is overly complex for the practical application of this statistic among many applied researchers. However, ease of use is the purpose of this goodness-of-fit statistic. Fortunately, the Monte Carlo experiments presented below indicate that for a correct specification the statistic is approximately Chi-square in distribution with degrees of freedom 7.5 when K = 10 and calculated as shown in the Appendix or as shown in Box 2 if K is not 10. A Simple Goodness-of-fit Test 13 3 Simulation experiments 3.1 Methods We investigated finite sample performance of the proposed statistic using Monte Carlo experiments of conditional Normal, Gumbel, Gamma, and Weibull models, each applied to data generating processes based on the same set of distributions. The first set of experiments comprised a total of sixteen model/DGP comparisons. We evaluated each model/DGP pair for sample sizes 500 and 1000, each using 2000 Monte Carlo samples from the DGP (see Appendix Table A1 for parameter specifications). We inspected rejection rates for significance levels spanning between 0 and 0.2 for each comparison. For each correct model/DGP pair (i.e. Normal/Normal, Gumbel/Gumbel, Gamma/Gamma, and Weibull/Weibull), the plot of the empirical cumulative distribution function (eCDF) of the calculated p values, across the 2000 MC samples, should approximately match the significance level (i.e. this plot should be approximately a straight line). For example, the use of a significance level of 0.01 should reject the model for approximately 1 per cent of the 2000 samples; using a significant level of 0.05 should reject approximately 5 per cent of the samples; and a 0.1 significance level should result in approximately 10 per cent rejections. For mismatched pairs (e.g. Weibull/Gumbel), if the fit test is useful it should produce rejection rates that are higher than the significance levels and increase with sample size; consequently, the eCDF of the test’s p-value should be above the significance level. The second set of experiments compared models in which parameters are specified as linear in conditioning variables to the DGP having the same distributional family but with parameters quadratic in the conditioning variables (see Appendix Table A2 for parameter specifications). For the normal distribution, we estimated models with homoscedasticity and heteroscedasticity. In the case of heteroscedasticity both the mean and variance were generated as quadratic in X in the DGP, but they were modeled as linear in the misspecified model. Similarly, we carried out experiments for sample sizes 500 and 1000. These experiments provide evidence regarding whether the test can identify deviations in parameterization as well as distributional family. In the Monte Carlo experiments reported below, we applied the steps presented in Box 1 to obtain p-values for each of 2000 data sets generated for each model/DGP being considered. We calculated both a p-value using degrees of freedom equal to 7.5 and also using the mean of the 2000 calculated U values for each DGP considered when using the correct model (remember that the degrees of freedom are associated with the distribution of U given the model is correct). P. Veazie, Z. Ye 14 3.2 Results Because we tested continuous conditional distributions, it is difficult to see the differences between the model and DGP for all patterns of explanatory variables. However, Table 1 shows the probability density functions for the true DGP (in the solid line) and the estimation model (the dotted line, using the average parameter values across the 2000 estimated models) evaluated at the mean of X. This gives some sense of the differences between the distributions being tested in the first set of experiments; however, the deviation of the model from the underlying distribution that drives larger values of U may be from other regions of the conditioning set than at the mean of X. Tables 2 and 3 present the eCDFs of the statistic’s p-values for each indicated model applied to the indicated DGP plotted for significance levels up to 0.2. Table 2 presents results for sample sizes of 500; Table 3 presents results for sample sizes of 1000. The thin straight lines show the points where the eCDFs would be if it corresponded to the significance level. The thick dark lines (or curves) are the eCDFs associated with p values based on degrees of freedom set to 7.5. The thick light lines (or curves) are the eCDFs associated with the Monte Carlo based empirical degrees of freedom. We determined the 7.5 degrees of freedom approximation by the average of the four empirical degrees of freedom across the DGPs using sample sizes of 1000. We also ran Monte Carlo experiments for correct model specifications using 10 correlated explanatory variables (results not presented); these BOX 1. How to calculate U and its p-value using K = 10 Step 1. For a candidate model F(yi | xi ; ), estimate the parameters, obtaining̂ . Step 2. Calculate the CDF value of ui = F(yi | xi ; ̂ ) for each observation (yi , xi) in the data. Step 3. Calculate the proportion ( ˆkP ) of ui in each of the ten intervals Rk for k  {1, 2, … 10}. Step 4. Calculate the statistic U using the equation . Step 5. Calculate the p-value as the upper tail area of a Chi-square distribution with degrees of freedom set to 7.5 or set to the estimated value determined by the equations presented in the Appendix or the model- specific Monte Carlo determined empirical degrees of freedom (see Box 2 for the algorithm). A Simple Goodness-of-fit Test 15 experiments showed that the empirical degrees of freedom remained around 7.5 in multivariable models. Specifically, the means of the U statistics, and therefore the degrees of freedom, in these experiments for the Normal, Gamma, Weibull, and Gumbel were 7.52, 7.32, 7.54, and 7.27 respectively. The figures on the diagonals of Tables 2 and 3 show the coverage of the test when the model is correctly specified. The results fell along the line representing accurate coverage: the eCDF corresponds to the significance level. Not surprisingly, the empirical degrees of freedom (the thick lighter line) were more accurate than using the approximate degrees of freedom of 7.5; however, the differences were slight, particularly up to the 0.1 significance level. The off-diagonal figures in Tables 2 and 3 show the rejection rate for the test of misspecified models across significance levels. The test was sufficiently powerful for some of the model/DGP combinations to reject the model for all 2000 samples at all significance levels greater than 0.001. Results for these combinations are simply indicated by the phrase ‘ALL DATA SETS REJECTED AT SIGNIFICANCE LEVEL 0.001’. Not surprisingly, comparing Table 2 to Table 3, the curve has a greater departure from the straight line in Table 3; it is evident that the power of the test increases with sample size. It is also clear that using the approximate 7.5 Table 1. Probability density functions of the true data generating process (solid curve) and the estimated model (dashed curve) evaluated at X=0 for the Monte Carlo simulations. P. Veazie, Z. Ye 16 Table 2. Monte Carlo Simulation: Empirical CDFs of Experiments on Distribution Specifications (N=500). degrees of freedom provide similar results to that of using the Monte Carlo determined empirical degrees of freedom. Table 4 presents results for the second set of experiments, which tested deviations from correct specification in the parameterization. The upper two rows show results for sample sizes of 500; the lower two rows show results for sample sizes of 1000. Similar to the first set of experiments, results showed accurate coverage for the test when the model was correctly specified and the ability to discern deviations from correct specification in parameterization. As the sample size went up, the power of the test to discern such deviations increased. The approximate 7.5 degrees of freedom yields results that were similar to the Monte Carlo calculated empirical degrees of freedom. 4 Example To present an example with real data, we used a random sample of 2000 individuals from the Household Component of the 2011 Medical Expenditure Panel Survey data file (MEPS). As one of the largest national health survey, A Simple Goodness-of-fit Test 17 MEPS has been widely used to study the patterns of health care access, utilization and expenditures in the United States (Cohen et al., 2009). We modeled each of the three outcomes – annual total health care expenditure, total office-based visits expenditure, and total dental care expenditure – as a function of individual demographics, socioeconomic status, self-rated health status, common chronic conditions, presence of usual source of care provider, and health insurance coverage. These covariates were selected in accordance with prior studies focusing on modeling health care costs using MEPS survey data (Fenton et al., 2012, Fleishman and Cohen, 2010). For each model, we included all individuals who reported an expense on the outcome of interest and took the log of the expenditure as the dependent variable. There were 1527 and 1215 individuals reporting expenses on health care and office-based services, which represented 76.4% and 60.8% of the total sample, respectively. Much fewer individuals reported any expenses on dental care (N= 724, 36.2%). Appendix Table A3 presents the descriptive statistics and distribution of the outcome variables and the covariates that we employed in the model. We used Pregibon’s link test (Pregibon, 1980) to identify a statistically Table 3. Monte Carlo Simulation: Empirical CDFs of Experiments on Distribution Specifications (N=1000). P. Veazie, Z. Ye 18 adequate specification of the explanatory variables for each model. We then computed U to test the hypothesis that the specified distribution was correct. This allows us to use the test to focus on testing deviations in the distributional family. We calculated the p-value based on the approximate degrees of freedom of 7.5 and the empirical degrees of freedom calculated from the parameter estimates of the specified model, based on 500 Monte Carlo samples. The algorithm for computing the empirical degrees of freedom is shown in Box 2. Table 5 presents the results from the empirical example for the three health care expenditure outcomes. The test clearly discerns the goodness-of-fit performance of different distributions. Results for the model of the logarithm of total health care expenditure strongly rejected the hypothesis that the conditional distribution follows a Gamma, Weibull or Gumbel distribution (U ranges from 21.985 to 116.578, p-value < 0.001 for all), and unequivocally failed to reject the hypothesis for normal (U = 5.304, p-value = 0.676 with approximate degrees of freedom of 7.5). For the model of office- based visits expenditure, we strongly rejected the hypotheses for the Gumbel and Weibull distribution (U equals to 59.626 and 33.776, respectively, p-value Table 4. Monte Carlo Simulation: Empirical CDFs of Experiments on Parameter Specifications. A Simple Goodness-of-fit Test 19 < 0.001 for both) and fail to reject the Normal (U=12.467, p-value = 0.107) or Gamma (U = 8.897, p-value = 0.305). For the model of dental care expenditure, we rejected all distributions except for the Gumbel (U = 14.333, p-value = 0.058 with degrees of freedom of 7.5). Figures A1-A3, in the Appendix, show the histograms of the residuals obtained from these models, standardized by the estimated standard deviations. Figure A1 shows the symmetry expected of a Normal distribution, which was not rejected by the test that unambiguously rejected the other distributions. Figure A2 shows a right-skewedness characteristic of a Gamma distribution (Model 1), but it is insufficiently skewed to reject the Normal at a significance level of 0.05. However, U is smaller in the Gamma indicating a better fit to the data. Under certain circumstances (i.e. shape parameter sufficiently large, >15), the Gamma distribution is approximately a Normal distribution (Rothschild and Logothetis, 1986). In this real-data example, the estimated shape parameter equaled to 35 in the model assuming Gamma distribution. It is therefore not surprising the test did not reject either the Gamma or the Normal distributions. Figure A3 demonstrates the clear right-skewedness of the residual from the model of dental care expenditure, which is expected of a Gumbel distribution. The calculated Monte Carlo empirical degrees of freedoms were approximately 7.5 for all three outcomes and therefore yielded similar results. As there were 21 variables in the empirical model, these results again show that the degrees of freedom for the statistic distribution based on 10 categories is approximately 7.5 in multivariable models. Table 5. Empirical Example: Goodness-of-Fit Tests on Conditional Probability Models for Log-Transformed Health Expenditures from MEPS. 5 Conclusion In this paper, we presented a simple specification test for conditional continuous distributions using uncensored data (see Box 1). We showed, using simulation experiments, that the test has accurate coverage under correct specification, and that the test can discern deviations from correct specification in both the distributional family as well as parameterization. The empirical example shows its ability to distinguish specific distributions from other candidates using real data. P. Veazie, Z. Ye 20 The results of our analysis indicate that U is approximately distributed Chi- square with degrees of freedom 7.5. We also provide a Monte Carlo method for an empirical determination of degrees of freedom in Box 2 and a direct estimator in the Appendix should the researcher not wish to use the approximating 7.5, for example when the p-value using the approximating 7.5 degrees of freedom is close to the test’s designated significance level. However, comparing the empirical degrees of freedom to 7.5 across all Monte Carlo experiments and real-data analyses of our study, the differences were slight and not likely to impact inferences. If a researcher does not wish to approximate the distribution using a Chi-square, a p-value based on the Monte Carlo distribution of statistic values generated in the process of Box 2 can be used as a parametric bootstrap test (Davison et al., 2003). Because the test discerns deviations in parameterization as well as the distributional family, an extra step is required to investigate the distributional family alone. Specifically, the researcher should engage in standard tests to identify the best parameter specification within each proposed model (e.g. we used Pregibon’s link test in the preceding example). Using the best within- family model specification, the test will then primarily be identifying deviations in the distributional family. It is important to note that our results using multiple explanatory variables in the models indicate the degrees of freedom for the statistic’s distribution is not a function of the number of estimated parameters. This is different from BOX 2. How to calculate the empirical degrees of freedom Step 1. Obtain the parameter estimates predicted from the estimated model ( ). Step 2. Generate outcome values as random draws from the distribution defined by the estimated parameters for all xi in the data. Step 3. Re-estimate the model using the generated outcomes. Step 4. Obtain the predicted parameter estimates ( ) from using the ‘correctly’ specified model in Step 3. Step 5. Calculate the value of for each observation. Step 6. Calculate U. Step 7. Repeat the steps 2 through 6 multiple times (e.g. we repeated 500 times in the empirical example), saving the statistic values. Step 8. Set the degrees of freedom to the mean of the calculated U values. A Simple Goodness-of-fit Test 21 the direct application of the Pearson Chi-square test to distributions with multiple parameters in which the degrees of freedom depend on the number of parameters m. This is an advantage since the degrees of freedom in the latter case is typically K−m−1, which implies m must be less than K −1 for those applications (Schervish, 1995): our test does not have this constraint. Although our test can be used as a goodness-of-fit test for marginal distributions, it is particularly useful as an easy-to-use model fit test of continuous conditional distributions for uncensored data, particularly in the case of few observations, indeed even one observation per pattern of explanatory variables, such as a time-series. P. Veazie, Z. Ye 22 References AMEMIYA, T. 1985. Advanced Econometrics, Cambridge, MA, Harvard University Press. ANDREWS, D. W. K. 1997. A Conditional Kolmogorov Test. Econometrica, 65, 1097-1128. COHEN, J. W., COHEN, S. B. & BANTHIN, J. S. 2009. The medical expenditure panel survey: a national information resource to support healthcare cost research and inform policy and practice. Med. Care., 47, S44-50. DAVISON, A. C., HINKLEY, D. V. & YOUNG, G. A. 2003. Recent developments in bootstrap methodology. Statistical Science, 18, 141-157. FAN, Y. Q., LI, Q. & MIN, I. 2006. A Nonparametric bootstrap test of conditional distributions. Economet. Theor., 22, 587-613. FENTON, J. J., JERANT, A. F., BERTAKIS, K. D. & FRANKS, P. 2012. The cost of satisfaction: a national study of patient satisfaction, health care utilization, expenditures, and mortality. Arch. Intern. Med., 172, 405-11. FLEISHMAN, J. A. & COHEN, J. W. 2010. Using Information on Clinical Conditions to Predict High-Cost Patients. Health. Serv. Res., 45, 532-552. HOSMER, D. W. & LEMESHOW, S. 1980. A goodness-of-fit test for the multiple logistic regression model. Communications in Statistics Part A- Theory and Methods, 10, 1043-1069. MOSCHOPOULOS, P. G. 1985. The distribution of the sum of independent gamma random variables. Annals of the Institute of Statistical Mathematics, 37, 541-544. O'REILLY, F. J. & QUESENBERRY, C. P. 1973. The Conditional Probability Integral Transformation and Applications to Obtain Composite Chi-Square Goodness-of-Fit Tests. Ann. Statist., 1, 74-83. O'REILLY, F. J. & STEPHENS, M. A. 1982. Characterizations and Goodness of Fit Tests. J. Roy. Statist. Soc. Ser. B, 44, 353-360. PREGIBON, D. 1980. Goodness of Link Tests for Generalized Linear Models. J. Roy. Statist. Soc. Ser. C, 29, 15-24. RINCON-GALLARDO, S., QUESENBERRY, C. P. & O'REILLY, F. J. 1979. Conditional Probability Integral Transformations and Goodness-of-Fit Tests for Multivariate Normal Distributions. Ann. Statist., 7, 1052-1057. ROTHSCHILD, V. & LOGOTHETIS, N. 1986. Probability distributions, Wiley. SCHERVISH, M. J. 1995. Theory of Statistics, New York, Springer-Verlag. ZHENG, J. X. 2000. A consistent test of conditional parametric distributions Economet. Theor., 16, 667-691. A Simple Goodness-of-fit Test 23 Appendix A1 The Expected Value of the U-Statistic The expected value of U is the expected value associated with the distribution of the standard Pearson Chi-square goodness-of-fit statistic minus a factor due to estimating the parameters of the model. In this appendix we provide the determination of the expected value, and we provide an estimator for the adjustment factor and thereby an estimator of the expected value for the proposed statistic. The expected value of U is proportional to the sum of expected values across the K equal-length regions of the partition of the unit interval being considered: 2 2 ˆ( ) [ ( ) ] ˆ[( ) ] k k k k k k E U E K N P P K N E P P =   − =   −   The expected values under the summation sign on the right-hand side of this equation are variances. This is seen by denoting an indicator of whether observation i falls in region k as , 1 (( 1) 0.1, 0.1) 0 i k i u k k I Otherwise  −   =   and noting that the expected value of the estimated proportion in category k is 1 , 1 1 , 1 1 , 1 , ˆ ˆˆ ˆ[ ] [ | ] ( ) ˆ ˆ[ | ] ( ) ˆ ˆ[ | ] ( ) ˆ ˆ( ) ( ) ˆ ˆ( ) ( ), for for all ˆ( ( )) k k N k iN i N k iN i N k iN i k k i k k E P E P dF E I dF E I dF P dF P dF P P i E P            = = = = = =  =  = = =      To determine ˆ( ( )) k E P  , consider a first order Taylor series approximation around the true value  P. Veazie, Z. Ye 24 ˆ ˆ( ) ( ) ( )k k k P P P      = +  −  , which yields ˆ ˆ( ( ) ( )) ( )k k k P N P P N       − =   −  . For an estimator, such as the maximum likelihood estimator, for which ˆ( )N   − converges to a normal distribution N(0,) by a central limit theorem, the left-hand side converges in distribution to a normal as well: ˆ( ( ) ( )) (0, ) d k k k k P P N P P N       − ⎯⎯→    . Therefore, ˆ( ) k P  has an asymptotic distribution with expected value of ˆ[ ( )] ( ) k k E P P = and variance of 1ˆ[ ( )] k k k P P V P N       =       . Consequently, since ˆ[ ( )] ( ) k k E P P = , 2ˆ ˆ[( ) ] [ ] k k k E P P V P− = . The expected value of the U is then proportional to the sum of variances: 2ˆ ˆ[ ( ) ] [ ] k k k k k E K N P P K N V P  − =    . The variance terms under the summation sign on the right-hand side are 2 1 , 1 1 , 1 1 21 21 1 ˆ ˆˆ ˆ[ ] [ | ] ( ) ˆ ˆ[ | ] ( ) ˆ ˆ[ | ] ( ), for independent observations ˆ ˆ ˆ( )(1 ( )) ( ) ˆ ˆ ˆ( ( ) ( ) ) ( ) ˆ ˆ ˆ[ ( ( )) ( ( )) ( ( )) [ k k N k iN i N k iN i k kN k kN k k kN N V P V P dF V I dF V I dF P P dF P P dF E P E P V P                = = = =  =  =  − =  − =  − − =       2 ˆ( ) ( ) ( ( )] k k k P P V P  − − Therefore, A Simple Goodness-of-fit Test 25 2 21 1 2 ˆˆ[ ( ) ] [( ) ( ( ))] 1 1 ˆ[( ) ( ( ))] ˆ( 1) ( ( )) k k k k kN k k kN k k k E K N P P K N P P V P K N V P K K K K V P      − =    − − =    − − = − −      The expected value of U is the degrees of freedom for a common Pearson Chi-Square test statistic (i.e. K − 1) minus a factor due to estimation of the distribution parameters. For K = 10, the expected value of U is then ˆ9 10 ( ( )) k k V P −  . A2 Estimation of the Shrinkage Factor The variance terms in the shrinkage factor can be estimated by using consistent estimators for the derivatives k P    and the covariance matrix . The derivative of kP is determined by noting that * * 1 [ ( ( ) | ; ) ( ( ) | ; )] ( ) k k k x P F y x x F y x x dF x  − = − , for which * k y are the critical values * 1 ( ) ( | ) k k y x F x K − = . Therefore, assuming we can interchange the order of integration and differentiation, * * 1 ( ( ) | ; ) ( ) ( ( ) | ; ) ( )k k x k x P F y x x dF x F y x x dF x     −    = −      . Estimating the integrals on the right-hand side of the equation by sample means yields the estimator * * 1 1 1 1 1ˆ ˆ( ( ) | ; ) ( ( ) | ; ) N N k k i i k i i i i P F y x x F y x x N N      − = =    =  −       . The estimator for the variances in the shrinkage factor is therefore 1ˆˆ ˆ[ ( )] k k k P P V P N       =        . P. Veazie, Z. Ye 26 For the maximum likelihood estimator, note that the scaled deviation of the estimator converges in distribution to a normal: 11ˆ( ) (0,[ ( ( ))] ) d N N N E H   −  − ⎯⎯→ − , for H denoting the matrix of second derivatives of the log-likelihood with respect to the parameters. Therefore, 11 1 [ ( ( ))] [ ( ( ))] N E H N E H   − −  = − =  − Using the sample mean for the expectation of the Hessian, evaluated at the estimated parameter values, yields the estimator 2 1ˆˆ [ ( )]N H  −  =  − . The estimated variance of ˆ( ) k P  is then 1ˆ ˆˆ[ ( )] [ ( )]k k k P P V P N H    −    =   −       . For example, consider the Weibull distribution specified in Table A1. The Weibull CDF is ( ) ( | ) b b x0 1a a x e0 1e y F y x 1 e +  +  −  = − . The derivatives with respect to the parameters are ( ) ln( ) ( ) ln( ) 0 1 0 1 0 1 b b x 0 b b x 1 F D a F D x a F D e y b F D e y x b +  +   =   =    =     =     where, b b x0 1b b x a a xe0 1 0 1 0 1a a xe y eD y e e +  +  +  + −  =   . A Simple Goodness-of-fit Test 27 Evaluating each of these derivatives and each observation in the sample i  {1, … N} at the estimated parameter values, data values ix , and the corresponding critical values * 0 ( ) i y x , and * ( ) k i y x for each k  {1,…10} creates variables for which the sample means can be used to determine k P    . These estimated derivatives combined with the estimated parameter covariance matrix ̂ provide the information to calculate the shrinkage factor as shown above. Table A0 presents the means of the estimated expected value of U using the above equations and means of the calculated U values across 100,000 data sets of sample sizes 100, 1000, and 10,000. The mean estimated E(u) was very similar to the mean of U values, rounding to 7.37 for each. An alternative for estimating the expected value of U (i.e. degrees of freedom for an approximating Chi square distribution) is the Monte Carlo method shown in Box 2 of the main text. Table A0. Mean estimated E(u) and mean U across 100,000 samples. A3 Additional Tables and Figures P. Veazie, Z. Ye 28 Table A1. Simulation process: conditional distribution of the data for the test of incorrect distributional family. True Data Generating Process Normal/Normal (homoscedasticity) Normal/Normal (heteroscedasticity) Gumbel/Gumbel Location Scale Gamma/Gamma Shape Scale Weibull/Weibull Shape Scale Table A2. Simulation process: conditional distribution of the data for the test of incorrect parameterization A Simple Goodness-of-fit Test 29 Table A3. Distribution of the cost-related outcome variables and patient characteristics. P. Veazie, Z. Ye 30 0 .1 .2 .3 .4 D e n s it y -4 -2 0 2 4 Standardized Residual Figure A1. Histogram of the Standardized Residual from the Model for Annual Total Health Care Expenditure. Model: MLE assuming Normal distribution with heteroskedasticity. A Simple Goodness-of-fit Test 31 Figure A2. Histogram of the Standardized Residual from the Model for Annual Total Expenditures on Office- Based Visits. P. Veazie, Z. Ye 32 Figure A3. Histogram of the Standardized Residual from the Model for Annual Total Expenditures on Dental Care.