TX_1~ABS:AT/ADD:TX_2~ABS:AT 9 http://journals.cihanuniversity.edu.iq/index.php/cuesj CUESJ 2020, 4 (2): 9-12 ReseaRch aRticle Likelihood Approach for Bayesian Logistic Weighted Model: Missing Completely at Random Case Dler H. Kadir* Department of Statistics, College of Administration and Economics, Salahaddin University-Erbil, Kurdistan Region - F.R. Iraq ABSTRACT Increasing the response rate and minimizing non-response rates represent the primary challenges facing researchers performing longitudinal and cohort research, especially can be seen in the area of pediatric medicine. When there are missing data, complete case analysis makes findings bias. Inverse probability weighting (IPW) is one of the many available approaches for reducing bias using complete case analysis. Here, a complete case is weighted by probability inverse of complete cases. The data were collected from the neonatal intensive care unit at Erbil maternity hospital from 2012 to 2017. In total, 570 babies (288 male and 282 females) were born very preterm. The aim of this paper is to use IPW on the Bayesian logistic model developmental outcome. The mental development index approach was used for assessing the cognitive development of those born very preterm. Almost half of the information for the babies was missing, meaning that we do not know whether they have cognitive development issues. We obtained greater precision in results and standard deviation of parameter estimates which are less in the posterior weighted model in comparison with frequent analysis. Further, research is needed using methods such as bootstrapping, sandwich, resampling, and jackknife methods for dealing with missing data. Keywords: Likelihood, logistic weighting, missing data, preterm infants INTRODUCTION Increasing the response rate and minimizing non-response rates represent the primary challenges facing researchers performing longitudinal and cohort research. This can especially be seen in the area of pediatric medicine, whereby birth cohorts are often utilized for epidemiological research and randomized clinical trials of parental interventions. The contrition of participants lowers the strength of the research, as well as leading to bias in the results.[1] A non-response usually has more medical and socioeconomic risks and can have systematic variations regarding interest disorders, causing a biased estimate of an adverse outcome.[2] It is also frequently the case that data are missing in social research. They present ambiguities in statistical analyses of a different type to that of the usual imprecisions of samples, which become lower with increases in sample sizes. Therefore, greater assumptions are required to permit inferences to be reached. Over the past 10 years, there has been much theoretical research into ways of analyzing missing data sets. Missing data can be defined as a value that is not recorded for a variable in the observation of Interest. Almost all branches of scientific research face this issue and will occasionally have to deal with missing data. Frequently, the missing data appear as incomplete data on a subject. Usually, the following analysis uses only a subject with a complete case measurement (complete case analysis). This proves expensive not only with regard to reductions in sample sizes, as an unclear variance estimate, lower statistical power, and the parameters which are thought to be possibly biased until a complete case analysis represents a random sampling of the focus population. To ensure that bias is considered, it is important to understand the mechanisms and the patterns of the missing data relevant to the research. The missing data mechanisms, as stated by Rubin, indicate the link between the missing values and the observed data.[3] It is obvious that bias cannot only occur when there are a systematic dropout and non-response, it can additionally stem from a different sampling probability resulting from the study design. A non-participants rate of, perhaps, 10% might not produce a stronger bias unless the non-response more powerfully relates to the parameters of interest.[4] Corresponding Author: Dler H. Kadir, Department of Statistics, College of Administration and Economics, Salahaddin University-Erbil, Kurdistan Region - F.R. Iraq. E-mail: dler.kadir@su.edu.krd DOI: 10.24086/cuesj.v4n2y2020.pp9-12 Copyright © 2020 Dler H. Kadir. This is an open-access article distributed under the Creative Commons Attribution License Cihan University-Erbil Scientific Journal (CUESJ) Received: Jul 7, 2020 Accepted: Jul 25, 2020 Published: Aug 13, 2020 mailto:dler.kadir%40su.edu.krd?subject= Kadir: Likelihood approach for Bayesian logistic weighted model 10 http://journals.cihanuniversity.edu.iq/index.php/cuesj CUESJ 2020, 4 (2): 9-12 INVERSE PROBABILITY WEIGHTING (IPW) An IPW method will directly model the missingness instead of modeling missing data observations.[5] IPW is occasionally termed “Inverse Propensity Weighting.” When a probability score is projected for all subjects, the interest covariate observed values are weighted through the inverse of the relevant probability scores. Response probability is able to be modeled with logistical regression models and uses the inverse of the probability scores as one of the factors of adjustment. The IPW adjustment permits a greater number of variables to be employed for predicting non-responses. The more appropriate variable set needs to be used to discover the model, which is the best fit for predicting non-response. This results in a “smooth” adjustment factor distribution, with no need for choosing an arbitrary cut-point.[6] Yet, the IPW may possess an extreme value, causing an adjustment factor which might possess a highly covariate weight, and thus, a high covariate weight-adjusted estimates. This issue can be resolved by trimming the adjustment factor or trimming non- response adjusted weights. However, such remedies might increase bias possibilities. WEIGHTED POSTERIOR DISTRIBUTIONS From a Bayesian perspective, we can gather the posterior distribution π(|x) through the combination of two types of information concerning the random variable . One source is given by the observed data which are summarized by the likelihood function, and the other information source is the previous information regarding its distribution π(). Weighted posterior distributions can be defined through the replacement of the likelihood function by its IPW counterpart, as discussed in the previous section. The weighted posterior distribution is as follows: � � � � �IPW IPWx L x| |� � � � �� � �� (1) Next, we propose IPWs evaluated as in the above equation of the form ˆ ˆIPW ( ) PW( ; , )θ=i i IPW nx x F . It is apparent that from the part of LIPW (x|), we can obtain first-order property of the actual function likelihood under the model assumption; therefore, this has validity for Bayesian estimates in a standard manner. One benefit of weighting is that it uses other pseudo- likelihood functions, leading to posterior distributions which belong to the same family of those obtained through the use of the genuine likelihood function. Therefore, the weighted posterior distributions vary from the genuine posterior distributions for the estimated values. It can seem that there is a conflict between the method and a proper Bayesian perspective. This is because the weighted likelihood function is not immediately driven by a probabilistic model; rather, it is driven by adaptive weights. The data still tell a story; however, some values are not consistent with the required models, and we are unable to simply delete outliers, yet it still contributes to the posterior estimate.[7] PRETERM DATA FOR USING LOGISTIC REGRESSION The data were collected from the neonatal intensive care unit at Erbil maternity hospital from 2012 to 2017. In total, 570 babies (288 males and 282 females) were born very preterm. We have considered the infants born before 28 weeks. The mental development index approach was used for assessing the cognitive development of those born very preterm. Almost half of the information for the babies was missing, meaning that we do not know whether they have cognitive development issues. Now let i1 P 0 = i if the data was collelected with probability R if the there is no response (2) • The procedure, therefore, involves: Fitting a binary logistic regression, responding to the research under observation (1 if observed, 0 if not) with the more appropriate variable set as an explanatory variable. • Obtaining the fitted probability for all infants, P i , i ∈(1,…,N) • Calculation of the IPW for all infants IPW i = 1/P i ) and uses IPW to fit weighted Bayesian logistic regression models using WinBUGS software. Through the assumption that the posterior weighting model is correct, we can obtain consistent parameter estimates to know the effect of the outcome model. Yet, the major issue in weighted data analysis is that the weight is not representative of the actual subject number; however, only an expected number might be applicable if the statistical weight features every detail regarding the sampling probability. Identical samples appear from simple random sampling (whereby every individual of the same sizes is able to be sampled with an equal probability).[4] An additional issue with using IPW is where a missingness predictor distribution in full cases varies from incomplete cases. The IPW will then greatly vary since complete case analyses, where the missingness predictor observation is nearer the center of the observation distributions in the incomplete case which might obtain a larger weight. This will, therefore, cause a larger standard error.[8] When the weight can account for the missing data, parameters must be predicted. The complete case data variance estimator makes the assumption that the weight is known and ignores any uncertainty in estimations about them.[9] Seaman et al. recommend the use of sandwich estimators in accounting for uncertainties in the weights. In reality, the true asymptotic uncertainty is frequently more when a true weight is utilized than when they are estimated. Thus, ignoring uncertainties in a fixed weight might cause a standard error.[8] Here, the weights value was altered in all iterations using Markov chain Monte Carlo. The weight calculated from the variable weights sampled from posterior distributions instead of being obtained from a fixed value. Therefore, in all iterations of the outcome models, various weights values were gained. Ignoring any uncertainty between variable and fixed weights was examined to determine if there were any issues. Consequently, uncertainties were included in the weight value using variable weights. In addition, weights were standardized (and multiplied by the number of observations/total number of the complete population). This results from the total of the weights being equal to the sums of the sample sizes. If the weights are not standardized, the sum of the weight is then equal to the entire Kadir: Likelihood approach for Bayesian logistic weighted model 11 http://journals.cihanuniversity.edu.iq/index.php/cuesj CUESJ 2020, 4 (2): 9-12 population instead of the total amount of observations, meaning that uncertainty (i.e., standard deviation and standard error) in the model without standardization will be underestimated. Thus, in this project, the weight is standardized. In these analyses, complete cases are weighted by the inverse probability of there being complete cases. Two logistic regression models were operated at the same time for both outcome and response. A covariate of mother birth age, sex, gestational age, and birth weight z-scores was used in the response model. Let, R i denotes the outcomes (response (infants where the developmental questionnaire was responded to)/ non-response [infants where the developmental questionnaire was not responded to]). The outcome was modeled with the assumption of the Bernoulli distribution. R I ~ bernoulli(q I ) (3) where, 0logit 1 = + − i i q a aX q When statistical models for weight have been recognized, it can then be used in developmental delay model analyses to be run alongside the dataset. Nonetheless, weighting is vital and, thus, the amount of information is required to account for the weight relying on particular parameters being considered. Inference is normally changed by multiplying the contributions of every infant to a statistic by its statistical weight. For modeling outcome and weight, each individual’s outcome variable value was multiplied by the individual’s weight. IPW for each of the infants is calculated using 1/probability of responses; the IPW was then standardized. Therefore, the likelihood functions created in WinBUGS and is thus: * (1 )= −y IPW y* IPWiL P P (4) P is the probability of babies surviving with developmentally delay issues, y is the outcome variable, and IPW is the IPW. Five hundred seventy infants featured in the response model. Yet, only 235 babies were used in the outcome model analysis (those babies known to be alive). We have used IPWs as adjustment factors for babies about which we do not have cognitive developmental delay information. We put weights on the likelihood function using WinBUGS software. We repeated the analysis using frequentist logistic regression using variable weights. We obtained greater precision in results and standard deviation of parameter estimates as it is shown in Table 1, which are less in the posterior weighted model in comparison with frequent analysis. DISCUSSION We have developed a likelihood-based approach to place weights which were calculated from response models into outcome models using logistic regression. It is noticed that unweighted models produce biased results. We have realized that a weighted model would provide more precise results in terms of odds ratio and uncertainty. IPW is one of the many available approaches for reducing bias using complete case analysis. Here, a complete case is weighted by probability inverse of complete cases. Even though weighting is often used in designing and analyzing surveys, using it in analyses of missing data not as well recognized, as the IPW fact parameter estimates can be inefficient in regard to probability-based analysis.[10] The resulting estimate is frequently sensitive to the exact type of models for the response probabilities.[11] In reality, the greatest concern of the data analysis is the efficacy of the IPW approach regarding the likelihood approach. Although some methods have been suggested for obtaining more efficient and robust estimates, this approach has not yet been effectively developed to handle more than one situation. Alternative methods could have been used in this analysis, for example, doubly robust IPW and multiple imputation. In addition, one disadvantage of this paper is the assumption that the weight is known; we have ignored uncertainties in the weight. To calculate IPW estimator standard error, methods like robust standard error created by weighted models could have been used. Other methods such as bootstrapping, sandwich, resampling, and jackknife methods are also possibilities, although these methods require intensive computations.[4,12] REFERENCES 1. D. Wolke, B. Sohne, B. Ohrt and K. Riegel. Follow-up of preterm children: Important to document dropouts. Lancet, vol. 345, no. 8947, p. 447, 1995. 2. S. Johnson, S. E. Seaton, B. N. Manktelow, L. K. Smith, D. Field, E. S. Draper, N. Marlow and E. M. Boyle. Telephone interviews and online questionnaires can be used to improve neurodevelopmental follow-up rates. BMC Research Notes, vol. 7, p. 219, 2014. 3. D. B. Rubin. Inference and missing data. Biometrika, vol. 63, no. 3, pp. 581-592, 1976. 4. M. Hofler, H. Pfister, R. Lieb and H. U. Wittchen. The use of weights to account for non-response and drop-out. Social Psychiatry and Psychiatric Epidemiology, vol. 40, no. 4, pp. 291- 299, 2005. 5. L. Lazzeroni, N. Schenker and J. Taylor. Robustness of Multiple- imputation Techniques to Model Misspecification. United States: Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 260-265, 1990. 6. B. L. Carlson and S. Williams. A Comparison of Two Methods to Adjust Weights for Non-response: Propensity Modeling and Weighting Class Adjustments. United States: Proceedings of the Annual Meeting of the American Statistical Association, 2001. 7. C. Agostinelli and L. Greco. Weighted Likelihood in Bayesian Inference. New York: Proceedings of the 46th Scientific Meeting Table 1: Posterior odds ratio and the standard deviation gained from the binary model using variable weights Estimates of parameters Odds ratio SD 95% Credible interval Constant 0.54 0.11 0.34, 0.79 Gestational age (centered)a 0.79 0.06 0.67, 0.94 Mother’s age 0.987 0.02 0.91, 1.02 Birth weight z-score 1.05 0.04 0.97, 1.14 Female 0.52 0.18 0.27, 0.98 aGestational age centered around 28 weeks Kadir: Likelihood approach for Bayesian logistic weighted model 12 http://journals.cihanuniversity.edu.iq/index.php/cuesj CUESJ 2020, 4 (2): 9-12 of the Italian Statistical Society, 2012. 8. S. R. Seaman and I. R. White. Review of inverse probability weighting for dealing with missing data. Statistical Methods in Medical Research, vol. 22, no. 3, pp. 278-295, 2013. 9. S. R. Seaman, I. R. White, A. J. Copas and L. Li. Combining multiple imputation and inverse‐probability weighting. Biometrics, vol. 68, no. 1, pp. 129-137, 2012. 10. D. Clayton, D. Spiegelhalter, G. Dunn and A. Pickles. Analysis of longitudinal binary data from multiphase sampling. Journal of the Royal Statistical Society, vol. 60, no. 1, pp. 71-87, 1998. 11. R. J. Little and D. B. Rubin. Statistical Analysis with Missing Data. Chichester: Wiley, p. 5, 1987. 12. L. H. Curtis, B. G. Hammill, E. L. Eisenstein, J. M. Kramer and K. J. Anstrom. Using inverse probability-weighted estimators in comparative effectiveness analyses with observational databases. Medical Care, vol. 45, no. 10, pp. S103-S107, 2007.