J. Nig. Soc. Phys. Sci. 3 (2021) 224–233 Journal of the Nigerian Society of Physical Sciences Bayesian Multilevel Models for Count Data Olumide Sunday Adesinaa,∗ aDepartment of Mathematical Sciences, Redeemer’s University, Nigeria Abstract The traditional Poisson regression model for fitting count data is considered inadequate to fit over- or under-dispersed count data and new models have been developed to make up for such inadequacies inherent in the model. In this study, Bayesian Multi-level model was proposed using the No-U-Turn Sampler (NUTS) sampler to sample from the posterior distribution. A simulation was carried out for both over-and under-dispersed data from discrete Weibull distribution. Pareto k diagnostics was implemented, and the result showed that under-dispersed and over-dispersed simulated data has all its k value to be less than 0.5, which indicate that all the observations are good. Also all WAIC were the same as LOO-IC except for Poisson in the over-dispersed simulated data. Real-life data set from National Health Insurance Scheme (NHIS) was used for further analysis. Seven multi-level models were fitted and the Geometric model outperformed other model. DOI:10.46481/jnsps.2021.168 Keywords: Count data, Health, Insurance, Dispersion, Multilevel models Article History : Received: 17 February 2021 Received in revised form: 09 July 2021 Accepted for publication: 30 July 2021 Published: 29 August 2021 c©2021 Journal of the Nigerian Society of Physical Sciences. All rights reserved. Communicated by: T. Latunde 1. Introduction Count data contains non-negative integers and zero obtained within a fixed period. Various studies have been carried out on count data and modelling, [1] applied it to medical data [2] ap- plied it to model microbiome data and many others. Frequentist and Bayesian estimation have equally been used to model count data, and the widely used is the Bayesian estimation technique. There are three major types of count data, the over-dispersed, under-dispersed and over-dispersed. More about types of count data can be found in the study by [3,4]. There are models dedicated to modelling under-dispersion due to their suitabil- ity while a model such as Negative Binomial is suitable for fit- ting over-dispersion. Models such as Dirichlet Process Prior, Negative Weibull can be used to fit both under-dispersion and ∗Corresponding author tel. no: Email address: olumidestats@gmail.com (Olumide Sunday Adesina ) over-dispersion. Models such as zero inflated and hurdle mod- els can effectively handle over and under-dispersed count data with many zeros. The zero-truncated regression models are specifically designed to fit count data with no zero count. The categorized regression model is designed to fit count data that its response variable is categorized. Some of the improved techniques relative to Pois- son regression model can be found in [5], [6], [3], amongst oth- ers. [7] carried out a on hidden markov model in multiple test- ing on dependent count data, [8] showed that the exponentiated- exponential Geometric distribution can be applied to fit under- dispersed or over-dispersed count data, in the same manner [4] demonstrated that Dirichlet Process Mixture Prior of Gener- alized Linear Mixed Models (DPMglmm) can fit either over- dispersed or under-dispersed count data well. [9] sufficiently showed that multi-level zero-inflated Poisson (ZIP) regression model can adequately fit both over-and under- dispersed count data that have zero counts. The authors adopted 224 Adesina / J. Nig. Soc. Phys. Sci. 3 (2021) 224–233 225 EM algorithm along with the penalized likelihood and restricted maximum likelihood (REML). [10] adopted multilevel Zero- inflated Poisson (ZIP) regression and zero-inflated negative bi- nomial (ZINB) and applied the models to fit count data relating to decay, missing and filled teeth of children aged 12 years old. A related and recent study was carried out by [11], the authors proposed multilevel zero-inflated generalized Poisson (ZIGP) that is suitable in fitting both over- and under-dispersed count data and compared with multilevel models of zero-inflated Pois- son, zero-inflated negative binomial. The result showed that the multilevel ZIGP produced more accurate parameter estimates, particularly for under-dispersed data. In this study, Bayesian multilevel modelling was proposed and implemented for some basic distributions used in fitting count data (Poisson, Negative Binomial and Geometric), zero inflated and hurdle models, and identify a most suitable model for fit- ting under-and over-dispersed respectively. The remaining part of this paper is sectionalized as follows; multi-level modelling is described in section 2, parameter estimation and model se- lection can be found in 3. Section 4 is the results of simulation study and that of real-life data. Lastly, summary and conclusion in section 5. 2. Model Description 2.1. Multilevel Modelling The multilevel modelling technique follows a similar process involved when fitting the Generalized Linear Model. In GLM, a link function links the response y variable to the predictor(s), same with multilevel modelling. Let ( f ) be an inverse link func- tion that links the response variable y to the predictor, then Υ is the linear combination of the predictors transformed by the inverse link function ( f ), and d be a parametric model (distri- bution), the model can be simply be written as yi ∼ d ( f (Υi) ,θ) (1) And linear predictor: Υ = Xβ + K� (2) The data is made up of the response variable y, X and K, while β, � and θ are the model parameters to be estimated. while β is the fixed effect coefficient at population level, while � is the coefficient at group-level. The Bayesian estimation technique by Monte Carlo Markov Chain (MCMC) procedure considers � as a parameter relative to maximum likelihood that considers � as error term [12]. 2.2. Zero-Inflated distribution If Y follows zero-inflated Poisson (ZIP) distributions, given by P(Y = y) = { ω + (1 −ω) exp(−λ), y = 0 (1 −ω) exp(−λ)λy/y!, y > 0 } (3) Where is ω in the range0 < ω < 1 , in order to accommodate more zeros than those allowed under the Poisson assumption(ω = 0), and the case of ω < 0imply zero inflated. [9] estimated multi-level parameters of ZIP regression in the generalized lin- ear mixed models (GLMMs) context. The authors generalized the ZIP model so that the model will be able to withstand more complex correlation structure. Zero-inflated negative binomial for counts is formed from ZIP, the mean and variance defined as: E(Y ) = (1 −ω)λ = µ, (4) The use of regression models based on ZIP was established by [13-15 ], [5]. Following [5] we have: log(λ) = Xβ and log (ω/1 −ω) = ZΥ (5) where X and Z are matrices of covariates, β and Υare vector of parameters. Assuming two linear predictors are related in some ways, [16] provided a simplest form of (3) which is refers to the ZI P(τ)model as follows: log(λ) = Xβ, log (ω/1 −ω) = τXβ (6) Where τ is a scalar parameter, which implies thatω = (1 + λ−r )−1. Following equation (3) in multi-level case, [9] identified the ex- tension of ZIP model to include random components wi and ui within logistic and Poisson linear predictors to take care of de- pendence of observations in given clusters. The random effects wi and uiare specific to the ith cluster. In a three-level hierar- chical situation of Yi jk, the kth observation of the jth individual within the ith clusters is measured through random effects asso- ciated with the linear predictors as follows: log [ φi jk (1−φi jk ) ] = ξi jk = aTi jkα + wi + si j log(λi jk) = γi jk = xTi jkβ + ui + vi j (7) The covariates aTi jkand x T i jk are not always the same α and β are the corresponding vectors of regression coefficients. si jand vi jare variations at subject level. 2.3. Hurdle Models If the distribution of Y follows zero-truncated Poisson distribu- tion it follows that: π0y > 0 P(Y = y) = (1 −π0)e−λλy (1 − e−λ)y! y = 0 (8) Reparametizing the zero-inflated Poisson model in equation (3) withπ0 = ω + (1 − ω)e−λ, [16], gave Poisson hurdle regression model is given as; log (λ) = Xβ, log[− log(1 −π+) = τXβ (9) Where π+ = 1 − π0 is the probability of clearing the “hurdle” and generating a non-zero count. 225 Adesina / J. Nig. Soc. Phys. Sci. 3 (2021) 224–233 226 2.4. Prior Distributions Prior distribution is specified at population and group-level. At population-level parameters have an improper prior [17]. At group level it is assumed that parameters ε comes from a multi- variate normal distribution having zero mean and unknown co- variance matrixΣ. epsilon ∼ N (0,Σ) (10) Covariances are between group-level parameters are generally of different groupings factors and assumed to be zero. By impli- cation, Kand ε can be divided to form several matrices Ki and parameter vectors εi , where i indexes grouping factors, thus, the model can be simplified to �i ∼ N (0,Σi) (11) Sometimes, it can be assumed that group-level parameters for different levels of the same grouping factors are not depen- dent. If the other level is indexed by j, (11) leads to: �i j ∼ N ( 0, M j ) (12) The covariance matrices M j will become the model parameters. No-U-Turn Sampler (NUTS) by (2014) [18] is used for M j as instead of the Inverse-Wishart prior distribution used in most studies and packages. Inverse-Wishart distribution is used be- cause it has good conjugacy characteristics for Gibbs-Sampler. The choice of Inverse-Wishart prior distribution was criticized in the studies by [19], and [20]. The parameters of M j is se- lected in terms of correlation matrix Ωj and a vector of standard deviations σ j through, M j = D ( σ j ) ΩjD ( σ j ) (13) and D ( σ j ) imply the diagonal matrix with diagonal elements σ j. Then, prior would be specified for D ( σ j ) ΩjD ( σ j ) . In the case of Ωj, LKJ-Correlation prior by [21] is used, with ζ > 0. That is, Ω j ∼ LK J (ζ) sectionParameter Estimation and Model Selection Sampling from the posterior require appropriate sampling pro- cedure, two basic sampling procedures are discussed here. First is the Hamiltonian Monte-Carlo (HMC) Sampler, also known as Hybrid Monte-Carlo [22-23]. [18] extended HMC to No-U- Turn Sampler (NUTS), because HMC has some drawbacks as discussed by [17]. The NUTS Sampler allows setting param- eters, and eliminates the need for hand-tuning, [18] stated that setting the parameters automatically makes it least efficient as compared to a well-tuned Hamiltonian Monte-Carlo. Software package by R core team (2020) was used to fit the model with brms package by [17] along with Stan processor without which the analysis cannot be run, it can be assessed on http://cran.r- project.org/bin/windows/Rtools/. The Watanabe-Akaike Information Criteria proposed (WAIC) by [24] and Leave-one-out cross validation LOO-CV by [25,26] were used for model selection in this study. The WAIC was used for estimating the out-of-sample expectation and consid- ered an improvement upon the DIC, with WAIC, correction for effective number of parameters to adjust over-fitting is added. According to [27], WAIC can be computed in two possible ways, first is calculated using simulation θs, s = 1, .......S and given as pW AIC1 = 2 n∑ i=1 log  1S S∑ s=1 log p(yi|θ s)  − 1S S∑ s=1 log p(yi|θ s) (14) For the second WAIC computation approach, the variance of individual terms in the log predictive density is added up over the ndata points and express as follows: pW AIC2 = n∑ i=1 varpost ( log p(yi|θ) ) (15) The advantages of WAIC over AIC and DIC was adequately discussed by [27] In the case of Leave-one-out cross-validation (LOO-CV) in Bayesian analysis, the data are repeatedly sub- divided into a training set ytrain and a holdout set yholdout with the objective of fitting ytrain yielding a posterior distribution ptrain (θ) = ptrain (θ|ytrain). The Bayesian LOO-CV estimate of out-of-sample predictive fit is l ppdloo−cv = n∑ i=1 log ppost(−i)(yi) (16) and estimated as n∑ i=1 log  1S S∑ s=1 log p(yi|θ is)  (17) Lower WAICs and LOOs suggest better model fit. 2.5. Pareto-k-diagnostics The shape parameter k of the generalized Pareto distribution can be used to assess the reliability and approximate conver- gence rate of the Pareto smoothed importance sampling (PSIS). It follows that if, k < 0.5(that is, ‘good’) then the central limit theorem holds. Similarly, If0.5 ≤ k < 1, (that is, ‘ok’) then the variance of the raw importance ratios is infinite, but the mean exists. In the same manner, If k > 0.7(that is, ‘bad’), unreason- able convergence rates is observed and unreliable Monte Carlo error estimates, and finally, if k ≥ 1 (that is, ‘very bad’), then neither the variance nor the mean of the raw importance ratios exists. 3. Result 3.1. Simulation Study Simulation of over and under-dispersed count data was carried out and the response count variable was obtained from Discrete Weibull distribution. On simulating count data from Discrete Weibull (DW) distribution, [28] identified that the parameter β of DW should contain the range0 ≤ β ≤ 1; irrespective the value of parameterq. For under-dispersed count data, β should be specified such that β ≥ 2, irrespective of the value of q. Anal- ysis for simulation study was carried using software package by 226 Adesina / J. Nig. Soc. Phys. Sci. 3 (2021) 224–233 227 Figure 1. Marginal plot of relationship between Encounter and type [29], and R package “DWreg” by [30]. Random numbers con- sisting of 500 observations were generated and two predictors were uniformly generated in interval (0, 1) and (0, 2) for over- and under-dispersion respectively. For over dispersed, beta=0.8 and for under-dispersed beta=2.1. The parameter of discrete Weibull follows that θ0 = 0.25, θ1 = 0.35, θ2 = 0.5 and the corresponding equation for logit link is as follows log (q (1 − q)) = 0.25 + 0.35x1 + 0.5x2 (18) All Pareto k estimates are good (k < 0.5) If p waic estimates greater than 0.4. We recommend trying loo Instead. Table 1 shows results for Under-dispersed simulated count data. The best two performed model are Geometric and Hurdle Pois- son distribution indicated with ** and * respectively with low- est WAIC and LOO, following implementation using brms, a package for Bayesian multilevel modelling in R. from Table 1, it shows that WAIC=LOO for all the models. Table 2 shows results for Over-dispersed simulated count data. The best two performed model are negative Binomial and zero inflated negative binomial distribution indicated with ** and * respectively with lowest WAIC and LOO, following implemen- tation using brms, a package for Bayesian multilevel modelling in R. All the model shows that WAIC=LOO expect for Poisson model. 3.2. 4.2 Application to Health Insurance Data 3.2.1. Data Description The data set was obtained from National Health Insurance Scheme (NHIS), and it contains excess zero count. Sample of 116 users of NHIS users was obtained from September 2016 to July 2017. Response variable is number of encounter (En- counter), out of 116 observed, eighty two (82) persons made claims. Encounter is the time a user of National Health In- surance Scheme (NHIS) visits the health facility, and possi- bly makes claims. The predictors include type of Encounter (type), which is either primary or (secondary= 0, primary= 1) Figure 2. Marginal plot of relationship between Encounter and Age Figure 3. Marginal plot of relationship between Encounter and Sex Figure 4. Marginal plot of relationship between Encounter and DrugsAdm primary are users who registered primarily to use the health fa- cility, while secondary are users who were referred from an- other health facility to that of State Hospital Ota, because of availability of specialists. Another predictor Sex (male=1 and female=0), Biological age of patients (Age). Number of drugs administered (DrugsAdm) that is, both oral and injection. Drugs-out-of-stock (DrugsOS) is another predictor. The 227 Adesina / J. Nig. Soc. Phys. Sci. 3 (2021) 224–233 228 Table 1. Simulated Under-Dispersed count data from Discrete Weibull Model Model elpd waic p waic waic elpd loo p loo looic Waic=looic Poisson Est. SE -628.1 10.2 1.9 0.1 1256.2 20.3 -628.1 10.2 1.9 0.1 1256.2 20.3 Yes Negbin Est. SE -629.3 10.1 1.9 0.1 1258.5 20.2 -629.3 10.1 1.9 0.1 1258.5 20.2 Yes Geometric Est. SE -717.3 12.9 0.9 0.1 1434.6** 25.8 -717.3 12.9 0.9 0.1 1434.6** 25.8 Yes hurdle poisson Est. SE -620.5 13.1 3.3 0.2 1241.0* 26.1 -620.5 13.1 3.3 0.2 1241.0* 26.1 Yes hurdle negbin Est. SE 621.2 13.1 3.3 0.2 1242.5 26.2 621.2 13.1 3.3 0.2 1242.5 26.2 Yes zero inflated poisson Est. SE -629.1 10.1 2.0 0.1 1258.3 20.2 -629.1 10.1 2.0 0.1 1258.3 20.2 Yes zero inflated negbin Est. SE 630.2 10.1 1.9 0.1 1260.5 20.2 630.2 10.1 1.9 0.1 1260.5 20.2 Yes Table 2. Simulated Over-Dispersed count data from Discrete Weibull Model Model elpd waic p waic waic elpd loo p loo looic Waic=LOO Poisson Est. SE -1984.6 108.6 21.6 3.1 3969.2 217.2 -1984.7 108.6 21.7 3.1 3969.4 217.2 No Negbin Est. SE -1217.4 27.9 3.9 0.5 2434.8** 55.8 -1217.4 27.9 3.9 0.5 2434.8** 55.8 Yes Geometric Est. SE -1228.7 27.9 4.2 0.6 2457.4 55.8 -1228.7 27.9 4.2 0.6 2457.4 55.8 Yes hurdle poisson Est. SE -1682.3 87.6 19.0 2.9 3364.6 175.1 -1682.3 87.6 19.0 2.9 3364.6 175.1 Yes hurdle negbin Est. SE -1229.7 27.7 4.5 0.4 2459.4 55.5 -1229.7 27.7 4.5 0.4 2459.4 55.5 Yes zero inflated poisson Est. SE -1680.7 87.6 18.7 2.9 3361.3 175.2 -1680.7 87.6 18.7 2.9 3361.4 175.2 Yes zero inflated negbin Est. SE -1218.2 27.9 4.0 0.5 2436.4* 55.9 -1218.2 27.9 4.1 0.5 2436.4* 55.9 Yes 228 Adesina / J. Nig. Soc. Phys. Sci. 3 (2021) 224–233 229 Figure 5. Marginal plot of relationship between Encounter and DrugOS Figure 6. Marginal plot of relationship between Encounter and Type along with Age Figure 7. Marginal plot of relationship between Encounter and Type along Sex idea of National Health Insurance Scheme (NHIS) is that treat- ments and drugs administered attract 10 percent of the total cost. It becomes disadvantage to patients having cases of Drug- sOS. Out of 116 users, 97 are primary, while 19 are secondary users, 70 out of 97 (72.16%) primary users did not make claims within the period observed. 4 out of 19 secondary users did not make claims (21.05%). Out of 116 NHIS users of the health fa- cility, 64 are females and 52 are males. 41 females had zero claims (64.06%), while 33 males had zero claims (63.46%). 11 females and 8 males are secondary users, while 53 females and 44 males are primary users. The data can be obtained https://data.mendeley.com/drafts/6hcf5mf7fy. Mean of the response variable (Encounter) is 0.7068, while variance is 1.7916 which potentially suggests over-dispersion. Further test on the data shows that the dispersion parameter is δ=1.49799, indicating that the data set is over-dispersed with δ>1.The dispersion test was performed using AER package in R by [31]. From Table 3 shows results for the real-life Over-dispersed count data. The best two performed model are Geometric and Nega- tive binomial distribution indicated with ** and * respectively with lowest WAIC and LOO, following implementation using brms, a package for Bayesian multilevel modelling in R, In all the models Waic LOO as shown in Table 6. As earlier stated; for any observation for k > 0.7indicate unreasonable conver- gence rates is observed and unreliable Monte Carlo error esti- mates. Table 4 shows that hurdle negative binomial (4) has the highest of such observations, Poisson, negative binomial, and hurdle Poisson model has 3 of such observations while Geo- metric, zero inflated poisson and zero inflated negbin has two (2) each. Each parameter is presented in Table 5 with the posterior mean as the ‘Estimate’ and the ‘Est.Error’ as the standard deviation of the posterior distribution, the Table also contain a two-sided 95% Credible intervals (l-95% CI and u-95% CI) established on quantiles. From Table 5, the ‘intercept’, ‘type’ and ‘type:Sex”- interaction has a negative posterior mean. For “type”, the model predicts longer periods for encounter for secondary users than primary users; ‘Sex’ (0.19) accounts for more Encounter than ‘Age’ (0.01). “drugsOS (0.41) tells us that there significant cases of drugs-out-stock which suggest ineffectiveness of Nige- ria Health Insurance Scheme. Drawing samples from (NUTS) follows that for each pa- rameter, Efficient Sample is a real measure of effective sample size, while Rhat is the would-be scale reduction factor on split chains (at convergence, Rhat = 1). Figure 1 shows that Encounter has positive relation with type; the figure suggests that the effect of Encounter on type is higher for secondary user of the facility since it is higher on zero end than 1. Figure 2 shows that Encounter has positive relation with Age; the figure suggests that as age increases Encounter increases Figure 3 shows that Encounter has positive relation with Sex; the figure suggests that male account for more Encounter than female. Figure 4 shows that Encounter has positive relation with type; Number of DrugsAdm increases with number of Encounter Figure 5 shows that Encounter has positive relation with type; Number of DrugsOS increases significantly with number of En- counter Figure 6 shows that as primary users of the facility increases, the number of Encounter increases across ages 229 Adesina / J. Nig. Soc. Phys. Sci. 3 (2021) 224–233 230 Figure 8. Density and Trace plots for all covariates Figure 7 shows that as primary users of the facility increases, the number of Encounter increases across Sexes Figure 8 shows that the density plots for the tail area of the distribution, which corresponds to l-95% CI and u-95% CI in Table 5 and trace plot, the Trace plot shows that the chains are stable. 4. Summary and Conclusion In this study Bayesian multi-level model was proposed and im- plemented. The No-U-Turn Sampler (NUTS) sampler was used to sample from the posterior distribution, and implementations were done using the ‘package brms’ in R. Simulation study was carried out for both over-and under-dispersed and response variables were generated through discrete Weibull distribution while the predictors generated from Uniform distribution. Re- sults from under-dispersed revealed that Geometric distribution is the most appropriate model to fit count data using multi-level approach. While for over-dispersed simulated data, negative binomial shows to outperform the Poisson, Geometric, hurdle- Poisson, hurdle-negbin, zero-inflated-Poisson, zero-inflated-negbin. Pareto K diagnostics shows that for under-dispersed and over- dispersed simulated data all k are less than 0.5, which makes all the observations to be good, also all WAIC were the same as LOO-IC except for Poisson in the over-dispersed simulated data. Real-life data set of count of Encounter of patients from Na- tional Health Insurance Scheme was used for further analysis. The model that performs best was the Geometric distribution followed by negative binomial model. Contrary to the sim- ulated data not all WAIC were the same as LOO-IC, except for Poisson model in the over-dispersed simulated data. The need to carry out LOO-IC was informed by having observa- 230 Adesina / J. Nig. Soc. Phys. Sci. 3 (2021) 224–233 231 Table 3. Real Life Data Model elpd waic p waic waic elpd loo p loo looic Waic=loo Poisson Est. SE -127.4 15.3 15.9 6.3 254.8 30.6 -129.7 15.9 18.2 7.0 259.3 31.8 No Negbin Est. SE -122.3 12.8 10.6 3.5 244.6* 25.6 -124.0 13.5 12.3 4.3 248.0* 27.0 No Geometric Est SE. -120.8 12.3 7.4 2.3 241.6** 24.6 -121.5 12.5 8.1 2.7 242.9** 25.0 No hurdle poisson Est. SE -132.6 13.3 11.4 4.2 265.1 26.6 -135.0 14.2 13.8 5.4 269.9 28.5 No hurdle negbin Est. SE -131.6 12.7 5.0 1.3 263.3 25.5 -133.1 13.3 6.4 2.3 266.2 26.5 No zero inflated poisson Est. SE -126.7 14.8 12.8 5.1 253.3 29.6 -127.8 15.1 13.9 5.4 255.6 30.1 No zero inflated negbin Est. SE -122.7 12.9 9.4 3.3 245.3 25.9 -123.4 13.2 10.2 3.7 246.8 26.4 No Table 4. Pareto k diagnostics Model Pareto k diag. remark count Pct Min neff obs. k>0.7 Poisson (-Inf, 0.5] (0.5, 0.7] (0.7, 1] (1, Inf) Good Ok Bad Very bad 112 1 2 1 96.6% 0.9 1.7% 0.9% 1367 236 17 4 3 Negbin (-Inf, 0.5] (0.5, 0.7] (0.7, 1] (1, Inf) Good Ok Bad Very bad 112 1 3 0 96.6% 0.9% 2.6% 0.0% 1068 446 19 - 3 Geometric (-Inf, 0.5] (0.5, 0.7] (0.7, 1] (1, Inf) Good Ok Bad Very bad 114 0 2 0 98.3% 0.0% 1.7% 0.0% 1036 - 88 - 2 hurdle poisson (-Inf, 0.5] (0.5, 0.7] (0.7, 1] (1, Inf) Good Ok Bad Very bad 108 5 1 2 93.1% 4.3% 0.9% 1.7% 859 413 16 7 3 hurdle negbin (-Inf, 0.5] (0.5, 0.7] (0.7, 1] (1, Inf) Good Ok Bad Very bad 109 3 3 1 94.0% 2.6% 2.6% 0.9% 2909 594 135 9 4 zero inflated poisson (-Inf, 0.5] (0.5, 0.7] (0.7, 1] (1, Inf) Good Ok Bad Very bad 111 3 1 1 95.7% 2.6% 0.9% 0.9% 1885 165 26 10 2 zero inflated negbin (-Inf, 0.5] (0.5, 0.7] (0.7, 1] (1, Inf) Good Ok Bad Very bad 113 1 2 0 97.4% 0.9% 1.7% 0.0% 1509 213 50 - 2 231 Adesina / J. Nig. Soc. Phys. Sci. 3 (2021) 224–233 232 Table 5. Population-Level Effects Estimate Est. Er- ror l-95% CI u-95% CI Eff.Sample Intercept -0.36 0.63 -1.63 0.83 2290 type -1.33 0.65 -2.57 -0.06 1972 Age 0.01 0.01 -0.02 0.03 2024 Sex 0.19 0.48 -0.73 1.13 2839 DrugsAdm 0.06 0.03 -0.00 0.13 3444 DrugsOS 0.41 0.17 0.10 0.74 2609 type:Age 0.01 0.02 -0.02 0.05 1818 type:Sex -0.18 0.56 -1.26 0.90 2897 tions for all the cases. Figures 1 to Figure 7 contains marginal plots to identify the relationship between the response variable (Encounter) and covariates; type, Sex, DrugsAdm, Age, and DrugsOS. As identified by [8] that Geometric family have the ability to model count data effectively, the same has been demonstrated in this study using Bayesian multi-level regression approach. Fu- ture direction can consider fitting multi-level regression model to fit count data using distribution such as the Weibull-exponential distribution and Exponentiated Generalized Weibull proposed by [32] and [33] respectively. Acknowledgements My appreciation goes to the Medical Director of the State Hospital, Ota, Ogun state where the data set was obtained. I also appreciate the inputs of reviewers and editors on the manuscript. References [1] M. S. Workie & A. M. Lakew, “Bayesian count regression analy- sis for determinants of antenatal care service visits among pregnant women in Amhara regional state, Ethiopia” Journal of Big Data 5 (2018) https://doi.org/10.1186/s40537-018-0117-8. [2] K. H. Lee, B. A Coull, A. B. Moscicki, B.J. Paster & J. R. Starr, “Bayesian Variable Selection for Multivariate Zero-Inflated Models: Application to Microbiome Count Data”, Biostatistics 21 (2020) 499. 1. [3] F. Famoye & K.P. Singh, “Zero-inflated generalized Poisson regression model with applications to domestic violence data”, Journal of Data Sci- ence 4 (2006) 117. [4] O. S. Adesina, T. O. Olatayo, O. O. Agboola, & P. E. Oguntunde, “Bayesian Dirichlet process mixture prior for count data”, International Journal of Mechanical Engineering and Technology 9 (2018) 630. [5] D. Lambert, “Zero-inflated Poisson regression, with an application to de- fects in manufacturing” Technometrics 34 (1992) . [6] F. Famoye & W. Wang, “Censored generalized Poisson regression model” Journal of Computational Statistics and Data Analysis 46 (2004) 547. [7] W. Su & X. Wang, “Hidden Markov model in multiple testing on de- pendent count data”, Journal of Statistical Computation and Simulation 5 (2020) 889. [8] F. Famoye & L. Carl “Exponentiated-exponential geometric regression model”, Journal of Applied Statistics 44 (2017) 2963. [9] A. H. Lee, K. Wang, J. A Scott, K. W Yau & G. J. McLachlan, “Multi- level zero-inflated Poisson regression modelling of correlated count data with excess zeros”, Statistical Methods in Medical Research 15 (2006) 47. [10] A. Moghimbeigi, M.R. Eshraghian, K. Mohammad & B. Mcardle “Mul- tilevel zero-inflated negative binomial regression modeling for over- dispersed count data with extra zeros”, Journal of Applied Statistics 35 (2008) 1193. [11] A. Almasi, M.R. Eshraghian, A. Moghimbeigic, A. Rahimib, K. Mo- hammad & S. Fallahigilan, “Multilevel zero-inflated generalized Pois- son regression modelling for dispersed correlated count data”, Statistical Methodology 30 (2016) 1. [12] J. Fox & S. Weisberg, An R companion to applied regression, Second Edition, Sage (2010), ISBN-13: 978-1412975148 [13] J. Mullahy “Specification and testing of some modified count data mod- els”, Journal of Econometrics 33 (1986) 341. [14] D. Heilbron, “Generalized linear models for altered zero probabilities and overdispersion in count data” SIMS Technical Report 9, Department of Epidemiology and Biostatistics, University of California, San Francisco 9 (1989). [15] D. Heilbron “Zero-altered and other regression models for count data with added zeros”, Biometrical Journal 36 (1994) 531. [16] M. Ridout, C. G. B. Demetrio & J. Hinde “Models for count data with many zeros”, International Biometric Conference Cape Town (1998) 1. [17] P. C. Burkner. “brms: An R package for Bayesian multilevel models using Sta”, Journal of Statistical Software (2017). [18] M. D. Hoffman & A. Gelman “The No-U-Turn sampler: adaptively set- ting path lengths in Hamiltonian Monte Carlo”, The Journal of Machine Learning Research 15 (2014) 1593. [19] R. Natarajan & R. E. Kass, “Reference Bayesian methods for generalized linear mixed models”, Journal of the American Statistical Association 95 (2000) 227. [20] R. E. Kass, R. Natarajan, “A default conjugate prior for variance compo- nents in generalized linear mixed models (comment on article by Browne and Draper)”, Bayesian Analysis 1 (2006) 535. [21] D. Lewandowski, D. Kurowicka & H. Joe, “Generating random corre- lation matrices based on vines and extended onion method”, Journal of Multivariate Analysis 100 (2009) 1989. [22] S. Duane, A. D. Kennedy, B. J. Pendleton & D. Roweth, “Hybrid Monte Carlo”, Physics Letters B, 195 (1987) 216. [23] R. M. Neal, Handbook of Markov Chain Monte Carlo, volume 2, chapter MCMC Using Hamiltonian Dynamics. CRC Press 2 (2011) 1. [24] S. Watanabe, “Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory”, The Journal of Machine Learning Research 11 (2011) 3571. [25] A. E. Gelfand, D. K. Dey, & C. H. Hang, “Model determination using pre- dictive distributions with implementation via sampling-based methods.” Technical report, DTIC Document (1992) [26] A. Vehtari’, A. Gelman & J. Gabry Efficient Implementation of Leave- One-Out Cross-Validation and WAIC for Evaluating Fitted Bayesian Models”, Unpublished manuscript, (2015). [27] A. Gelman, J. B. Carlin, H. S. Stern & D. B. Rubin, Bayesian Data Anal- ysis, Taylor & Francis 2 (2014). [28] H. S. Klakattawi, V. Vinciotti & K. Yu, “A simple and adaptive dispersion regression model for count data”, Entropy 20 (2016) 1. [29] R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R- project.org (2020). [30] R. Vinciotti, “DWreg: Parametric Regression for Dis- crete Response”, R package version 2.0. https://CRAN.R- project.org/package=DWreg (2016) [31] C. Kleiber & A. Zeileis, Applied Econometrics with R, New 232 Adesina / J. Nig. Soc. Phys. Sci. 3 (2021) 224–233 233 York: Springer-Verlag. ISBN 978-0-387-77316-2. URL https://CRAN.R- project.org/package=AER (2008) [32] P. E. Oguntunde, A. O. Adejumo & E. A. Owoloko, “Exponential Inverse Exponential (EIE) distribution”, Asian Journal of Scientific Research, 10 (2017) 169. [33] P. E. Oguntunde, O. A. Odetunmibi & A. O. Adejumo, “ On the exponen- tiated generalized Weibull distribution: a generalization of the Weibull distribution”, Indian Journal of Science and Technology 8 (2015) 1. 233