A Study of Count Regression Models for Mortality Rate CAUCHY โ€“ Jurnal Matematika Murni dan Aplikasi Volume 7(1) (2021), Pages 142-151 p-ISSN: 2086-0382; e-ISSN: 2477-3344 Submitted: October 16, 2021 Reviewed: November 04, 2021 Accepted: November 05, 2021 DOI: https://doi.org/10.18860/ca.v7i1.13642 A Study of Count Regression Models for Mortality Rate Anwar Fitrianto Department of Statistics, Faculty of Mathematics and Natural Sciences, IPB University Email: anwarstat@gmail.com ABSTRACT In this study, Poisson regression model, Negative Binomial 1 regression model (NEGBIN 1) and Negative Binomial regression 2 (NEGBIN 2) model were proposed to fit mortality rate data. The method used is comparing the values of Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) to find out which method suits the data the most. The results show that the data indeed display higher variability. Among the three models, the model preferred is NEGBIN 1 model. Keywords: Mortality; Poisson; Regression; Binomial; Overdispersion INTRODUCTION Count data contain variables that count how many times something has happened, such as the number of cases with a particular disease in epidemiology [1]. Linear regression models have often been applied to handle this kind of data, but the results are inefficient, inconsistent, and biased. This type of data is considered as count data with variable offset. Mortality data is considered as the amount of data that contains the offset variable. A study of mortality for middle-aged men on ischemic heart disease (IHD) that affects mortality has been conducted by [2]. The results showed that there were 46 of 109 deaths around 11.4 years of follow-up due to IHD. In addition to studies on causes of death other than IHD, [3] has researched the global impact of HIV/AIDS. Another study on mortality was conducted by [4] about the diarrheal disease. It has been found that diarrhea causes 1 in 9 child deaths worldwide, the second leading cause among children under 5 years of age. In addition, [5] examined the global causes of death due to disease in children under 5 years. In their study, diarrhea remained the second leading cause of death in children from infection in the last 30 years. In addition, malnutrition is said to be one of the world's worrisome problems. It affects about 6 million child deaths every year. [6] studied that poor nutrition during fetal development can cause severe physical damage, and malnutrition always increases susceptibility to disease. A study conducted by [7] stated that malnutrition (measured as poor anthropometric status) accounted for nearly 50% of childhood deaths. Regarding the problem of mortality due to disease, [8] stated that the trend of injuries and deaths from road traffic accidents (RTA) is becoming severe in countries such as India. Not a day goes by without an RTA in India; many people die or become disabled. In https://doi.org/10.18860/ca.v7i1.13642 mailto:anwarstat@gmail.com A Study of Count Regression Models for Mortality Rate Anwar Fitrianto 143 addition, suicide is one of the factors that contribute to the death rate. In a study by [9], suicidal behavior has always been a major health problem in many countries, both developed and developing countries. Poisson regression model is one of the general linear models for data with offset variables. It is also the standard model for calculating data and contingency tables. In this model, the response variable is assumed to have a Poisson distribution. In addition to Poisson regression, Negative Binomial regression is also a generalized linear model where the dependent variable is the number of events. The Negative Binomial distribution is a two-parameter distribution that is generally more flexible than the Poisson model [2]. This model can also model scattered quantities, which the Poisson model cannot. The Negative Binomial model can be derived from the Poisson distribution and the generalized Poisson distribution. [10] has discussed several other specific mortality measures, such as age-specific crude death rates, cause-specific mortality rates, and infant and maternal mortality rates. In the data collection process, there may be biased and inaccurate data measurements. The inaccuracy of this data collection will cause overdispersion. This study aims to identify the most suitable method when dealing with mortality data which usually has overdispersion. METHODS Data The data used in this study is mortality rate data which is available in [11]. The data consists of 163 observations (countries) with seven independent variables, which are the number of people dying per 100,000 live births due to IHD (๐‘ฅ1), diarrheal disease (๐‘ฅ2), HIV/AIDS (๐‘ฅ3), malaria (๐‘ฅ4), malnutrition (๐‘ฅ5), road accidents (๐‘ฅ6), and suicides (๐‘ฅ7). Count Regression Models According to [12], the count regression model has been suggested to be used to model over-dispersed and zero-inflated count response variables. Poisson regression is the standard model for modeling count data, while the Negative Binomial regression model is often introduced to solve count data with overdispersion. Meanwhile, the zero-inflated Poisson model (ZIP) and the zero-inflated Negative Binomial model (ZINB) are introduced to solve a zero-inflated variable in which the data contains many zeros. Moreover, [13] found that ZIP and ZINB can be obtained by mixing a distribution degenerate at zero with a Poisson regression and Negative Binomial regression, respectively. The probability mass function of the ZIP is, ๐‘ƒ(๐‘Œ = ๐‘ฆ๐‘–) = { ๐œ”๐‘– + (1 โˆ’ ๐œ”๐‘–) ๐‘’๐‘ฅ๐‘( ๐œ†๐‘–), ๐‘ฆ๐‘– = 0 (1 โˆ’ ๐œ”๐‘–) ๐œ†๐‘– ๐‘ฆ๐‘–! ๐‘’๐‘ฅ๐‘( ๐œ†๐‘–), ๐‘ฆ๐‘– > 0 (1) Meanwhile, the ZINB's probability mass function can be formulated as: ๐‘ƒ(๐‘Œ = ๐‘ฆ๐‘–) = { ๐œ”๐‘– + (1 โˆ’ ๐œ”๐‘–) ( ๐œƒ ๐œƒ+๐œ†๐‘– ) ๐œƒ , ๐‘ฆ๐‘– = 0 (1 โˆ’ ๐œ”๐‘–) ๐›ค(๐‘ฆ๐‘–+๐œƒ) ๐‘ฆ๐‘–!๐›ค(๐œƒ) ( ๐œƒ ๐œƒ+๐œ†๐‘– ) ( ๐œ†๐‘– ๐œƒ+๐œ†๐‘– ) ๐‘ฆ๐‘– , ๐‘ฆ๐‘– > 0 (2) with ๐œ†๐‘– = ๐‘’๐‘ฅ๐‘( ๐‘ฅ๐‘–๐›ฝ). The 0's arise with probability ๐œ” from a second process. The function F that relates to the product ๐‘ฅ๐‘–๐›พ to the probability ๐œ”๐‘– is named as the zero-inflated link function, ๐œ”๐‘– = ๐น(๐‘ฅ๐‘–๐›พ). A Study of Count Regression Models for Mortality Rate Anwar Fitrianto 144 Poisson Regression Model [14] studied about Poisson regression model as the standard model for count data. A variable Y is a count of events of Poisson regression, and the marginal probability of Poisson regression is written as: ๐‘ƒ(๐‘Œ = ๐‘ฆ๐‘–) = ๐‘’๐‘ฅ๐‘(โˆ’๐œ†๐‘–)๐œ†๐‘– ๐‘ฆ๐‘– ๐›ค(1+๐‘ฆ๐‘–) ; (3) with ๐œ†๐‘– = ๐‘’๐‘ฅ๐‘( ๐›ผ + ๐‘ฅ๐‘–๐›ฝ); ๐‘ฆ๐‘– = 0,1, . . . ๐‘. The rate parameter of Poisson regression is ๐œ†๐‘– and it is also known as its expected count is formulated as: ๐œ†๐‘– = ๐‘’ ๐›ฝ0+๐›ฝ1๐‘ฅ1+...+๐›ฝ๐‘๐‘ฅ๐‘ . (4) Based on Equation (4), the Log-linear model for mean rate is written as: ๐‘™๐‘œ๐‘”(๐œ†๐‘–) = ๐›ฝ0 + ๐›ฝ1๐‘ฅ1+. . . +๐›ฝ๐‘๐‘ฅ๐‘, (5) with p is the number of predictors or covariates in the model, ๐›ฝ0 is the intercept of the regression, ๐›ฝ๐‘ are the regression coefficients, and ๐‘ฅ๐‘– is the independent variable. [14] formulated Maximum Likelihood Estimation (MLE) of Poisson regression. Let Y be a random variable with Poisson distribution and with an unknown parameter value ๐œƒ. The probability mass function of Y is obtained, which is ๐‘ƒ๐‘ฆ(๐‘ฆ; ๐œƒ) to emphasize the parameter ๐œƒ and n is the independent trials in order to get the data ๐‘ฆ1, ๐‘ฆ2, ๐‘ฆ3, . . . , ๐‘ฆ๐‘›. The joint probability mass function is as follows: ๐‘ƒ๐‘Œ...๐‘Œ๐‘›(๐‘ฆ1, . . . , ๐‘ฆ๐‘›; ๐œƒ) = ๐‘ƒ๐‘ฆ(๐‘ฆ1; ๐œƒ). . . . ๐‘ƒ๐‘ฆ(๐‘ฆ๐‘›; ๐œƒ). (6) The likelihood of ๐œƒ given data ๐‘ฆ1, . . . , ๐‘ฆ๐‘› can be obtained from Equation (6) by applying the logarithm as follows: ๐ฟ(๐œƒ; ๐‘ฆ1, โ€ฆ , ๐‘ฆ1) = ๐‘ƒ๐‘Œ...๐‘Œ๐‘›1 (๐‘ฆ1, . . . , ๐‘ฆ๐‘›; ๐œƒ) = ๐‘ƒ๐‘ฆ(๐‘ฆ1; ๐œƒ). . . . ๐‘ƒ๐‘ฆ(๐‘ฆ๐‘›; ๐œƒ). (7) The estimated value maximizes the Maximum Likelihood Estimates ๐œƒ = ๐œƒ. Y follows a Poisson distribution with unknown parameters, and the data is collected from the independent trials are of the form ๐‘Œ1 = ๐‘ฆ1, ๐‘Œ2 = ๐‘ฆ2, . . . , ๐‘Œ๐‘› = ๐‘ฆ๐‘›. On the other hand, the likelihood function of the Poisson regression is written as: ๐ฟ = โˆ ๐‘’โˆ’๐œ†๐‘–๐œ† ๐‘– โˆ’๐‘ฆ๐‘– ๐‘ฆ๐‘–! ๐‘ ๐‘–=1 . (8) The log-likelihood function of Poisson regression is obtained by applying the logarithm of Equation (8), ๏€ฝl โˆ‘ ๐‘™๐‘œ๐‘” ( ๐‘’โˆ’๐œ†๐‘–๐œ† ๐‘– โˆ’๐‘ฆ๐‘– ๐‘ฆ๐‘–! )๐‘๐‘–=1 . The Standard Negative Binomial Regression Model According to [1], in most applications, the mean of the data is usually greater than the variance. If otherwise, it is called overdispersion in the particular data. But, based on the study of [15], the Poisson regression model is inefficient when dealing with overdispersed data. While in a study by [16], the Negative Binomial distribution is more flexible than Poisson distribution as it is a two-parameter when modeling the data with overdispersion. Particularly, Negative Binomial regression can model overdispersed counts. The Negative Binomial model can be derived as a mixture of the Gamma-Poisson model. Starting from the conditional mean of the Poisson model, ๐ธ(๐‘ฆ๐‘–|๐‘ฅ๐‘–. ๐œ€๐‘–) = ๐‘’๐‘ฅ๐‘( ๐›ผ + ๐‘ฅ๐‘–๐›ฝ + ๐œ€๐‘–) = โ„Ž๐‘–๐œ†๐‘–, (9) A Study of Count Regression Models for Mortality Rate Anwar Fitrianto 145 where โ„Ž๐‘– = ๐‘’๐‘ฅ๐‘(๐œ€๐‘–). In the case of the Poisson-Gamma distribution, ๐‘”(๐œƒ, ๐œƒ) is the Poisson distribution while โ„Ž๐‘– = ๐‘’๐‘ฅ๐‘(๐œ€๐‘–) follows Gamma distribution. The โ„Ž๐‘– is assumed to follow a two-parameter Gamma distribution, ๐‘“(โ„Ž๐‘–) = ๐œƒ๐œƒ ๐‘’๐‘ฅ๐‘(โˆ’๐œƒโ„Ž๐‘–)โ„Ž๐‘– ๐œƒโˆ’1 ๐›ค(๐œƒ) . (10) Once โ„Ž๐‘– has been integrated out from the joint distribution, then the marginal probability of Negative Binomial distribution is obtained as follows: ๐‘ƒ(๐‘Œ = ๐‘ฆ๐‘–|๐‘ฅ๐‘–) = ๐›ค(๐œƒ+๐‘ฆ๐‘–) ๐‘ฆ๐‘–!๐›ค(๐œƒ) ( ๐œƒ ๐œƒ+๐œ†๐‘– ) ๐œƒ ( ๐œ†๐‘– ๐œƒ+๐œ†๐‘– ) ๐‘ฆ๐‘– . (11) The mean of Negative Binomial is the same as Poisson regression, which is written as ๐ธ(๐‘ฆ๐‘–|๐‘ฅ๐‘–) = ๐œ†๐‘– = ๐‘’ ๐‘ฅ๐‘–๐›ฝ and the variance of a Negative Binomial is written as: ๐‘‰๐‘Ž๐‘Ÿ(๐‘ฆ๐‘–|๐‘ฅ๐‘–) = ๐œ†๐‘– [1 + ( 1 ๐œƒ ) ๐œ†๐‘–] = ๐œ†๐‘–(1 + ๐‘˜๐œ†๐‘–), (12) where ๐‘˜ = ๐‘‰๐‘Ž๐‘Ÿ(โ„Ž๐‘–). Moreover, the rate parameter of negative regression ๐œ†๐‘–, which is also known as its expected counts, is written as: ๐œ†๐‘– = ๐‘’ ๐›ฝ0+๐›ฝ1๐‘ฅ1+...+๐›ฝ๐‘๐‘ฅ๐‘ . (13) The Log-linear model for the mean rate of Negative Binomial regression can be obtained by applying the logarithm of Equation (13): ๐‘™๐‘œ๐‘”(๐œ†๐‘–) = ๐›ฝ0 + ๐›ฝ1๐‘ฅ1+. . . +๐›ฝ๐‘๐‘ฅ๐‘, (14) where p is the number of predictors or covariates in the model, ๐›ฝ0 is the intercept of the regression, ๐›ฝ๐‘ are the regression coefficients, and x's are the independent variables. [17] has discussed the MLE of Negative Binomial regression in which random samples of n subjects are given. In a standard Negative Binomial model, the dependent variables ๐‘ฆ๐‘– and the predictor variables ๐‘ฅ1๐‘–, ๐‘ฅ2๐‘–, โ€ฆ , ๐‘ฅ๐‘๐‘– are included. Predictor variables are combined to form the following matrix, ๐‘ฟ = [ 1 ๐‘ฅ11 โ€ฆ ๐‘ฅ1๐‘ 1 ๐‘ฅ12 โ‹ฏ ๐‘ฅ2๐‘ โ‹ฎ โ‹ฎ โ‹ฑ โ‹ฎ 1 ๐‘ฅ1๐‘› โ‹ฏ ๐‘ฅ๐‘›๐‘] . The ๐‘–๐‘กโ„Ž row of X is designated to be ๐‘ฅ๐‘– , from Equation (11), the ๐œ†๐‘– in which is replaced by ๐‘’๐‘ฅ๐‘–๐›ฝ. The Equation can be rewritten as, ๐‘ƒ(๐‘Œ = ๐‘ฆ๐‘–|๐‘ฅ๐‘–) = ๐›ค(๐œƒ+๐‘ฆ๐‘–) ๐‘ฆ๐‘–!๐›ค(๐œƒ) ( ๐œƒ ๐œƒ+๐‘’๐‘ฅ๐‘–๐›ฝ ) ๐œƒ ( ๐‘’๐‘ฅ๐‘–๐›ฝ ๐œƒ+๐‘’๐‘ฅ๐‘–๐›ฝ ) ๐‘ฆ๐‘– . (15) The likelihood function of Negative Binomial is stated as below, ๐ฟ = โˆ ฮ“(๐œƒ+๐‘ฆ๐‘–) ๐‘ฆ๐‘–!ฮ“(๐œƒ) ๐‘ ๐‘–=1 ( ๐œƒ ๐œƒ+๐‘’๐‘ฅ๐‘–๐›ฝ ) ๐œƒ ( ๐‘’๐‘ฅ๐‘–๐›ฝ ๐œƒ+๐‘’๐‘ฅ๐‘–๐›ฝ ) ๐‘ฆ๐‘– , (16) and the log-likelihood function of Negative Binomial regression is obtained by applying the logarithm to obtain the following equation: ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ๏ฑ๏ฑ ๏ฑ ๏ฑ๏ข ๏ฑ ๏ข ln1lnln1ln) 1 ln( 1 ๏€ญ๏€ซ๏‡๏€ญ๏€ซ๏‡๏€ซ๏ƒท๏ƒท ๏ƒธ ๏ƒถ ๏ƒง๏ƒง ๏ƒจ ๏ƒฆ ๏€ซ๏€ซ๏€ญ๏€ซ๏ƒท ๏ƒธ ๏ƒถ ๏ƒง ๏ƒจ ๏ƒฆ ๏€ฝ ๏ƒฅ ๏€ฝ ii i iii N i yy x yxyyl e . (17) A Study of Count Regression Models for Mortality Rate Anwar Fitrianto 146 Negative Binomial 1 Model and Negative Binomial P Model [18] have shown the Equation (9) is considered as a Negative Binomial 2 (NEGBIN 2) model. They re-parameterized the NEGBIN 2 model, and it is labeled as specification Negative Binomial 1 (NEGBIN 1), which is written as, ๐‘‰๐‘Ž๐‘Ÿ(๐‘ฆ๐‘–|๐‘ฅ๐‘–) = ๐œ†๐‘– + ๐‘˜๐œ†๐‘– = ๐‘‰๐‘Ž๐‘Ÿ(๐‘ฆ๐‘–|๐‘ฅ๐‘–) = ๐œ†๐‘–(1 + ๐‘˜). (18) The marginal probability of NEGBIN 1 is obtained by replacing ๐›ณ with ๐›ณ๐œ†๐‘– in Equation (11), ๐‘ƒ(๐‘Œ = ๐‘ฆ๐‘–|๐‘ฅ) = ฮ“(๐œƒ๐œ†๐‘–+๐‘ฆ๐‘–) ๐‘ฆ๐‘–!ฮ“(๐œƒ๐œ†๐‘–) ( ๐œƒ๐œ†๐‘– ๐œƒ๐œ†๐‘–+๐œ†๐‘– ) ๐œƒ ( ๐œ†๐‘– ๐œƒ๐œ†๐‘–+๐œ†๐‘– ) ๐‘ฆ๐‘– . (19) By replacing ๐œƒ with ๐œƒ๐œ†๐‘– 2โˆ’๐‘ƒ in Equation 19, the Negative Binomial P (NEGBIN P) model is written as: ๐‘ƒ(๐‘Œ = ๐‘ฆ๐‘–|๐‘ฅ๐‘–) = ๐›ค(๐œƒ๐œ†๐‘– 2โˆ’๐‘ƒ+๐‘ฆ๐‘–) ๐‘ฆ๐‘–!๐›ค(๐œƒ๐œ†๐‘– 2โˆ’๐‘ƒ) ( ๐œƒ๐œ†๐‘– 2โˆ’๐‘ƒ ๐œƒ๐œ†๐‘– 2โˆ’๐‘ƒ+๐œ†๐‘– ) ๐œƒ๐œ†๐‘– 2โˆ’๐‘ƒ ( ๐œ†๐‘– ๐œƒ๐œ†๐‘– 2โˆ’๐‘ƒ+๐œ†๐‘– ) ๐‘ฆ๐‘– . (20) Overdispersion [19] have proposed that in almost the statistical study for the count data, it is always assumed that the dependent variable follows the Poisson distribution. The mean is assumed to be equal to the variance. However, in real life, the variance is usually larger than the mean. [19] also stated that overdispersion indicates high variability around a model's fitted values in the Poisson formulation. This case will lead to a Negative Binomial model as a proposal to correct this problem. When the data are over-dispersed, the variance is not the same as its mean, or ๐‘‰๐‘Ž๐‘Ÿ(๐‘ฅ๐‘–) = ๐œ‘๐œ†, where ๐œ† is the mean. If ๐œ‘ = 1, the Poisson model is ordinary; if ๐œ‘ > 1, it means that the model is overdispersed model. Consequently, [20] stated that a unique property of distributions in exponential families is the conditional variance equal the conditional mean. The dispersion parameter, ๐œ‘. In the Poisson model, the dispersion parameter is set to constant value ๐œ‘ = 1. Count Data According to [16], count data indicates how many times or how frequent something happens. Furthermore, [18] stated that an event outcome is the number of times an event occurs while an event count is a nonnegative random variable. The examples of count data included the number of patients hospitalized, the number of thieves arrested, and the number of natural disasters. In some cases of count, data have offset variables. [21] said that offset variable is always being analyzed by the generalized linear model (GLM) and count regression model. The analysis is usually used whenever the data is recorded over an observed period. Offset is used to denote the period observed in GLM. Other than that, offset is usually defined as a measure of exposure. The exposure can be the number of house years incurred, and the response will be the number of claims incurred. The log-linear mean rate for Poisson regression and Negative Binomial model is, ๐‘™๐‘œ๐‘”(๐œ†๐‘–) = ๐›ฝ0 + ๐›ฝ1๐‘ฅ1+. . . +๐›ฝ๐‘๐‘ฅ๐‘, (21) when applying Poisson regression or Negative Binomial regression, the offset variable, ๐‘™๐‘œ๐‘”(๐‘ก) is added A Study of Count Regression Models for Mortality Rate Anwar Fitrianto 147 ๐‘™๐‘œ๐‘”(๐œ†๐‘–) = ๐›ฝ0 + ๐›ฝ1๐‘ฅ1 + โ‹ฏ + ๐›ฝ๐‘๐‘ฅ๐‘ + ๐‘™๐‘œ๐‘”( ๐‘ก) (22) ๐‘™๐‘œ๐‘”(๐œ†๐‘–) โˆ’ ๐‘™๐‘œ๐‘”( ๐‘ก) = ๐›ฝ0 + ๐›ฝ1๐‘ฅ1+. . . +๐›ฝ๐‘๐‘ฅ๐‘ ๐‘™๐‘œ๐‘” ( ๐œ†๐‘– ๐‘ก ) = ๐›ฝ0 + ๐›ฝ1๐‘ฅ1+. . . +๐›ฝ๐‘๐‘ฅ๐‘. ๐‘™๐‘œ๐‘”(๐œ†๐‘–) โˆ’ ๐‘™๐‘œ๐‘”( ๐‘ก) = ๐›ฝ0 + ๐›ฝ1๐‘ฅ1+. . . +๐›ฝ๐‘๐‘ฅ๐‘ ๐‘™๐‘œ๐‘” ( ๐œ†๐‘– ๐‘ก ) = ๐›ฝ0 + ๐›ฝ1๐‘ฅ1+. . . +๐›ฝ๐‘๐‘ฅ๐‘, where p is the number of predictors or covariates in the model, ๐›ฝ0 is the intercept of the regression, ๐›ฝ๐‘ are the coefficients of the regression, ๐‘ฅ is the independent variable, t is the period observed (exposure), log (t) is the offset variable and ๐œ†๐‘– ๐‘ก is the rate. In this study, our interest is in modeling for the mortality data, which is count data. Poisson regression and Negative Binomial regression are generally appropriate to deal with the count data. In this research, our interest is to find out which regression best fits the mortality data. Modelling the Mortality Rate Data Poisson regression and Negative Binomial regression are the main study in this research in modeling the data. The model for Poisson model and Negative Binomial model are written as Equation (22), where p is the number of predictors or covariates in the model, ๐›ฝ0 is the intercept of the regression, ๐›ฝ๐‘ is the covariate coefficients, and ๐‘ฅ is the independent variable. The ๐‘™๐‘œ๐‘” ( ๐œ†๐‘– ๐‘ก ) represents the number of people dying per time unit and the function ฮฒx is the relationship of death rate changes as a function of subject covariates. The null hypothesis states the slope is equal to zero, whereas the alternative hypothesis indicates the slope is not equal to zero. Goodness-of-fit Test Deviance and Person's Chi-Square will be carried out to check if the data has over- dispersion or under-dispersion. The results of deviance and Pearson's Chi-Square that are divided by the degree of freedom (df) should be approximately equal to one. If the values are more than one, this indicates that the data is overdispersion. Goodness-of-fit is performed by using the PROC GENMOD statement in SAS. Deviance for fitted Poisson regression and Negative Binomial regression is written as: ๐ท = 2 โˆ‘ {๐‘ฆ๐‘–๐‘™๐‘œ๐‘” ( ๐‘ฅ๐‘– ๐‘ฆ๐‘– ) โˆ’ (๐‘ฅ๐‘– โˆ’ ๐œ†๐‘–)} ๐‘› ๐‘–=1 . (23) And the Pearsonโ€™s Chi-Square is defined as, ๐œ’2 = โˆ‘ (๐‘ฅ๐‘–โˆ’๐œ†๐‘–) 2 ๐‘‰๐‘Ž๐‘Ÿ(๐‘ฅ๐‘–) ๐‘› ๐‘–=1 , (24) where ๐œ†๐‘– = ๐‘’ ๐›ฝ0+๐›ฝ1๐‘ฅ1+...+๐›ฝ๐‘๐‘ฅ๐‘ . A Study of Count Regression Models for Mortality Rate Anwar Fitrianto 148 RESULTS AND DISCUSSION Mortality Rate Data Models PROC GENMOD statement in SAS version 9.4 was used to run the Poisson regression analysis. At 5% level of significance, all independent variables contributed significantly to the mortality rate with the following estimated Poisson regression model (Table 1): ( ๐‘™๐‘œ๐‘”(๐œ†๐‘–) ๐‘ก ) ฬ‚ =6.5834 + 0.0008๐‘ฅ1+ 0.0039๐‘ฅ2+0.0010๐‘ฅ3+0.004๐‘ฅ4-0.003๐‘ฅ5โ€“ 0.0123๐‘ฅ6+0.0081๐‘ฅ7 Table 1. Analysis of Maximum Likelihood Parameter Estimates for Poisson Regression Parameter Degree of Freedom Estimate Standard Error Chi-Square Pr > Chi- Square Intercept 1 6.5834 0.0083 6255069.00 <.0001 ๐‘ฅ1 1 0.0008 0.0000 507.71 <.0001 ๐‘ฅ2 1 0.0039 0.0002 312.25 <.0001 ๐‘ฅ3 1 0.0010 0.0000 1787.60 <.0001 ๐‘ฅ4 1 0.0046 0.0002 529.74 <.0001 ๐‘ฅ5 1 -0.0032 0.0003 95.57 <.0001 ๐‘ฅ6 1 -0.0123 0.0004 1092.88 <.0001 ๐‘ฅ7 1 0.0081 0.0004 463.89 <.0001 The estimated Poisson model, along with the standard error of each estimated coefficient and p values, indicated that the IHD, diarrheal disease, AIDS/HIV, malaria, malnutrition, road accidents and suicides were significant predictors contributing to the mortality rate. As an alternative to the Poisson regression model, the data were also analyzed using the Negative Binomial model. Table 2 displays the result of the analysis based on maximum likelihood estimation for the Negative Binomial regression. Table 2. Analysis of Maximum Likelihood Parameter Estimates for Negative Binomial Regression Parameter Degree of Freedom Estimate Standard Error Chi-Square Pr > Chi - Square Intercept 1 6.5602 0.00875 5616.57 <.0001 ๐‘ฅ1 1 0.0008 0.0004 4.02 0.0451 ๐‘ฅ2 1 0.0046 0.0026 3.16 0.0755 ๐‘ฅ3 1 0.0011 0.0003 13.13 0.0003 ๐‘ฅ4 1 0.0054 0.0022 5.95 0.0147 ๐‘ฅ5 1 -0.0041 0.0038 1.19 0.2750 ๐‘ฅ6 1 -0.0138 0.0037 13.88 0.0002 ๐‘ฅ7 1 0.0112 0.0045 6.13 0.0133 Fitting the data using the Negative Binomial regression model found that all independent variables are except ๐‘ฅ2 (diarrheal disease) and ๐‘ฅ5(malnutrition) contribute significantly to the mortality rate. Both variables have a more considerable p value (0.0755 for diarrheal diseases and 0.2750 for malnutrition). Hence, diarrheal disease and malnutrition were not significant predictors, while the other variables IHD, AIDS/HIV, malaria, road accidents, and suicides, were the significant predictors. The predicted model using the Negative Binomial regression model for the mortality rate data is written as, ( ๐‘™๐‘œ๐‘”(๐œ†๐‘–) ๐‘ก ) ฬ‚ =6.5602+0.0008๐‘ฅ1+0.0046๐‘ฅ2+0.0011๐‘ฅ3+0.0054๐‘ฅ4-0.0041๐‘ฅ5-.0138๐‘ฅ6+0.0112๐‘ฅ7 A Study of Count Regression Models for Mortality Rate Anwar Fitrianto 149 Descriptive Statistics of the Variables for Checking Overdispersion When the variance of a particular variable is higher than its mean, it indicates that the data has overdispersion. In this study, the dependent variable's mean and variance were 824.0061 and 105125.22, respectively, indicating overdispersion. Table 3 displays the means and variances of all the variables in the study. All the variables were over- dispersed and more considerable variability was given around a model's fitted values in Poisson regression, ๐‘‰๐‘Ž๐‘Ÿ(๐‘ฅ๐‘–) =๐œ‘๐œ†, ๐œ‘ >1. As a consequence, the Negative Binomial regression was the better approach for modeling over-dispersed count data. Table 3. The Mean and the Variance for Each Variable Variable Mean Variance Mortality 824.0061 105125.22 IHD 114.5747 6031.84 Diarrhoel Disease 22.0791 1118.02 AIDS/HIV 46.6652 11754.18 Malaria 11.5688 481.2379 Malnutrition 12.7489 477.6099 Road Accidents 17.2767 91.6093 Suicides 10.0120 52.6231 Goodness-of-fit Test for Poisson Regression and Negative Binomial Regression The main purpose of the goodness-of-fit test is to determine a more appropriate model. Table 4 presents the deviance and Pearson's Chi-Square to observe whether the deviance and Pearson's Chi-Square obtained close to one. Table 4. Goodness-of-fit Test for Poisson Regression and Negative Binomial Regression Model Criterion df Value Value/df Poisson Regression Deviance 155 15081.6228 97.3008 Pearson's Chi-Square 155 14196.5148 91.5904 Negative Binomial Regression Deviance 155 167.1663 1.0785 Pearson's Chi-Square 155 131.1002 0.8458 The value/df column of Deviance and Pearson's Chi-Square for the Poisson model were 97.3008 and 91.5904, respectively, which were remarkably higher than one. The Poisson model did not correctly describe the data. There was more significant variability among counts than will be expected for Poisson distribution. This situation arises because repeated subjects may not be independent. One of the possible reasons for the overdispersion is that experimental conditions are not under control, hence ๐œ†๐‘– varies with uncontrolled factors. The table shows that the Negative Binomial regression was the better alternative to model the mortality rate. The value/df of the Deviance and Pearson's Chi-Square were 1.0785 and 0.8458, respectively. Both values were closer to one as compared to the corresponding values in the Poisson regression model. Comparison between Poisson Regression, Negative Binomial 1 and Negative Binomial 2. Comparisons between all the three proposed models for the mortality data were given in Table 5. The AIC for Poisson regression was larger compared to the other two. The AIC value for NEGBIN 1 was slightly smaller than the one for NEGBIN 2. It indicated that NEGBIN l was a better fit than Poisson regression and NEGBIN 2. On the other hand, the BIC values for the three regressions were 16501, 2345, and 2347, respectively, for A Study of Count Regression Models for Mortality Rate Anwar Fitrianto 150 Poisson, NEGBIN 1, and NEGBIN 2. The BIC value for Poisson regression was much higher when compared to the Negative Binomial regressions. Thus, with lower AIC and BIC values, the NEGBIN 1 was the better approach for the mortality rate data since it can explain more variation with the same number of independent variables.. Table 5. AIC and BIC Values Between Fifferent Regression Models Regressions AIC BIC Poisson 16476 16501 NEGBIN 1 2317 2315 NEGBIN 2 2319 2347 CONCLUSIONS The analysis was conducted to compare the performance of three models: Poisson regression, NEGBIN 1 and NEGBIN 2. The NEGBIN 1 has been proven that it is the most appropriate model for overdispersed data. The mean and the variance were calculated to ensure that data has overdispersion. Since the data were overdispersed, the results of deviance and Pearson's Chi-Square showed that Negative Binomial was a better model for the data. Then, the performance of AIC and BIC showed that NEGBIN 1 is a better model, followed by NEGBIN 2 and Poisson regression. REFERENCES [1] Tutz, G. Regression for categorical data, Cambridge University Press, New York, 2012. [2] Kumpusalo, E., Lakka, H.N., Laaksomen, D.E., Lakka, T.A., Niskanen, L.K., Salonen, J.T. and Tuomilehto, J., "The metabolic syndrome and total and cardiovascular disease mortality in middle-age men", Journal of American Medical Association, vol. 288, no. 3, pp. 2708-2716, 2002. [3] Gayle, H.D. and Hill, G.L. "Global impact of human immunodeficiency virus and AIDS". Clinical Microbiology Reviews, vol. 14, no. 2, pp. 327 โ€“ 335, 2001. [4] Breman, J.G., Jamison, D.T., and Measam, A.R., Disease control priorities in developing countries 2nd Edition: Worldbank, Washington (DC), 2006. [5] Claudio F. Lanata, Christa L. Fischer-Walker and Ana C. Olascoaga, Carla X. Torres, Martin J. Aryee, Robert E. Black, "Global causes of diarrheal disease mortality in children <5 years of age: a systematic review", PLoS ONE. vol. 8, no.9, pp. 1-11, 2013. [6] Bassett, L. and Levinson, F.J. Malnutrition is still a major contributor to child death. Population Reference Bureau, Boston, 2007. [7] Black, R.E., Hyder, A., Sacco, L. and Rice, A.L., Malnutrition as an underlying cause of childhood deaths associated with infectious diseases in developing countries. World Health Organization. Bulletin of the World Health Organization, 2000. [8] Gopalakrishnan, S. , "A public health perspective of road traffic accidents", Journal of Family Medicine and Primary Care, vol. 1, no. 2, pp. 144โ€“150, 2012. [9] Wasserman, D., Cheng, Q., and Jiang, G.X., "Global suicide rates among young people age 15 โ€“ 19", World Psychiatry, vol. 4, no. 2, pp. 114 โ€“120, 2005. A Study of Count Regression Models for Mortality Rate Anwar Fitrianto 151 [10 Sheil, D, Alder, D., and Burshem, D., "The interpretation and misinterpretation of mortality rate measures", Journal of Ecology, vol. 83, no. 2, pp. 331โ€“333, 1995. [11] Koontz, D, Life Expectancy,, http://www.worldlifeexpectancy.com/life-expectancy- research (accessed 12 August 2021), 2021. [12] ร–zmen, I. and Fayome, F., "Count regressions model with an application to zoological data containing structural zero", Journal of Data Science, vol. 5, no. 4, pp. 491-502, 2007. [13] Ismail, N. and Zamani, H.,"Estimation of claim count data using Negative Binomal, Generalized Poisson, Zero-Inflated Negative Binomial and Zero-Inflated Generalized Poisson model", Casualty Acturial Society E-Form. vol. 41, no. 20, pp. 1-28, 2013. [14] Greene, W. Functional form and heterogeneity in models for count data, Now Publisher Inc, 2008. [15] Gourieroux, C., Monfort, A., and Trognon, A. , "Pseudo maximum likelihood methods", Econometrica, vol. 52, no. 3,pp. 681-700, 1984. [16] Long, J. S. and Freese, J., Regression models for categorical dependent variables using Stata, Second Edition, College Station, TX: Stata Press, 2006. [17] Zwilling, M.L., "Negative Binomial regression", The Mathematical Journal, vol. 15, no. 1, pp. 1-18, 2013. [18] Cameron, C.A. and Trivedi, P.K., Regression aof count data, Cambridge University Press, Cambridge, 2013. [19] Berk, R and MacDonald, J., "Overdispersion and Poisson regression", Journal of Quantitative Criminology, vol. 24, no. 3, pp. 269-284. [20] Turner, H., Introduction to generalized linear model, ESRC National Centre for Research Method, 2008. [21] Yan, J., Guszcza, J., Flynn, M., and Wu, C.S., "Applications of the offset in property- casualty predictive modeling", Casualty Actuarial Society E-Forum, vol. 1, no. 1., pp. 366-385. http://www.worldlifeexpectancy.com/life-