www.biomathforum.org/biomath/index.php/biomath

ORIGINAL ARTICLE

Modeling outbreak data: Analysis of a 2012
Ebola virus disease epidemic in DRC

Boseung Choi∗, Sydney Busch†, Dieudonné Kazadi††‡‡, Benoit Ilunga‡‡, Emile Okitolonda††,
Yi Dai‡, Robert Lumpkin§, Omar Saucedo¶, Wasiur R. KhudaBukhsh‡¶, Joseph Tien§,

Marcel Yotebieng‖, Eben Kenah‡, Grzegorz A. Rempala‡§¶∗∗
∗Department of National Statistics, Korea University Sejoung Campus

Sejoung, Republic of Korea
†Department of Mathematics, Augsburg College

Minneapolis, MN, USA
††Ministry of Health, Democratic Republic of the Congo
‡‡School of Public Health, University of Kinshasa

Kinshasa, Democratic Republic of the Congo
‡Division of Biostatistics, College of Public Health

§Department of Mathematics
¶Mathematical Biosciences Institute

‖Division of Epidemiology, College of Public Health
The Ohio State University, Columbus, OH, USA
∗∗Corresponding author: rempala.3@osu.edu

Received: 8 September 2019, accepted: 3 October 2019, published: 15 October 2019

Abstract—We describe two approaches to model-
ing data from a small to moderate-sized epidemic
outbreak. The first approach is based on a branching
process approximation and direct analysis of the
transmission network, whereas the second one is
based on a survival model derived from the clas-
sical SIR equations with no explicit transmission
information. We compare these approaches using
data from a 2012 outbreak of Ebola virus disease
caused by Bundibugyo ebolavirus in city of Isiro,
Democratic Republic of the Congo. The branching
process model allows for a direct comparison of

disease transmission across different environments,
such as the general community or the Ebola treat-
ment unit. However, the survival model appears to
yield parameter estimates with more accuracy and
better precision in some circumstances.

Keywords-parameter estimation; branching pro-
cess; Markov Chain Monte-Carlo methods; survival
dynamical system;

I. INTRODUCTION
On August 1, 2018, the Ministry of Health of

the Democratic Republic of the Congo (DRC)

Copyright: c©2019 Choi et al. This article is distributed under the terms of the Creative Commons Attribution License (CC
BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source
are credited.
Citation: Boseung Choi, Sydney Busch, Dieudonné Kazadi, Benoit Ilunga, Emile Okitolonda, Yi Dai,
Robert Lumpkin, Omar Saucedo, Wasiur R. KhudaBukhsh, Joseph Tien, Marcel Yotebieng, Eben
Kenah, Grzegorz A. Rempala, Modeling outbreak data: Analysis of a 2012 Ebola virus disease
epidemic in DRC, Biomath 8 (2019), 1910037, http://dx.doi.org/10.11145/j.biomath.2019.10.037

Page 1 of 12

http://www.biomathforum.org/biomath/index.php/biomath
https://creativecommons.org/licenses/by/4.0/
https://creativecommons.org/licenses/by/4.0/
http://dx.doi.org/10.11145/j.biomath.2019.10.037


B. Choi et al., Modeling outbreak data: Analysis of a 2012 Ebola virus disease epidemic in DRC

reported an outbreak of Ebola virus disease (EVD)
in North Kivu Province. At the time of writing
about one year later, confirmed and probable cases
have been reported in nine health zones of North
Kivu and Ituri provinces (including the provincial
capital city of Goma), threatening further spread
of the epidemic into neighboring provinces and
the countries of Uganda and Rwanda. The current
outbreak area is roughly 780 miles away from
Equateur province, where an earlier Ebola out-
break was reported in May 2018. This persistent
reoccurrence of Ebola in the DRC as well as
elsewhere in Africa is a reminder that another
large pandemic like the 2013-2016 West African
Ebola epidemic remains possible. To control fu-
ture outbreaks and to better understand patterns
of transmission in households and at health care
facilities, it is essential to carefully analyze well-
documented historic data from past outbreaks.

The current paper is concerned with model-
ing data from a small 2012 Ebola virus disease
(EVD) outbreak caused by Bundibugyo ebolavirus
(BDBV) in the Isiro municipality in DRC. The
interesting feature of this dataset is that it includes
partial contact information on Ebola cases treated
either in the community or in healthcare facilities,
which allows for network-based inference. Despite
the fact that such inference has been an extremely
active area of research in the past 20 years [1]–
[5], there have been relatively few well docu-
mented historical datasets from real epidemics.
For our purpose of analyzing a relatively small
network, we here apply the edge-based approach
of Miller and Volz [15]–[17] and compare it with
the recently proposed simple non-network survival
dynamical system model [18].

The paper is organized as follows. In the re-
mainder of this section we give some basic back-
ground information on the Ebola virus and de-
scribe the Isiro outbreak dataset to be analyzed. In
Sections 2 and 3, we outline the proposed statisti-
cal models, use them to analyze the Isiro outbreak
data, and describe the results. In Section 4, we
offer some brief concluding remarks.

A. Ebola virus disease

EVD in humans is a severe hemorrhagic fever
caused by four of the five viruses of the genus
Ebolavirus in the viral family Filoviridae [19]. The
virus was initially characterized in 1976 during
an outbreak in DRC (known then as Zaire) and
has subsequently caused at least 26 additional
outbreaks in human populations [20], [21]. Since
2000, there have been 8 recorded outbreaks in the
DRC. The ongoing outbreak in the North Kivu
province is the most serious to date, and Maganga
et al. [22] describe an earlier serious outbreak
that occurred in the Boende region of Équateur
Province in 2014. Data is available in the literature
for at least 4 out of these 8 outbreaks, although
these data are not considered fully reliable because
the majority of the outbreaks occurred in remote
areas. Although EVD outbreaks in DRC have
been limited to fewer than 500 cases to date, the
potential for dangerous spread across the African
continent was demonstrated by the 2013-2016
West African Ebola epidemic, which resulted in
more than 28,600 cases and 11,313 deaths across
ten countries [23] .

The introduction of the virus into human com-
munities is likely the result of sporadic zoonotic
events. Several species of fruit bats native to areas
endemic to Ebola have been implicated as the nat-
ural reservoir for the disease [19]. Upon infection,
the virus will typically incubate for a period of
two to 21 days [23]. After this period, the typical
clinical presentation is a mix of severe symptoms
that may include fever, nausea, diarrhea, vomit-
ing, chest pain, dyspnea, cough, ocular edema,
hypotension, conjunctivitis, headaches, coma, and
hemorrhaging [24].

Human-to-human Ebola transmission is be-
lieved to occur via close contact with infected bod-
ily fluids and, therefore, individuals such as family
members of cases and healthcare workers have
significantly increased risk of infection [25]–[28].
In addition, viral load (which changes throughout
the course of illness and is at the highest level
immediately after death) is known to impact the
probability of transmission [29], [30]. Transmis-

Biomath 8 (2019), 1910037, http://dx.doi.org/10.11145/j.biomath.2019.10.037 Page 2 of 12

http://dx.doi.org/10.11145/j.biomath.2019.10.037


B. Choi et al., Modeling outbreak data: Analysis of a 2012 Ebola virus disease epidemic in DRC

sion dynamics are further complicated by the
fact that an infectious individual’s contacts may
vary throughout disease progression. Accounting
for these dynamics through contact tracing and
network analysis is vital to understanding the
disease, as studies of previous outbreaks have
noted disproportionately high rates of infection
among women, healthcare workers, and family
members [23], [25], [27], [31].

Practical strategies have been implemented to
reduce the risk of transmission, including bar-
rier nursing methods, safe burial practices, and
the creation of isolation units within treatment
centers [31], [32]. Nevertheless, the scope of the
2013-2016 West African Ebola epidemic suggests
that a deeper quantitative understanding of these
dynamics might be needed to control transmission
more effectively. While an effective vaccine is
now available [33], the question of how and when
to most effectively use available resources during
an ongoing epidemic remains in need of further
quantitative study.

Prior to the West African Ebola epidemic, only
a handful of mathematical models for Ebola had
been studied [34]–[37]. Chowell et al. [34] and
Lekone and Finkenstadt [35] used an SEIR frame-
work to determine the effect of interventions on the
1995 DRC and 2000 Uganda outbreaks. Legrand
et al. [36] formulated a stochastic compartmental
model that accounted for transmission in several
epidemiological settings by introducing compart-
ments for hospitalized individuals and dead Ebola
cases, who can transmit the disease during funer-
als.

The West African epidemic highlighted the
critical need for a better understanding of the
Ebola transmission dynamics and potential control
measures. Consequently, there has been an out-
pouring of Ebola models including deterministic
compartmental models [38]–[40], stochastic mod-
els [41]–[44] and multi-type branching process
models [45]. Many of these adapted the SEIHRF
framework, which includes hospital and funeral
compartments [26], [46]–[48].

Recent work has considered spatial aspects of

Ebola virus transmission and the effect of cluster-
ing in the population. The spatiotemporal spread of
Ebola has been studied with a county-level multi-
patch model employing mobile data [41], with
spatial individual-based models for international
spread [43], and transmission between house-
holds [44]. Scarpino et al. [26] used a phylody-
namic model to reconstruct chains of transmission
for cases occurring in Sierra Leone in June 2014.
Their fit of an SEIR model with the Rand-style [1]
pair approximation gave evidence for the presence
of clustering within the population. The same pair
approximation was employed by Wells et al. [46]
in their investigation of the effectiveness of case
isolation and ring vaccination. Their model had
10 compartments and required 65 equations after
closure at the level of pairs.

B. Isiro EVD outbreak data

Isiro is a municipality in the north-east part
of DRC that is the capital of Haut-Uele dis-
trict. It is situated between equatorial forest and
savannah. The Isiro dataset contains information
about the 2012 EVD outbreak caused by Bundibu-
gyo ebolavirus (BDBV). In total, there were 62
cases of infection listed as either probable or
confirmed with 52 of them having proper clinical
information allowing for a more detailed study.
As shown in Figure 1(a), these were divided
into community cases and Ebola treatment center
(ETC) cases based on the source of the infecting
contact. Among community cases, there was over-
representation of females (85.3%) and of individu-
als aged 15-54 years (82.4%). The mean duration
of EVD was 18 days, and the mean incubation
period was 11.3 days [49].

We were able to obtain additional information
concerning contacts for most individual cases from
the DRC Ministry of Health. However, unlike
traditional contact tracing, these additional records
contained a list of potentially infecting individuals
for each case. Using this information, we were
able to track the likely number of people each case
infected as well as the overall contact network (for
most of the EVD cases) in both the community
and the ETC. To refine the transmission network,

Biomath 8 (2019), 1910037, http://dx.doi.org/10.11145/j.biomath.2019.10.037 Page 3 of 12

http://dx.doi.org/10.11145/j.biomath.2019.10.037


B. Choi et al., Modeling outbreak data: Analysis of a 2012 Ebola virus disease epidemic in DRC

(a) (b)

Fig. 1: 2012 Isiro EVD data and model. Panel (a): Summary of available Isiro cases used in current
analysis of the transmission dynamics in the community and ETC. This data is a subset of 52 cases
described in [49]. Panel (b): Example of transmission data reconstructed from the Isiro outbreak files.
Dark figures represent primary cases and secondary cases who infected others. All others represent
infected who did not transmit. All cases of transmission ambiguity (multiple in-arrows) were resolved
uniformly at random.

we used occupation and other socioeconomic in-
formation as available. Of the 52 documented
infections in the Isiro epidemic, clinical documen-
tation of contacts among 48 cases (17 probable,
31 confirmed) could be retrieved. Out of these,
there were 37 community cases1 (13 suspected or
probable and 24 confirmed) who never reached the
ETC. The 11 cases that reached the ETC were all
confirmed as EVD [49].

II. STATISTICAL MODELS

In this paper, we consider two different statisti-
cal models for the Isiro EVD outbreak. The first
one is based on a branching process approximation
of the virus transmission network, which appears
especially appropriate for directly comparing out-
break parameters in different environments such as
the village community and the ETC. The second
model is based on a survival analysis approach
that assumes homogenous contact patterns among
susceptibles and infectives. This assumption is
more likely to be appropriate among community
cases than ETC cases.

1We note here a slight discrepancy with [49] where only
34 cases were classified as community.

A. Branching model

1) Primary cases: For a primary (or index)
case i represented by a node of degree di, the
distribution of the number of secondary infections
created by i (say, Xi) conditionally on the degree
and the infection period (say, ti) is given by

P(Xi = xi|ti,di) =
(
di
xi

)
pxiti (1 −pti)

di−xi. (1)

We see therefore that the distribution function of
Xi is binomial with di trials and the probability of
success pti. Under the assumption that infectious
contacts follow a Poisson process with rate β, the
probability of a successful infection of a given
neighbor by case i before time ti is

pti = 1 −e
−βti.

The infectious period (i.e., time from symptom
onset to removal) for each case is assumed to
follow an exponential distribution with rate γ, so
its density is

f(ti) = γe
−γti. (2)

For the i-th index case the joint conditional proba-
bility distribution of (xi, ti) given di is the product

Biomath 8 (2019), 1910037, http://dx.doi.org/10.11145/j.biomath.2019.10.037 Page 4 of 12

http://dx.doi.org/10.11145/j.biomath.2019.10.037


B. Choi et al., Modeling outbreak data: Analysis of a 2012 Ebola virus disease epidemic in DRC

of (1) and (2):

πdi(xi, ti) = P(Xi = xi|ti,di)f(ti)

=

(
di
xi

)
γ(1 −e−βti)xie−(β(di−xi)+γ)ti.

Without any prior information on the degree
di of the i-th index case, the unconditional joint
probability distribution of the number of infections
and the infection period is then given by

f(xi, ti) =
∑
di≥xi

qdiπdi(xi, ti) (3)

where qd is the probability of degree d. Here,
we consider the Poisson distribution with mean
λ,which appears to fit well the transmission net-
work for the Isiro epidemic. With this degree dis-
tribution, the final form of (3) admits the following
closed-form expression parametrized by the triple
θ = (β,γ,λ) ∈ R3>0:

fθ(xi, ti)

=
∑
di≥xi

λdie−λ

di!

(
di
xi

)
γ(1−e−βti)xie−(β(di−xi)+γ)ti

=
[λ(1 −e−βti)]xi

xi!
e−λ(1−e

−βti)γe−γti. (4)

2) Secondary cases: Consider now the joint
distribution of the number of infections and the
infection period for a secondary (i.e., non-index)
case. For tractability, we assume the random net-
work follows the configuration model (CM), so
edges are formed uniformly at random (excluding
multiple edges and self-loops) given the degrees
of all nodes. For such networks, we define a
secondary case as an individual (node) to whom
the infection was successfully transmitted (see also
[50]). Note that since by definition the secondary
case has one infecting neighbor, its degree avail-
able for further infection decreases by one. This
decreased degree is often referred to as the excess
degree in the literature.

For secondary cases, we can modify the primary
case model by replacing the degree with the excess
degree. Let q′dj denote the excess degree probabil-
ity for a secondary case with degree dj > 0, and let

µ ∈ (0,∞) be the mean of the degree distribution.
Then it is known ( [51]) that for the CM network

q′dj =
djqdj
µ

.

In the case of the Poisson degree distribution, λ =
µ and it is easy to check that

q′dj = qdj−1. (5)

Let x′j and t
′
j be the number of infections and

the infection period for the secondary case j.
Equations (3) and (5) give us

f(x′j, t
′
j) =

∑
dj>x

′
j

q′djπdj−1(x
′
j, t
′
j)

=
∑
dj>x

′
j

qdj−1πdj−1(x
′
j, t
′
j)

= fθ(x
′
j, t
′
j),

(6)

where fθ is given by (4).
For known θ = (β,γ,λ), we can calculate

the key characteristic of an epidemic known as
the basic reproduction number (R0). It may be
interpreted as the average number of secondary
infections caused by a primary case. For an ar-
bitrary degree distribution the definition of R0
for CM graph is given, for instance, in [52]. For
the Poisson degree distribution with exponential
infectious periods, we have simply

R0 =
βλ

β + γ
. (7)

3) Likelihood and estimation: Suppose the ob-
served numbers of infections and infectious peri-
ods for m primary cases are x = {x1,x2, . . . ,xm}
and t = {t1, t2, . . . , tm}, and those for n sec-
ondary cases are x′ = {x′1,x

′
2, . . . ,x

′
n} and t′ =

{t′1, t
′
2, . . . , t

′
n}. Recall that θ = (β,γ,λ) is the

vector of parameters that need to be estimated.
Here β is the transmission rate of infection, and
γ−1 is the mean infectious period, and λ is the
mean degree. For the observed data, the likelihood
function for θ can be constructed using (4) and (6):

L(θ|x,t,x′,t′) ∝
m∏
i=1

fθ(xi, ti)

n∏
j=1

fθ(x
′
j, t
′
j).

Biomath 8 (2019), 1910037, http://dx.doi.org/10.11145/j.biomath.2019.10.037 Page 5 of 12

http://dx.doi.org/10.11145/j.biomath.2019.10.037


B. Choi et al., Modeling outbreak data: Analysis of a 2012 Ebola virus disease epidemic in DRC

We could calculate the maximum likelihood esti-
mates (MLEs) of θ by maximizing the expression
above using numerical optimization. However,
for better stability and reproducibility of results,
we will use a Bayesian Markov Chain Monte
Carlo (MCMC) procedure. We assign independent
gamma prior distributions for the components of
θ. Specifically, let

π(β) ∼ Γ(aβ,bβ)
π(γ) ∼ Γ(aγ,bγ)
π(λ) ∼ Γ(aλ,bλ). (8)

Given the data x, t, x′, and t′, the Bayesian
estimate of θ is the mean of the joint posterior
distribution

Lp(θ|x,t,x′,t′) ∝L(θ|x,t,x′,t′)π(β)π(γ)π(λ).
(9)

This mean may be approximated with an empirical
average from the following converged MCMC
sampler:

Algorithm 1 MCMC posterior sampler for the
branching process model

1: Initialize θ with θ0 = (β0,γ0,λ0).
2: Sample β via a Metropolis-Hastings step [53]

with truncated normal proposal for the
target conditional distribution of β given
x,t,x′,t′,λ.

3: Sample γ directly from its conditional distri-
bution

γ|x,t,x′,t′ ∝ γm+n+aγ−1e−γ(
∑
ti+

∑
t′j+bγ).

4: Sample λ via another Metropolis-Hastings
step (similar to Step 2) for the target condi-
tional distribution of λ given x,t,x′,t′,β.

5: Return to Step 2 and repeat until convergence.

B. Survival model

Under the survival model, it is assumed that
the population at risk (which is possibly much
smaller than the set of all initially susceptible indi-
viduals) interacts homogeneously according to the
Kermack-McKendrick model (see below). Hence

it is problematic to apply this model in the case of
an ETC where the homogenous mixing is likely
not satisfied. Consequently, we apply this model
only to the community outbreak data. In what fol-
lows it is convenient to write the usual Kermack-
McKendrick model in the following integral form:

st = exp

(
−β
∫ t
0
ιudu

)
= exp (−Rort)

ιt = ρe
−γt −

∫ t
0
sue
−γ(t−u)du

rt = γ

∫ t
0
ιudu

where ρ = ι0, and R0 = β/γ. By interpreting the
strictly decreasing function st as an improper sur-
vival function, it follows [18] that the conditional
density of infection time is given by

fτ (t) = −ṡt/τ (10)
where τ = limt→∞rt < 1 is the final size of the
epidemic [18], which is the unique solution of

1 − τ = e−R0(τ+ρ). (11)
Thus, for a collection of n individuals initially
at risk, out of which k are infected at respective
times t1 < ... < tk < T (where T < ∞ is the
maximum follow-up time), we have the following
log-likelihood function for infection times

lI (t1, . . . tk|θ,n) = (n−k) log sT +
k∑
i=1

log fτ (ti) .

Note that this likelihood is conditional on the num-
ber of individuals at risk, which is often unknown.
However, given the number of infections (k) by
time T and under the assumption of independence
of infection times (see [18] for discussion), we
may consider n as the realization of the nega-
tive binomial distribution: n ∼ NegBinom(k,τ).
Denote by wi the ith individual’s removal time,
defined as minimum of the times of individual’s
recovery, hospitalization, or death. Assuming r
recoveries given k infections, the log-likelihood is

lR (w1, . . . ,wr|θ,k)

= (k −r) log Hγ(T) +
r∑
i=1

log hγ(wi),

Biomath 8 (2019), 1910037, http://dx.doi.org/10.11145/j.biomath.2019.10.037 Page 6 of 12

http://dx.doi.org/10.11145/j.biomath.2019.10.037


B. Choi et al., Modeling outbreak data: Analysis of a 2012 Ebola virus disease epidemic in DRC

where Hγ(·) is the survival function and hγ(·) is
the PDF of an exponential distribution with rate
γ. The complete log-likelihood is then the sum of
lI and lR [18]:

l0 (t1, . . . , tk,w1, . . . ,wr|θ,n)
= ll (t1, . . . , tk|θ,n) + lR (w1, . . . ,wr|θ,k) .

Under this survival model, we may estimate the
model parameters and the size of the population
at risk n using another Bayesian MCMC. The
model parameters are now θ′ = (β,γ,ρ), because
τ is fully determined by θ′ via (11). The prior
distributions for β and γ are of the form (8) and the
prior distribution for ρ is taken as Uniform(0, 1).
All parameters are assumed independent a priori.
The MCMC algorithm used to obtain the posterior
sample of the parameters is similar to that in
previous section.

Algorithm 2 MCMC posterior sampler for the
survival model

1: Initialize θ′ = (β,γ,ρ) from the prior distri-
bution and set n = k.

2: Perform a Metropolis-Hastings step (using the
truncated normal proposal) for the target con-
ditional distribution of θ′ given n using the
complete log-likelihood `0 = `I + `R.

3: Calculate τ based on the current value of θ′

as the solution to final size equation (11).
4: Sample the conditional distribution of n given
θ′ by drawing n ∼ NegBinom(k,τ).

5: Return to Step 2 and repeat until convergence.

III. DATA ANALYSIS

The analysis was conducted separately for the
two models (branching process and survival) based
on the 48 available cases of EVD described in Sec-
tion 1. Parameter estimates were obtained using
the MCMC algorithms for each model described
in Section 2. For the branching process model, we
separately analyzed the community and ETC out-
breaks. For the survival model, we only analyzed
the community outbreak.

A. Branching process model

To perform the separate analyses of the commu-
nity and ETC outbreaks, the contact and infection
data were partitioned into two subsets depending
on the location of the infective contact (community
or ETC). Although the same uninfected individuals
were allowed to be in both outbreak networks, all
the EVD cases were assigned either to the com-
munity or to the ETC. When reconstructing the
transmission network, all ambiguous contact trac-
ing was resolved uniformly at random as shown
in Figure 1(b). For some individuals, the complete
infection period was unknown and needed to be
imputed. All such imputations were based on the
density (2) and performed between steps 3 and 4
in MCMC algorithm.

For estimating β and γ, we assigned non-
informative gamma prior distributions with lo-
cation and rate parameters of 0.001. However,
due to limited contact information, we assigned
a relatively informative prior to λ with location
parameter of 6 and a rate parameter of 1. This
assignment was based on the empirical mean of
the primary and secondary cases in the dataset.
Finally, for the Metropolis-Hastings steps in the
MCMC sampler algorithm we used the truncated
normal distribution as proposal distribution and
tuned its standard deviation to achieve an accep-
tance ratio of 44% as recommended in [54]. The
final results of the MCMC were based on 55,000
iterations of the sampler with first 5,000 iterations
removed as “burn-in”. The posterior samples were
thinned by keeping only the results from every
10th iteration, resulting in a final set of 5,000
posterior samples that were used to estimate the
parameters and calculate approximate posterior
credible intervals. The convergence of the MCMC
algorithm was diagnosed based on the R statistic
and trace and autocorrelation (ACF) plots. To
conserve space, these plots are not shown here.

Table I summarizes the results of branching
process analyses of community and ETC epi-
demics. For each parameter, the posterior mean,
standard deviation, and 95% credible interval (CI)
are provided. We note that the comparisons be-

Biomath 8 (2019), 1910037, http://dx.doi.org/10.11145/j.biomath.2019.10.037 Page 7 of 12

http://dx.doi.org/10.11145/j.biomath.2019.10.037


B. Choi et al., Modeling outbreak data: Analysis of a 2012 Ebola virus disease epidemic in DRC

TABLE I: Results under the EVD branching process model for the community and ETC outbreaks

Community infections ETC infections

Mean Std Dev 95% CI Mean Std Dev 95% CI
β 0.0741 0.0389 (0.0331, 0.1806) 0.0387 0.0182 (0.0152, 0.0851)
γ 0.1936 0.0397 (0.1254, 0.2811) 0.2205 0.0634 (0.1170, 0.3605)
λ 5.4460 1.4460 (2.9690, 8.6310) 5.9030 1.4200 (3.4200, 8.9820)
R0 1.3730 0.2951 (0.8510, 2.0230) 0.8592 0.3200 (0.3700, 1.6040)

TABLE II: Results under the EVD survival model for the community outbreak only

Mean Std Dev 95% CI
β 0.1964 0.0324 (0.1403, 0.2626)
γ 0.1774 0.0296 (0.1258, 0.2381)
ρ 0.0039 0.0017 (0.0017, 0.0079)
n 163.20 35.35 (113.00, 252.00)
R0 1.1080 0.0316 (1.0570,1.1650)

tween parameters in the two epidemics may be
conducted informally by comparing their respec-
tive CIs bounds. If a particular parameter’s CI
bounds for the community are contained within
the respective CI bounds for the ETC, or vice-
versa, one would consider the corresponding pos-
terior distributions as statistically (i.e., for given
data) equal. For β, which represents the transmis-
sion rate of Ebola virus, the posterior mean for
household infection was approximately 0.0741—
about twice as large as ETC infection rate—so
the two posterior distributions may be considered
statistically different. This is in contrast with the
parameter γ, which represents the reciprocal of the
mean infectious period, for which the estimated
respective posterior means of 0.1936 and 0.2205
for the community and ETC were not found to
be statistically different based on their respective
95% credible bounds. The parameter λ represents
the average degree of the degree distribution and
its posterior mean in the ETC is slightly (but not
significantly) larger that its posterior mean in the
community. This may reflect additional contacts of
the individuals at the ETC with patients, visitors,
and ETC staff. Finally, the posterior means of the
basic reproduction numbers R0 for the community
and ETC outbreaks were found to be significantly
different at 1.373 and 0.8592, respectively. As

expected, the posterior mean of R0 for the com-
munity outbreak is higher than one for the ETC.
In both settings, R0 was calculated according
to equation (7). However, we also note that the
95% CI bounds for both posterior distributions are
quite wide, indicating a lack of precision in the
branching process model.

B. Survival model

Under the survival model outlined in Sec-
tion II-B, the analysis simplifies in that we are
no longer concerned with estimating the network
average degree λ. Hence our model parameters are
θ′ = (β,γ,ρ) as well as the size of the population
at risk (or effective population size) denoted by
n. In this model, R0 = β/γ. The results of the
analysis for the 37 community cases are presented
in Table II. We note that the estimates of the
parameters (β,γ,R0) under the survival model
can be compared with the estimates of (βλ,γ,R0)
under branching process model. Despite consider-
able differences in their respective posterior mean
values, the posterior distributions of all three pa-
rameters are statistically equivalent based on the
respective 95% CI bounds. This underscores the
lack of precision of the estimates based on the
branching process model.

Biomath 8 (2019), 1910037, http://dx.doi.org/10.11145/j.biomath.2019.10.037 Page 8 of 12

http://dx.doi.org/10.11145/j.biomath.2019.10.037


B. Choi et al., Modeling outbreak data: Analysis of a 2012 Ebola virus disease epidemic in DRC

C. Model validation

To perform our model validation and fit as-
sessment, we compared the distributions of 5,000
samples from the posterior distributions of the final
outbreak size obtained from our branching process
analysis to the values of 37 and 11 observed in
the community and ETC outbreaks respectively.
Figure 2 presents the histograms of the posterior
size distributions for the respective final sizes.
The vertical lines are plotted for reference at
the observed outbreak values of 37 and 11. The
histogram plots show that the observed values
are close to the modes of the respective posterior
distributions for both models, indicating adequate
model fit [50]. The last panel of Figure 2 presents
a comparison of observed depletion of susceptibles
over the course of the community EVD outbreak
with that predicted by the survival model. Both the
observed and predicted curves are initiated on day
one at the estimated mean of total population at
risk. The two curves are very close to each other,
indicating good fit of the survival model.

IV. SUMMARY AND CONCLUSIONS

We presented two statistical models for analyz-
ing patterns of EVD transmission in a small com-
munity. The branching process model was based
on partial contact network tracing data, whereas
survival model used an aggregate (network-free)
approach. The two models allowed for a more
detailed analysis of the 2012 Isiro EVD epidemic
data that was summarized in [49].

The branching process model was based on a
configuration model random graph with a Poisson
degree distribution, and it explicitly described the
direct and indirect contacts of the EVD cases
in the community and at the ETC. In particu-
lar, the model made it possible to derive and
directly compare the characteristics of the EVD
Isiro outbreaks at these two different locations.
Although the comparison provided some evidence
of the usefulness of the ETC in controlling EVD
outbreaks, the analysis also suggests that the type
of basic contact tracing performed in Isiro may not
be sufficient to provide precise estimates of the

epidemic force via the basic reproduction number
(R0).

The survival model was derived from the stan-
dard SIR equations. Since this model did not re-
quire contact tracing or estimation of mean degree,
it did not suffer from the same problem of low pre-
cision as the branching process model. In fact, the
estimate of R0 provided by the survival model was
more precise and likely also more accurate (based
on a resampling analysis not shown here) than the
values obtained from the branching process model.
However, the drawback of the survival model was
that it could not be used for comparison of trans-
mission in the community and the ETC because
the ETC was unlikely to satisfy the homogeneous
mixing assumption.

It appears that an effective approach to modeling
the type of outbreak dynamics described by the
Isiro data might be to combine the two models
presented here, so as to retain the precision of the
survival one but also incorporate the transmission
network information. This will be likely the focus
of our future research.

ACKNOWLEDGMENT

This research was partially funded by the US
National Science Foundation and the US National
Institutes of Health under grants DMS1853587,
U54 GM111274 and R01 AI116770. The funds
were also received from the Mershon Center for
International Security Studies at OSU, the Korea
University and the Ohio Five-OSU Summer Un-
dergraduate Research Experience (SURE) Fund.
The authors would like to especially thank the
Mathematical Biosciences Institute (MBI) at OSU
for providing additional resources and logistical
help. MBI receives funds from the National Sci-
ence Foundation under grants DMS1440386 and
DMS1757423.

REFERENCES

[1] H. Andersson and T. Britton, Stochastic epidemic mod-
els and their statistical analysis. Springer Science &
Business Media, 2012, vol. 151.

[2] D. Brockmann and D. Helbing, “The hidden geometry
of complex, network-driven contagion phenomena,” Sci-
ence, vol. 342, no. 6164, pp. 1337–1342, 2013.

Biomath 8 (2019), 1910037, http://dx.doi.org/10.11145/j.biomath.2019.10.037 Page 9 of 12

http://dx.doi.org/10.11145/j.biomath.2019.10.037


B. Choi et al., Modeling outbreak data: Analysis of a 2012 Ebola virus disease epidemic in DRC

Community (Branching)

 
D
e

n
si

ty
/U

n
it

20 40 60 80

0
.0

0
0

.0
1

0
.0

2
0

.0
3

0
.0

4

(a)

ETC (Branching)

Outbreak Size

0 10 20 30 40

0
.0

0
0

.0
2

0
.0

4
0

.0
6

(b)

Community (Survival)

Outbreak Size

25 30 35 40 45 50 55

0
.0

0
0

.0
2

0
.0

4
0

.0
6

0
.0

8

(c)

Susceptibles Depletion (Survival)

0 20 40 60 80 100

1
0

0
1

2
0

1
4

0
1

6
0

1
8

0

Time

N
u

m
b

e
r 

o
f 
S

u
sc

e
p

tib
le

s

(d)

Fig. 2: Model validation. Top panels: branching process model predicted final size distributions and
observed values (marked with vertical lines) for the (a) community and (b) ETC outbreaks. Bottom
panels: (c) survival model predicted final size distribution and the observed value and (d) survival
model predicted depletion of the susceptible population and the observed depletion. The curves in (d)
are drawn conditionally on the estimated mean initial population size n = 163 with the lower and
upper dotted lines representing model’s 95% CI bounds.

Biomath 8 (2019), 1910037, http://dx.doi.org/10.11145/j.biomath.2019.10.037 Page 10 of 12

http://dx.doi.org/10.11145/j.biomath.2019.10.037


B. Choi et al., Modeling outbreak data: Analysis of a 2012 Ebola virus disease epidemic in DRC

[3] P. S. Dodds and D. J. Watts, “Universal behavior in
a generalized model of contagion,” Physical Review
Letters, vol. 92, no. 21, pp. 218 701(1) – 218 201(4),
2004.

[4] T. House and M. J. Keeling, “Insights from unify-
ing modern approximations to infections on networks,”
Journal of The Royal Society Interface, vol. 8, no. 54,
pp. 67–73, 2011.

[5] L. A. Meyers, B. Pourbohloul, M. E. Newman, D. M.
Skowronski, and R. C. Brunham, “Network theory and
SARS: predicting outbreak diversity,” Journal of Theo-
retical Biology, vol. 232, pp. 71–81, 2005.

[6] M. E. Newman, “Spread of epidemic disease on net-
works,” Physical Review E, 2002.

[7] M. Altmann, “The deterministic limit of infectious
disease models with dynamic partners,” Mathematical
Biosciences, vol. 150, no. 2, pp. 153–175, 1998.

[8] K. Y. Leung, M. Kretzschmar, and O. Diekmann, “Dy-
namic concurrent partnership networks incorporating
demography,” Theoretical Population Biology, vol. 82,
pp. 229–239, 2012.

[9] ——, “SI infection on a dynamic partnership network:
characterization of R0,” Journal of Mathematical Biol-
ogy, vol. 71, pp. 1–56, 2015.

[10] M. Barthélemy, A. Barrat, R. Pastor-Satorras, and
A. Vespignani, “Velocity and hierarchical spread of
epidemic outbreaks in scale-free networks,” Physical
Review Letters, vol. 92, no. 178701, 2004.

[11] ——, “Dynamical patterns of epidemic outbreaks in
complex heterogeneous networks,” Journal of Theoreti-
cal Biology, vol. 235, pp. 275–288, 2005.

[12] R. Pastor-Satorras and A. Vespignani, “Epidemic
spreading in scale-free networks,” Physical Review Let-
ters, vol. 86, no. 14, pp. 3200–3203, 2001.

[13] J. Lindquist, J. Ma, P. van den Driessche, and F. H.
Willeboordse, “Effective degree network disease mod-
els,” Journal of Mathematical Biology, vol. 62, no. 2,
pp. 143–164, 2011.

[14] J. Ma, P. van den Driessche, and F. H. Willeboordse,
“Effective degree household network disease model,”
Journal of Mathematical Biology, vol. 66, pp. 75–94,
2013.

[15] E. Volz, “SIR dynamics in random networks with
heterogeneous connectivity,” Journal of Mathematical
Biology, vol. 56, no. 3, pp. 293–310, 2008.

[16] J. C. Miller and E. M. Volz, “Incorporating disease
and population structure into models of SIR disease in
contact networks,” PLoS One, vol. 8, no. 8, p. e69162,
2013.

[17] I. Z. Kiss, J. C. Miller, and P. L. Simon, Mathematics
of epidemics on networks: from exact to approximate
models. Springer, 2017, vol. 46.

[18] W. R. KhudaBukhsh, B. Choi, E. Kenah, and
G. A. Rempala, “Survival dynamical systems for the
population-level analysis of epidemics,” arXiv preprint
arXiv:1901.00405, 2019.

[19] K. J. Olival and D. T. Hayman, “Filoviruses in bats:

current knowledge and future directions,” Viruses, vol. 6,
no. 4, pp. 1759–1788, 2014.

[20] K. M. Johnson, J. V. Lange, P. A. Webb, and F. A.
Murphy, “Isolation and partial characterisation of a
new virus causing acute haemorrhagic fever in Zaire,”
Lancet, vol. 309, no. 8011, pp. 569–571, 1977.

[21] C. for Disease Control, P. (CDC) et al., “Outbreaks
chronology: Ebola virus disease,” CDC, http://www. cdc.
gov/vhf/ebola/outbreaks/history/chronology. html, 2014.

[22] G. D. Maganga, J. Kapetshi, N. Berthet, B. Ke-
bela Ilunga, F. Kabange, P. Mbala Kingebeni, V. Mon-
donge, J.-J. T. Muyembe, E. Bertherat, S. Briand et al.,
“Ebola virus disease in the Democratic Republic of
Congo,” New England Journal of Medicine, vol. 371,
no. 22, pp. 2083–2091, 2014.

[23] C. E. M. Coltart, A. M. Johnson, and C. J. M. Whitty,
“Role of healthcare workers in early epidemic spread of
Ebola: policy implications of prophylactic compared to
reactive vaccination polict in outbreak prevention and
control,” BMC Medicine, vol. 13, p. 271, 2015.

[24] H. Feldmann and T. W. Geisbert, “Ebola haemorrhagic
fever,” Lancet, vol. 377, no. 9768, pp. 849–862, 2011.

[25] P. Francesconi, Z. Yoti, S. Declich, P. A. Onek, M. Fabi-
ani, J. Olango, R. Andraghetti, P. E. Rollin, C. Opira,
D. Greco, and S. Salmaso, “Ebola hemorrhagic fever
transmission and risk factors of contacts, Uganda,”
Emerging Infectious Diseases, vol. 9, no. 11, pp. 1430–
1437, 2003.

[26] S. V. Scarpino, A. Iamarino, C. Wells, D. Yamin,
M. Ndeffo-Mbah, N. S. Wenzel, S. J. Fox, T. Nyenswah,
F. L. Altice, A. P. Galvani et al., “Epidemiological and
viral genomic sequence analysis of the 2014 Ebola out-
break reveals clustered transmission,” Clinical Infectious
Diseases, p. ciu1131, 2014.

[27] S. F. Dowell, R. Mukunu, T. G. Ksiazek, A. S. Khan,
P. E. Rollin, and C. Peters, “Transmission of Ebola
hemorrhagic fever: a study of risk factors in family
members, Kikwit, Democratic Republic of the Congo,
1995,” Journal of Infectious Diseases, vol. 179, no.
Supplement 1, pp. S87–S91, 1999.

[28] A. Matanock, M. A. Arwady, P. Ayscue, J. D. Forrester,
B. Gaddis, J. C. Hunter, B. Monroe, S. K. Pillai,
C. Reed, I. J. Schafer, M. Massaquoi, B. Dahn, and
K. M. De Cock, “Ebola Virus Disease cases among
health care workers not working in Ebola treatment units
- Liberia, June-August, 2014,” Morbidity and Mortality
Weekly Report, vol. 63, no. 46, pp. 1077–1081, 2014.

[29] D. Yamin, S. Gertler, M. L. Ndeffo-Mbah, L. A. Skrip,
M. Fallah, T. G. Nyenswah, F. L. Altice, and A. P.
Galvani, “Effect of Ebola progression on transmission
and control in Liberia,” Annals of Internal Medicine,
vol. 162, no. 1, pp. 11–17, 2015.

[30] J. Mossong, N. Hens, M. Jit, P. Beutels, K. Auranen,
R. Mikolajczyk, M. Massari, S. Salmaso, G. S. Tomba,
J. Wallinga et al., “Social contacts and mixing patterns
relevant to the spread of infectious diseases,” PLoS Med,
vol. 5, no. 3, p. e74, 2008.

Biomath 8 (2019), 1910037, http://dx.doi.org/10.11145/j.biomath.2019.10.037 Page 11 of 12

http://dx.doi.org/10.11145/j.biomath.2019.10.037


B. Choi et al., Modeling outbreak data: Analysis of a 2012 Ebola virus disease epidemic in DRC

[31] A. S. Khan, F. K. Tshioko, D. L. Heymann, B. Le
Guenno, P. Nabeth, B. Kerstiens, Y. Fleerackers, P. H.
Kilmarx, G. R. Rodier, O. Nkuku, P. E. Rollin,
A. Sanchez, R. Swanepoel, O. Tomori, S. T. Nichol,
C. J. Peters, J.-J. Muyembe-Tamfum, and T. G. Ksiazek,
“The reemergence of Ebola hemorrhagic fever, Demo-
cratic Republic of Congo, 1995,” Journal of Infectious
Diseases, vol. 179, no. S1, pp. S76–S86, 1999.

[32] D. S. Chertow, C. Kleine, J. K. Edwards, R. Scaini,
R. Giuliani, and A. Sprecher, “Ebola virus disease in
West Africa—clinical manifestations and management,”
New England Journal of Medicine, vol. 371, no. 22, pp.
2054–2057, 2014.

[33] A. M. Henao-Restrepo, I. M. Longini, M. Egger, N. E.
Dean, W. J. Edmunds, A. Camacho, M. W. Carroll,
M. Doumbia, B. Draguez, S. Duraffour et al., “Efficacy
and effectiveness of an rVSV-vectored vaccine express-
ing Ebola surface glycoprotein: interim results from the
Guinea ring vaccination cluster-randomised trial,” The
Lancet, vol. 386, no. 9996, pp. 857–866, 2015.

[34] G. Chowell, N. W. Hengartner, C. Castillo-Chavez, P. W.
Fenimore, and J. Hyman, “The basic reproductive num-
ber of Ebola and the effects of public health measures:
the cases of Congo and Uganda,” Journal of Theoretical
Biology, vol. 229, no. 1, pp. 119–126, 2004.

[35] P. E. Lekone and B. F. Finkenstädt, “Statistical infer-
ence in a stochastic epidemic SEIR model with control
intervention: Ebola as a case study,” Biometrics, vol. 62,
no. 4, pp. 1170–1177, 2006.

[36] J. Legrand, R. Grais, P. Boelle, A. Valleron, and
A. Flahault, “Understanding the dynamics of Ebola epi-
demics,” Epidemiology and Infection, vol. 135, no. 04,
pp. 610–621, 2007.

[37] J. M. Drake, I. Bakach, M. R. Just, S. M. O’Regan,
M. Gambhir, and I. C.-H. Fung, “Transmission mod-
els of historical Ebola outbreaks,” Emerging Infectious
Diseases, vol. 21, no. 8, p. 1447, 2015.

[38] J. S. Weitz and J. Dushoff, “Modeling post-death trans-
mission of Ebola: Challenges for inference and oppor-
tunities for control,” Scientific Reports, vol. 5, 2015.

[39] C. M. Rivers, E. T. Lofgren, M. Marathe, S. Eubank,
and B. L. Lewis, “Modeling the impact of interventions
on an epidemic of Ebola in Sierra Leone and Liberia,”
PLoS Currents, vol. 6, 2014.

[40] B. Tsanou, G. M. Moremedi, R. Kaondera-Shava, J. M.
Lubuma, and N. Morris, “A simple mathematical model
for Ebola in Africa,” Biomath Communications, vol. 2,
no. 1, 2015.

[41] L. Valdez, H. H. A. Rêgo, H. Stanley, and L. Braun-
stein, “Predicting the extinction of Ebola spreading in
Liberia due to mitigation strategies,” arXiv preprint
arXiv:1502.01326, 2015.

[42] A. Pandey, K. E. Atkins, J. Medlock, N. Wenzel, J. P.
Townsend, J. E. Childs, T. G. Nyenswah, M. L. Ndeffo-
Mbah, and A. P. Galvani, “Strategies for containing

Ebola in West Africa,” Science, vol. 346, no. 6212, pp.
991–995, 2014.

[43] M. F. Gomes, A. P. y Piontti, L. Rossi, D. Chao,
I. Longini, M. E. Halloran, and A. Vespignani, “As-
sessing the international spreading risk associated with
the 2014 West African Ebola outbreak,” PLoS Currents,
vol. 6, 2014.

[44] S. Merler, M. Ajelli, L. Fumanelli, M. F. Gomes, A. P.
y Piontti, L. Rossi, D. L. Chao, I. M. Longini, M. E.
Halloran, and A. Vespignani, “Spatiotemporal spread of
the 2014 outbreak of Ebola virus disease in Liberia and
the effectiveness of non-pharmaceutical interventions: a
computational modelling analysis,” The Lancet Infec-
tious Diseases, vol. 15, no. 2, pp. 204–211, 2015.

[45] J. M. Drake, R. Kaul, L. W. Alexander, S. M. O’Regan,
A. M. Kramer, J. T. Pulliam, M. J. Ferrari, and
A. W. Park, “Ebola cases and health system demand in
Liberia,” PLoS Biol, vol. 13, no. 1, p. e1002056, 2015.

[46] C. Wells, D. Yamin, M. L. Ndeffo-Mbah, N. Wenzel,
S. G. Gaffney, J. P. Townsend, L. A. Meyers, M. Fallah,
T. G. Nyenswah, F. L. Altice, K. E. Atkins, and A. P.
Galvani, “Harnessing case isolation and ring vaccination
to control Ebola,” PLoS Neglected Tropical Diseases,
2015.

[47] G. Webb, C. Browne, X. Huo, O. Seydi, M. Seydi,
and P. Magal, “A model of the 2014 Ebola epidemic
in West Africa with contact tracing,” PLoS Currents,
vol. 7, 2014.

[48] D. Fisman, E. Khoo, and A. Tuite, “Early epidemic
dynamics of the West African 2014 Ebola outbreak:
estimates derived with a simple two-parameter model,”
PLoS Currents, vol. 6, 2014.

[49] T. Kratz, P. Roddy, A. T. Oloma, B. Jeffs, D. P. Ciruelo,
O. de la Rosa, and M. Borchert, “Ebola Virus Disease
outbreak in Isiro, Democratic Republic of the Congo,
2012: signs and symptoms, management and outcomes,”
PLoS ONE, vol. 10, no. 6, p. e0129333, 2015.

[50] M. G. Burch, K. A. Jacobsen, J. H. Tien, and G. A.
Rempala, “Network-based analysis of a small ebola
outbreak,” Math Biosci Eng., vol. 1, no. 14, pp. 67–77,
2017.

[51] M. Newman, Networks: an introduction. Oxford Uni-
versity Press, 2010.

[52] J. C. Miller, A. C. Slim, and E. M. Volz, “Edge-
based compartmental modelling for infectious disease
spread,” Journal of The Royal Society Interface,
vol. 9, no. 70, pp. 890–906, 2012. [Online]. Available:
http://rsif.royalsocietypublishing.org/content/9/70/890

[53] S. Chib and E. Greenberg, “Understanding the
metroplis-hastings algorithm,” The American Statisti-
cian, vol. 49, no. 4, pp. 327–335, 1995.

[54] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson,
A. Vehtari, and D. B. Rubin, Bayesian data analysis.
CRC press Boca Raton, FL, 2014, vol. 2.

Biomath 8 (2019), 1910037, http://dx.doi.org/10.11145/j.biomath.2019.10.037 Page 12 of 12

http://rsif.royalsocietypublishing.org/content/9/70/890
http://dx.doi.org/10.11145/j.biomath.2019.10.037

	Introduction
	Ebola virus disease
	Isiro EVD outbreak data

	Statistical Models
	Branching model
	Primary cases
	Secondary cases
	Likelihood and estimation

	Survival model 

	Data Analysis
	Branching process model
	Survival model
	Model validation

	Summary and Conclusions
	References