Layout 1


ISDS Annual Conference Proceedings 2012. This is an Open Access article distributed under the terms of the Creative Commons Attribution-
Noncommercial 3.0 Unported License (http://creativecommons.org/licenses/by-nc/3.0/), permitting all non-commercial use, distribution, and
reproduction in any medium, provided the original work is properly cited.

ISDS 2012 Conference Abstracts

Determinants of Outbreak Detection Performance
Nastaran Jafarpour*1, Doina Precup2 and David Buckeridge2

1Department of Computer Engineering, Ecole Polytechnique de Montreal, Montreal, QC, Canada; 2McGill University, Montreal, QC,
Canada

Objective
To predict the performance of outbreak detection algorithms under

different circumstances which will guide the method selection and
algorithm configuration in surveillance systems, to characterize the
dependence of the performance of detection algorithms on the type
and severity of outbreak, to develop quantitative evidence about de-
terminants of detection performance.

Introduction
The choice of outbreak detection algorithm and its configuration

can result in important variations in the performance of public health
surveillance systems. Our work aims to characterize the performance
of detectors based on outbreak types. We are using Bayesian networks
(BN) to model the relationships between determinants of outbreak
detection and the detection performance based on a significant study
on simulated data.

Methods
The simulated surveillance data that we used was generated by

Surveillance Lab of McGill University using Simulation Analysis
Platform [1] considering surveillance in an urban area to detect wa-
terborne outbreaks due to the failure of a water treatment plant. We
focus on predicting the performance of the C-family of algorithms,
because they are widely used, state-of-art outbreak detection algo-
rithms [2]. We investigate the influence of algorithm characteristics
and outbreak characteristics in determining outbreak detection per-
formance. The C1, C2, and C3 are distinguished by the configura-
tion of 2 parameters,the guardband and memory. Generally, gradually
increasing outbreaks can bias the test statistic upward, so the detec-
tion algorithm will fail to flag the outbreak. To avoid this situation,
the C2 and C3 use a 2-day gap, guardband, between the baseline in-
terval and the test interval. The C3 includes 2 recent observations,
called memory, in the computation of the test statistic. The W2 algo-
rithm is a modified version of the C2 which takes weekly patterns of
surveillance time series into account [3]. In the W2, the baseline data
is stratified to 2 distinct baselines: one for weekdays, the other for
weekends. The W3 includes 2 recent observations of each baseline
while calculating the test statistic in the corresponding baseline.

We ran the C1, C2, C3, W2, and W3 on 18k simulated time series
and measured the sensitivity and specificity of detection. Then we
created the training data set of 5400000 instances. Each instance was
the result of performance evaluation of an outbreak detection algo-
rithm with a specific setting of parameters. In order to investigate the
determinants of detection performance and reveal their effects quan-
titatively, we used BN to predict the performance based on algorithm
characteristics and outbreak characteristics.

Results
We developed 2 BN models in the Weka machine learning soft-

ware [4] using 5-fold cross-validation. The first BN determines the ef-
fect of the guardband, memory, alerting threshold, and the weekly
pattern indicator (0 for C-algorithms, 1 for W-algorithms) and out-
break characteristics (contamination level and duration) on the sen-
sitivity of detection. The value of sensitivity was mapped to 4 classes:
(0, 0.3], (0.3, 0.6], (0.6, 0.9], (0.9, 1]. The developed BN correctly
classified 67.74% of instances. The misclassification error was
0.9407. The second BN for predicting the specificity of detection cor-
rectly classified 95.895% of instances in 10 classes and the misclas-
sification error was 0.2975.

Conclusions
The contamination level and duration of outbreaks, alerting thresh-

old, memory, guardband, and whether the weekly pattern was con-
sidered or not influence the sensitivity and specificity of outbreak
detection and given the C-algorithm parameter settings, we can pre-
dict outbreak detection performance quantitatively. In future work,
we plan to investigate other predictors of performance and study how
these predictions can be used in algorithm and policy choices.

Keywords
outbreak detection; public health surveillance; machine learning;
bayesian networks; detection performance

References

1.Buckeridge, D.L., et al. Simulation Analysis Platform (SnAP): a Tool
for Evaluation of Public Health Surveillance and Disease Control
Strategies. 2011. American Medical Informatics Association.

2.Hutwagner, L., et al., The bioterrorism preparedness and response early
aberration reporting system (EARS). Journal of Urban Health: Bul-
letin of the New York Academy of Medicine, 2003. 80(Supplement 1):
p. i89-i96.

3.Tokars, J.I., et al., Enhancing time-series detection algorithms for auto-
mated biosurveillance. Emerging Infectious Diseases, 2009. 15(4): p.
533.

4.Hall, M., et al., The WEKA data mining software: an update. ACM
SIGKDD Explorations Newsletter, 2009. 11(1): p. 10-18.

*Nastaran Jafarpour
E-mail: nastaran.jafarpour@polymtl.ca

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 5(1):e90, 2013