Layout 1 ISDS Annual Conference Proceedings 2012. This is an Open Access article distributed under the terms of the Creative Commons Attribution- Noncommercial 3.0 Unported License (http://creativecommons.org/licenses/by-nc/3.0/), permitting all non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. ISDS 2012 Conference Abstracts Determinants of Outbreak Detection Performance Nastaran Jafarpour*1, Doina Precup2 and David Buckeridge2 1Department of Computer Engineering, Ecole Polytechnique de Montreal, Montreal, QC, Canada; 2McGill University, Montreal, QC, Canada Objective To predict the performance of outbreak detection algorithms under different circumstances which will guide the method selection and algorithm configuration in surveillance systems, to characterize the dependence of the performance of detection algorithms on the type and severity of outbreak, to develop quantitative evidence about de- terminants of detection performance. Introduction The choice of outbreak detection algorithm and its configuration can result in important variations in the performance of public health surveillance systems. Our work aims to characterize the performance of detectors based on outbreak types. We are using Bayesian networks (BN) to model the relationships between determinants of outbreak detection and the detection performance based on a significant study on simulated data. Methods The simulated surveillance data that we used was generated by Surveillance Lab of McGill University using Simulation Analysis Platform [1] considering surveillance in an urban area to detect wa- terborne outbreaks due to the failure of a water treatment plant. We focus on predicting the performance of the C-family of algorithms, because they are widely used, state-of-art outbreak detection algo- rithms [2]. We investigate the influence of algorithm characteristics and outbreak characteristics in determining outbreak detection per- formance. The C1, C2, and C3 are distinguished by the configura- tion of 2 parameters,the guardband and memory. Generally, gradually increasing outbreaks can bias the test statistic upward, so the detec- tion algorithm will fail to flag the outbreak. To avoid this situation, the C2 and C3 use a 2-day gap, guardband, between the baseline in- terval and the test interval. The C3 includes 2 recent observations, called memory, in the computation of the test statistic. The W2 algo- rithm is a modified version of the C2 which takes weekly patterns of surveillance time series into account [3]. In the W2, the baseline data is stratified to 2 distinct baselines: one for weekdays, the other for weekends. The W3 includes 2 recent observations of each baseline while calculating the test statistic in the corresponding baseline. We ran the C1, C2, C3, W2, and W3 on 18k simulated time series and measured the sensitivity and specificity of detection. Then we created the training data set of 5400000 instances. Each instance was the result of performance evaluation of an outbreak detection algo- rithm with a specific setting of parameters. In order to investigate the determinants of detection performance and reveal their effects quan- titatively, we used BN to predict the performance based on algorithm characteristics and outbreak characteristics. Results We developed 2 BN models in the Weka machine learning soft- ware [4] using 5-fold cross-validation. The first BN determines the ef- fect of the guardband, memory, alerting threshold, and the weekly pattern indicator (0 for C-algorithms, 1 for W-algorithms) and out- break characteristics (contamination level and duration) on the sen- sitivity of detection. The value of sensitivity was mapped to 4 classes: (0, 0.3], (0.3, 0.6], (0.6, 0.9], (0.9, 1]. The developed BN correctly classified 67.74% of instances. The misclassification error was 0.9407. The second BN for predicting the specificity of detection cor- rectly classified 95.895% of instances in 10 classes and the misclas- sification error was 0.2975. Conclusions The contamination level and duration of outbreaks, alerting thresh- old, memory, guardband, and whether the weekly pattern was con- sidered or not influence the sensitivity and specificity of outbreak detection and given the C-algorithm parameter settings, we can pre- dict outbreak detection performance quantitatively. In future work, we plan to investigate other predictors of performance and study how these predictions can be used in algorithm and policy choices. Keywords outbreak detection; public health surveillance; machine learning; bayesian networks; detection performance References 1.Buckeridge, D.L., et al. Simulation Analysis Platform (SnAP): a Tool for Evaluation of Public Health Surveillance and Disease Control Strategies. 2011. American Medical Informatics Association. 2.Hutwagner, L., et al., The bioterrorism preparedness and response early aberration reporting system (EARS). Journal of Urban Health: Bul- letin of the New York Academy of Medicine, 2003. 80(Supplement 1): p. i89-i96. 3.Tokars, J.I., et al., Enhancing time-series detection algorithms for auto- mated biosurveillance. Emerging Infectious Diseases, 2009. 15(4): p. 533. 4.Hall, M., et al., The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 2009. 11(1): p. 10-18. *Nastaran Jafarpour E-mail: nastaran.jafarpour@polymtl.ca Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 5(1):e90, 2013