The design and evaluation of a Bayesian system for detecting and characterizing outbreaks of Influenza Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 11(2):e6, 2019 OJPHI The design and evaluation of a Bayesian system for detecting and characterizing outbreaks of influenza Nicholas E. Millett1, John M. Aronis1,*, Michael M. Wagner1, Fuchiang Tsui1, Ye Ye1, Jeffrey P. Ferraro2, Peter J. Haug2, Per H. Gesteland2, Gregory F. Cooper1 1. Real-time Outbreak and Disease Surveillance (RODS) Laboratory, Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania 2. Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah Abstract The prediction and characterization of outbreaks of infectious diseases such as influenza remains an open and important problem. This paper describes a framework for detecting and characterizing outbreaks of influenza and the results of testing it on data from ten outbreaks collected from two locations over five years. We model outbreaks with compartment models and explicitly model non- influenza influenza-like illnesses. *Corresponding Author. Email: jma18@pitt.edu. Current Address: Real-time Outbreak and Disease Surveillance Laboratory, Department of Biomedical Informatics, 5607 Baum Boulevard, University of Pittsburgh, Pittsburgh, Pennsylvania 15206-3701 DOI: .10.5210/ojphi.v11i2.9952 Copyright ©2019 the author(s) This is an Open Access article. Authors own copyright of their articles appearing in the Online Journal of Public Health Informatics. Readers may copy articles without permission of the copyright owner(s), as long as the author and OJPHI are acknowledged in the copy and the copy is used for educational, not-for-profit purposes. 1 Introduction The prediction and characterization of outbreaks of infectious diseases remains an open and important problem [1]. Influenza, with nearly annual outbreaks in temperate regions of the world, provides an ideal test domain [2]. This paper describes a framework for detecting and characterizing outbreaks of influenza and the results of testing it on data from ten real outbreaks collected from two locations over five years. Like several other systems, we model outbreaks with compartment models [3-5]. We differ from this past work in that we use the full text of patient care reports, rather than just chief complaints [6], counts of syndromes from sentinel physicians [5], counts of internet queries [7], etc. Doing so provides a rich source of evidence that may provide an early signal of an outbreak. We use the evidence to reason about likelihoods, such as P(findings|influenza) or P(findings|RSV), rather than just simple counts. The approach is quite general, since the findings can include any The design and evaluation of a Bayesian system for detecting and characterizing outbreaks of Influenza Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 11(2):e6, 2019 OJPHI evidence about a patient's disease status, including history, symptoms, signs, labs, and other information. This paper extends our previous work [8] by using a more sophisticated model of non-influenza influenza-like illness (NIILI), modeling a probability distribution over influenza outbreak start dates, and testing on a set of real outbreak data collected over five years at two locations widely separated in the United States. 2 System Architecture We have developed an end-to-end framework for outbreak detection and [9]. It starts with patient care reports, extracts findings with natural language processing (NLP), assigns likelihoods to each patient case with a case-detection system (CDS), and constructs a model with an outbreak- detection system (ODS) that can be used for prediction and characterization. A patient's care report contains the most detailed and complete record of their present illness available. Much of the information in it (including chief complaint, history of present illness, a detailed patient assessment, treatment, and response to treatment) is in free-text. Other information, such as laboratory findings, is codified. In our system, such data, including symptoms and signs, are extracted using natural language processing software [10]. Some patient care reports include a laboratory test for influenza which can provide a definitive diagnosis of influenza. The findings (free-text derived and coded) for each patient are passed to CDS which derives the probability of those findings given each of influenza, NIILI, and other. NIILI implicitly includes several diseases, such as respiratory syncytial virus (RSV) and parainfluenza, and other includes everything else such as trauma, appendicitis, etc. CDS uses a Bayesian network that represents the joint probability distribution of each patient's findings (including laboratory results) and the three disease categories just mentioned [11]. As mentioned, it provides the likelihoods P(E(p,d)|influenza), P(E(p,d)|NIILI), and P(E(p,d)|other), where E(p,d) is the set of findings for patient p on day d. That is, the probability of the patient's findings given they have each of influenza, NIILI, or other. ODS takes all of the evidence from the first day of the monitored period through the present, evaluates thousands of possible outbreak models against the data, and makes projections about the future. Let M1,…,Mn be a representative set of models and E(1:c) be all of the data available through the current day c. ODS computes the expected number of influenza cases on each day d, with: Expected Number of Influenza Cases on Day d = ∑ Mi n i=1 (d)P(Mi|E(1: c)) (1) where P(Mi|E(1:c)) is the probability of model Mi given the data up to the current day, and Mi(d) is the number of influenza cases predicted by model Mi on day d. Typically, d>c since we want to predict the future, but we can assess the past with d1. We run ODS for day c and derive P(Mi|E(1:c)) for each model Mi using the NIILI priors P’1(NIILI),…,P’c(NIILI). Let Pc(outbreak|E(1:c)) be the The design and evaluation of a Bayesian system for detecting and characterizing outbreaks of Influenza Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 11(2):e6, 2019 OJPHI probability that an outbreak is occurring on day c given E(1:c), and Pc(outbreak|Mi) be the probability that an outbreak is occurring on day c given model Mi (as defined by Equation 8). We compute P’c+1(NIILI) with: 𝑃′𝑐+1(𝑁𝐼𝐼𝐿𝐼) = (1 − 𝑃𝑐 (𝑜𝑢𝑡𝑏𝑟𝑒𝑎𝑘|𝐸(1: 𝑐))𝑃𝑐+1(𝑁𝐼𝐼𝐿𝐼) + ∑ 𝑃𝑐 (𝑜𝑢𝑡𝑏𝑟𝑒𝑎𝑘|𝑀𝑖 )𝑃(𝑀𝑖 |𝐸(1: 𝑐))𝑃𝑠𝑡𝑎𝑟𝑡(𝑀𝑖)(𝑁𝐼𝐼𝐿𝐼)𝑖 (20) This equation says that we use the original NIILI prior on day c, Pc+1(NIILI), weighted by the probability of no outbreak on day c, plus the sum over all models of the original NIILI prior on the start day of a model, 𝑃𝑠𝑡𝑎𝑟𝑡(𝑀𝑖)(𝑁𝐼𝐼𝐿𝐼), weighted by the probability an outbreak is occurring on day c given model Mi, Pc(outbreak|Mi) and the posterior of model Mi given the evidence through day c, P(Mi|E(1:c)). The result of using Equation 20 is that when an influenza outbreak is occurring, the NIILI priors are weighted toward the values at the start of the most likely ongoing outbreak models, and do not have the behavior of increasing due to influence of the influenza outbreak. The design and evaluation of a Bayesian system for detecting and characterizing outbreaks of Influenza Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 11(2):e6, 2019 OJPHI Appendix B Counts of Positive Influenza Lab Tests Figure 1: Allegheny County 2010-2011 Figure 2: Salt Lake County 2010-201 The design and evaluation of a Bayesian system for detecting and characterizing outbreaks of Influenza Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 11(2):e6, 2019 OJPHI Figure 3: Allegheny County 2011-2012 Figure 4: Salt Lake County 2011-2012 The design and evaluation of a Bayesian system for detecting and characterizing outbreaks of Influenza Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 11(2):e6, 2019 OJPHI Figure 5: Allegheny County 2012-2013 Figure 6: Salt Lake County 2012-2013 The design and evaluation of a Bayesian system for detecting and characterizing outbreaks of Influenza Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 11(2):e6, 2019 OJPHI Figure 7: Allegheny County 2013-2014 Figure 8: Salt Lake County 2013-2014 The design and evaluation of a Bayesian system for detecting and characterizing outbreaks of Influenza Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 11(2):e6, 2019 OJPHI Figure 9: Allegheny County 2014-2015 Figure 10: Salt Lake County 2014-2015