Layout 1 ISDS Annual Conference Proceedings 2012. This is an Open Access article distributed under the terms of the Creative Commons Attribution- Noncommercial 3.0 Unported License (http://creativecommons.org/licenses/by-nc/3.0/), permitting all non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. ISDS 2012 Conference Abstracts Computerized Text Analysis to Enhance Automated Pneumonia Detection Sylvain DeLisle*1, 2, Tariq Siddiqui1, 2, Adi Gundlapalli3, 4, Matthew Samore3, 4 and Leonard D’Avolio5, 6 1VA Maryland Health Care System, Baltimore, MD, USA; 2Medicine, University of Maryland, Baltimore, MD, USA; 3VA Salt Lake City Health Care System, Salt Lake City, UT, USA; 4University of Utah, Salt Lake City, UT, USA; 5VA Boston Health Care System, Boston, MA, USA; 6Harvard Medical School, Boston, MA, USA Objective To improve the surveillance for pneumonia using the free-text of electronic medical records (EMR). Introduction Information about disease severity could help with both detection and situational awareness during outbreaks of acute respiratory in- fections (ARI). In this work, we use data from the EMR to identify patients with pneumonia, a key landmark of ARI severity. We asked if computerized analysis of the free-text of clinical notes or imaging reports could complement structured EMR data to uncover pneumo- nia cases. Methods A previously validated ARI case-detection algorithm (CDA) (sen- sitivity, 99%; PPV, 14%) [1] flagged VAMHCS outpatient visits with associated chest imaging (n = 2737). Manually categorized imaging reports (Non-Negative if they could support the diagnosis of pneu- monia, Negative otherwise; kappa = 0.88), served as a reference for the development of an automated report classifier through machine- learning [2]. EMR entries related to visits with Non-Negative chest imaging were manually reviewed to identify cases with Possible Pneumonia (new symptom(s) of cough, sputum, fever/chills/night sweats, dyspnea, pleuritic chest pain) or with Pneumonia-in-Plan (pneumonia listed as one of two most likely diagnoses in a physi- cian’s note). These cases were used as reference for the development of the EMR-based CDAs. CDA components included ICD-9 codes for the full spectrum of ARI [1] or for the pneumonia subset, text analysis aimed at non-negated ARI symptoms in the clinical note [1] and the above-mentioned imaging report text classifier. Results The manual review identified 370 reference cases with Possible Pneumonia and 250 with Pneumonia-in-Plan. Statistical performance for illustrative CDAs that combined structured EMR parameters with or without text analyses are shown in the Table. Addition of the “Text of Imaging Report” analyses increased PPV by 38-70% in absolute terms. Despite attendant losses in sensitivity, this classifier increased the F-Measure of all CDAs based on a broad ARI ICD-9 codeset. With the possible exception is CDA 6, whose F-measure was the highest achieved in this study, the text analysis seeking ARI symp- toms in the clinical note did not add further value to those CDAs that also included analyses of the chest imaging reports. Conclusions Automated text analysis of chest imaging reports can improve our ability to separate outpatients with pneumonia from those with a milder form of ARI. Keywords situational awareness; influenza; surveillance; electronic medical record; pneumonia References [1] DeLisle S, South B, Anthony JA, Kalp E, Gundlapalli A, et al. Com- bining Free Text and Structured Electronic Medical Record Entries to Detect Acute Respiratory Infections. PLoS ONE (2010) 5(10): e13377. [2] D’Avolio L, Nguyen T, Goryachev S, Fiore L. Automated Concept- Level Information Extraction to Reduce the Need for Custom Soft- ware and Rules Development. Journal of the American Medical Informatics Association 2011 18(5): 607. *Sylvain DeLisle E-mail: sdelisle@umaryland.edu Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 5(1):e74, 2013