Layout 1 ISDS Annual Conference Proceedings 2012. This is an Open Access article distributed under the terms of the Creative Commons Attribution- Noncommercial 3.0 Unported License (http://creativecommons.org/licenses/by-nc/3.0/), permitting all non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. ISDS 2012 Conference Abstracts Extraction of Disease Occurrence Patterns Using MiSTIC: Salmonellosis in Florida Vipul Raheja* and K. S. Rajan International Institute of Information Technology Hyderabad (IIIT-H), Hyderabad, India Objective This work leverages spatio-temporal data mining (ST-DM), the MiSTIC (Mining Spatio-Temporally Invariant Cores)[1,6] method for infectious disease surveillance, by identifying a) Extent of spatial spread of disease core regions across popula- tions-scale of disease prevalence b) Possible causes of the observed patterns-for better prediction, detection & management of infectious disease & its outbreaks Introduction Infectious diseases, though initially tend to be limited geographi- cally to a reservoir; a subsequent spatial variation in disease preva- lence (including spread & intensity) arises from the underlying differences in physical-biological conditions that support pathogen, its vectors & reservoirs. Different factors like spatial proximity, phys- ical & social connectivity, & local environmental conditions which add to its susceptibility influence the occurrence[2]. In Disease management, analysis of historical data over various aspects of geography, epidemiology, social structures & network dy- namics need to be accounted for. Large amounts of data raise issues of data processing, storage, pattern identification, etc. In addition, identifying the source of disease occurrence & its pattern can be of immense value. ST-DM of disease data can be an effective tool for endemic pre- paredness[3], as it extracts implicit knowledge, spatial & temporal relationships, or other patterns inherent in such databases. Here, Core Region is defined as a set of spatial entities(eg.coun- ties) aggregated over time, which occur frequently at places having high values in a defined region (considering areas of influence around them)[1]. Methods Here, MiSTIC algorithm detects spatio-temporally invariant cores with respect to disease occurrence. It involves both a spatial analysis step to detect focal points & a spatio-temporal analysis over the time period of study to identify core regions, which are then classified as – CHD, CLD & CND. They refer to Cores with High, Low and No (mostly random) dominating points respectively based on frequency of occurrences of disease. The predominantly occurring focal points capture the localized behavior of the disease whereas the neighborhood constraints capture the nature (dynamic or non-dynamic) of the event. Results County-level annual data of Salmonellosis incidence from Florida Department of Health [3] covering a period of 50 years (1961-2010) is used. Two types of cores were identified based on type of neighborhood - Contiguous (CC) & within a defined Radius (CR). Table1 shows the analysis of counties according to valid frequency criteria for both CC & CR (r=2) & their sub-classification. Salmonellosis etiology shows that it is caused by tainted food, hy- giene, local environment etc. which are largely sanitation-related [4]. Taking the level of urbanization [5] as a proxy for sanitation, it can be seen from Fig.1, 12 of 19 cores occur in rural counties. Conclusions It is observed that CC is better indicator of cores than CR, imply- ing that Salmonellosis manifests itself in a highly localized manner. Thus, use of MiSTIC is promising & provides a way for identifying disease “hot-spots”. It also provides valuable insight into the under- standing of disease prevalence in different regions based on their his- tory over space and time. Classification of Core Polygons Map showing overlay of Metropolitan Areas and Cores Keywords Disease Cores; Salmonellosis; Spatio-Temporal Data Mining; Pat- terns References 1. K Sravanthi, K S Rajan: Spatio-Temporal Mining of Core Regions - Study of rainfall patterns in Monsoonal India. 11th IEEE International Conference on Data Mining Workshops (ICDMW) 2011, pp.30-37 2. Chris Bailey-Kellogg et al.: Spatial Data Mining to Support Pandemic Preparedness. ACM SIGKDD Explorations Newsletter. 2006; 8(1):80-82 3. http://www.floridacharts.com [20/5/2012] 4. http://www.cdc.gov/healthypets/diseases/salmonellosis.htm [2/7/2012] 5. http://www.census.gov [25/5/2012] 6. K Sravanthi; MiSTIC; MS Thesis. IIIT Hyderabad *Vipul Raheja E-mail: vipul.raheja@research.iiit.ac.in Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 5(1):e19, 2013