2014.ISDS.Abstracts.Final.pdf ISDS Annual Conference Proceedings 2014. This is an Open Access article distributed under the terms of the Creative Commons Attribution- Noncommercial 3.0 Unported License (http://creativecommons.org/licenses/by-nc/3.0/), permitting all non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. ISDS 2014 Conference Abstracts Analytics, Machine Learning & NLP – use in BioSurveillance and Public Health practice Mujitha B. K B*1, Ajil Jalal2, Vishnuprasad V1 and Nishad K A1 1Informatics, LongRiver Infotech, Bangalore, India; 2IIT Madras, Chennai, India Objective To summarize ways in which Analytics, Machine Learning (ML) and Natural Language Processing (NLP) can improve accuracy and efficiency in bio surveillance and public health practices. We also discuss the use of this framework in typical surveillance applications (Integration with Devices/Sensors, Web/Mobile, Clinical Records, Internet queries, Social/News media). Introduction Currently, there is an abundance of data coming from most of the surveillance environments and applications. Identification and filtering of responsive messages from this big data ocean and then processing these informative datasets to gain knowledge are the two real challenges in today’s applications. Use of Analytics has revolutionized many areas. At LongRiver Infotech, we have used various Machine Learning techniques (Regression, Classification, Text Analytics, Decision Trees, Clustering etc.) in different types of applications. These methodologies are abstracted in a generic platform, which can be put to use in many public health and surveillance applications, which are enumerated here. Methods In this generic ML platform, we brought together modules covering each of the ML and NLP areas. This platform was then evaluated in a simulated environment – interfacing with a Web/mobile surveillance data capture application, medical devices/sensors over RFID, and social feeds. Surveillance data in both streaming and batch modes have been used for this test environment. ‘R’ was used for ML algorithms and Infrastructure tools like NoSQL database (Apache Spark), Map Reduce (Spark/Hadoop) and Visual tools (R/Tableau) were integrated in this pilot study. Results Each of the independent modules included in this environment had been evaluated in separate projects (Precision, Recall rates, R-squared values, AUC etc. for respective algorithms). Scaling capabilities (input data, ML processing) of the platform was evaluated in an Apache spark cluster. Conclusions This framework can be plugged into any surveillance application, which has the required IT infrastructure in place – for efficient and scalable distributed processing and big data handling. From our evaluation so far, there is an increased interest from various stakeholders in using these Machine Learning algorithms and NLP technology on Surveillance data. Further enhancements in NLP include: 1) Speech recognition, which enables users to tell their problems (which can first be converted to text and then NLP can act upon it) 2) Support for multiple languages (which enables public to tell in their own local language) 3) Question-Answering (which enables machine processing of user stories and responding with the findings/solutions) A primary motivation for presentation at this conference is to solicit feedback from public health practitioners on this idea and its potential / challenges for use in existing surveillance systems. Analytics use in Bio Surveillance Analytics use in Public Health practice Keywords responsive messages; clustering; classification; machine learning algorithm; text analytics *Mujitha B. K B E-mail: mujitha@longriverinfotech.com Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * (1):e194, 201