Layout 1

ISDS Annual Conference Proceedings 2012. This is an Open Access article distributed under the terms of the Creative Commons Attribution-
Noncommercial 3.0 Unported License (http://creativecommons.org/licenses/by-nc/3.0/), permitting all non-commercial use, distribution, and
reproduction in any medium, provided the original work is properly cited.

ISDS 2012 Conference Abstracts

Collaborative Automation Reliably Remediating
Erroneous Conclusion Threats (CARRECT)
Jonathan C. Lansey*1, Paul Picciano1, Ian Yohai1, Fred Grant2 and Robert Gern2

1Aptima inc., Woburn, MA, USA; 2Northrop Grumman Corporation, Falls Church, VA, USA

Objective
The objective of the CARRECT software is to make cutting edge

statistical methods for reducing bias in epidemiological studies easy
to use and useful for both novice and expert users.

Introduction
Analyses produced by epidemiologists and public health practi-

tioners are susceptible to bias from a number of sources including
missing data, confounding variables, and statistical model selection.
It often requires a great deal of expertise to understand and apply the
multitude of tests, corrections, and selection rules, and these tasks
can be time-consuming and burdensome. To address this challenge,
Aptima began development of CARRECT, the Collaborative Au-
tomation Reliably Remediating Erroneous Conclusion Threats sys-
tem. When complete, CARRECT will provide an expert system that
can be embedded in an analyst’s workflow. CARRECT will support
statistical bias reduction and improved analyses and decision mak-
ing by engaging the user in a collaborative process in which the tech-
nology is transparent to the analyst.

Methods
Older approaches to imputing missing data, including mean im-

putation and single imputation regression methods, have steadily
given way to a class of methods known as “multiple imputation”
(hereafter “MI”; Rubin 1987). Rather than making the restrictive as-
sumption that the data are missing completely at random (MCAR),
MI typically assumes the data are missing at random (MAR).

There are two key innovations behind MI. First, the observed val-
ues can be useful in predicting the missing cells, and thus specifying
a joint distribution of the data is the first step in implementing the
models. Second, single imputation methods will likely fail not only
because of the inherent uncertainty in the missing values but also be-
cause of the estimation uncertainty associated with generating the pa-
rameters in the imputation procedure itself. By contrast, drawing the
missing values multiple times, thereby generating m complete
datasets along with the estimated parameters of the model properly
accounts for both types of uncertainty (Rubin 1987; King et al. 2001).
As a result, MI will lead to valid standard errors and confidence in-
tervals along with unbiased point estimates.

In order to compute the joint distribution, CARRECT uses a boot-
strapping-based algorithm that gives essentially the same answers as
the standard Bayesian Markov Chain Monte Carlo (MCMC) or Ex-
pectation Maximization (EM) approaches, is usually considerably
faster than existing approaches and can handle many more variables.

Results
Tests were conducted on one of the proposed methods with an epi-

demiological dataset from the Integrated Health Interview Series
(IHIS) producing verifiably unbiased results despite high missing-
ness rates. In addition, mockups (Figure 1) were created of an intu-
itive data wizard that guides the user through the analysis processes
by analyzing key features of a given dataset. The mockups also show

prompts for the user to provide additional substantive knowledge to
improve the handling of imperfect datasets, as well as the selection of
the most appropriate algorithms and models.

Conclusions
Our approach and program were designed to make bias mitigation

much more accessible to much more than only the statistical elite.
We hope that it will have a wide impact on reducing bias in epi-
demiological studies and provide more accurate information to poli-
cymakers.

Figure 1 - Screenshot of user selecting imputation parameters.

Keywords
Bias reduction; Missing data; Statistical model selection

Acknowledgments

This material is based upon work supported by the Walter Reed Army In-
stitute of Research (WRAIR) under Contract No. W81XWH-11-C-0505.
Any opinions, findings and conclusions or recommendations expressed in
this material are those of the authors and do not necessarily reflect the
views of the WRAIR.

References

James Honaker and Gary King, “What to do About Missing Values in
Time Series Cross-Section Data” American Journal of Political Sci-
ence Vol. 54, No. 2 (April, 2010): Pp. 561-581.

Gary King, James Honaker, Anne Joseph, and Kenneth Scheve. “Ana-
lyzing Incomplete Political Science Data: An Alternative Algorithm
for Multiple Imputation”, American Political Science Review, Vol.
95, No. 1 (March, 2001): Pp. 49-69.

*Jonathan C. Lansey
E-mail: jlansey@aptima.com

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 5(1):e189, 2013