Layout 1


ISDS Annual Conference Proceedings 2012. This is an Open Access article distributed under the terms of the Creative Commons Attribution-
Noncommercial 3.0 Unported License (http://creativecommons.org/licenses/by-nc/3.0/), permitting all non-commercial use, distribution, and
reproduction in any medium, provided the original work is properly cited.

ISDS 2012 Conference Abstracts

An Improved Algorithm for Outbreak Detection in
Multiple Surveillance Systems
Angela Noufaily*1, Doyo Enki1, Paddy Farrington1, Paul Garthwaite1, Nick Andrews2 and
Andre Charlett2

1The Open University, Milton Keynes, United Kingdom; 2Health Protection Agency, London, United Kingdom

Objective
To improve the performance of the England and Wales large scale

multiple statistical surveillance system for infectious disease out-
breaks with a view to reducing the number of false reports, while re-
taining good power to detect genuine outbreaks.

Introduction
There has been much interest in the use of statistical surveillance

systems over the last decade, prompted by concerns over bio-terror-
ism, the emergence of new pathogens such as SARS and swine flu,
and the persistent public health problems of infectious disease out-
breaks. In the United Kingdom (UK), statistical surveillance methods
have been in routine use at the Health Protection Agency (HPA) since
the early 1990s and at Health Protection Scotland (HPS) since the
early 2000s (1,2). These are based on a simple yet robust quasi-Pois-
son regression method (1). We revisit the algorithm with a view to im-
proving its performance.

Methods
We fit a quasi-Poisson regression model to baseline data.
One of the limitations of the current algorithm is the small number

of baseline weeks used. We propose a simple seasonal adjustment
using factors. We extend the model to include a 10-level factor. 

We fit the trend component always irrespective of its statistical sig-
nificance.

We are concerned that the existing weighting procedure is too dras-
tic. The baseline at a certain week is down-weighted if the standard-
ized Anscombe residual for that week is greater than 1. This condition
was chosen empirically to avoid reducing the sensitivity of the sys-
tem in the presence of large outbreaks in the baselines, but may be in-
creasing the FPR unduly when there are no or only small outbreaks
in the baselines. We investigate several other options, including re-
ducing the down-weighting to cases where the Anscombe residuals
are greater than 2 or 3.

We evaluate a new re-weighting scheme informed by past deci-
sions. Using this adaptive scheme, baseline data where an alarm was
flagged are down-weighted to reduce their effect on current predic-
tions. The criterion we use for re-weighting, here, is the value of the
exceedance score.

Finally, we investigate the validity of the upper threshold values
based on the quasi-Poisson model when the data are generated using
known negative binomial distributions.

Results
Our evaluation of the existing algorithm showed that the false pos-

itive rate (FPR) is too high.

A novel feature of our new models is that they make use of much
more baseline data. This resulted in a better estimation of the trend
and variance and decreased the FPR. In addition, we found that the
trend should always be fitted even when non-significant (or extreme).
This decreases the discrepancies in the results when moving from one
week to another.

The adaptive reweighting scheme was found to give broadly equiv-
alent results to the reweighting method based on scaled Anscombe
residuals. Using the latter as in the original HPA method, but with
much higher threshold for reweighting decreased the FPR further.

Our investigations also suggest that the negative binomial model
is a reasonable one, though not ideal in all circumstances. Thus, there
is a good case for replacing the quasi-Poisson model with the nega-
tive binomial.

One of the unusual features of the HPA system is that it is run every
week on a database of more than 3300 distinct organisms, which is
likely to produce a large number of aberrances. We found that re-
taining the exceedance score approach based on the 0.995 quantile is
perfectly reasonable. This involves ranking aberrant organisms in
order of exceedance.

Conclusions
We have undertaken a thorough evaluation of the HPA’s outbreak

detection system based on simulated and real data. The main conclu-
sion from this evaluation is that the FPR is too high, owing to a com-
bination of factors notably excessive down-weighting of high
baselines and reliance on too few baseline weeks.

Keywords
outbreak; negative binomial regression; quasi-Poisson

Acknowledgments

This research was supported by a project grant from the Medical Research
Council, and by a Royal Society Wolfson Research Merit Award.

References

1. Farrington CP, Andrews NJ, Beale AJ, Catchpole MA. A Statistical Al-
gorithm for the Early Detection of Outbreaks of Infectious Disease.
Journal of the Royal Statistical Society Series A. 1996; 159: 547-563.

2. McCabe GJ, Greenhalgh D, Gettingby G, Holmes E, Cowden J. Pre-
diction of infectious diseases: an exception reporting system. Journal
of Medical Informatics and Technologies. 2003;5: 67-74.

*Angela Noufaily
E-mail: a.noufaily@open.ac.uk

Online Journal of Public Health Informatics * ISSN 1947-2579 * http://ojphi.org * 5(1):e148, 2013