Statistical analyses in disease surveillance systems
© Lescano et al; licensee BioMed Central Ltd. 2008
Published: 14 November 2008
The performance of disease surveillance systems is evaluated and monitored using a diverse set of statistical analyses throughout each stage of surveillance implementation. An overview of their main elements is presented, with a specific emphasis on syndromic surveillance directed to outbreak detection in resource-limited settings. Statistical analyses are proposed for three implementation stages: planning, early implementation, and consolidation. Data sources and collection procedures are described for each analysis.
During the planning and pilot stages, we propose to estimate the average data collection, data entry and data distribution time. This information can be collected by surveillance systems themselves or through specially designed surveys. During the initial implementation stage, epidemiologists should study the completeness and timeliness of the reporting, and describe thoroughly the population surveyed and the epidemiology of the health events recorded. Additional data collection processes or external data streams are often necessary to assess reporting completeness and other indicators. Once data collection processes are operating in a timely and stable manner, analyses of surveillance data should expand to establish baseline rates and detect aberrations. External investigations can be used to evaluate whether abnormally increased case frequency corresponds to a true outbreak, and thereby establish the sensitivity and specificity of aberration detection algorithms.
Statistical methods for disease surveillance have focused mainly on the performance of outbreak detection algorithms without sufficient attention to the data quality and representativeness, two factors that are especially important in developing countries. It is important to assess data quality at each state of implementation using a diverse mix of data sources and analytical methods. Careful, close monitoring of selected indicators is needed to evaluate whether systems are reaching their proposed goals at each stage.
Most analyses performed with data from disease surveillance systems focus on establishing baseline disease rates and testing outbreak detection algorithms [1, 2]. Another outcome commonly evaluated is the timeliness of data reporting . However, the performance of disease surveillance systems also needs to be evaluated and monitored during other stages of surveillance implementation. For example, outbreak detection algorithms often need to adapt to systematic variations in the frequency of conditions under surveillance . Therefore, it is meaningful to understand and incorporate the seasonal or day-of-week variability in the caseload in order to implement outbreak detection algorithms that can better respond to these variations. Thus, a diverse set of data collection processes and statistical analyses should be implemented and continuously used to monitor and evaluate surveillance systems.
In developing countries, surveillance is often conducted in the context of more limited resources than in developed settings, frequently lacking appropriate computer systems and laboratory diagnostic capabilities, and without sufficient numbers of well-trained physicians . Syndromic surveillance has thus emerged as an alternative to the lack of physicians and laboratory diagnostics, and in some cases is based on monitoring the frequency of patients' signs and symptoms instead of relying on clinical or laboratory-confirmed diagnosis . The introduction of syndromic surveillance in developing countries has also met with increased need for more comprehensive statistical analysis of the information generated, as less is known about the characteristics and behavior of the data streams used in these novel surveillance approaches. We present a brief overview of selected key statistical procedures proposed for different stages of the implementation of epidemiological surveillance systems. Proposed analyses have been classified according to three implementation stages: planning, early implementation and consolidation. Specific emphasis is placed on statistical analyses needed for syndromic surveillance systems implemented in resource-limited settings aiming at early warning and outbreak detection.
The statistical procedures proposed in this paper were developed and applied between 2000 and 2006 during the planning stages and implementation of Alerta  and the Early Warning Outbreak Response System (EWORS) , two electronic early warning surveillance systems currently in place in resource-limited settings of Asia and the Americas. Alerta monitors clinical diagnosis of mandatory-reporting conditions twice per week within the Peruvian Navy and Army since 2003, and is currently being expanded to other countries in the Americas. Alerta is implemented by the U.S. Naval Medical Research Center Detachment (NMRCD), Peru and uses technology of Voxiva S.R.L. EWORS monitors daily the signs and symptoms of patients with conditions of potential infectious origin in sentinel hospitals, and is implemented in both Southeast Asia and in Peru. EWORS was developed in 1999 by the U.S. Naval Medical Research Unit #2 (NAMRU-2) in Indonesia, and implemented in collaboration with the Ministries of Health in each country. Numeric examples are drawn from the experiences of Alerta in Peru and EWORS in Indonesia, Lao PDR and Peru.
Proposed statistical analyses by stage
To demonstrate the feasibility of conducting surveillance, often it is important to prove that surveillance will not place a burden on healthcare personnel, and that timely data can be generated with existing resources. This is particularly the case for syndromic surveillance systems that use data that otherwise are not collected routinely. Pilot tests of the data gathering forms of proposed new surveillance systems should be conducted in advance (with >30 patients, ideally), in order to measure the mean data collection and entry time, (i.e., time needed to obtain information from a patient and to input data in a computer). Averages and standard deviations are commonly estimated, and for Alerta and EWORS data collection takes <1 minute per case and does not distract personnel or patients. Data distribution time, on the other hand, estimates the time required to send data from the surveillance site to the central hub, usually via the Internet or by phone. It can be estimated beforehand or can be recorded automatically by the system itself, and usually takes <15 m for EWORS (daily) or Alerta (twice per week), showing that little connectivity time is needed, and that dial-up connections are sufficient in most surveillance sites.
Early implementation stage
Average and range of reporting rate (percent days with data reported) across sites, EWORS 2000 – 2006.
Overall percent reporting rates (range)
96 (91 – 100)
93 (88 – 94)
89 (69 – 100)
Completeness of reporting (percent of all eligible subjects whose data was actually recorded in the surveillance system) assesses the representativeness of the data reported. Estimating this indicator often requires a labor-intensive process involving sporadic site visits to manually evaluate the fraction of patients who visited a surveillance site seeking medical attention and were actually included in the surveillance system. EWORS and Alerta sites evaluated in Peru showed completeness rates ranging from 66% to 90% (Araujo-Castillo R, personal communication 2007).
Main sociodemographic characteristics of cases surveyed, EWORS 2000 – 2006.
Median age (years)
Traveled recently (%)
Patients per day
Most frequent symptoms and syndromes and percent patients affected, EWORS, 2000 – 2006.
Most frequent symptoms
Sore throat (25)
Most frequent syndromes
Any of these three
As this article places specific emphasis on early warning outbreak detection surveillance systems, one of the expected outcomes of such systems will often be the actual detection of an aberration or potential outbreak. The performance of outbreak detection algorithms is measured by their sensitivity (percent detected of all outbreaks), specificity (often expressed by the average time between warnings) and detection timeliness (delay between outbreak onset and detection). Numerous algorithms exist and abundant literature describe their performance [6, 10], although with known limitations: 1) performance assessments still rely on simulations instead of 'real' outbreaks, 2) measuring 'true' performance requires outbreak investigations and parallel detection systems as a gold standard, and 3) evidence from developing countries remains remarkably limited.
Researchers conducting statistical analyses applied to disease surveillance systems often place more interest on outbreak detection algorithms [2, 4, 6], describing in substantially less detail the systems' performance, data quality and the epidemiological profile of the population under surveillance. Surveillance conducted in resource-limited settings, however, often suffers from low reporting coverage and data completeness, which in some cases may be insufficient to support accurate, timely outbreak detection. These operational issues must be addressed before the performance of the system as a whole can be assessed. Although our analyses and evidence are limited to developing country settings, surveillance systems in more developed countries would probably also benefit from increased analysis of data quality and system performance beyond aberration detection.
Outbreak detection algorithms should match the characteristics of the surveillance data available, and only a careful analysis of existing surveillance data may reveal unique features that need to be addressed. A few examples include the presence and magnitude of day-of-week effects, their effect in comparison with seasonal variation, the most frequent disease outcomes among the population, and the socio-demographic units within the population.
Surveillance in the context of developing countries is unique in many aspects. Few surveillance systems currently exist, and entire sub-groups of populations such as the military are often excluded . The implementation of new systems such as Alerta and EWORS provides an opportunity to look critically at the surveillance system evaluation framework and expand the current arsenal of evaluation procedures. As current pandemic threats  and international regulations demand more extensive surveillance , the evaluation of new systems should begin earlier on their implementation in order to enhance their overall chances of success.
Statistical methods for disease surveillance have focused mainly on the performance of outbreak detection algorithms and have not paid sufficient attention to the data quality and representativeness, two factors that are especially important in developing countries. Whether the final endpoint of surveillance is outbreak detection, situational awareness, or estimation of trends, these aims cannot be accomplished without adequate intermediate outcomes such as reporting coverage, data quality and completeness. We advocate the use of a more holistic approach to statistical analyses in which indicators relate to the entire surveillance process. Assessment of data quality using a diverse mix of data sources and analytical methods is key during each stage of its implementation. Careful, close monitoring of selected indicators is also crucial to evaluate whether systems are reaching their proposed goals at each stage. A more balanced, diverse analysis of surveillance systems data is essential in the current context, as new surveillance systems are implemented in response to pandemic threats and the recently updated international health regulations.
The views expressed in this article are those of the authors and do not necessarily reflect the official policy or position of the Ministries of Health or Governments of Peru, Indonesia or Lao PDR, the U.S. Department of the Navy, the U.S. Department of Defense, nor the U.S. Government.
Several authors of this manuscript are employees of the U.S. Government. This work was prepared as part of their duties. Title 17 U.S.C. §105 provides that 'Copyright protection under this title is not available for any work of the United States Government'. Title 17 U.S.C. §101 defines a U.S. Government work as a work prepared by military service member or employee of the U.S. Government as part of that person's official duties.
The authors would like to acknowledge the efforts of the Ministries of Health of Peru, Indonesia and Lao PDR to develop and enhance their disease surveillance capabilities. This work was partially supported by DoD-GEIS Work Unit Number 847705 82000 25 GB B0016 and the NIH/FIC training grant D43 TW007393-01.
This article has been published as part of BMC Proceedings Volume 2 Supplement 3, 2008: Proceedings of the 2007 Disease Surveillance Workshop. Disease Surveillance: Role of Public Health Informatics. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/2?issue=S3.
- Buckeridge DL, Burkom H, Moore A, Pavlin J, Cutchis P, Hogan W: Evaluation of syndromic surveillance systems – design of an epidemic simulation model. MMWR Morb Mortal Wkly Rep. 53 (Suppl): 137-43. 2004 Sep 24PubMedGoogle Scholar
- Hurt-Mullen KJ, Coberly J: Syndromic surveillance on the epidemiologist's desktop: making sense of much data. MMWR Morb Mortal Wkly Rep. 54 (Suppl): 141-6. 2005 Aug 26PubMedGoogle Scholar
- Jajosky RA, Groseclose SL: Evaluation of reporting timeliness of public health surveillance systems for infectious diseases. BMC Public Health. 4: 29-10.1186/1471-2458-4-29. 2004 Jul 26PubMed CentralView ArticlePubMedGoogle Scholar
- Hutwagner LC, Thompson WW, Seeman GM, Treadwell T: A simulation model for assessing aberration detection methods used in public health surveillance for systems with limited baselines. Stat Med. 24 (4): 543-50. 10.1002/sim.2034. 2005 Feb 28View ArticlePubMedGoogle Scholar
- Chretien JP, Blazes DL, Mundaca CC, Glass J, Happel Lewis S, Lombardo J, Erickson RL: Surveillance for Emerging Infection Epidemics in Developing Countries: EWORS and Alerta DISAMAR. Disease Surveillance. Edited by: Lombardo JS, Buckeridge DL. 2007, Hoboken (NJ): John Wiley & Sons, Inc, 369-96.Google Scholar
- Buckeridge DL: Outbreak detection through automated surveillance: a review of the determinants of detection. J Biomed Inform. 2007, 40 (4): 370-9. 10.1016/j.jbi.2006.09.003.View ArticlePubMedGoogle Scholar
- Soto G, Araujo-Castillo RV, Neyra J, Mundaca CC, Blazes DL: Challenges in the implementation of an electronic surveillance system in a resource-limited setting: Alerta, in Peru. BMC Proceedings. 2008, 2 (Suppl 3): S4-PubMed CentralView ArticlePubMedGoogle Scholar
- Corwin A: Developing regional outbreak response capabilities: Early Warning Outbreak Recognition System (EWORS). Navy Med. 2000, 1-5.Google Scholar
- Buehler JW, Hopkins RS, Overhage JM, Sosin DM, Tong V: Framework for evaluating public health surveillance systems for early detection of outbreaks: recommendations from the CDC Working Group. MMWR Recomm Rep. 53 (RR-5): 1-11. 2004 May 7Google Scholar
- Buckeridge DL, Burkom H, Campbell M, Hogan WR, Moore AW: Algorithms for rapid outbreak detection: a research synthesis. J Biomed Inform. 2005, 38 (2): 99-113. 10.1016/j.jbi.2004.11.007.View ArticlePubMedGoogle Scholar
- Fauci AS: Pandemic influenza threat and preparedness. Emerg Infect Dis. 2006, 12 (1): 73-7. 10.1086/507550.PubMed CentralView ArticlePubMedGoogle Scholar
- World Health Organization: The World Health Report 2007: A Safer Future. Global Public Health Security in the 21st Century. Geneva, Switzerland. 2007Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.