Characterizing bioterrorist attacks from a short time series of diagnosed patient data - A Bayesian approach
Jaideep Ray, (Sandia National Laboratories, Livermore, CA), firstname.lastname@example.org,
Youssef M. Marzouk, (Sandia National Laboratories, Livermore, CA), email@example.com,
Mark Krauss, (NORAD-NORTHCON, Colorado Springs, CO), Mark.Kraus@northcom.mil, and
Petri Fast, (Lawrence Livermore National Laboratories), firstname.lastname@example.org
We present a Bayesian approach for inferring the number of infected people, the time of infection and the dosage received from an atmospheric release of an aerosolized pathogen during a bioattack. The inputs into the inference process is the number of new symptomatic patients as observed over a short (2-4 days) period, during the early epoch of the outbreak.
The release of a pathogen during a bioattack may not always be caught on environmental sensors - it may be too small, may consist of a low-quality formulation (coarse and heavy) which quickly precipitates or may occur in an uninstrumented location. In such a case, the first intimation of an attack will be the first confirmed diagnosis of a patient. Being able to infer the size of the problem from scarce data has important ramifications on the logistics of mounting a response. Further, since the estimates will be based on incomplete/incorrect observations, quantifying the uncertainty in those estimates or establishing confidence intervals becomes a concern. These estimates, once drawn, can be used in epidemic models to predict the evolution of the disease in the near future, under various levels of medical intervention. Current response plans do not contain any provisions for incorporating the uncertainty in the characterization of the outbreak at hand; they err on the side of caution by being broad and rapid. Sustainability, especially under multiple outbreaks, has not been considered an issue.
In this paper, we outline the development of the inference model  and apply it to a number of simulated attacks (using smallpox and anthrax) as well as the Sverdlovsk anthrax outbreak of 1979. A Bayesian approach is used to develop estimates of N, the number of people infected, t, the time of infections and D, the dosage received as probability density functions, thus capturing the uncertainty in the inference. A dose-dependent incubation period model is used for anthrax . Simple tests, involving people infected by an identical dosage, progress to more realistic ones where infected people receive a spectrum of dosages. This distribution is obtained by ,imprinting, a spatially distributed population with a dosage distribution obtained from an atmospheric dispersion model. We also explore the effect of model errors i.e. where there is a systematic difference between the model used for simulating the outbreak and that used for inference. This is done by using Wilkening's Model A2 and D , the two models that show the closest fit to results from anthrax challenge experiments on non-human primates. Preliminary investigations  show that 3-5 days of data are often sufficient to arrive within a factor-of-two of the ,correct, answer; if data is collected over 6-hour intervals rather than on a daily basis, the inferences are significantly sharper. Thus one may not require *more* data, over longer observation periods, to arrive at an accurate estimate; a better capturing of its structure, for instance through nimble reporting protocols, may be of greater assistance. Further, for diseases with long incubation periods, e.g. smallpox, the 3-5 days' observation period usually correspond to < 1% of the total infected exhibiting symptoms; however, this is usually sufficient to infer the outbreak characteristics to within very tight accuracies .
We finally apply this inference process to the Sverdlovsk anthrax outbreak of 1979, which, it is suspected, was caused by an accidental release of anthrax spores from a military facility . 70 people died and 80 were infected. The estimated date of release is April 2nd,1979, with the first symptoms being observed on April 4th. The symptomatic patients' time-series was reconstructed from grave-markers and interviews since much of the data had be ,scrubbed,. Further, it is believed that the dosages were very low - estimates range from10-300 spores . In addition, the progression of the outbreak had been severely modified (it lasted 42 days) by public health measures. The small size of the outbreak, the low dosages, the antibiotic-modified progression and the reconstructed data result in a stiff challenge to any inference process. Our automated method correctly identified the time of release with barely 4 days of observed data, though it took about 9 days of observations to arrive at the correct estimate for the size. Dosages were difficult to infer, though it was clear that it was less than 100 spores.
The motivation for developing this inference technique was to be able to characterize an outbreak with as little data as possible. Since this data contains noise, erroneous characterizations in the early epoch of the observation period are a constant threat. These take the form of support for hypotheses which are significantly different from the true characterization of the outbreak. We show examples of such failures and well as empirical proof that the procedure corrects itself as more data becomes available (7-8 days). While this is a measure of robustness of the procedure, length of the observation period is simply too long to be of any relevance for response planning. However, we conjecture that this particular shortcoming could be largely eliminated if prior distributions for some/all of the variables are available, since all our tests are performed with broad uniform priors. These priors are best obtained from syndromic surveillance data.
We have developed a prototypical approach for estimating the characteristics of an outbreak resulting from inhalational infection. The results discussed above encourage us to believe that such an inference process could profitably complement medical surveillance networks, by using their raw data to draw inferences regarding the size of the outbreak, including infected people still in incubation. It could also serve as a ,fusion, mechanism for syndromic surveillance and medical reporting by exploiting priors drawn from syndromic surveillance to increase the efficiency of the inference process.
 Meselson et al, Science, 266:1202-1208, 1994.
 Ray et al, Sandia National Laboratories Technical Report SAND2006-1491. Unclassified, unlimited release.
 D. Wilkening, PNAS, 103(20):7589-7594, 2006.