Using Scan Statistics for Anomaly Detection in Genetic Regulatory Networks
Christopher C. Overall, (George Mason University), email@example.com,
Jeffrey L. Solka, (Dahlgren Division of the Naval Surface Warfare Center), Jeffrey.Solka@navy.mil,
Jennifer W. Weller, (George Mason University), firstname.lastname@example.org, and
Carey E. Priebe, (Johns Hopkins University), email@example.com
Biological systems contain many levels of complex interactions between heterogeneous components, the dynamics of which are usually non-linear. Each functional layer forms an interacting network, and the network layers interact, but the components are not completely connected. It has become increasingly popular to represent these biological interactions as a network (graph) in which a node (vertex) represents a biological molecule or functional complex and an edge represents a relationship between the two molecules. This graph representation provides a powerful and intuitive framework because the full power of graph theory can be harnessed for analyzing the global behavior of the system, but it comes with a price; the dynamics of the system are lost when the interactions are represented as a static graph. In general, biological researchers often want to determine if and when the relationship between biological entities alters significantly over time and in response to the environment in the system under study. In other words, biologists are interested when and where an anomaly occurs and this requires that the dynamics of the system be incorporated into the analysis.
The problem of detecting anomalies in biological networks is analogous to anomaly-based network intrusion detection in the computer network security domain. There is a large amount of data that requires automated techniques for determining normal network behavior and then using this prior history to determine when an anomalous change has occurred at one or more nodes in the network. These techniques detect anomalous behavior in the network that might not have been deduced by a human, allowing the analyst or researcher to hone in on the anomaly and to determine if it is significant. Although many anomaly-based network intrusion detection techniques have been developed for computer networks, to our knowledge, there are not any similar anomaly detection techniques for biological networks.
We have developed a technique for anomaly detection in genetic networks that are generated from time-series transcriptional profiling experiments, of the type that are measured on microarray platforms or RT-PCR devices, and successfully applied it to a time-series Drosophila microarray dataset. The technique is a hybrid solution for the study of biological interaction networks, incorporating some of the dynamics of the system while using the simplifying network representation as the analysis framework. First, a genetic regulatory network is constructed for the genes under investigation. Then univariate and/or multivariate model-based clustering is used to create a time sequence of graphs using the time-series gene expression dataset and the genetic regulatory network. Finally, the series of graphs is analyzed for anomalies in gene activity over time using the graph-based scan statistics of Priebe et al. (2006).
Although the characterization of static biological interaction networks and the interplay between them is far from complete, even for model organisms, we feel that it is nevertheless important to take the next logical step and begin to explore techniques for automated anomaly detection in these networks. We hope that our anomaly detection methodology will spur interest in, and appreciation for, this type of analysis in the biological sciences.