Quantitative Methods in Defense and National Security 2007

Discovery Facilitation Via Latent Semantic Indexing
Jeff Solka, (Naval Surface Warfare Center), jeffrey.solka@navy.mil,
Nicholas Tucey, (Naval Surface Warfare Center), nicholas.tucey@navy.mil, and
Avory Bryant, (Naval Surface Warfare Center), avory.bryant@navy.mil


Previous work by Swanson and other (Swanson, 1986, Smalheiser, NR & Swanson, DR., 1998, Gordon, M.D. and Lindsay, R. K., 1996) have developed methodologies for the semi-automated detection of new discoveries. These new discoveries have consisted of new candidate approaches to standing problems. For example Swanson himself had applied his developed methodologies to discovery of new techniques for the treatment of Raynaud's syndrome (Swanson, 1986), migraines (Swanson, D. R., 1988), and even the prevention of technological surprise (Swanson D.R, Smalheiser N.R and Bookstein A., 2001).

Swanson's original approach sought out transitive links between related literatures. Gordon and Lindsay 1996 sought to automate Swanson's procedures through the application of statistical down selection procedures. Our work has followed the lead of the previous efforts of Gordon and Dumais (Gordon, M. D., and Dumais, S., 1998) that utilizes latent semantic indexing (LSI) to facilitate the discovery process. Their work primarily focused on the identification of related terms.

The reader is reminded that LSI is based on singular value decomposition on the term document matrix and that this decomposition provides projections that allow one to render the terms and documents within a common space. Gordon and Dumais used the projection that rendered the terms (single and double terms) to look for interesting associations between the terms that were originally associated with the problem of interest and those that might offer potential solutions. They discussed within the venues of their paper the fact that one could take a similar approach to look for associations among documents rather than terms.

This talk will discuss some of our recent work to revisit and extend the methodology of Gordon and Dumais. Our work has focused on the development of visualization frameworks and GUI-based software systems to facilitate the identification of potential discoveries based on term to term and document to document associations. We will illustrate these approaches on a current problem of interest which is the discovery of new methods of water purification.


Gordon, M. D., and Dumais, S., 1998. Using latent semantic indexing for literature-based discovery. Journal of the American Society for Information Science. 49(8): 674-685.

Gordon, M.D. and Lindsay, R. K., 1996. Toward discovery support systems: A replication, re-examination, and extension of Swanson's work on literature-based discovery of a connection between Raynaud's and fish oil. Journal of the American Society for Information Science. 47. 116-128.

Smalheiser, NR & Swanson, DR., 1998. Using Arrowsmith: a computer-assisted approach to formulating and assessing scientific hypotheses. Computer Methods and Programs in Biomedicine 57, 149-153.

Swanson, D. R., 1986. Fish Oil, Raynaud's syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine, 30(1), 7-18.

Swanson, D. R., 1988. Migraine and magnesium: eleven neglected connections. Perspt. Biol. Med., 31, 526-557.

Swanson D.R, Smalheiser N.R and Bookstein A., 2001. Information discovery from complementary literatures: categorizing viruses as potential weapons. JASIST 52(10), 797-812.

Take me back to the main conference page.