Quantitative Methods in Defense and National Security 2007

Exploratory Data Analysis on Document Collections
Nicholas Tucey, (Naval Surface Warfare Center Dahlgren Division), nicholas.tucey@navy.mil


A demonstration on various text data mining software tools will be given. The software indexes a collection of documents using the Java Lucene search engine API and Latent Semantic Indexing (LSI). Both indexing techniques allow the user to explore the document collection using queries. Visualization of the collection is performed using spectral graph projections and hierarchical clustering of the documents. Finally, a social network visualization tool will be demonstrated using the collaboration of authors from the document collection.

