Quantitative Methods in Defense and National Security 2007

Achieving Human Performance for Multi-Lingual Multi-Document Summarization
John M. Conroy, (IDA Center for Computing Sciences), conroy@super.org,
Dianne P. O'Leary, (Dept. Computer Science, University of Maryland), oleary@cs.umd.edu, and
Judith D. Schlesinger, (IDA Center for Computing Sciences), judith@super.org


Given a group of approximately 10 topically related documents in English and Arabic ,compose a 100-word resume of that topic, capturing the important people, places, and details surrounding the topic event., This was the task of the 2005 and 2006 Multi-Lingual Summarization Evaluation. In this talk, I will describe a computational approach to this problem which performs at human performance levels as measured by both automatic and human evaluation.

The approach consists of three stages: a linguistic step to identify and shorten the original sentences, a statistical approach of identifying sentences with the largest expected number of terms which would appear in a human abstract, and a linear algebraic approach for selecting a non-redundant subset of the sentences with good coverage of the important terms.

See http://research.microsoft.com/~lucyv/MSE2006.htm.

Take me back to the main conference page.