Achieving Human Performance for Multi-Lingual Multi-Document Summarization
John M. Conroy, (IDA Center for Computing Sciences), firstname.lastname@example.org,
Dianne P. O'Leary, (Dept. Computer Science, University of Maryland), email@example.com, and
Judith D. Schlesinger, (IDA Center for Computing Sciences), firstname.lastname@example.org
Given a group of approximately 10 topically related documents in English and Arabic ,compose a 100-word resume of that topic, capturing the important
people, places, and details surrounding the topic event., This was the task of the 2005 and 2006 Multi-Lingual
Summarization Evaluation. In this talk, I will describe a
computational approach to this problem
which performs at human performance levels as measured by both automatic and human evaluation.
The approach consists of three stages: a linguistic step to identify and shorten the original sentences, a statistical approach of identifying sentences with the largest expected number of terms which would appear in a human abstract, and a linear algebraic approach for selecting a non-redundant subset of the sentences with good coverage of the important terms.