scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
01 Jan 2000
TL;DR: A text mining tool that performs two tasks, namely document clustering and text summarization, based on computing the value of a TF-ISF measure for each word, which is an adaptation of the conventional TF-IDF measure of information retrieval.
Abstract: This paper describes a text mining tool that performs two tasks, namely document clustering and text summarization. These tasks have, of course, their corresponding counterpart in “conventional” data mining. However, the textual, unstructured nature of documents makes these two text mining tasks considerably more difficult than their data mining counterparts. In our system document clustering is performed by using the Autoclass data mining algorithm. Our text summarization algorithm is based on computing the value of a TF-ISF (term frequency – inverse sentence frequency) measure for each word, which is an adaptation of the conventional TF-IDF (term frequency – inverse document frequency) measure of information retrieval. Sentences with high values of TF-ISF are selected to produce a summary of the source text. The system has been evaluated on real-world documents, and the results are satisfactory.

171 citations

Proceedings ArticleDOI
04 Sep 2005
TL;DR: It is shown that a summarization system that uses a combination of lexical, prosodic, structural and discourse features produces the most accurate summaries, and that a combinations of acoustic/prosodic and structural features are enough to build a ‘good’ summarizer when speech transcription is not available.
Abstract: We present results of an empirical study of the usefulness of different types of features in selecting extractive summaries of news broadcasts for our Broadcast News Summarization System. We evaluate lexical, prosodic, structural and discourse features as predictors of those news segments which should be included in a summary. We show that a summarization system that uses a combination of these feature sets produces the most accurate summaries, and that a combination of acoustic/prosodic and structural features are enough to build a ‘good’ summarizer when speech transcription is not available.

169 citations

Journal ArticleDOI
TL;DR: This review examines work on automated summarization of electronic health record (EHR) data and in particular, individual patient record summarization with a particular focus on methods for detecting and removing redundancy, describing temporality, determining salience, accounting for missing data, and taking advantage of encoded clinical knowledge.

167 citations

01 Jan 2001
TL;DR: This paper describes four experiments in text summarization about the identification by human subjects of cross-document structural relationships such as identity, paraphrase, elaboration, and fulfillment and presents numerical evaluations of all four experiments.
Abstract: In this paper, we describe four experiments in text summarization. The first experiment involves the automatic creation of 120 multi-document summaries and 308 single-document summaries from a set of 30 clusters of related documents. We present official results from a multi-site manual evaluation of the quality of the summaries. The second experiment is about the identification by human subjects of cross-document structural relationships such as identity, paraphrase, elaboration, and fulfillment. The third experiment focuses on a particular cross-document structural relationship, namely subsumption. The last experiment asks human judges to determine which of the input articles in a given cluster were used to produce individual sentences of a manual summary. We present numerical evaluations of all four experiments. All automatic summaries have been produced by MEAD, a flexible summarization system under development at the University of Michigan.

167 citations

Proceedings ArticleDOI
Ani Nenkova1
09 Jul 2005
TL;DR: An overview of the achieved results in the different types of summarization tasks, comparing both the broader classes of baselines, systems and humans, as well as individual pairs of summarizers (both human and automatic).
Abstract: Since 2001, the Document Understanding Conferences have been the forum for researchers in automatic text summarization to compare methods and results on common test sets. Over the years, several types of summarization tasks have been addressed--single document summarization, multi-document summarization, summarization focused by question, and headline generation. This paper is an overview of the achieved results in the different types of summarization tasks. We compare both the broader classes of baselines, systems and humans, as well as individual pairs of summarizers (both human and automatic). An analysis of variance model is fitted, with summarizer and input set as independent variables, and the coverage score as the dependent variable, and simulation-based multiple comparisons were performed. The results document the progress in the field as a whole, rather then focusing on a single system, and thus can serve as a future reference on the work done up to date, as well as a starting point in the formulation of future tasks. Results also indicate that most progress in the field has been achieved in generic multi-document summarization and that the most challenging task is that of producing a focused summary in answer to a question/topic.

167 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852