scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings Article
23 Aug 2010
TL;DR: It is shown that four well-known summarization tasks including generic, query-focused, update, and comparative summarization can be modeled as different variations derived from the proposed framework.
Abstract: Multi-document summarization has been an important problem in information retrieval. It aims to distill the most important information from a set of documents to generate a compressed summary. Given a sentence graph generated from a set of documents where vertices represent sentences and edges indicate that the corresponding vertices are similar, the extracted summary can be described using the idea of graph domination. In this paper, we propose a new principled and versatile framework for multi-document summarization using the minimum dominating set. We show that four well-known summarization tasks including generic, query-focused, update, and comparative summarization can be modeled as different variations derived from the proposed framework. Approximation algorithms for performing summarization are also proposed and empirical experiments are conducted to demonstrate the effectiveness of our proposed framework.

150 citations

Journal Article
TL;DR: A new evaluation measure for assessing the quality of a summary that can compare a summary with its full text and if abstracts are not available for a given corpus, using the LSA-based measure is an appropriate choice.
Abstract: We explain the ideas of automatic text summarization approaches and the taxonomy of summary evaluation methods. Moreover, we propose a new evaluation measure for assessing the quality of a summary. The core of the measure is covered by Latent Semantic Analysis (LSA) which can capture the main topics of a document. The summarization systems are ranked according to the similarity of the main topics of their summaries and their reference documents. Results show a high correlation between human rankings and the LSA-based evaluation measure. The measure is designed to compare a summary with its full text. It can compare a summary with a human written abstract as well; however, in this case using a standard ROUGE measure gives more precise results. Nevertheless, if abstracts are not available for a given corpus, using the LSA-based measure is an appropriate choice.

149 citations

Journal ArticleDOI
TL;DR: SumUM is a text summarization system that takes a raw technical text as input and produces an indicative informative summary that motivates the topics, describes entities, and defines concepts.
Abstract: We present and evaluate SumUM, a text summarization system that takes a raw technical text as input and produces an indicative informative summary. The indicative part of the summary identifies the topics of the document, and the informative part elaborates on some of these topics according to the reader's interest. SumUM motivates the topics, describes entities, and defines concepts. It is a first step for exploring the issue of dynamic summarization. This is accomplished through a process of shallow syntactic and semantic analysis, concept identification, and text regeneration. Our method was developed through the study of a corpus of abstracts written by professional abstractors. Relying on human judgment, we have evaluated indicativeness, informativeness, and text acceptability of the automatic summaries. The results thus far indicate good performance when compared with other summarization technologies.

149 citations

Proceedings ArticleDOI
01 Aug 2017
TL;DR: The authors employed a Graph Convolutional Network (GCNets) on the relation graphs, with sentence embeddings obtained from Recurrent Neural Networks as input node features for saliency estimation.
Abstract: We propose a neural multi-document summarization system that incorporates sentence relation graphs. We employ a Graph Convolutional Network (GCN) on the relation graphs, with sentence embeddings obtained from Recurrent Neural Networks as input node features. Through multiple layer-wise propagation, the GCN generates high-level hidden sentence features for salience estimation. We then use a greedy heuristic to extract salient sentences that avoid redundancy. In our experiments on DUC 2004, we consider three types of sentence relation graphs and demonstrate the advantage of combining sentence relations in graphs with the representation power of deep neural networks. Our model improves upon other traditional graph-based extractive approaches and the vanilla GRU sequence model with no graph, and it achieves competitive results against other state-of-the-art multi-document summarization systems.

148 citations

Journal ArticleDOI
TL;DR: Analysis of feedback forms filled in after each decision indicated that the intelligibility of present-day machine-generated summaries is high, and the evaluation methods used in the SUMMAC evaluation are of interest to both summarization evaluation as well as evaluation of other ‘output-related’ NLP technologies, where there may be many potentially acceptable outputs.
Abstract: The TIPSTER Text Summarization Evaluation (SUMMAC) has developed several new extrinsic and intrinsic methods for evaluating summaries. It has established definitively that automatic text summarization is very effective in relevance assessment tasks on news articles. Summaries as short as 17% of full text length sped up decision-making by almost a factor of 2 with no statistically significant degradation in accuracy. Analysis of feedback forms filled in after each decision indicated that the intelligibility of present-day machine-generated summaries is high. Systems that performed most accurately in the production of indicative and informative topic-related summaries used term frequency and co-occurrence statistics, and vocabulary overlap comparisons between text passages. However, in the absence of a topic, these statistical methods do not appear to provide any additional leverage: in the case of generic summaries, the systems were indistinguishable in accuracy. The paper discusses some of the tradeoffs and challenges faced by the evaluation, and also lists some of the lessons learned, impacts, and possible future directions. The evaluation methods used in the SUMMAC evaluation are of interest to both summarization evaluation as well as evaluation of other ‘output-related’ NLP technologies, where there may be many potentially acceptable outputs, with no automatic way to compare them.

145 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852