scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Experiments on benchmark datasets show that the proposed summarization approach significantly outperforms relevant state-of-the-art baselines and the Semantic Link Network plays an important role in representing and understanding documents.
Abstract: The key to realize advanced document summarization is semantic representation of documents. This paper investigates the role of Semantic Link Network in representing and understanding documents for multi-document summarization. It proposes a novel abstractive multi-document summarization framework by first transforming documents into a Semantic Link Network of concepts and events and then transforming the Semantic Link Network into the summary of the documents based on the selection of important concepts and events while keeping semantics coherence. Experiments on benchmark datasets show that the proposed summarization approach significantly outperforms relevant state-of-the-art baselines and the Semantic Link Network plays an important role in representing and understanding documents.

19 citations

Proceedings Article
01 May 2006
TL;DR: A novel method for automatic text summarization through text extraction, using computational semantics, to construct a summarizer that can be quickly assembled, with the use of only a very few basic language tools, for languages that lack large amounts of structured or annotated data or advanced tools for linguistic processing.
Abstract: In this paper we present a novel method for automatic text summarization through text extraction, using computational semantics The new idea is to view all the extracted text as a whole and compute a score for the total impact of the summary, instead of ranking for instance individual sentences A greedy search strategy is used to search through the space of possible summaries and select the summary with the highest score of those found The aim has been to construct a summarizer that can be quickly assembled, with the use of only a very few basic language tools, for languages that lack large amounts of structured or annotated data or advanced tools for linguistic processing The proposed method is largely language independent, though we only evaluate it on English in this paper, using ROUGE-scores on texts from among others the DUC 2004 task 2 On this task our method performs better than several of the systems evaluated there, but worse than the best systems

19 citations

Journal ArticleDOI
TL;DR: This work integrates automatically acquired information from external search engines with a semi-supervised probabilistic graphical model, and shows that this helps to achieve significantly better classification performance without the need for much training data.

19 citations

Posted Content
TL;DR: This work proposes to extract a set of candidate summaries for each document set based on an ILP framework, and then leverage Ranking SVM for summary reranking, and Evaluation results on the benchmark DUC datasets validate the efficacy and robustness of the proposed approach.
Abstract: Existing multi-document summarization systems usually rely on a specific summarization model (i.e., a summarization method with a specific parameter setting) to extract summaries for different document sets with different topics. However, according to our quantitative analysis, none of the existing summarization models can always produce high-quality summaries for different document sets, and even a summarization model with good overall performance may produce low-quality summaries for some document sets. On the contrary, a baseline summarization model may produce high-quality summaries for some document sets. Based on the above observations, we treat the summaries produced by different summarization models as candidate summaries, and then explore discriminative reranking techniques to identify high-quality summaries from the candidates for difference document sets. We propose to extract a set of candidate summaries for each document set based on an ILP framework, and then leverage Ranking SVM for summary reranking. Various useful features have been developed for the reranking process, including word-level features, sentence-level features and summary-level features. Evaluation results on the benchmark DUC datasets validate the efficacy and robustness of our proposed approach.

19 citations

Proceedings ArticleDOI
27 Aug 2011
TL;DR: This work presents techniques for automatically summarizing the topical content of an audio corpus and an example summarization of conversational data from the Fisher corpus that demonstrates the effectiveness of the approach is presented and evaluated.
Abstract: This work presents techniques for automatically summarizing the topical content of an audio corpus. Probabilistic latent semantic analysis (PLSA) is used to learn a set of latent topics in an unsupervised fashion. These latent topics are ranked by their relative importance in the corpus and a summary of each topic is generated from signature words that aptly describe the content of that topic. This paper presents techniques for producing a high quality summarization. An example summarization of conversational data from the Fisher corpus that demonstrates the effectiveness of our approach is presented and evaluated. Index Terms: latent topic modeling, speech summarization

19 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852