scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
31 Oct 2008
TL;DR: ROUGE-C applies the ROUGE method alternatively by replacing the reference summaries with source document as well as query-focused information (if any), and therefore it enables a fully manual-independent way of evaluating multi-document summarization.
Abstract: This paper presents how to use ROUGE to evaluate summaries without human reference summaries. ROUGE is a widely used evaluation tool for multi-document summarization and has great advantages in the areas of summarization evaluation. However, manual reference summaries written beforehand by assessors are indispensable for a ROUGE test. There was still no research on ROUGEpsilas abilities of evaluating summaries without manual reference summaries. By considering summary as consensus speaker for the original input information, we discovered and developed ROUGE-C. ROUGE-C applies the ROUGE method alternatively by replacing the reference summaries with source document as well as query-focused information (if any), and therefore it enables a fully manual-independent way of evaluating multi-document summarization. Experiments conducted on the 2001 to 2005 DUC data showed that, with restraint of appropriate condition and some acceptable decreased efficiency, ROUGE-C correlated well with methods that depend on reference summaries, including human judgments.

24 citations

Patent
Benyu Zhang1, Dou Shen1, Hua-Jun Zeng1, Wei-Ying Ma1, Zheng Chen1 
10 Aug 2005
TL;DR: In this article, a method and system for calculating the significance of a sentence within a document is provided, which can then be used to identify significant sentences of a document based on the important words that a sentence contains and select significant sentences as a summary of the document.
Abstract: A method and system for calculating the significance of a sentence within a document is provided. The summarization system calculates the significance of the sentences of a document and selects the most significant sentences as the summary of the document. The summarization system calculates the significance of a sentence based on the "important" words of the document that are contained within the sentence. The summarization system calculates the importance of words of the document using various scoring techniques and then combines the scores to classify a word as important or not important. The summarization system can then be used to identify significant sentences of the document based on the important words that a sentence contains and select significant sentences as a summary of the document.

24 citations

Proceedings Article
01 Dec 2012
TL;DR: Empirical evaluation on the DUC benchmark datasets demonstrates that the overall summary quality can be improved significantly using global optimization with semantically motivated models.
Abstract: This paper applies sentence compression models for the task of query-focused multi-document summarization in order to investigate if sentence compression improves the overall summarization performance. Both compression and summarization are considered as global optimization problems and solved using integer linear programming (ILP). Three different models are built depending on the order in which compression and summarization are performed: 1) ComFirst (where compression is performed first), 2) SumFirst (where important sentence extraction is performed first), and 3) Combined (where compression and extraction are performed jointly via optimizing a combined objective function). Sentence compression models include lexical, syntactic and semantic constraints while summarization models include relevance, redundancy and length constraints. A comprehensive set of query-related and importance-oriented measures are used to define the relevance constraint whereas four alternative redundancy constraints are employed based on different sentence similarity measures using a) cosine similarity, b) syntactic similarity, c) semantic similarity, and d) extended string subsequence kernel (ESSK). Empirical evaluation on the DUC benchmark datasets demonstrates that the overall summary quality can be improved significantly using global optimization with semantically motivated models.

24 citations

Proceedings ArticleDOI
01 Aug 2006
TL;DR: To improve the accuracy of term frequency, MSBGA employs a novel method TFS, which takes word sense into account while calculating term frequency and the experiments show that the strategy is effective and the ROUGE-1 score is only 0.55% lower than the best participant in DUC04.
Abstract: The multi-document summarizer using genetic algorithm-based sentence extraction (MSBGA) regards summarization process as an optimization problem where the optimal summary is chosen among a set of summaries formed by the conjunction of the original articles sentences. To solve the NP hard optimization problem, MSBGA adopts genetic algorithm, which can choose the optimal summary on global aspect. The evaluation function employs four features according to the criteria of a good summary: satisfied length, high coverage, high informativeness and low redundancy. To improve the accuracy of term frequency, MSBGA employs a novel method TFS, which takes word sense into account while calculating term frequency. The experiments on DUC04 data show that our strategy is effective and the ROUGE-1 score is only 0.55% lower than the best participant in DUC04.

24 citations

Proceedings ArticleDOI
01 Dec 2013
TL;DR: Experimental results show the quality of the summary generated by the proposed random forest classifier based multi-document summarization system is good in terms of relevance and novelty.
Abstract: In the recent times, the requirement for generation of multi-document summary has gained a lot of attention among the researchers due to the information explosion in the web media. Mostly, the text summarization technique uses the sentence extraction technique where the salient sentences in the multiple documents are extracted and presented as a summary. In our proposed system, we have developed a random forest classifier based multi-document summarization system that differentiates the sentences in the multiple documents as one belonging to the summary or not belonging to the summary. For this each sentence in the documents is represented by a set of feature scores. Classifier is trained using feature scores and summary information of each sentence in the document set. Feature scores of sentences of multiple documents to be summarized are given as the test document for the classifier. From the output of the classifier, sentences that belonging to the summary class, a required size summary is generated using Maximal Marginal Relevance. The experiments are conducted using the DUC 2002 dataset and its corresponding summary. Experimental results show the quality of the summary generated by this method is good in terms of relevance and novelty.

24 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852