Topic
Multi-document summarization
About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.
Papers published on a yearly basis
Papers
More filters
••
31 Oct 2008TL;DR: ROUGE-C applies the ROUGE method alternatively by replacing the reference summaries with source document as well as query-focused information (if any), and therefore it enables a fully manual-independent way of evaluating multi-document summarization.
Abstract: This paper presents how to use ROUGE to evaluate summaries without human reference summaries. ROUGE is a widely used evaluation tool for multi-document summarization and has great advantages in the areas of summarization evaluation. However, manual reference summaries written beforehand by assessors are indispensable for a ROUGE test. There was still no research on ROUGEpsilas abilities of evaluating summaries without manual reference summaries. By considering summary as consensus speaker for the original input information, we discovered and developed ROUGE-C. ROUGE-C applies the ROUGE method alternatively by replacing the reference summaries with source document as well as query-focused information (if any), and therefore it enables a fully manual-independent way of evaluating multi-document summarization. Experiments conducted on the 2001 to 2005 DUC data showed that, with restraint of appropriate condition and some acceptable decreased efficiency, ROUGE-C correlated well with methods that depend on reference summaries, including human judgments.
24 citations
•
10 Aug 2005TL;DR: In this article, a method and system for calculating the significance of a sentence within a document is provided, which can then be used to identify significant sentences of a document based on the important words that a sentence contains and select significant sentences as a summary of the document.
Abstract: A method and system for calculating the significance of a sentence within a document is provided. The summarization system calculates the significance of the sentences of a document and selects the most significant sentences as the summary of the document. The summarization system calculates the significance of a sentence based on the "important" words of the document that are contained within the sentence. The summarization system calculates the importance of words of the document using various scoring techniques and then combines the scores to classify a word as important or not important. The summarization system can then be used to identify significant sentences of the document based on the important words that a sentence contains and select significant sentences as a summary of the document.
24 citations
•
01 Dec 2012TL;DR: Empirical evaluation on the DUC benchmark datasets demonstrates that the overall summary quality can be improved significantly using global optimization with semantically motivated models.
Abstract: This paper applies sentence compression models for the task of query-focused multi-document summarization in order to investigate if sentence compression improves the overall summarization performance. Both compression and summarization are considered as global optimization problems and solved using integer linear programming (ILP). Three different models are built depending on the order in which compression and summarization are performed: 1) ComFirst (where compression is performed first), 2) SumFirst (where important sentence extraction is performed first), and 3) Combined (where compression and extraction are performed jointly via optimizing a combined objective function). Sentence compression models include lexical, syntactic and semantic constraints while summarization models include relevance, redundancy and length constraints. A comprehensive set of query-related and importance-oriented measures are used to define the relevance constraint whereas four alternative redundancy constraints are employed based on different sentence similarity measures using a) cosine similarity, b) syntactic similarity, c) semantic similarity, and d) extended string subsequence kernel (ESSK). Empirical evaluation on the DUC benchmark datasets demonstrates that the overall summary quality can be improved significantly using global optimization with semantically motivated models.
24 citations
••
01 Aug 2006TL;DR: To improve the accuracy of term frequency, MSBGA employs a novel method TFS, which takes word sense into account while calculating term frequency and the experiments show that the strategy is effective and the ROUGE-1 score is only 0.55% lower than the best participant in DUC04.
Abstract: The multi-document summarizer using genetic algorithm-based sentence extraction (MSBGA) regards summarization process as an optimization problem where the optimal summary is chosen among a set of summaries formed by the conjunction of the original articles sentences. To solve the NP hard optimization problem, MSBGA adopts genetic algorithm, which can choose the optimal summary on global aspect. The evaluation function employs four features according to the criteria of a good summary: satisfied length, high coverage, high informativeness and low redundancy. To improve the accuracy of term frequency, MSBGA employs a novel method TFS, which takes word sense into account while calculating term frequency. The experiments on DUC04 data show that our strategy is effective and the ROUGE-1 score is only 0.55% lower than the best participant in DUC04.
24 citations
••
01 Dec 2013TL;DR: Experimental results show the quality of the summary generated by the proposed random forest classifier based multi-document summarization system is good in terms of relevance and novelty.
Abstract: In the recent times, the requirement for generation of multi-document summary has gained a lot of attention among the researchers due to the information explosion in the web media. Mostly, the text summarization technique uses the sentence extraction technique where the salient sentences in the multiple documents are extracted and presented as a summary. In our proposed system, we have developed a random forest classifier based multi-document summarization system that differentiates the sentences in the multiple documents as one belonging to the summary or not belonging to the summary. For this each sentence in the documents is represented by a set of feature scores. Classifier is trained using feature scores and summary information of each sentence in the document set. Feature scores of sentences of multiple documents to be summarized are given as the test document for the classifier. From the output of the classifier, sentences that belonging to the summary class, a required size summary is generated using Maximal Marginal Relevance. The experiments are conducted using the DUC 2002 dataset and its corresponding summary. Experimental results show the quality of the summary generated by this method is good in terms of relevance and novelty.
24 citations