Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

ROUGE-C: A fully automated evaluation method for multi-document summarization

[...]

Tingting He¹, Jinguang Chen¹, Liang Ma¹, Zhuoming Gui¹, Fang Li¹, Wei Shao¹, Qian Wang¹ - Show less +3 more•Institutions (1)

Central China Normal University¹

31 Oct 2008

TL;DR: ROUGE-C applies the ROUGE method alternatively by replacing the reference summaries with source document as well as query-focused information (if any), and therefore it enables a fully manual-independent way of evaluating multi-document summarization.

...read moreread less

Abstract: This paper presents how to use ROUGE to evaluate summaries without human reference summaries. ROUGE is a widely used evaluation tool for multi-document summarization and has great advantages in the areas of summarization evaluation. However, manual reference summaries written beforehand by assessors are indispensable for a ROUGE test. There was still no research on ROUGEpsilas abilities of evaluating summaries without manual reference summaries. By considering summary as consensus speaker for the original input information, we discovered and developed ROUGE-C. ROUGE-C applies the ROUGE method alternatively by replacing the reference summaries with source document as well as query-focused information (if any), and therefore it enables a fully manual-independent way of evaluating multi-document summarization. Experiments conducted on the 2001 to 2005 DUC data showed that, with restraint of appropriate condition and some acceptable decreased efficiency, ROUGE-C correlated well with methods that depend on reference summaries, including human judgments.

...read moreread less

24 citations

Patent•

Method and system for summarizing a document

[...]

Benyu Zhang¹, Dou Shen¹, Hua-Jun Zeng¹, Wei-Ying Ma¹, Zheng Chen¹ - Show less +1 more•Institutions (1)

Microsoft¹

10 Aug 2005

TL;DR: In this article, a method and system for calculating the significance of a sentence within a document is provided, which can then be used to identify significant sentences of a document based on the important words that a sentence contains and select significant sentences as a summary of the document.

...read moreread less

Abstract: A method and system for calculating the significance of a sentence within a document is provided. The summarization system calculates the significance of the sentences of a document and selects the most significant sentences as the summary of the document. The summarization system calculates the significance of a sentence based on the "important" words of the document that are contained within the sentence. The summarization system calculates the importance of words of the document using various scoring techniques and then combines the scores to classify a word as important or not important. The summarization system can then be used to identify significant sentences of the document based on the important words that a sentence contains and select significant sentences as a summary of the document.

...read moreread less

24 citations

Proceedings Article•

On the Effectiveness of using Sentence Compression Models for Query-Focused Multi-Document Summarization

[...]

Yllias Chali¹, Sadid A. Hasan¹•Institutions (1)

University of Lethbridge¹

01 Dec 2012

TL;DR: Empirical evaluation on the DUC benchmark datasets demonstrates that the overall summary quality can be improved significantly using global optimization with semantically motivated models.

...read moreread less

Abstract: This paper applies sentence compression models for the task of query-focused multi-document summarization in order to investigate if sentence compression improves the overall summarization performance. Both compression and summarization are considered as global optimization problems and solved using integer linear programming (ILP). Three different models are built depending on the order in which compression and summarization are performed: 1) ComFirst (where compression is performed first), 2) SumFirst (where important sentence extraction is performed first), and 3) Combined (where compression and extraction are performed jointly via optimizing a combined objective function). Sentence compression models include lexical, syntactic and semantic constraints while summarization models include relevance, redundancy and length constraints. A comprehensive set of query-related and importance-oriented measures are used to define the relevance constraint whereas four alternative redundancy constraints are employed based on different sentence similarity measures using a) cosine similarity, b) syntactic similarity, c) semantic similarity, and d) extended string subsequence kernel (ESSK). Empirical evaluation on the DUC benchmark datasets demonstrates that the overall summary quality can be improved significantly using global optimization with semantically motivated models.

...read moreread less

24 citations

Proceedings Article•DOI•

MSBGA: A Multi-Document Summarization System Based on Genetic Algorithm

[...]

Yanxiang He¹, Dexi Liu¹, Donghong Ji², Hua Yang¹, Chong Teng¹ - Show less +1 more•Institutions (2)

Wuhan University¹, Institute for Infocomm Research Singapore²

01 Aug 2006

TL;DR: To improve the accuracy of term frequency, MSBGA employs a novel method TFS, which takes word sense into account while calculating term frequency and the experiments show that the strategy is effective and the ROUGE-1 score is only 0.55% lower than the best participant in DUC04.

...read moreread less

Abstract: The multi-document summarizer using genetic algorithm-based sentence extraction (MSBGA) regards summarization process as an optimization problem where the optimal summary is chosen among a set of summaries formed by the conjunction of the original articles sentences. To solve the NP hard optimization problem, MSBGA adopts genetic algorithm, which can choose the optimal summary on global aspect. The evaluation function employs four features according to the criteria of a good summary: satisfied length, high coverage, high informativeness and low redundancy. To improve the accuracy of term frequency, MSBGA employs a novel method TFS, which takes word sense into account while calculating term frequency. The experiments on DUC04 data show that our strategy is effective and the ROUGE-1 score is only 0.55% lower than the best participant in DUC04.

...read moreread less

24 citations

Proceedings Article•DOI•

Random forest classifier based multi-document summarization system

[...]

Ansamma John¹, M. Wilscy¹•Institutions (1)

University of Kerala¹

01 Dec 2013

TL;DR: Experimental results show the quality of the summary generated by the proposed random forest classifier based multi-document summarization system is good in terms of relevance and novelty.

...read moreread less

Abstract: In the recent times, the requirement for generation of multi-document summary has gained a lot of attention among the researchers due to the information explosion in the web media. Mostly, the text summarization technique uses the sentence extraction technique where the salient sentences in the multiple documents are extracted and presented as a summary. In our proposed system, we have developed a random forest classifier based multi-document summarization system that differentiates the sentences in the multiple documents as one belonging to the summary or not belonging to the summary. For this each sentence in the documents is represented by a set of feature scores. Classifier is trained using feature scores and summary information of each sentence in the document set. Feature scores of sentences of multiple documents to be summarized are given as the test document for the classifier. From the output of the classifier, sentences that belonging to the summary class, a required size summary is generated using Maximal Marginal Relevance. The experiments are conducted using the DUC 2002 dataset and its corresponding summary. Experimental results show the quality of the summary generated by this method is good in terms of relevance and novelty.

...read moreread less

24 citations

Collapse

Network Information

Performance

Metrics

2,507

Papers

81,726

Citations

No. of papers in the topic in previous years
Year	Papers
2023	74
2022	160
2021	52
2020	61
2019	47
2018	52

Multi-document summarization

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics