Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Older versions of the ROUGEeval summarization evaluation system were easier to fool

[...]

Jonas Sjöbergh

01 Nov 2007-Information Processing and Management

TL;DR: A method for automatic summarization based on a Markov model of the source text, by a simple greedy word selection strategy, is presented, and summaries with high ROUGE-scores are generated.

...read moreread less

Abstract: We show some limitations of the ROUGE evaluation method for automatic summarization. We present a method for automatic summarization based on a Markov model of the source text. By a simple greedy word selection strategy, summaries with high ROUGE-scores are generated. These summaries would however not be considered good by human readers. The method can be adapted to trick different settings of the ROUGEeval package.

...read moreread less

45 citations

Proceedings Article•

Improving multi-document summarization via text classification

[...]

Ziqiang Cao¹, Wenjie Li¹, Sujian Li², Furu Wei³•Institutions (3)

Hong Kong Polytechnic University¹, Peking University², Microsoft³

12 Feb 2017

TL;DR: This paper proposed a novel summarization system called TCSum, which leverages plentiful text classification data to improve the performance of multi-document summarization, and also utilizes the classification results to produce summaries of different styles.

...read moreread less

Abstract: Developed so far, multi-document summarization has reached its bottleneck due to the lack of sufficient training data and diverse categories of documents. Text classification just makes up for these deficiencies. In this paper, we propose a novel summarization system called TCSum, which leverages plentiful text classification data to improve the performance of multi-document summarization. TCSum projects documents onto distributed representations which act as a bridge between text classification and summarization. It also utilizes the classification results to produce summaries of different styles. Extensive experiments on DUC generic multi-document summarization datasets show that, TCSum can achieve the state-of-the-art performance without using any hand-crafted features and has the capability to catch the variations of summary styles with respect to different text categories.

...read moreread less

45 citations

Proceedings Article•DOI•

Automatic Text Summarization Based on the Global Document Annotation

[...]

Katashi Nagao, Koiti Hasida

10 Aug 1998

TL;DR: The main features are a domain/style-free algorithm and personalization on summarization which reflects readers' interests and preferences and the proposed method is flexible enough to dynamically generate summaries of various sizes.

...read moreread less

Abstract: The GDA (Global Document Annotation) project proposes a tag set which allows machines to automatically infer the underlying semantic/pragmatic structure of documents. Its objectives are to promote development and spread of NLP/AI applications to render GDA-tagged documents versatile and intelligent contents, which should motivate WWW (World Wide Web) users to tag their documents as part of content authoring. This paper discusses automatic text summarization based on GDA. Its main features are a domain/style-free algorithm and personalization on summarization which reflects readers' interests and preferences. In order to calculate the importance score of a text element, the algorithm uses spreading activation on an intradocument network which connects text elements via thematic, rhetorical, and coreferential relations. The proposed method is flexible enough to dynamically generate summaries of various sizes. A summary browser supporting personalization is reported as well.

...read moreread less

44 citations

Proceedings Article•

Selecting text spans for document summaries: heuristics and metrics

[...]

Vibhu Mittal, Mark Kantrowitz, Jade Goldstein, Jaime G. Carbonell¹•Institutions (1)

Carnegie Mellon University¹

18 Jul 1999

TL;DR: An analysis of news-article summaries generated by sentence extraction using a large corpus of extraction-based summaries to characterize the underlying degree of difficulty of summarization at different compression levels on articles in this corpus.

...read moreread less

Abstract: Human-quality text summarization systems are difficult to design, and even more difficult to evaluate, in part because documents can differ along several dimensions, such as length, writing style and lexical usage. Nevertheless, certain cues can often help suggest the selection of sentences for inclusion in a summary. This paper presents an analysis of news-article summaries generated by sentence extraction. Sentences are ranked for potential inclusion in the summary using a weighted combination of linguistic features - derived from an analysis of news-wire summaries. This paper evaluates the relative effectiveness of these features. In order to do so, we discuss the construction of a large corpus of extraction-based summaries, and characterize the underlying degree of difficulty of summarization at different compression levels on articles in this corpus. Results on our feature set are presented after normalization by this degree of difficulty.

...read moreread less

44 citations

Proceedings Article•

Developing Infrastructure for the Evaluation of Single and Multi-document Summarization Systems in a Cross-lingual Environment.

[...]

Horacio Saggion, Dragomir R. Radev, Simone Teufel, Wai Lam, Stephanie Strassel - Show less +1 more

01 May 2002

TL;DR: This work describes the development of Language and Evaluation Resources for the evaluation of summaries in English and Chinese and focuses on the resources developed that are made available for the research community.

...read moreread less

Abstract: We describe our work on the development of Language and Evaluation Resources for the evaluation of summaries in English and Chinese. The language resources include a parallel corpus of English and Chinese texts which are translations of each other, a set of queries in both languages, clusters of documents relevants to each query, sentence relevance measures for each sentence in the document clusters, and manual multi-document summaries at different compression rates. The evaluation resources consist of metrics for measuring the content of automatic summaries against reference summaries. The framework can be used in the evaluation of extractive, non-extractive, single and multi-document summarization. We focus on the resources developed that are made available for the research community.

...read moreread less

44 citations

Collapse

Network Information

Performance

Metrics

2,507

Papers

81,726

Citations

No. of papers in the topic in previous years
Year	Papers
2023	74
2022	160
2021	52
2020	61
2019	47
2018	52

Multi-document summarization

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics