scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A method for automatic summarization based on a Markov model of the source text, by a simple greedy word selection strategy, is presented, and summaries with high ROUGE-scores are generated.
Abstract: We show some limitations of the ROUGE evaluation method for automatic summarization. We present a method for automatic summarization based on a Markov model of the source text. By a simple greedy word selection strategy, summaries with high ROUGE-scores are generated. These summaries would however not be considered good by human readers. The method can be adapted to trick different settings of the ROUGEeval package.

45 citations

Proceedings Article
12 Feb 2017
TL;DR: This paper proposed a novel summarization system called TCSum, which leverages plentiful text classification data to improve the performance of multi-document summarization, and also utilizes the classification results to produce summaries of different styles.
Abstract: Developed so far, multi-document summarization has reached its bottleneck due to the lack of sufficient training data and diverse categories of documents. Text classification just makes up for these deficiencies. In this paper, we propose a novel summarization system called TCSum, which leverages plentiful text classification data to improve the performance of multi-document summarization. TCSum projects documents onto distributed representations which act as a bridge between text classification and summarization. It also utilizes the classification results to produce summaries of different styles. Extensive experiments on DUC generic multi-document summarization datasets show that, TCSum can achieve the state-of-the-art performance without using any hand-crafted features and has the capability to catch the variations of summary styles with respect to different text categories.

45 citations

Proceedings ArticleDOI
10 Aug 1998
TL;DR: The main features are a domain/style-free algorithm and personalization on summarization which reflects readers' interests and preferences and the proposed method is flexible enough to dynamically generate summaries of various sizes.
Abstract: The GDA (Global Document Annotation) project proposes a tag set which allows machines to automatically infer the underlying semantic/pragmatic structure of documents. Its objectives are to promote development and spread of NLP/AI applications to render GDA-tagged documents versatile and intelligent contents, which should motivate WWW (World Wide Web) users to tag their documents as part of content authoring. This paper discusses automatic text summarization based on GDA. Its main features are a domain/style-free algorithm and personalization on summarization which reflects readers' interests and preferences. In order to calculate the importance score of a text element, the algorithm uses spreading activation on an intradocument network which connects text elements via thematic, rhetorical, and coreferential relations. The proposed method is flexible enough to dynamically generate summaries of various sizes. A summary browser supporting personalization is reported as well.

44 citations

Proceedings Article
18 Jul 1999
TL;DR: An analysis of news-article summaries generated by sentence extraction using a large corpus of extraction-based summaries to characterize the underlying degree of difficulty of summarization at different compression levels on articles in this corpus.
Abstract: Human-quality text summarization systems are difficult to design, and even more difficult to evaluate, in part because documents can differ along several dimensions, such as length, writing style and lexical usage. Nevertheless, certain cues can often help suggest the selection of sentences for inclusion in a summary. This paper presents an analysis of news-article summaries generated by sentence extraction. Sentences are ranked for potential inclusion in the summary using a weighted combination of linguistic features - derived from an analysis of news-wire summaries. This paper evaluates the relative effectiveness of these features. In order to do so, we discuss the construction of a large corpus of extraction-based summaries, and characterize the underlying degree of difficulty of summarization at different compression levels on articles in this corpus. Results on our feature set are presented after normalization by this degree of difficulty.

44 citations

Proceedings Article
01 May 2002
TL;DR: This work describes the development of Language and Evaluation Resources for the evaluation of summaries in English and Chinese and focuses on the resources developed that are made available for the research community.
Abstract: We describe our work on the development of Language and Evaluation Resources for the evaluation of summaries in English and Chinese. The language resources include a parallel corpus of English and Chinese texts which are translations of each other, a set of queries in both languages, clusters of documents relevants to each query, sentence relevance measures for each sentence in the document clusters, and manual multi-document summaries at different compression rates. The evaluation resources consist of metrics for measuring the content of automatic summaries against reference summaries. The framework can be used in the evaluation of extractive, non-extractive, single and multi-document summarization. We focus on the resources developed that are made available for the research community.

44 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852