scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal Article
TL;DR: It is concluded from the literature studies that most of the a bstractive summarization methods produces highly coherent, cohesive, information rich and less redundant summary.
Abstract: Text summarization is the process of extracting sal ient information from the source text and to presen t that information to the user in the form of summary. It is very difficult for human beings to manually summarize large documents of text . Automatic abstractive summarization provides the r equired solution but it is a challenging task because it requires de eper analysis of text. In this paper, a survey on a bstractive text summarization methods has been presented. Abstractive summarization methods are classified into t wo categories i.e. structured based approach and seman tic based approach. The main idea behind these methods has been discussed. Besides the main idea, the strengths and weaknesses of each method have al so been highlighted. Some open research issues in abst ractive summarization have been identified and will address for future research. Finally, it is conclud ed from the literature studies that most of the a bstractive summarization methods produces highly coherent, cohesive, information rich and less redundant summary.

67 citations

Proceedings Article
01 Dec 2012
TL;DR: A new method to generate extractive multi-document summaries that uses Integer Linear Programming to jointly maximize the importance of the sentences it includes in the summary and their diversity, without exceeding a maximum allowed summary length.
Abstract: We present a new method to generate extractive multi-document summaries. The method uses Integer Linear Programming to jointly maximize the importance of the sentences it includes in the summary and their diversity, without exceeding a maximum allowed summary length. To obtain an importance score for each sentence, it uses a Support Vector Regression model trained on human-authored summaries, whereas the diversity of the selected sentences is measured as the number of distinct word bigrams in the resulting summary. Experimental results on widely used benchmarks show that our method achieves state of the art results, when compared to competitive extractive summarizers, while being computationally efficient as well.

65 citations

Proceedings ArticleDOI
01 Sep 2001
TL;DR: A global system evaluation shows that for the two more informal genres, the summarization system using dialogue specific components significantly outperforms a baseline using TFIDF term weighting with maximum marginal relevance ranking (MMR).
Abstract: Automatic summarization of open domain spoken dialogues is a new research area. This paper introduces the task, the challenges involved, and presents an approach to obtain automatic extract summaries for multi-party dialogues of four different genres, without any restriction on domain. We address the following issues which are intrinsic to spoken dialogue summarization and typically can be ignored when summarizing written text such as newswire data: (i) detection and removal of speech disfluencies; (ii) detection and insertion of sentence boundaries; (iii) detection and linking of cross-speaker information units (question-answer pairs). A global system evaluation using a corpus of 23 relevance annotated dialogues containing 80 topical segments shows that for the two more informal genres, our summarization system using dialogue specific components significantly outperforms a baseline using TFIDF term weighting with maximum marginal relevance ranking (MMR).

65 citations

Proceedings Article
11 Jul 2009
TL;DR: This paper proposes to use the multi-modality manifold-ranking algorithm for extracting topic-focused summary from multiple documents by considering the within- document sentence relationships and the cross-document sentence relationships as two separate modalities (graphs).
Abstract: Graph-based manifold-ranking methods have been successfully applied to topic-focused multi-document summarization. This paper further proposes to use the multi-modality manifold-ranking algorithm for extracting topic-focused summary from multiple documents by considering the within-document sentence relationships and the cross-document sentence relationships as two separate modalities (graphs). Three different fusion schemes, namely linear form, sequential form and score combination form, are exploited in the algorithm. Experimental results on the DUC benchmark datasets demonstrate the effectiveness of the proposed multi-modality learning algorithms with all the three fusion schemes.

65 citations

Journal ArticleDOI
31 Dec 2010
TL;DR: A new content–based method for the evaluation of text summarization systems without human models which is used to produce system rankings is studied and a variety of divergences among probability distributions are computed.
Abstract: We study a new content–based method for the evaluation of text summarization systems without human models which is used to produce system rankings The research is carried out using a new content–based evaluation framework called Fresa to compute a variety of divergences among probability distributions We apply our comparison framework to various well–established content–based evaluation measures in text summarization such as COVERAGE, RESPONSIVENESS, PYRAMIDS and ROUGE studying their associations in various text summarization tasks including generic multi–document summarization in English and French, focus–based multi–document summarization in English and generic single–document summarization in French and Spanish

65 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852