Topic
Multi-document summarization
About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.
Papers published on a yearly basis
Papers
More filters
•
TL;DR: It is concluded from the literature studies that most of the a bstractive summarization methods produces highly coherent, cohesive, information rich and less redundant summary.
Abstract: Text summarization is the process of extracting sal ient information from the source text and to presen t that information to the user in the form of summary. It is very difficult for human beings to manually summarize large documents of text . Automatic abstractive summarization provides the r equired solution but it is a challenging task because it requires de eper analysis of text. In this paper, a survey on a bstractive text summarization methods has been presented. Abstractive summarization methods are classified into t wo categories i.e. structured based approach and seman tic based approach. The main idea behind these methods has been discussed. Besides the main idea, the strengths and weaknesses of each method have al so been highlighted. Some open research issues in abst ractive summarization have been identified and will address for future research. Finally, it is conclud ed from the literature studies that most of the a bstractive summarization methods produces highly coherent, cohesive, information rich and less redundant summary.
67 citations
•
01 Dec 2012TL;DR: A new method to generate extractive multi-document summaries that uses Integer Linear Programming to jointly maximize the importance of the sentences it includes in the summary and their diversity, without exceeding a maximum allowed summary length.
Abstract: We present a new method to generate extractive multi-document summaries. The method uses Integer Linear Programming to jointly maximize the importance of the sentences it includes in the summary and their diversity, without exceeding a maximum allowed summary length. To obtain an importance score for each sentence, it uses a Support Vector Regression model trained on human-authored summaries, whereas the diversity of the selected sentences is measured as the number of distinct word bigrams in the resulting summary. Experimental results on widely used benchmarks show that our method achieves state of the art results, when compared to competitive extractive summarizers, while being computationally efficient as well.
65 citations
••
01 Sep 2001TL;DR: A global system evaluation shows that for the two more informal genres, the summarization system using dialogue specific components significantly outperforms a baseline using TFIDF term weighting with maximum marginal relevance ranking (MMR).
Abstract: Automatic summarization of open domain spoken dialogues is a new research area. This paper introduces the task, the challenges involved, and presents an approach to obtain automatic extract summaries for multi-party dialogues of four different genres, without any restriction on domain. We address the following issues which are intrinsic to spoken dialogue summarization and typically can be ignored when summarizing written text such as newswire data: (i) detection and removal of speech disfluencies; (ii) detection and insertion of sentence boundaries; (iii) detection and linking of cross-speaker information units (question-answer pairs). A global system evaluation using a corpus of 23 relevance annotated dialogues containing 80 topical segments shows that for the two more informal genres, our summarization system using dialogue specific components significantly outperforms a baseline using TFIDF term weighting with maximum marginal relevance ranking (MMR).
65 citations
•
11 Jul 2009TL;DR: This paper proposes to use the multi-modality manifold-ranking algorithm for extracting topic-focused summary from multiple documents by considering the within- document sentence relationships and the cross-document sentence relationships as two separate modalities (graphs).
Abstract: Graph-based manifold-ranking methods have been successfully applied to topic-focused multi-document summarization. This paper further proposes to use the multi-modality manifold-ranking algorithm for extracting topic-focused summary from multiple documents by considering the within-document sentence relationships and the cross-document sentence relationships as two separate modalities (graphs). Three different fusion schemes, namely linear form, sequential form and score combination form, are exploited in the algorithm. Experimental results on the DUC benchmark datasets demonstrate the effectiveness of the proposed multi-modality learning algorithms with all the three fusion schemes.
65 citations
••
31 Dec 2010TL;DR: A new content–based method for the evaluation of text summarization systems without human models which is used to produce system rankings is studied and a variety of divergences among probability distributions are computed.
Abstract: We study a new content–based method for the evaluation of text summarization systems without human models which is used to produce system rankings The research is carried out using a new content–based evaluation framework called Fresa to compute a variety of divergences among probability distributions We apply our comparison framework to various well–established content–based evaluation measures in text summarization such as COVERAGE, RESPONSIVENESS, PYRAMIDS and ROUGE studying their associations in various text summarization tasks including generic multi–document summarization in English and French, focus–based multi–document summarization in English and generic single–document summarization in French and Spanish
65 citations