scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
25 Oct 2008
TL;DR: The hypothesis that encyclopedic knowledge is a useful addition to a summarization system is confirmed by the system implemented, which ranks high compared to the participating systems in the DUC competitions.
Abstract: Information of interest to users is often distributed over a set of documents. Users can specify their request for information as a query/topic -- a set of one or more sentences or questions. Producing a good summary of the relevant information relies on understanding the query and linking it with the associated set of documents. To "understand" the query we expand it using encyclopedic knowledge in Wikipedia. The expanded query is linked with its associated documents through spreading activation in a graph that represents words and their grammatical connections in these documents. The topic expanded words and activated nodes in the graph are used to produce an extractive summary. The method proposed is tested on the DUC summarization data. The system implemented ranks high compared to the participating systems in the DUC competitions, confirming our hypothesis that encyclopedic knowledge is a useful addition to a summarization system.

99 citations

Proceedings ArticleDOI
18 Mar 2016
TL;DR: This survey portrays that most of the abstractive summarization methods produces highly cohesive, coherent, less redundant summary and information rich.
Abstract: Text Summarization is the task of extracting salient information from the original text document. In this process, the extracted information is generated as a condensed report and presented as a concise summary to the user. It is very difficult for humans to understand and interpret the content of the text. In this paper, an exhaustive survey on abstractive text summarization methods has been presented. The two broad abstractive summarization methods are structured based approach and semantic based approach. This paper collectively summarizes and deciphers the various methodologies, challenges and issues of abstractive summarization. State of art benchmark datasets and their properties are being explored. This survey portrays that most of the abstractive summarization methods produces highly cohesive, coherent, less redundant summary and information rich.

99 citations

Journal ArticleDOI
TL;DR: Experimental results provide strong evidence that the proposed optimization-based approach is a viable method for document summarization and an improved differential evolution algorithm is created to solve the optimization problem.
Abstract: This paper proposes an optimization-based model for generic document summarization. The model generates a summary by extracting salient sentences from documents. This approach uses the sentence-to-document collection, the summary-to-document collection and the sentence-to-sentence relations to select salient sentences from given document collection and reduce redundancy in the summary. To solve the optimization problem has been created an improved differential evolution algorithm. The algorithm can adjust crossover rate adaptively according to the fitness of individuals. We implemented the proposed model on multi-document summarization task. Experiments have been performed on DUC2002 and DUC2004 data sets. The experimental results provide strong evidence that the proposed optimization-based approach is a viable method for document summarization.

98 citations

Proceedings Article
11 Jul 2010
TL;DR: This paper proposes to consider the translation quality of each sentence in the English-to-Chinese cross-language summarization process, and suggests that the English sentences with high translation quality and high informative-ness are selected and translated to form the Chinese summary.
Abstract: Cross-language document summarization is a task of producing a summary in one language for a document set in a different language. Existing methods simply use machine translation for document translation or summary translation. However, current machine translation services are far from satisfactory, which results in that the quality of the cross-language summary is usually very poor, both in readability and content. In this paper, we propose to consider the translation quality of each sentence in the English-to-Chinese cross-language summarization process. First, the translation quality of each English sentence in the document set is predicted with the SVM regression method, and then the quality score of each sentence is incorporated into the summarization process. Finally, the English sentences with high translation quality and high informative-ness are selected and translated to form the Chinese summary. Experimental results demonstrate the effectiveness and usefulness of the proposed approach.

98 citations

Journal ArticleDOI
TL;DR: A novel word-sentence co-ranking model named CoRank is proposed, which combines the word- Sentence relationship with the graph-based unsupervised ranking model and can serve as an important building-block of the intelligent summarization systems.
Abstract: A principled word-sentence co-ranking model called CoRank is proposed.The convergence of CoRank with matrix notation is proved.A redundancy elimination technique is presented to further improve the performance of CoRank. Extractive summarization aims to automatically produce a short summary of a document by concatenating several sentences taken exactly from the original material. Due to its simplicity and easy-to-use, the extractive summarization methods have become the dominant paradigm in the realm of text summarization. In this paper, we address the sentence scoring technique, a key step of the extractive summarization. Specifically, we propose a novel word-sentence co-ranking model named CoRank, which combines the word-sentence relationship with the graph-based unsupervised ranking model. CoRank is quite concise in the view of matrix operations, and its convergence can be theoretically guaranteed. Moreover, a redundancy elimination technique is presented as a supplement to CoRank, so that the quality of automatic summarization can be further enhanced. As a result, CoRank can serve as an important building-block of the intelligent summarization systems. Experimental results on two real-life datasets including nearly 600 documents demonstrate the effectiveness of the proposed methods.

96 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852