scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This article proposes using a small number of nearest neighbor documents to improve document summarization and keyphrase extraction for the specified document, under the assumption that the neighbor documents could provide additional knowledge and more clues.
Abstract: Document summarization and keyphrase extraction are two related tasks in the IR and NLP fields, and both of them aim at extracting condensed representations from a single text document. Existing methods for single document summarization and keyphrase extraction usually make use of only the information contained in the specified document. This article proposes using a small number of nearest neighbor documents to improve document summarization and keyphrase extraction for the specified document, under the assumption that the neighbor documents could provide additional knowledge and more clues. The specified document is expanded to a small document set by adding a few neighbor documents close to the document, and the graph-based ranking algorithm is then applied on the expanded document set to make use of both the local information in the specified document and the global information in the neighbor documents. Experimental results on the Document Understanding Conference (DUC) benchmark datasets demonstrate the effectiveness and robustness of our proposed approaches. The cross-document sentence relationships in the expanded document set are validated to be beneficial to single document summarization, and the word cooccurrence relationships in the neighbor documents are validated to be very helpful to single document keyphrase extraction.

91 citations

Patent
04 Mar 2005
TL;DR: In this paper, a system for generating a summary of a plurality of documents and presenting the summary information to a user is provided which includes a computer readable document collection containing a plurality related documents stored in electronic form.
Abstract: A system for generating a summary of a plurality of documents and presenting the summary information to a user is provided which includes a computer readable document collection containing a plurality of related documents stored in electronic form. Documents can be pre-processed to group documents into document clusters. The document clusters can also be assigned to predetermined document categories for presentation to a user. A number of multiple document summarization engines are provided which generate summaries for specific classes of multiple documents clusters. A summarizer router is employed to determining a relationship of the documents in a cluster and select one of the document summarization engines for use in generating a summary of the cluster. A single event engine is provided to generate summaries of documents which are closely related temporally and to a specific event. A dissimilarity engine for multiple document summary generation is provided which generates summaries of document clusters having documents with varying degrees of relatedness. A user interface is provided to display categories, cluster titles, summaries, related images.

91 citations

Journal ArticleDOI
TL;DR: It is found that CMS represents a better approach to email thread summarization, and that current sentence compression techniques do not improve summarization performance in this genre.
Abstract: We present two approaches to email thread summarization: collective message summarization (CMS) applies a multi-document summarization approach, while individual message summarization (IMS) treats the problem as a sequence of single-document summarization tasks. Both approaches are implemented in our general framework driven by sentence compression. Instead of a purely extractive approach, we employ linguistic and statistical methods to generate multiple compressions, and then select from those candidates to produce a final summary. We demonstrate these ideas on the Enron email collection - a very challenging corpus because of the highly technical language. Experimental results point to two findings: that CMS represents a better approach to email thread summarization, and that current sentence compression techniques do not improve summarization performance in this genre.

91 citations

Proceedings ArticleDOI
Xiaojun Wan1
25 Oct 2008
TL;DR: A document-based graph model is proposed to incorporate the document-level information and the sentence-to-document relationship into the graph-based ranking process and the results show the robustness of the proposed model.
Abstract: The graph-based ranking algorithm has been recently exploited for multi-document summarization by making only use of the sentence-to-sentence relationships in the documents, under the assumption that all the sentences are indistinguishable. However, given a document set to be summarized, different documents are usually not equally important, and moreover, different sentences in a specific document are usually differently important. This paper aims to explore document impact on summarization performance. We propose a document-based graph model to incorporate the document-level information and the sentence-to-document relationship into the graph-based ranking process. Various methods are employed to evaluate the two factors. Experimental results on the DUC2001 and DUC2002 datasets demonstrate that the good effectiveness of the proposed model. Moreover, the results show the robustness of the proposed model.

91 citations

01 Jan 2011
TL;DR: CSTNews, a discourse-annotated corpus for fostering research on single and multi-document summarization, is introduced within the context of the SUCINTO Project, which aims at investigating summarization strategies and developing tools and resources for that purpose.
Abstract: Summary. This paper introduces CSTNews, a discourse-annotated corpus for fostering research on single and multi-document summarization. The corpus comprises 50 clusters of news texts in Brazilian Portuguese and some related material, which includes a set of single-document manual summaries and a set of multi-document manual and automatic summaries. The texts are annotated in different ways for discourse organization, following both the Rhetorical Structure Theory and Cross-document Structure Theory. The corpus is a result delivered within the context of the SUCINTO Project, which aims at investigating summarization strategies and developing tools and resources for that purpose. The design of the discourse annotation tasks and the decisions that have been taken during the annotation process are detailed in this paper.

90 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852