scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
Liu Na1, Lu Ying1, Tang Xiao-Jun1, Wang Hai-wen1, Xiao Peng1, Li Ming-Xia1 
28 May 2016
TL;DR: This work proposes a generic multi-document summarization algorithm based on significance sentences which achieved better performance compared to the other state-of-the-art algorithms on DUC2002corpus.
Abstract: Latent Dirichlet Allocation (LDA) has been used to generate text corpora topics recently. The basic idea of most LDA is that documents are represented as random mixtures over latent topics, each topic is characterized by a distribution over words. However, the main task of multi-document summarization is sentences selection. For generic multi-document summarization, we propose a multi-document summarization algorithm based on significance sentences. Firstly, our method proposes a sentence_LDA model which represents topics as a mixture of sentences not words. Secondly, our proposed method introduces three different criteria to determine the significance of sentences. Multi-document summarization is consists of significance sentences. The experiments showed that the proposed algorithm achieved better performance compared the other state-of-the-art algorithms on DUC2002corpus.

6 citations

Proceedings ArticleDOI
17 Nov 2013
TL;DR: In this article, a new method for automatic summarization of online forums by using topic models and content/metadata sensitive clustering was developed, which is especially useful in the case of online user forums that contain a large number of posts spread out across several threads.
Abstract: The advent of the Internet and improvements in data sharing and storage, have resulted in an explosion of textual data. But, complete assimilation of such massive amounts of data in its raw form is a daunting task. Automated text mining methods such as text summarization present the user with a condensed version of data containing only key information. This is especially useful in the case of online user forums that contain a large number of posts spread out across several threads. Document summarization methods have been extensively studied and several methods have been developed in the recent past. This paper aims at developing a new method for automatic summarization of online forums by using topic models and content/metadata sensitive clustering.

6 citations

01 Sep 2015
TL;DR: Results show that RST may contribute to produce more informative summaries in both rulebased and statistical methods.
Abstract: Rhetorical Structure Theory (RST) has been applied in different areas, such as single document summarization, with promising results. In this paper, we discuss how Multi-document Summarization may benefit from RST in both rulebased and statistical methods. Results show that RST may contribute to produce more informative summaries.

6 citations

Journal ArticleDOI
TL;DR: The variable-based framework and the summarization process are described, and the construction of the taxonomy for supporting the summarizing process is reported, providing an example to show how to use the constructed taxonomy to identify important concepts and integrate the concepts extracted from different abstracts.
Abstract: This paper reports part of a study to develop a method for automatic multi-document summarization. The current focus is on dissertation abstracts in the field of sociology. The summarization method uses macro-level and micro-level discourse structure to identify important information that can be extracted from dissertation abstracts, and then uses a variable-based framework to integrate and organize extracted information across dissertation abstracts. This framework focuses more on research concepts and their research relationships found in sociology dissertation abstracts and has a hierarchical structure. A taxonomy is constructed to support the summarization process in two ways: (1) helping to identify important concepts and relations expressed in the text, and (2) providing a structure for linking similar concepts in different abstracts. This paper describes the variable-based framework and the summarization process, and then reports the construction of the taxonomy for supporting the summarization process. An example is provided to show how to use the constructed taxonomy to identify important concepts and integrate the concepts extracted from different abstracts.

6 citations

Journal ArticleDOI
TL;DR: This work proposes an approach that expands the benefit of rhetorical relations to address redundancy problem for cluster-based text summarization of multiple documents by exploiting rhetorical relations exist between sentences to group similar sentences into multiple clusters to identify themes of common information.
Abstract: Many of previous research have proven that the usage of rhetorical relations is capable to enhance many applications such as text summarization, question answering and natural language generation. This work proposes an approach that expands the benefit of rhetorical relations to address redundancy problem for cluster-based text summarization of multiple documents. We exploited rhetorical relations exist between sentences to group similar sentences into multiple clusters to identify themes of common information. The candidate summary were extracted from these clusters. Then, cluster-based text summarization is performed using Conditional Markov Random Walk Model to measure the saliency scores of the candidate summary. We evaluated our method by measuring the cohesion and separation of the clusters constructed by exploiting rhetorical relations and ROUGE score of generated summaries. The experimental result shows that our method performed well which shows promising potential of applying rhetorical relation in text clustering which benefits text summarization of multiple documents.

6 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852