scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
01 Aug 2006
TL;DR: An automatic text summarization method based on natural language understanding by using RST (Rhetorical Structure Theory) and CIT (Comprehensive Information Theory), an integrated concept of syntactic, semantic and pragmatic information.
Abstract: In this paper, we present an automatic text summarization method based on natural language understanding by using RST (Rhetorical Structure Theory) and CIT (Comprehensive Information Theory). RST is an analytic framework designed to account for text structure at the clause level. CIT is an integrated concept of syntactic, semantic and pragmatic information. The system extracts the rhetorical structure of text and the compound of the rhetorical relations between sentences, and then cuts out less important parts from the extracted structure. Finally it analyses the sentences in the extracted structure to generate text summarization by using CIT.

5 citations

Proceedings ArticleDOI
11 Dec 2011
TL;DR: This paper presents a novel extractive approach based on supervised lazy random walk (Super Lazy) that naturally combines the rich features of sentences with the intrinsic sentence graph structure in a principled way, and thus enjoys the advantages of both the existing supervised and unsupervised approaches.
Abstract: Topic-focused multi-document summarization aims to produce a summary given a specific topic description and a set of related documents. It has become a crucial text processing task in many real applications that can help users consume the massive information. This paper presents a novel extractive approach based on supervised lazy random walk (Super Lazy). This approach naturally combines the rich features of sentences with the intrinsic sentence graph structure in a principled way, and thus enjoys the advantages of both the existing supervised and unsupervised approaches. Moreover, our approach can achieve the three major goals of topic-focused multi-document summarization (i.e. relevance, salience and diversity) simultaneously with a unified ranking process. Experiments on the benchmark dataset TAC2008 and TAC2009 are performed and the ROUGE evaluation results demonstrate that our approach can significantly outperform both the state-of-the-art supervised and unsupervised methods.

5 citations

Proceedings ArticleDOI
12 Feb 2014
TL;DR: A novel margin-based discriminative training (MBDT) algorithm that aims to penalize non-summary sentences in an inverse proportion to their summarization evaluation scores, leading to better discrimination from the desired summary sentences is proposed.
Abstract: The task of extractive speech summarization is to select a set of salient sentences from an original spoken document and concatenate them to form a summary, facilitating users to better browse through and understand the content of the document. In this paper we present an empirical study of leveraging various supervised discriminative methods for effectively ranking important sentences of a spoken document to be summarized. In addition, we propose a novel margin-based discriminative training (MBDT) algorithm that aims to penalize non-summary sentences in an inverse proportion to their summarization evaluation scores, leading to better discrimination from the desired summary sentences. By doing so, the summarization model can be trained with an objective function that is closely coupled with the ultimate evaluation metric of extractive speech summarization. Furthermore, sentences of spoken documents are embodied by a wide range of prosodie, lexical and relevance features, whose utilities are extensively compared and analyzed. Experiments conducted on a Mandarin broadcast news summarization task demonstrate the performance merits of our summarization method when compared to several well-studied state-of-the-art supervised and unsupervised methods.

5 citations

Journal Article
TL;DR: The UMD summaries for the opinion task were especially effec-tive in providing non-redundant informa-tion and more coherent summaries resulted when using the antonymy feature as compared to when not using it.
Abstract: The University of Maryland participatedin three tasks organized by the Text Anal-ysis Conference 2008 (TAC 2008): (1) theupdate task of text summarization; (2) theopinion task of text summarization; and(3) recognizing textual entailment (RTE).At the heart of our summarization sys-tem is Trimmer, which generates multi-ple alternative compressed versions of thesource sentences that act as candidate sen-tences for inclusion in the summary. Forthe first time, we investigated the use ofautomatically generated antonym pairs forboth text summarization and recognizingtextual entailment. The UMD summariesfor the opinion task were especially effec-tive in providing non-redundant informa-tion (rank 3 out of a total 19 submissions).More coherent summaries resulted whenusing the antonymy feature as comparedto when not using it. On the RTE task,even when using only automatically gen-erated antonyms the system performed aswell as when using a manually compiledlist of antonyms. 1 Introduction

5 citations

Proceedings ArticleDOI
Daeyong Kim1, Daehoon Kim1, Siwan Kim1, Minho Jo1, Eenjun Hwang1 
09 Jan 2014
TL;DR: This study developed a SNS-based issue detection and related news summarization scheme and implemented a prototype system, performed various experiments, and presented some of the results.
Abstract: Due to the unprecedented popularity of social network services (SNSs), such as Twitter and Facebook, means that a huge number of user documents are created and shared constantly via SNSs. Given the volume of user documents, browsing documents in a selective manner based on personal interests is a time-consuming and laborious task. Therefore, in the case of Twitter, trend keyword lists are provided for the user's convenience. However, it is still not easy to determine the details based on a few simple keywords. The keywords usually relate to the hot issues at any time so many documents will contain pertinent details, such as news on the Internet. Thus, to provide detailed information about an issue, it is necessary to identify relationships among them. In this study, we developed a SNS-based issue detection and related news summarization scheme. To evaluate the effectiveness of our scheme, we implemented a prototype system and performed various experiments. We present some of the results.

5 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852