scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The design of a multi-document summarizer that uses Katz's K-mixture model for term distribution is discussed, which outperforms other auto-summarizers at different extraction levels of summarization with respect to the ideal summary.
Abstract: Data availability is not a major issue at present times in view of the widespread use of Internet; however, information and knowledge availability are the issues. Due to data overload and time-critical nature of information need, automatic summarization of documents plays a significant role in information retrieval and text data mining. This paper discusses the design of a multi-document summarizer that uses Katz's K-mixture model for term distribution. The model helps in ranking the sentences by a modified term weight assignment. Highly ranked sentences are selected for the final summary. The sentences that are repetitive in nature are eliminated, and a tiled summary is produced. Our method avoids redundancy and produces a readable (even browsable) summary, which we refer to as an event-specific tiled summary. The system has been evaluated against the frequently occurring sentences in the summaries generated by a set of human subjects. Our system outperforms other auto-summarizers at different extraction levels of summarization with respect to the ideal summary, and is close to the ideal summary at 40% extraction level.

14 citations

Proceedings ArticleDOI
14 Jul 2014
TL;DR: A generative model for multi-document summarization, namely Titled-LDA that simultaneously models the content of documents and the titles of document is proposed and achieved better performance compared to the other state-of-the-art algorithms on DUC2002 corpus.
Abstract: Based on LDA(Latent Dirichlet Allocation) topic model, a generative model for multi-document summarization, namely Titled-LDA that simultaneously models the content of documents and the titles of document is proposed. This generative model represents each document with a mixture of topics, and extends these approaches to title modeling by allowing the mixture weights for topics to be determined by the titles of the document. In the mixing stage, the algorithm can learn the weight in an adaptive asymmetric learning way based on two kinds of information entropies. In this way, the final model incorporated the title information and the content information appropriately, which helped the performance of summarization. The experiments showed that the proposed algorithm achieved better performance compared the other state-of-the-art algorithms on DUC2002 corpus.

14 citations

Book ChapterDOI
21 Jun 2010
TL;DR: This study identifies the key features of natural literature reviews through a macro-level and clause-level discourse analysis and identifies human information selection strategies by mapping referenced information to source documents.
Abstract: This paper gives an overview of a project to generate literature reviews from a set of research papers, based on techniques drawn from human summarization behavior. For this study, we identify the key features of natural literature reviews through a macro-level and clause-level discourse analysis; we also identify human information selection strategies by mapping referenced information to source documents. Our preliminary results of discourse analysis have helped us characterize literature review writing styles based on their document structure and rhetorical structure. These findings will be exploited to design templates for automatic content generation.

14 citations

Proceedings ArticleDOI
15 Jul 2013
TL;DR: This work explores a novel sentence modeling approach built on top of the notion of relevance, where the relationship between a candidate summary sentence and the spoken document to be summarized is discovered through various granularities of context for relevance modeling.
Abstract: Extractive speech summarization, aiming to select an indicative set of sentences from a spoken document so as to concisely represent the most important aspects of the document, has emerged as an attractive area of research and experimentation. A recent school of thought is to employ the language modeling (LM) framework along with the Kullback-Leibler (KL) divergence measure for important sentence selection, which has shown preliminary promise for extractive speech summarization. Our work in this paper continues this general line of research in two significant aspects. First, we explore a novel sentence modeling approach built on top of the notion of relevance, where the relationship between a candidate summary sentence and the spoken document to be summarized is discovered through various granularities of context for relevance modeling. Second, not only lexical but also topical cues inherent in the spoken document are exploited for sentence modeling. Experiments on broadcast news summarization seem to demonstrate the performance merits of our methods when compared to several existing methods.

14 citations

Journal ArticleDOI
TL;DR: This paper develops and experiments with two graph-based approaches that combine four weighting schemes and two ranking methods in one graph framework, and proposes taking the average of their results using the arithmetic mean and the harmonic mean to improve the results of generic, extractive, and multi-document summarization.
Abstract: Automatic text summarization aims to reduce the document text size by building a brief and voluble summary that has the most important ideas in that document. Through the years, many approaches were proposed to improve the automatic text summarization results; the graph-based method for sentence ranking is considered one of the most important approaches in this field. However, most of these approaches rely on only one weighting scheme and one ranking method, which may cause some limitations in their systems. In this paper, we focus on combining multiple graph-based approaches to improve the results of generic, extractive, and multi-document summarization. This improvement results in more accurate summaries, which could be used as a significant part of some natural language applications. We develop and experiment with two graph-based approaches that combine four weighting schemes and two ranking methods in one graph framework. To combine these methods, we propose taking the average of their results using the arithmetic mean and the harmonic mean. We evaluate our proposed approaches using DUC 2003 & DUC 2004 dataset and measure the performance using ROUGE evaluation toolkit. Our experiments demonstrate that using the harmonic mean in combining weighting schemes outperform the arithmetic mean and show a good improvement over the baselines and many state-of-the-art systems.

14 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852