scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings Article
01 Apr 2006
TL;DR: A framework for summarizing a corpus of evaluative documents about a single entity by a natural language summary is presented and it is indicated that forevaluative text abstraction tends to be more effective than extraction, particularly when the corpus is controversial.
Abstract: In many decision‐making scenarios, people can benefit from knowing what other people's opinions are. As more and more evaluative documents are posted on the Web, summarizing these useful resources becomes a critical task for many organizations and individuals. This paper presents a framework for summarizing a corpus of evaluative documents about a single entity by a natural language summary. We propose two summarizers: an extractive summarizer and an abstractive one. As an additional contribution, we show how our abstractive summarizer can be modified to generate summaries tailored to a model of the user preferences that is solidly grounded in decision theory and can be effectively elicited from users. We have tested our framework in three user studies. In the first one, we compared the two summarizers. They performed equally well relative to each other quantitatively, while significantly outperforming a baseline standard approach to multidocument summarization. Trends in the results as well as qualitative comments from participants suggest that the summarizers have different strengths and weaknesses. After this initial user study, we realized that the diversity of opinions expressed in the corpus (i.e., its controversiality) might play a critical role in comparing abstraction versus extraction. To clearly pinpoint the role of controversiality, we ran a second user study in which we controlled for the degree of controversiality of the corpora that were summarized for the participants. The outcome of this study indicates that for evaluative text abstraction tends to be more effective than extraction, particularly when the corpus is controversial. In the third user study we assessed the effectiveness of our user tailoring strategy. The results of this experiment confirm that user tailored summaries are more informative than untailored ones.

176 citations

Proceedings ArticleDOI
04 Aug 2009
TL;DR: A new Bayesian sentence-based topic model for summarization by making use of both the term-document and term-sentence associations is proposed and an efficient variational Bayesian algorithm is derived for model parameter estimation.
Abstract: Most of the existing multi-document summarization methods decompose the documents into sentences and work directly in the sentence space using a term-sentence matrix. However, the knowledge on the document side, i.e. the topics embedded in the documents, can help the context understanding and guide the sentence selection in the summarization procedure. In this paper, we propose a new Bayesian sentence-based topic model for summarization by making use of both the term-document and term-sentence associations. An efficient variational Bayesian algorithm is derived for model parameter estimation. Experimental results on benchmark data sets show the effectiveness of the proposed model for the multi-document summarization task.

175 citations

Proceedings ArticleDOI
15 Aug 2005
TL;DR: This paper presents eight different methods of generating MDS and evaluates each of these methods on a large set of topics used in past DUC workshops, showing a significant improvement in the quality of summaries based on topic themes over MDS methods that use other alternative topic representations.
Abstract: The problem of using topic representations for multi-document summarization (MDS) has received considerable attention recently. In this paper, we describe five different topic representations and introduce a novel representation of topics based on topic themes. We present eight different methods of generating MDS and evaluate each of these methods on a large set of topics used in past DUC workshops. Our evaluation results show a significant improvement in the quality of summaries based on topic themes over MDS methods that use other alternative topic representations.

174 citations

Proceedings ArticleDOI
31 May 2014
TL;DR: An eye-tracking study of 10 professional Java programmers in which the programmers read Java methods and wrote English summaries of those methods is presented and the findings are applied to build a novel summarization tool.
Abstract: Source Code Summarization is an emerging technology for automatically generating brief descriptions of code. Current summarization techniques work by selecting a subset of the statements and keywords from the code, and then including information from those statements and keywords in the summary. The quality of the summary depends heavily on the process of selecting the subset: a high-quality selection would contain the same statements and keywords that a programmer would choose. Unfortunately, little evidence exists about the statements and keywords that programmers view as important when they summarize source code. In this paper, we present an eye-tracking study of 10 professional Java programmers in which the programmers read Java methods and wrote English summaries of those methods. We apply the findings to build a novel summarization tool. Then, we evaluate this tool and provide evidence to support the development of source code summarization systems.

173 citations

Journal ArticleDOI
TL;DR: This paper presents a different kind of learning models, namely regression models, to query-focused multi-document summarization, and chooses to use Support Vector Regression to estimate the importance of a sentence in a document set to be summarized through a set of pre-defined features.
Abstract: Most existing research on applying machine learning techniques to document summarization explores either classification models or learning-to-rank models. This paper presents our recent study on how to apply a different kind of learning models, namely regression models, to query-focused multi-document summarization. We choose to use Support Vector Regression (SVR) to estimate the importance of a sentence in a document set to be summarized through a set of pre-defined features. In order to learn the regression models, we propose several methods to construct the ''pseudo'' training data by assigning each sentence with a ''nearly true'' importance score calculated with the human summaries that have been provided for the corresponding document set. A series of evaluations on the DUC data sets are conducted to examine the efficiency and the robustness of the proposed approaches. When compared with classification models and ranking models, regression models are consistently preferable.

172 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852