Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Multi-Document Summarization of Evaluative Text.

[...]

Giuseppe Carenini, Raymond T. Ng, Adam Pauls

01 Apr 2006

TL;DR: A framework for summarizing a corpus of evaluative documents about a single entity by a natural language summary is presented and it is indicated that forevaluative text abstraction tends to be more effective than extraction, particularly when the corpus is controversial.

...read moreread less

Abstract: In many decision‐making scenarios, people can benefit from knowing what other people's opinions are. As more and more evaluative documents are posted on the Web, summarizing these useful resources becomes a critical task for many organizations and individuals. This paper presents a framework for summarizing a corpus of evaluative documents about a single entity by a natural language summary. We propose two summarizers: an extractive summarizer and an abstractive one. As an additional contribution, we show how our abstractive summarizer can be modified to generate summaries tailored to a model of the user preferences that is solidly grounded in decision theory and can be effectively elicited from users. We have tested our framework in three user studies. In the first one, we compared the two summarizers. They performed equally well relative to each other quantitatively, while significantly outperforming a baseline standard approach to multidocument summarization. Trends in the results as well as qualitative comments from participants suggest that the summarizers have different strengths and weaknesses. After this initial user study, we realized that the diversity of opinions expressed in the corpus (i.e., its controversiality) might play a critical role in comparing abstraction versus extraction. To clearly pinpoint the role of controversiality, we ran a second user study in which we controlled for the degree of controversiality of the corpora that were summarized for the participants. The outcome of this study indicates that for evaluative text abstraction tends to be more effective than extraction, particularly when the corpus is controversial. In the third user study we assessed the effectiveness of our user tailoring strategy. The results of this experiment confirm that user tailored summaries are more informative than untailored ones.

...read moreread less

176 citations

Proceedings Article•DOI•

Multi-Document Summarization using Sentence-based Topic Models

[...]

Dingding Wang¹, Shenghuo Zhu, Tao Li¹, Yihong Gong•Institutions (1)

Florida International University¹

04 Aug 2009

TL;DR: A new Bayesian sentence-based topic model for summarization by making use of both the term-document and term-sentence associations is proposed and an efficient variational Bayesian algorithm is derived for model parameter estimation.

...read moreread less

Abstract: Most of the existing multi-document summarization methods decompose the documents into sentences and work directly in the sentence space using a term-sentence matrix. However, the knowledge on the document side, i.e. the topics embedded in the documents, can help the context understanding and guide the sentence selection in the summarization procedure. In this paper, we propose a new Bayesian sentence-based topic model for summarization by making use of both the term-document and term-sentence associations. An efficient variational Bayesian algorithm is derived for model parameter estimation. Experimental results on benchmark data sets show the effectiveness of the proposed model for the multi-document summarization task.

...read moreread less

175 citations

Proceedings Article•DOI•

Topic themes for multi-document summarization

[...]

Sanda M. Harabagiu¹, Finley Lacatusu¹•Institutions (1)

Language Computer Corporation¹

15 Aug 2005

TL;DR: This paper presents eight different methods of generating MDS and evaluates each of these methods on a large set of topics used in past DUC workshops, showing a significant improvement in the quality of summaries based on topic themes over MDS methods that use other alternative topic representations.

...read moreread less

Abstract: The problem of using topic representations for multi-document summarization (MDS) has received considerable attention recently. In this paper, we describe five different topic representations and introduce a novel representation of topics based on topic themes. We present eight different methods of generating MDS and evaluate each of these methods on a large set of topics used in past DUC workshops. Our evaluation results show a significant improvement in the quality of summaries based on topic themes over MDS methods that use other alternative topic representations.

...read moreread less

174 citations

Proceedings Article•DOI•

Improving automated source code summarization via an eye-tracking study of programmers

[...]

Paige Rodeghero¹, Collin McMillan¹, Paul W. McBurney¹, Nigel Bosch¹, Sidney K. D'Mello¹ - Show less +1 more•Institutions (1)

University of Notre Dame¹

31 May 2014

TL;DR: An eye-tracking study of 10 professional Java programmers in which the programmers read Java methods and wrote English summaries of those methods is presented and the findings are applied to build a novel summarization tool.

...read moreread less

Abstract: Source Code Summarization is an emerging technology for automatically generating brief descriptions of code. Current summarization techniques work by selecting a subset of the statements and keywords from the code, and then including information from those statements and keywords in the summary. The quality of the summary depends heavily on the process of selecting the subset: a high-quality selection would contain the same statements and keywords that a programmer would choose. Unfortunately, little evidence exists about the statements and keywords that programmers view as important when they summarize source code. In this paper, we present an eye-tracking study of 10 professional Java programmers in which the programmers read Java methods and wrote English summaries of those methods. We apply the findings to build a novel summarization tool. Then, we evaluate this tool and provide evidence to support the development of source code summarization systems.

...read moreread less

173 citations

Journal Article•DOI•

Applying regression models to query-focused multi-document summarization

[...]

You Ouyang¹, Wenjie Li¹, Sujian Li², Qin Lu¹•Institutions (2)

Hong Kong Polytechnic University¹, Chinese Ministry of Education²

01 Mar 2011-Information Processing and Management

TL;DR: This paper presents a different kind of learning models, namely regression models, to query-focused multi-document summarization, and chooses to use Support Vector Regression to estimate the importance of a sentence in a document set to be summarized through a set of pre-defined features.

...read moreread less

Abstract: Most existing research on applying machine learning techniques to document summarization explores either classification models or learning-to-rank models. This paper presents our recent study on how to apply a different kind of learning models, namely regression models, to query-focused multi-document summarization. We choose to use Support Vector Regression (SVR) to estimate the importance of a sentence in a document set to be summarized through a set of pre-defined features. In order to learn the regression models, we propose several methods to construct the ''pseudo'' training data by assigning each sentence with a ''nearly true'' importance score calculated with the human summaries that have been provided for the corresponding document set. A series of evaluations on the DUC data sets are conducted to examine the efficiency and the robustness of the proposed approaches. When compared with classification models and ranking models, regression models are consistently preferable.

...read moreread less

172 citations

Collapse

Network Information

Performance

Metrics

2,507

Papers

81,726

Citations

No. of papers in the topic in previous years
Year	Papers
2023	74
2022	160
2021	52
2020	61
2019	47
2018	52

Multi-document summarization

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics