scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
30 Apr 2000
TL;DR: This paper discusses a text extraction approach to multi- document summarization that builds on single-document summarization methods by using additional, available information about the document set as a whole and the relationships between the documents.
Abstract: This paper discusses a text extraction approach to multi-document summarization that builds on single-document summarization methods by using additional, available information about the document set as a whole and the relationships between the documents. Multi-document summarization differs from single in that the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Our approach addresses these issues by using domain-independent techniques based mainly on fast, statistical processing, a metric for reducing redundancy and maximizing diversity in the selected passages, and a modular framework to allow easy parameterization for different genres, corpora characteristics and user requirements.

408 citations

Book ChapterDOI
Ryan McDonald1
02 Apr 2007
TL;DR: This work defines a general framework for inference in summarization and presents three algorithms: a greedy approximate method, a dynamic programming approach based on solutions to the knapsack problem, and an exact algorithm that uses an Integer Linear Programming formulation of the problem.
Abstract: In this work we study the theoretical and empirical properties of various global inference algorithms for multi-document summarization. We start by defining a general framework for inference in summarization. We then present three algorithms: The first is a greedy approximate method, the second a dynamic programming approach based on solutions to the knapsack problem, and the third is an exact algorithm that uses an Integer Linear Programming formulation of the problem. We empirically evaluate all three algorithms and show that, relative to the exact solution, the dynamic programming algorithm provides near optimal results with preferable scaling properties.

382 citations

Proceedings ArticleDOI
20 Apr 2009
TL;DR: The proposed methods are quite general and can be used to generate rated aspect summary automatically given any collection of short comments each associated with an overall rating.
Abstract: Web 2.0 technologies have enabled more and more people to freely comment on different kinds of entities (e.g. sellers, products, services). The large scale of information poses the need and challenge of automatic summarization. In many cases, each of the user-generated short comments comes with an overall rating. In this paper, we study the problem of generating a ``rated aspect summary'' of short comments, which is a decomposed view of the overall ratings for the major aspects so that a user could gain different perspectives towards the target entity. We formally define the problem and decompose the solution into three steps. We demonstrate the effectiveness of our methods by using eBay sellers' feedback comments. We also quantitatively evaluate each step of our methods and study how well human agree on such a summarization task. The proposed methods are quite general and can be used to generate rated aspect summary automatically given any collection of short comments each associated with an overall rating.

381 citations

Patent
Yoshio Nakao1
16 Jan 1998
TL;DR: In this paper, a focused information relevant portion extraction unit extracts a portion related to two types of focused information in a document to be summarized, i.e., user-focused information as information focused by a user who uses a summary and author-focused as information emphasized by an author of the document.
Abstract: A document summarization apparatus or method summarizes an electronic document written in a natural language, and generates an appropriate summary depending on user's focus and user's knowledge. The document summarization apparatus according to the present invention includes, for example, a focused information relevant portion extraction unit, a summary readability improvement unit, and a summary generation unit. The focused information relevant portion extraction unit extracts a portion related to two types of focused information in a document to be summarized based on the two types of focused information, that is, user-focused information as information focused by a user who uses a summary, and author-focused information as information emphasized by an author of the document to be summarized. In the document to be summarized, the summary readability improvement unit distinguishes user known information already known to a user, and information known through an access log regarded as already known to a user based on a document previously presented to the user when a summary is generated, from other information than these two types of information, and selects an important portion in the document to be summarized. The summary generation unit generates the summary of the document to be summarized based on the selection result of the summary readability improvement unit. Thus, a summary can be generated with both user-focused information and author-focused information can be included depending on the knowledge level of a user.

378 citations

Proceedings ArticleDOI
01 May 2004
TL;DR: The functionality of MEAD is described, a comprehensive, public domain, open source, multidocument multilingual summarization environment that has been thus far downloaded by more than 500 organizations.
Abstract: This paper describes the functionality of MEAD, a comprehensive, public domain, open source, multidocument multilingual summarization environment that has been thus far downloaded by more than 500 organizations. MEAD has been used in a variety of summarization applications ranging from summarization for mobile devices to Web page summarization within a search engine and to novelty detection.

378 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852