scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: It is demonstrated how four variant summarization tasks, including general, query-focused, update, and comparative summarization, can be modeled as different versions acquired from the proposed framework.

24 citations

Proceedings ArticleDOI
23 Aug 2004
TL;DR: A large-scale test collection for multiple document summarization, the Text Summarization Challenge 3 (TSC3) corpus, which annotates not only the important sentences in a document set, but also those among them that have the same content.
Abstract: In this paper, we introduce a large-scale test collection for multiple document summarization, the Text Summarization Challenge 3 (TSC3) corpus. We detail the corpus construction and evaluation measures. The significant feature of the corpus is that it annotates not only the important sentences in a document set, but also those among them that have the same content. Moreover, we define new evaluation metrics taking redundancy into account and discuss the effectiveness of redundancy minimization.

24 citations

Journal ArticleDOI
TL;DR: This paper adapt the notion of risk minimization for extractive speech summarization by formulating the selection of summary sentences as a decision-making problem, and develops several selection strategies and modeling paradigms that can leverage supervised and unsupervised summarization models to inherit their individual merits as well as to overcome their inherent limitations.
Abstract: Extractive speech summarization attempts to select a representative set of sentences from a spoken document so as to succinctly describe the main theme of the original document. In this paper, we adapt the notion of risk minimization for extractive speech summarization by formulating the selection of summary sentences as a decision-making problem. To this end, we develop several selection strategies and modeling paradigms that can leverage supervised and unsupervised summarization models to inherit their individual merits as well as to overcome their inherent limitations. On top of that, various component models are introduced, providing a principled way to render the redundancy and coherence relationships among sentences and between sentences and the whole document, respectively. A series of experiments on speech summarization seem to demonstrate that the methods deduced from our summarization framework are very competitive with existing summarization methods.

24 citations

Proceedings ArticleDOI
12 Aug 2012
TL;DR: This work proposes a new summarization approach based on query-specific facet selection that aims to discover the important facets hidden behind a query using a machine learning approach, and summarizes retrieved documents based on those important facets.
Abstract: As highly structured documents with rich metadata (such as products, movies, etc.) become increasingly prevalent, searching those documents has become an important IR problem. Unfortunately existing work on document summarization, especially in the context of search, has been mainly focused on unstructured documents, and little attention has been paid to highly structured documents. Due to the different characteristics of structured and unstructured documents, the ideal approaches for document summarization might be different. In this paper, we study the problem of summarizing highly structured documents in a search context. We propose a new summarization approach based on query-specific facet selection. Our approach aims to discover the important facets hidden behind a query using a machine learning approach, and summarizes retrieved documents based on those important facets. In addition, we propose to evaluate summarization approaches based on a utility function that measures how well the summaries assist users in interacting with the search results. Furthermore, we develop a game on Mechanical Turk to evaluate different summarization approaches. The experimental results show that the new summarization approach significantly outperforms two existing ones.

24 citations

Proceedings Article
01 Aug 2013
TL;DR: A literature review framework based on a deconstruction of human-written literature review sections in information science research papers is developed, and insights from this analysis are discussed, and how the framework can be adapted to automatic summaries resembling human written literature reviews.
Abstract: This study is conducted in the area of multidocument summarization, and develops a literature review framework based on a deconstruction of human-written literature review sections in information science research papers. The first part of the study presents the results of a multi-level discourse analysis to investigate their discourse and content characteristics. These findings were incorporated into a framework for literature reviews, focusing on their macro-level document structure and the sentence-level templates, as well as the information summarization strategies. The second part of this study discusses insights from this analysis, and how the framework can be adapted to automatic summaries resembling human written literature reviews. Summaries generated from a partial implementation are evaluated against human written summaries and assessors’ comments are discussed to formulate recommendations for future work.

24 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852