scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: It is found that, when summarizing gene-related literature, using GO, SNOMED-CT and HUGO to extract domain concepts results in significantly better summaries than using all available vocabularies in the UMLS.

19 citations

Proceedings ArticleDOI
19 Jul 2010
TL;DR: A novel entity-labeled corpus with temporal information out of the TREC 2004 Novelty collection is constructed, and it is shown that an article's history can be exploited to improve its summarization.
Abstract: In this paper we study the problem of entity retrieval for news applications and the importance of the news trail history (i.e. past related articles) to determine the relevant entities in current articles. We construct a novel entity-labeled corpus with temporal information out of the TREC 2004 Novelty collection. We develop and evaluate several features, and show that an article's history can be exploited to improve its summarization.

19 citations

01 Jan 2005
TL;DR: It is demonstrated that the initial application of a sentence-trimming approach (Trimmer) to the problem of multi-document summarization in the MSE2005 and DUC2005 tasks was able to port Trimmer easily and the direct impact of sentence trimming was minimal compared to other features used in the system.
Abstract: We implemented an initial application of a sentence-trimming approach (Trimmer) to the problem of multi-document summarization in the MSE2005 and DUC2005 tasks. Sentence trimming was incorporated into a feature-based summarization system, called MultiDocument Trimmer (MDT), by using sentence trimming as both a preprocessing stage and a feature for sentence ranking. We demonstrate that we were able to port Trimmer easily to this new problem. Although the direct impact of sentence trimming was minimal compared to other features used in the system, the interaction of the other features resulted in trimmed sentences accounting for nearly half of the selected summary sentences.

19 citations

Proceedings ArticleDOI
01 Jun 2014
TL;DR: A model based on Bayesian surprise is presented which provides an intuitive way to identify surprising information from a summarization input with respect to a background corpus.
Abstract: In order to summarize a document, it is often useful to have a background set of documents from the domain to serve as a reference for determining new and important information in the input document. We present a model based on Bayesian surprise which provides an intuitive way to identify surprising information from a summarization input with respect to a background corpus. Specifically, the method quantifies the degree to which pieces of information in the input change one’s beliefs’ about the world represented in the background. We develop systems for generic and update summarization based on this idea. Our method provides competitive content selection performance with particular advantages in the update task where systems are given a small and topical background corpus.

19 citations

Proceedings ArticleDOI
27 Oct 2013
TL;DR: A new approach of text modeling via network analysis is proposed, and a simple method based on the influence analysis is effective, compared with existing generative topic modeling and random walk based ranking.
Abstract: This paper studies text summarization by extracting hierarchical topics from a given collection of documents. We propose a new approach of text modeling via network analysis. We convert documents into a word influence network, and find the words summarizing the major topics with an efficient influence maximization algorithm. Besides, the influence capability of the topic words on other words in the network reveal the relations among the topic words. Then we cluster the words and build hierarchies for the topics. Experiments on large collections of Web documents show that a simple method based on the influence analysis is effective, compared with existing generative topic modeling and random walk based ranking.

19 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852