scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
13 Dec 2014
TL;DR: This paper proposes a new method to evaluate a sentence subset based on its capacity to reproduce term projections on right singular vectors and demonstrates the effectiveness of these methods on DUC2002 and DUC2004 datasets.
Abstract: Multi-document summary plays an increasingly important role with the exponential document growth on the web Among many traditional multi-document summarization techniques, the latent semantic analysis (LSA) is a unique duo to its using latent semantic information instead of original feature, which results in a better performance However, since those approaches based on LSA evaluate and select sentence individually, none of them is able to remove the redundant sentences In this paper, we propose a new method to evaluate a sentence subset based on its capacity to reproduce term projections on right singular vectors Finally, the experiments on DUC2002 and DUC2004 datasets validate the effectiveness of our proposed methods

9 citations

Posted Content
TL;DR: In this article, the authors present a new dataset for multi-document summarization that is large both in the total number of document clusters and in the size of individual clusters, and they build this dataset by leveraging the Wikipedia Current Events Portal (WCEP), which provides concise and neutral human-written summaries of news events with links to external source articles.
Abstract: Multi-document summarization (MDS) aims to compress the content in large document collections into short summaries and has important applications in story clustering for newsfeeds, presentation of search results, and timeline generation. However, there is a lack of datasets that realistically address such use cases at a scale large enough for training supervised models for this task. This work presents a new dataset for MDS that is large both in the total number of document clusters and in the size of individual clusters. We build this dataset by leveraging the Wikipedia Current Events Portal (WCEP), which provides concise and neutral human-written summaries of news events, with links to external source articles. We also automatically extend these source articles by looking for related articles in the Common Crawl archive. We provide a quantitative analysis of the dataset and empirical results for several state-of-the-art MDS techniques.

9 citations

Journal ArticleDOI
TL;DR: This paper provides researchers with a comprehensive survey of DL-based abstractive summarization, and highlights some open challenges in the abstractive summization task and outline some future research trends.
Abstract: With the rapid development of the Internet, the massive amount of web textual data has grown exponentially, which has brought considerable challenges to downstream tasks, such as document management, text classification, and information retrieval. Automatic text summarization (ATS) is becoming an extremely important means to solve this problem. The core of ATS is to mine the gist of the original text and automatically generate a concise and readable summary. Recently, to better balance and develop these two aspects, deep learning (DL)-based abstractive summarization models have been developed. At present, for ATS tasks, almost all state-of-the-art (SOTA) models are based on DL architecture. However, a comprehensive literature survey is still lacking in the field of DL-based abstractive text summarization. To fill this gap, this paper provides researchers with a comprehensive survey of DL-based abstractive summarization. We first give an overview of abstractive summarization and DL. Then, we summarize several typical frameworks of abstractive summarization. After that, we also give a comparison of several popular datasets that are commonly used for training, validation, and testing. We further analyze the performance of several typical abstractive summarization systems on common datasets. Finally, we highlight some open challenges in the abstractive summarization task and outline some future research trends. We hope that these explorations will provide researchers with new insights into DL-based abstractive summarization.

9 citations

01 Jan 2009
TL;DR: Parsumist is introduced-a text summarization system for Persian documents that exploits a combination of statistical, semantic and heuristic-improved methods to generate generic or topic/query-driven extracts summaries for single-or multiple Persian documents.
Abstract: The rapid growth of online information services has created the problem of information explosion. Automatic text summarization techniques are essential for dealing with this problem. The process of compacting a source document to reduce its complexity and length while retaining its most important contents is called text summarization. This paper introduces Parsumist-a text summarization system for Persian documents. It exploits a combination of statistical, semantic and heuristic-improved methods. It can generate generic or topic/query-driven extracts summaries for single-or multiple Persian documents. In this paper, we first review the related work in this field, especially for Persian text summarization. We then present the architecture of Parsumist, its components and feature s. The last section evaluates the system and compares it to other systems that exist.

9 citations

Posted Content
TL;DR: A new method for summarizing similarities and differences in a pair of related documents using a graph representation for text using a spreading activation technique to discover nodes semantically related to the topic.
Abstract: We describe a new method for summarizing similarities and differences in a pair of related documents using a graph representation for text. Concepts denoted by words, phrases, and proper names in the document are represented positionally as nodes in the graph along with edges corresponding to semantic relations between items. Given a perspective in terms of which the pair of documents is to be summarized, the algorithm first uses a spreading activation technique to discover, in each document, nodes semantically related to the topic. The activated graphs of each document are then matched to yield a graph corresponding to similarities and differences between the pair, which is rendered in natural language. An evaluation of these techniques has been carried out.

9 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852