scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A new image summarization algorithm designed for automatically summarizing user's snapshot photos taken in a virtual environment based on user's context information and educational contents, and then presenting a summarized photos shortly after user's virtual reality experience is proposed.
Abstract: In this paper, we proposed a new image summarization algorithm designed for automatically summarizing user's snapshot photos taken in a virtual environment based on user's context information and educational contents, and then presenting a summarized photos shortly after user's virtual reality experience. While other image summarization algorithms used date, location, and keyword to effectively summarize a large amount of photos, this algorithm is intended to improve users' memory retention by recalling their interests and important educational contents. This paper first describes some criteria of extracting the meaningful images to improve learning effects and the identification rate calculations, followed by the system architecture that integrates the virtual environment and the viewer interface. It will also discuss a user study to model the algorithm's optimal identification rate and then future research directions.Key Words:Image Summarization Algorithm, Memory Improvement, Educational Virtual Environment
Journal ArticleDOI
TL;DR: This survey focuses on some of the existing techniques of statistical document summarization as well as summarization using semantic approaches to deal with the improvements that can be done for Extractive Text.
Abstract: Conversion of text-to-text, to generate summary has been a key research area now a days. Automatic text summarization reduces human effort in generating summary from text document(s) with the help of computer program. Various approaches, methods and systems have been suggested and developed so far till date. This survey focuses on some of the existing techniques of statistical document summarization as well as summarization using semantic approaches to deal with the improvements that can be done for Extractive Text
Posted Content
TL;DR: The authors adopt the Longformer architecture with proper input transformation and global attention to fit for multi-document inputs, and use Gap Sentence Generation objective with a new strategy to select salient sentences for the whole cluster, called Entity Pyramid, to teach the model to select and aggregate information across a cluster of related documents.
Abstract: Recently proposed pre-trained generation models achieve strong performance on single-document summarization benchmarks. However, most of them are pre-trained with general-purpose objectives and mainly aim to process single document inputs. In this paper, we propose PRIMER, a pre-trained model for multi-document representation with focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data. Specifically, we adopt the Longformer architecture with proper input transformation and global attention to fit for multi-document inputs, and we use Gap Sentence Generation objective with a new strategy to select salient sentences for the whole cluster, called Entity Pyramid, to teach the model to select and aggregate information across a cluster of related documents. With extensive experiments on 6 multi-document summarization datasets from 3 different domains on the zero-shot, few-shot, and full-supervised settings, our model, PRIMER, outperforms current state-of-the-art models on most of these settings with large margins. Code and pre-trained models are released at this https URL
Proceedings ArticleDOI
01 Jan 2022
TL;DR: Automatic evaluation indicates that removing straplines and noise from the training data of a news summarizer results in higher quality summaries, with improvements as high as 7 points ROUGE score.
Abstract: Recent improvements in automatic news summarization fundamentally rely on large corpora of news articles and their summaries. These corpora are often constructed by scraping news websites, which results in including not only summaries but also other kinds of texts. Apart from more generic noise, we identify straplines as a form of text scraped from news websites that commonly turn out not to be summaries. The presence of these non-summaries threatens the validity of scraped corpora as benchmarks for news summarization. We have annotated extracts from two news sources that form part of the Newsroom corpus (Grusky et al., 2018), labeling those which were straplines, those which were summaries, and those which were both. We present a rule-based strapline detection method that achieves good performance on a manually annotated test set. Automatic evaluation indicates that removing straplines and noise from the training data of a news summarizer results in higher quality summaries, with improvements as high as 7 points ROUGE score.
Journal Article
TL;DR: This article proposed a hybrid model for sentence ordering in extractive multi-document summarization that combines four relations between sentences, where sentence as vertex and combined relation as edge of a directed graph on which the approximately optimal ordering can be generated.
Abstract: Ordering information is a critical task for multi-document summarization because it heavily influent the coherence of the generated summary In this paper, we propose a hybrid model for sentence ordering in extractive multi-document summarization that combines four relations between sentences This model regards sentence as vertex and combined relation as edge of a directed graph on which the approximately optimal ordering can be generated with Pag-eRank analysis Evaluation of our hybrid model shows a significant improvement of the ordering over strategies losing some relations and the results also indicate that this hybrid model is robust for articles with different genre

Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852