scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Posted ContentDOI
24 Oct 2022
TL;DR: The LANS dataset as mentioned in this paper is a large-scale and diverse dataset for Arabic text summarization, which contains 8.4 million articles and their summaries extracted from newspapers websites metadata between 1999 and 2019.
Abstract: Text summarization has been intensively studied in many languages, and some languages have reached advanced stages. Yet, Arabic Text Summarization (ATS) is still in its developing stages. Existing ATS datasets are either small or lack diversity. We build, LANS, a large-scale and diverse dataset for Arabic Text Summarization task. LANS offers 8.4 million articles and their summaries extracted from newspapers websites metadata between 1999 and 2019. The high-quality and diverse summaries are written by journalists from 22 major Arab newspapers, and include an eclectic mix of at least more than 7 topics from each source. We conduct an intrinsic evaluation on LANS by both automatic and human evaluations. Human evaluation of 1000 random samples reports 95.4% accuracy for our collected summaries, and automatic evaluation quantifies the diversity and abstractness of the summaries. The dataset is publicly available upon request.
Proceedings ArticleDOI
18 Jul 2022
TL;DR: This article proposed a multi-view extractive text summarization approach for long scientific texts, which treated extractive summarization as a binary classification problem, and evaluated a view fusion method that are independent of the classification model.
Abstract: Automatic text summarization aims to generate a simple and descriptive summary covering the main content of a document. Most of the literature in automatic text summarization focuses on proposing and improving Deep Learning methods in order to make these models applicable in the context of long text summarization. Unfortunately, these models still have limitations on the input sequence length. Such a limitation may lead to a loss of information that impairs the quality of the summaries generated. For this reason, we propose a Multi-View Extractive Text Summarization approach for long scientific texts. Our hypothesis is that focusing on getting a better representation of the text is a key attribute for this task. We treat extractive summarization as a binary classification problem. We propose and evaluate a view fusion method that are independent of the classification model. We show that using this strategy we generate more consistent models that have a small score variance. Our experiments with real world articles show that our approach outperforms recent state-of-the-art models in extractive. abstractive and hybrid summarization.
Posted ContentDOI
19 May 2023
TL;DR: The authors proposed a unified topic encoder, which jointly discovers latent topics from the document and various kinds of side information through a graph encoder through a topic-aware interaction, and then proposed a triplet contrastive learning mechanism to align the single-and multi-modal information into a unified semantic space.
Abstract: Automatic summarization plays an important role in the exponential document growth on the Web. On content websites such as CNN.com and WikiHow.com, there often exist various kinds of side information along with the main document for attention attraction and easier understanding, such as videos, images, and queries. Such information can be used for better summarization, as they often explicitly or implicitly mention the essence of the article. However, most of the existing side-aware summarization methods are designed to incorporate either single-modal or multi-modal side information, and cannot effectively adapt to each other. In this paper, we propose a general summarization framework, which can flexibly incorporate various modalities of side information. The main challenges in designing a flexible summarization model with side information include: (1) the side information can be in textual or visual format, and the model needs to align and unify it with the document into the same semantic space, (2) the side inputs can contain information from various aspects, and the model should recognize the aspects useful for summarization. To address these two challenges, we first propose a unified topic encoder, which jointly discovers latent topics from the document and various kinds of side information. The learned topics flexibly bridge and guide the information flow between multiple inputs in a graph encoder through a topic-aware interaction. We secondly propose a triplet contrastive learning mechanism to align the single-modal or multi-modal information into a unified semantic space, where the summary quality is enhanced by better understanding the document and side information. Results show that our model significantly surpasses strong baselines on three public single-modal or multi-modal benchmark summarization datasets.
Proceedings ArticleDOI
01 Aug 2007
TL;DR: This paper proposes a subtopic- focused model to score sentences in the extractive summarization task via a hierarchical Bayesian model, through which sentences are scored and extracted as summary.
Abstract: In previous works, subtopics are seldom mentioned in multi-document summarization while only one topic is focused to extract summary. In this paper, we propose a subtopic- focused model to score sentences in the extractive summarization task. Different with supervised methods, it does not require costly manual work to form the training set. Multiple documents are represented as mixture over subtopics, denoted by term distributions through unsupervised learning. Our method learns the subtopic distribution over sentences via a hierarchical Bayesian model, through which sentences are scored and extracted as summary. Experiments on DUC 2006 data are performed and the ROUGE evaluation results show that the proposed method can reach the state-of-the-art performance.
Proceedings ArticleDOI
01 Jan 2009
TL;DR: An interactive summarization framework called iWISE is proposed to facilitate the process by providing a summary of the information on the Web site by making use of graphical visualization, tag clouds and text summarization.
Abstract: The World Wide Web (WWW) has become a huge repository of information and knowledge, and an essential channel for information exchange Many sites and thousands of pages of information on distributed power generation and alternate energy development are being added or modified constantly and the task of finding the most appropriate information is getting difficult While search engines are capable to return a collection of links according to key terms and some forms of ranking mechanism, it is still necessary to access the Web page and navigate through the site in order to find the information This paper proposes an interactive summarization framework called iWISE to facilitate the process by providing a summary of the information on the Web site The proposed approach makes use of graphical visualization, tag clouds and text summarization A number of cases are presented and compared in this paper with a discussion on future work

Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852