Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

Topic-Centric Unsupervised Multi-Document Summarization of Scientific and News Articles

[...]

Amanuel Alambo¹, Cori Lohstroh¹, Erik Madaus¹, Swati Padhee¹, Brandy Foster¹, Tanvi Banerjee¹, Krishnaprasad Thirunarayan¹, Michael L. Raymer¹ - Show less +4 more•Institutions (1)

Wright State University¹

03 Nov 2020-arXiv: Computation and Language

TL;DR: This paper proposes a topic-centric unsupervised multi-document summarization framework to generate extractive and abstractive summaries for groups of scientific articles across 20 Fields of Study in Microsoft Academic Graph (MAG) and news articles from DUC-2004 Task 2.

...read moreread less

Abstract: Recent advances in natural language processing have enabled automation of a wide range of tasks, including machine translation, named entity recognition, and sentiment analysis. Automated summarization of documents, or groups of documents, however, has remained elusive, with many efforts limited to extraction of keywords, key phrases, or key sentences. Accurate abstractive summarization has yet to be achieved due to the inherent difficulty of the problem, and limited availability of training data. In this paper, we propose a topic-centric unsupervised multi-document summarization framework to generate extractive and abstractive summaries for groups of scientific articles across 20 Fields of Study (FoS) in Microsoft Academic Graph (MAG) and news articles from DUC-2004 Task 2. The proposed algorithm generates an abstractive summary by developing salient language unit selection and text generation techniques. Our approach matches the state-of-the-art when evaluated on automated extractive evaluation metrics and performs better for abstractive summarization on five human evaluation metrics (entailment, coherence, conciseness, readability, and grammar). We achieve a kappa score of 0.68 between two co-author linguists who evaluated our results. We plan to publicly share MAG-20, a human-validated gold standard dataset of topic-clustered research articles and their summaries to promote research in abstractive summarization.

...read moreread less

3 citations

Proceedings Article•DOI•

Read Top News First: A Document Reordering Approach for Multi-Document News Summarization

[...]

01 Jan 2022

TL;DR: The authors propose a simple approach to reorder the documents according to their relative importance before concatenating and summarizing them, which makes the salient content easier to learn by the summarization model.

...read moreread less

Abstract: A common method for extractive multi-document news summarization is to re-formulate it as a single-document summarization problem by concatenating all documents as a single meta-document. However, this method neglects the relative importance of documents. We propose a simple approach to reorder the documents according to their relative importance before concatenating and summarizing them. The reordering makes the salient content easier to learn by the summarization model. Experiments show that our approach outperforms previous state-of-the-art methods with more complex architectures.

...read moreread less

3 citations

Proceedings Article•DOI•

Sentiment diversification for short review summarization

[...]

Mohammed Al-Dhelaan¹, Abeer Al-Suhaim¹•Institutions (1)

King Saud University¹

23 Aug 2017

TL;DR: This paper presents a graph-based algorithm that is capable of producing extractive summaries that are both diversified from a sentiment point of view and topically well-covered and shows improvements in ROUGE metrics.

...read moreread less

Abstract: With the abundance of reviews published on the Web about a given product, consumers are looking for ways to view major opinions that can be presented in a quick and succinct way. Reviews contain many different opinions, making the ability to show a diversified review summary that focus on coverage and diversity a major goal. Most review summarization work focuses on showing salient reviews as a summary which might ignore diversity in summaries. In this paper, we present a graph-based algorithm that is capable of producing extractive summaries that are both diversified from a sentiment point of view and topically well-covered. First, we use statistical measures to find topical words. Then we split the dataset based on the sentiment class of the reviews and perform the ranking on each sentiment graph. When compared with different baselines, our approach scores best in most ROUGE metrics. Specifically, our approach shows improvements of 3.9% in ROUGE-1 and 1.8% in ROUGE-L in comparison with the best competing baseline.

...read moreread less

3 citations

Dissertation•

Natural language summarization of text and videos using topic models

[...]

Rohini K. Srihari¹, Pradipto Das¹•Institutions (1)

University at Buffalo¹

01 Jan 2014

TL;DR: This work extends a few existing unsupervised topic models such as Latent Dirichlet Allocation (LDA) to model documents which are annotated from two different perspectives and shows that using topic models it is possible to outperform keyword summaries generated by annotating videos through state-of-the-art object recognition techniques from computer vision.

...read moreread less

Abstract: Probabilistic topic models have recently become the cornerstone of unsupervised exploratory analysis of text documents using Bayesian statistics. The strength of the models lie in their modularity—random variables can be introduced or modified to suit the requirements of the different applications. Many of these models however consider modeling only one particular view of the observations such as treating documents as a flat collection of words ignoring the nuances of the different classes of annotations which may be present in an implicit and/or explicit form. We extend a few existing unsupervised topic models such as Latent Dirichlet Allocation (LDA) to model documents which are annotated from two different perspectives. The perspectives consist of both a word level (e.g. part-of-speech, affect, positional etc.) tag annotation and a document level (e.g. crowd-sourced document labels, captions of embedded multimedia) highlighting. The new models are dubbed as the Tag2LDA class of models whose primary goal is to combine the best aspects of supervised and unsupervised modeling learning under one framework. Additionally, the correspondence class of Tag2LDA models explored in this context are state-of-the-art among the family of parametric tag-topic models in terms of predictive log likelihoods. These models are presented in Chapter 4. The field of automatic summary generation is increasingly gaining traction and there is a steady rise in demand of the summarization algorithms that is applicable to a wide variety of genres of text and other kinds of data as well (e.g. video). Producing short summaries in a human readable form is very attractive particularly for very large datasets. However, the problem is NP-Hard even for smaller domains such as summarizing small sets of newswire documents. We use the Tag2LDA class of models in conjunction with local models (e.g. extracting syntactic and semantic roles of words, Rhetorical Structure trees, etc.) to do multi-document summarization of text documents based on information needs that are guided by a common information model. The guided summarization task, as laid out in recent text summarization competitions, aims to cover information needs by asking questions like “who did what when and where?” We also have successfully applied multi-modal topic models to summarize domain specific videos into natural language text directly from low level features extracted from the videos. The experiments performed for this task are described in detail in Chapter 5. Finally, in Chapter 6, we show that using topic models it is possible to outperform keyword summaries generated by annotating videos through state-of-the-art object recognition techniques from computer vision. Summarizing a video in terms of natural language generated from such keywords in context removes the laborious frame-by-frame drawing of bounding boxes surrounding objects of interest—a scheme which is required for annotating videos to training a large number of object detectors. The topic models that we develop for this purpose instead use easily available short lingual descriptions of entire videos to predict text for a given domain specific test video. The models are also novel in handling both text and video features particularly with regards to multimedia topic discovery from captioned videos whose features can belong to both discrete and real valued domains.

...read moreread less

3 citations

Journal Article•DOI•

Viewer’s Affective Feedback for Video Summarization

[...]

Majdi Dammak, Ali Wali, Adel M. Alimi

01 Mar 2015-Journal of Information Processing Systems

TL;DR: This paper presents how the emotional dimensions issued from real viewers can be used as an important input for computing which part is the most interesting in the total time of a film.

...read moreread less

Abstract: For different reasons, many viewers like to watch a summary of films without having to waste their time. Traditionally, video film was analyzed manually to provide a summary of it, but this costs an important amount of work time. Therefore, it has become urgent to propose a tool for the automatic video summarization job. The automatic video summarization aims at extracting all of the important moments in which viewers might be interested. All summarization criteria can differ from one video to another. This paper presents how the emotional dimensions issued from real viewers can be used as an important input for computing which part is the most interesting in the total time of a film. Our results, which are based on lab experiments that were carried out, are significant and promising.

...read moreread less

3 citations

Collapse

Network Information

Performance

Metrics

2,507

Papers

81,726

Citations

No. of papers in the topic in previous years
Year	Papers
2023	74
2022	160
2021	52
2020	61
2019	47
2018	52

Multi-document summarization

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics