Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Automatically Determining a Proper Length for Multi-Document Summarization: A Bayesian Nonparametric Approach

[...]

Tengfei Ma¹, Hiroshi Nakagawa¹•Institutions (1)

University of Tokyo¹

01 Oct 2013

TL;DR: A Bayesian nonparametric model for multidocument summarization is proposed in order to automatically determine the proper lengths of summaries and the ”reconstruction” of an original document can be reconstructed by a Bayesian framework which selects sentences to form a good summary.

...read moreread less

Abstract: Document summarization is an important task in the area of natural language processing, which aims to extract the most important information from a single document or a cluster of documents. In various summarization tasks, the summary length is manually defined. However, how to find the proper summary length is quite a problem; and keeping all summaries restricted to the same length is not always a good choice. It is obviously improper to generate summaries with the same length for two clusters of documents which contain quite different quantity of information. In this paper, we propose a Bayesian nonparametric model for multidocument summarization in order to automatically determine the proper lengths of summaries. Assuming that an original document can be reconstructed from its summary, we describe the ”reconstruction” by a Bayesian framework which selects sentences to form a good summary. Experimental results on DUC2004 data sets and some expanded data demonstrate the good quality of our summaries and the rationality of the length determination.

...read moreread less

5 citations

Book Chapter•DOI•

Automated News Summarization Using Transformers

[...]

01 Jan 2022

TL;DR: This article presented a comprehensive comparison of a few transformer architecture-based pretrained models for text summarization, and used the BBC news dataset that contains text data for summarization and human-generated summaries for evaluating and comparing the summaries generated by machine learning models.

...read moreread less

Abstract: The amount of text data available online is increasing at a very fast pace; hence, text summarization has become essential. Most of the modern recommender and text classification systems require going through a huge amount of data. Manually generating precise and fluent summaries of lengthy articles is a very tiresome and time-consuming task. Hence, generating automated summaries for the data and using it to train machine learning models will make these models space and time efficient. Extractive summarization and abstractive summarization are two separate methods of generating summaries. The extractive technique identifies the relevant sentences from the original document and extracts only those from the text. Whereas in abstractive summarization techniques, the summary is generated after interpreting the original text, hence making it more complicated. In this paper, we will be presenting a comprehensive comparison of a few transformer architecture-based pretrained models for text summarization. For analysis and comparison, we have used the BBC news dataset that contains text data that can be used for summarization and human-generated summaries for evaluating and comparing the summaries generated by machine learning models.

...read moreread less

5 citations

Posted Content•

Leveraging Word Embeddings for Spoken Document Summarization

[...]

Kuan-Yu Chen¹, Kuan-Yu Chen², Shih-Hung Liu², Hsin-Min Wang², Berlin Chen, Hsin-Hsi Chen¹ - Show less +2 more•Institutions (2)

National Taiwan University¹, Academia Sinica²

14 Jun 2015-arXiv: Computation and Language

TL;DR: This article used word embeddings for extractive text or speech summarization, where the cosine similarity measure was employed to determine the relevance degree between a pair of representations, and a ranking model based on the general word embedding methods was proposed.

...read moreread less

Abstract: Owing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express the most important theme of the document, has been an active area of research and experimentation. On the other hand, word embedding has emerged as a newly favorite research subject because of its excellent performance in many natural language processing (NLP)-related tasks. However, as far as we are aware, there are relatively few studies investigating its use in extractive text or speech summarization. A common thread of leveraging word embeddings in the summarization process is to represent the document (or sentence) by averaging the word embeddings of the words occurring in the document (or sentence). Then, intuitively, the cosine similarity measure can be employed to determine the relevance degree between a pair of representations. Beyond the continued efforts made to improve the representation of words, this paper focuses on building novel and efficient ranking models based on the general word embedding methods for extractive speech summarization. Experimental results demonstrate the effectiveness of our proposed methods, compared to existing state-of-the-art methods.

...read moreread less

5 citations

Proceedings Article•DOI•

VCSR: Video Content Summarization for Recommendation

[...]

Chi-Cheng Tsai¹, Ching-I Chung¹, Yi-Ting Huang¹, Chia-Hsing Shen¹, Yu-Chieh Wu¹, Jie Chi Yang¹ - Show less +2 more•Institutions (1)

National Central University¹

18 Jul 2007

TL;DR: A video content summarization for recommendation system to auto-recommend suitable multimedia learning materials for learners and indicates how the VCSR system effectively plays the intermediate role in a modern digital library.

...read moreread less

Abstract: In this paper, the authors present a video content summarization for recommendation (called VCSR) system to auto-recommend suitable multimedia learning materials for learners. The VCSR system firstly extracts important content as summarization from input raw video data, while the generated summarization will be auto-routed to learners according to their profiles. Video captions are initially recognized using optical character recognition (OCR), then a set of key passages with corresponding frame images are extracted to form a video summary. The recommendation is achieved by calculating the relevance of the video summarization for each learner. Also, this paper indicates how the VCSR system effectively plays the intermediate role in a modern digital library.

...read moreread less

5 citations

Proceedings Article•

Automatic summarization of online customer reviews

[...]

Jiaming Zhan, Han Tong Loh, Ying Liu

01 Jan 2007

TL;DR: This paper proposes a summarization approach based on the topical structure of multiple customer reviews that extracts topics from a collection of reviews and further ranks the topics based on their frequency, and shows that the approach outperformed the baseline summarization systems, i.e. Copernic summarizer and clustering-summarization, in terms of users’ responsiveness.

...read moreread less

Abstract: Online customer reviews offer valuable information for merchants and potential shoppers in e-Commerce and e-Business. However, even for a single product, the number of reviews often amounts to hundreds or thousands. Thus, summarization of multiple reviews is helpful to extract the important issues that merchants and customers are concerned about. Existing methods of multi-document summarization divide documents into non-overlapping clusters first and then summarize each cluster of documents individually with the assumption that each cluster discusses a single topic. When applied to summarize customer reviews, it is however difficult to determine the number of clusters without the prior domain knowledge, and moreover, topics often overlap with each other in a collection of customer reviews. In this paper, we propose a summarization approach based on the topical structure of multiple customer reviews. Instead of clustering and summarization, our approach extracts topics from a collection of reviews and further ranks the topics based on their frequency. The summary is then generated according to the ranked topics. The evaluation results showed that our approach outperformed the baseline summarization systems, i.e. Copernic summarizer and clustering-summarization, in terms of users’ responsiveness.

...read moreread less

5 citations

Collapse

Network Information

Performance

Metrics

2,507

Papers

81,726

Citations

No. of papers in the topic in previous years
Year	Papers
2023	74
2022	160
2021	52
2020	61
2019	47
2018	52

Multi-document summarization

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics