scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings Article
01 Oct 2013
TL;DR: A Bayesian nonparametric model for multidocument summarization is proposed in order to automatically determine the proper lengths of summaries and the ”reconstruction” of an original document can be reconstructed by a Bayesian framework which selects sentences to form a good summary.
Abstract: Document summarization is an important task in the area of natural language processing, which aims to extract the most important information from a single document or a cluster of documents. In various summarization tasks, the summary length is manually defined. However, how to find the proper summary length is quite a problem; and keeping all summaries restricted to the same length is not always a good choice. It is obviously improper to generate summaries with the same length for two clusters of documents which contain quite different quantity of information. In this paper, we propose a Bayesian nonparametric model for multidocument summarization in order to automatically determine the proper lengths of summaries. Assuming that an original document can be reconstructed from its summary, we describe the ”reconstruction” by a Bayesian framework which selects sentences to form a good summary. Experimental results on DUC2004 data sets and some expanded data demonstrate the good quality of our summaries and the rationality of the length determination.

5 citations

Book ChapterDOI
01 Jan 2022
TL;DR: This article presented a comprehensive comparison of a few transformer architecture-based pretrained models for text summarization, and used the BBC news dataset that contains text data for summarization and human-generated summaries for evaluating and comparing the summaries generated by machine learning models.
Abstract: The amount of text data available online is increasing at a very fast pace; hence, text summarization has become essential. Most of the modern recommender and text classification systems require going through a huge amount of data. Manually generating precise and fluent summaries of lengthy articles is a very tiresome and time-consuming task. Hence, generating automated summaries for the data and using it to train machine learning models will make these models space and time efficient. Extractive summarization and abstractive summarization are two separate methods of generating summaries. The extractive technique identifies the relevant sentences from the original document and extracts only those from the text. Whereas in abstractive summarization techniques, the summary is generated after interpreting the original text, hence making it more complicated. In this paper, we will be presenting a comprehensive comparison of a few transformer architecture-based pretrained models for text summarization. For analysis and comparison, we have used the BBC news dataset that contains text data that can be used for summarization and human-generated summaries for evaluating and comparing the summaries generated by machine learning models.

5 citations

Posted Content
TL;DR: This article used word embeddings for extractive text or speech summarization, where the cosine similarity measure was employed to determine the relevance degree between a pair of representations, and a ranking model based on the general word embedding methods was proposed.
Abstract: Owing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express the most important theme of the document, has been an active area of research and experimentation. On the other hand, word embedding has emerged as a newly favorite research subject because of its excellent performance in many natural language processing (NLP)-related tasks. However, as far as we are aware, there are relatively few studies investigating its use in extractive text or speech summarization. A common thread of leveraging word embeddings in the summarization process is to represent the document (or sentence) by averaging the word embeddings of the words occurring in the document (or sentence). Then, intuitively, the cosine similarity measure can be employed to determine the relevance degree between a pair of representations. Beyond the continued efforts made to improve the representation of words, this paper focuses on building novel and efficient ranking models based on the general word embedding methods for extractive speech summarization. Experimental results demonstrate the effectiveness of our proposed methods, compared to existing state-of-the-art methods.

5 citations

Proceedings ArticleDOI
18 Jul 2007
TL;DR: A video content summarization for recommendation system to auto-recommend suitable multimedia learning materials for learners and indicates how the VCSR system effectively plays the intermediate role in a modern digital library.
Abstract: In this paper, the authors present a video content summarization for recommendation (called VCSR) system to auto-recommend suitable multimedia learning materials for learners. The VCSR system firstly extracts important content as summarization from input raw video data, while the generated summarization will be auto-routed to learners according to their profiles. Video captions are initially recognized using optical character recognition (OCR), then a set of key passages with corresponding frame images are extracted to form a video summary. The recommendation is achieved by calculating the relevance of the video summarization for each learner. Also, this paper indicates how the VCSR system effectively plays the intermediate role in a modern digital library.

5 citations

Proceedings Article
01 Jan 2007
TL;DR: This paper proposes a summarization approach based on the topical structure of multiple customer reviews that extracts topics from a collection of reviews and further ranks the topics based on their frequency, and shows that the approach outperformed the baseline summarization systems, i.e. Copernic summarizer and clustering-summarization, in terms of users’ responsiveness.
Abstract: Online customer reviews offer valuable information for merchants and potential shoppers in e-Commerce and e-Business. However, even for a single product, the number of reviews often amounts to hundreds or thousands. Thus, summarization of multiple reviews is helpful to extract the important issues that merchants and customers are concerned about. Existing methods of multi-document summarization divide documents into non-overlapping clusters first and then summarize each cluster of documents individually with the assumption that each cluster discusses a single topic. When applied to summarize customer reviews, it is however difficult to determine the number of clusters without the prior domain knowledge, and moreover, topics often overlap with each other in a collection of customer reviews. In this paper, we propose a summarization approach based on the topical structure of multiple customer reviews. Instead of clustering and summarization, our approach extracts topics from a collection of reviews and further ranks the topics based on their frequency. The summary is then generated according to the ranked topics. The evaluation results showed that our approach outperformed the baseline summarization systems, i.e. Copernic summarizer and clustering-summarization, in terms of users’ responsiveness.

5 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852