scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Book ChapterDOI
01 Jan 2022
TL;DR: OpenNMT as mentioned in this paper is the encoder-decoder-based neural machine translation model which has been fine-tuned for the task of abstractive text summarization and has been tested on freely available dataset such as CNNDM & MSMO dataset and depicts their proficiency in terms of ROUGE and BLEU score.
Abstract: AbstractWith the massive growth of blogs, news stories, and reports, extracting useful information from such a large quantity of textual documents has become a difficult task. Automatic text summarization is an excellent approach for summarising these documents. Text summarization aims to condense large documents into concise summaries while preserving essential information and meaning. A variety of fascinating summarising models have been developed to achieve state-of-the-art performance in terms of fluency, human readability, and semantically meaningful summaries. In this paper, we have investigated the OpenNMT tool for task text summarization. The OpenNMT is the encoder-decoder-based neural machine translation model which has been fine-tuned for the task of abstractive text summarization. The proposed OpenNMT based text summarization approach has been tested on freely available dataset such as CNNDM & MSMO dataset and depicts their proficiency in terms of ROUGE and BLEU score.KeywordsText summarizationAbstractive summaryNews articlesOpenNMT
Journal ArticleDOI
TL;DR: The authors proposed a set of new requirements for opinion-Topic-sentence, which are essential for performing opinion summarization, and proposed four submodular functions and two optimization algorithms with proven performance bounds.
Abstract: This paper focuses on opinion summarization for constructing subjective and concise summaries representing essential opinions of online text reviews. As previous works rarely focus on the relationship between opinions, topics, and sentences, we propose a set of new requirements for Opinion-Topic-Sentence, which are essential for performing opinion summarization. We prove that Opinion-Topic-Sentence can be theoretically analyzed by submodular information measures. Thus, our proposed method can reduce redundant information, strengthen the relevance to given topics, and informatively represent the underlying emotional variations. While conventional methods require human-labeled topics for extractive summarization, we use unsupervised topic modeling methods to generate topic features. We propose four submodular functions and two optimization algorithms with proven performance bounds that can maximize opinion summarization's utility. An automatic evaluation metric, Topic-based Opinion Variance, is also derived to compensate for ROUGE-based metrics of opinion summarization evaluation. Four large, diversified, and representative corpora, OPOSUM, Opinosis, Yelp, and Amazon reviews, are used in our study. The results on these online review texts corroborate the efficacy of our proposed metric and framework.
Journal ArticleDOI
TL;DR: Two functional approaches to assessing the efficacy of summarization using a query set on both the original documents and their summaries, and using document classification on a 12-class set to compare among different summarization approaches, are introduced.
Abstract: Extractive summarization is an important natural language processing approach used for document compression, improved reading comprehension, key phrase extraction, indexing, query set generation, and other analytics approaches. Extractive summarization has specific advantages over abstractive summarization in that it preserves style, specific text elements, and compound phrases that might be more directly associated with the text. In this article, the relative effectiveness of extractive summarization is considered on two widely different corpora: (1) a set of works of fiction (100 total, mainly novels) available from Project Gutenberg, and (2) a large set of news articles (3000) for which a ground truthed summarization (gold standard) is provided by the authors of the news articles. Both sets were evaluated using 5 different Python Sumy algorithms and compared to randomly-generated summarizations quantitatively. Two functional approaches to assessing the efficacy of summarization using a query set on both the original documents and their summaries, and using document classification on a 12-class set to compare among different summarization approaches, are introduced. The results, unsurprisingly, show considerable differences consistent with the different nature of these two data sets. The LSA and Luhn summarization approaches were most effective on the database of fiction, while all five summarization approaches were similarly effective on the database of articles. Overall, the Luhn approach was deemed the most generally relevant among those tested.
Journal ArticleDOI
TL;DR: In this article , a review of the commonly used sentence scoring methods for text summarization is presented, as well as their effectiveness, shortcomings and application in various fields such as search engines, business analysis, market review and academia.
Abstract: With the growing amount of text data being generated from multiple sources over the past few years, it has become essential to effectively summarize this information. At present, it is estimated that there are over sixty trillion pages on the web. This is definitely more than any individual could hope to read or understand. Each year, there are approximately 600,000–1,000,000 books published in the US alone. These numbers demonstrate that it is next to impossible for any human to be able to consume all the information. Through text summarization, proper insights into the data can be gained. We need tools and techniques that can sift through this large amount of data and provide us with a brief summary of the information we are trying to learn and understand. Tools that can provide a detailed yet concise summary of the source text have become the need of the hour. Hence, text summarization has found its application in multiple field such as search engines, business analysis, market review and academia. This paper is an attempt to summarize some of the commonly used sentence scoring methods for text summarization. It presents a review of their algorithm, their effectiveness and their shortcomings.KeywordsText summarizationExtractive text summarization techniquesWord frequency summarizationTF-IDF summarizationTextRank algorithmKL Sum algorithm

Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852