scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Book ChapterDOI
01 Jan 2010
TL;DR: This entry reviews the main developments in the field of automated text summarization and provides a guiding map to those interested in understanding the strengths and weaknesses of an increasingly ubiquitous technology.
Abstract: After lying dormant for a few decades, the field of automated text summarization has experienced a tremendous resurgence of interest. Recently, many new algorithms and techniques have been proposed for identifying important information in single documents and document collections, and for mapping this information into grammatical, cohesive, and coherent abstracts. Since 1997, annual workshops, conferences, and large-scale comparative evaluations have provided a rich environment for exchanging ideas between researchers in Asia, Europe, and North America. This entry reviews the main developments in the field and provides a guiding map to those interested in understanding the strengths and weaknesses of an increasingly ubiquitous technology
Journal ArticleDOI
TL;DR: In this article , a text summarization approach combining TF-TDF-TR (Term Frequency, Inverse Document Frequency, Text Rank) and seq2seq (Sequence to Sequence) model is proposed.
Abstract: In the age of technology, data is critical. The data on the internet is formless and poorly organized. The concept of text summarization is introduced in order to convert data summaries. Text summarization is the process of extracting useful information from raw data without diluting the main theme of the data. Today’s readers must contend with task of reading comments, reviews, news articles, blogs and other forms of informal and noisy communication. It is difficult to retrieve the correct gist of the gist, which is required by all readers. To achieve the benefits of both extractive and abstractive summarization, the proposed approach combines TF-TDF-TR(Term Frequency – Inverse Document Frequency – Text Rank) as an unsupervised learning algorithm and the seq2seq (Sequence to Sequence) model as a supervised learning algorithm. In terms of ROUGE score, the proposed TFRSP approach outperforms existing text summarization methods, resulting in high summary accuracy.
Journal ArticleDOI
TL;DR: This article proposed an unsupervised graph-based multi-document summarization method, which divides a large number of Tibetan news documents into topics and extracts the summarization of each topic, then model sentence clusters into graphs and remeasure sentence nodes based on the topic semantic information and the impact of topic features on sentences, higher topic relevance summary is extracted.
Abstract: Text summarization creates subset that represents the most important or relevant information in the original content, which effectively reduce information redundancy. Recently neural network method has achieved good results in the task of text summarization both in Chinese and English, but the research of text summarization in low-resource languages is still in the exploratory stage, especially in Tibetan. What’s more, there is no large-scale annotated corpus for text summarization. The lack of dataset severely limits the development of low-resource text summarization. In this case, unsupervised learning approaches are more appealing in low-resource languages as they do not require labeled data. In this paper, we propose an unsupervised graph-based Tibetan multi-document summarization method, which divides a large number of Tibetan news documents into topics and extracts the summarization of each topic. Summarization obtained by using traditional graph-based methods have high redundancy and the division of documents topics are not detailed enough. In terms of topic division, we adopt two level clustering methods converting original document into document-level and sentence-level graph, next we take both linguistic and deep representation into account and integrate external corpus into graph to obtain the sentence semantic clustering. Improve the shortcomings of the traditional K-Means clustering method and perform more detailed clustering of documents. Then model sentence clusters into graphs, finally remeasure sentence nodes based on the topic semantic information and the impact of topic features on sentences, higher topic relevance summary is extracted. In order to promote the development of Tibetan text summarization, and to meet the needs of relevant researchers for high-quality Tibetan text summarization datasets, this paper manually constructs a Tibetan summarization dataset and carries out relevant experiments. The experiment results show that our method can effectively improve the quality of summarization and our method is competitive to previous unsupervised methods.
Journal ArticleDOI
09 May 2023-PLOS ONE
TL;DR: This paper proposed a graph-based extractive single-document summarization method for Hausa text by modifying the existing PageRank algorithm using the normalized common bigrams count between adjacent sentences as the initial vertex score.
Abstract: Automatic text summarization is one of the most promising solutions to the ever-growing challenges of textual data as it produces a shorter version of the original document with fewer bytes, but the same information as the original document. Despite the advancements in automatic text summarization research, research involving the development of automatic text summarization methods for documents written in Hausa, a Chadic language widely spoken in West Africa by approximately 150,000,000 people as either their first or second language, is still in early stages of development. This study proposes a novel graph-based extractive single-document summarization method for Hausa text by modifying the existing PageRank algorithm using the normalized common bigrams count between adjacent sentences as the initial vertex score. The proposed method is evaluated using a primarily collected Hausa summarization evaluation dataset comprising of 113 Hausa news articles on ROUGE evaluation toolkits. The proposed approach outperformed the standard methods using the same datasets. It outperformed the TextRank method by 2.1%, LexRank by 12.3%, centroid-based method by 19.5%, and BM25 method by 17.4%.
Proceedings ArticleDOI
19 Oct 2022
TL;DR: In this article , a new approach of automated text summarization based on topic modeling techniques and taking into account the user's profile which helps to semantically extract relevant topics of textual documents, summarizing information according to the user' topics interests and finally visualize them through a hypergraph is proposed.
Abstract: Due to the enormous volume of data on the web, it is hard for the user to retrieve effective and useful information within the right time. Thus, it has become a need to generate a brief summary from a large amount of textual data according to the user profile. In this context, text summarization is used to identify important information within text documents. It aims to generate shorter versions of the source text, by including only the relevant and salient information. In recent years, the research on summarization techniques based on topic modeling techniques has become a hot topic among researchers thanks to their ability to classify, understand a large text corpora and extract important topics on the text. However, existing studies do not provide the support of personalization when generating summaries because they need to know not only which documents are most helpful to the users, but also which topics and keywords are more or less related to the user' interests. Thus, existing studies lack of the support of adaptive user modeling for user applications in the emerging areas of automatic summarization, topic modeling and visualization. In this context, we propose a new approach of automated text summarization based on topic modeling techniques and taking into account the user's profile which helps to semantically extract relevant topics of textual documents, summarizing information according to the user' topics interests and finally visualize them through a hyper-graph Experiments have been conducted to measure the effectiveness of our solution compared to existing summarizing approaches based on text content. The results show the superiority of our approach.

Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852