scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Posted Content
TL;DR: The authors analyzed the paragraph-level attention weights of GraphSum's multi-heads and decoding layers in order to improve the explainability of a transformer-based multi-document summarization (MDS) model.
Abstract: Modern multi-document summarization (MDS) methods are based on transformer architectures. They generate state of the art summaries, but lack explainability. We focus on graph-based transformer models for MDS as they gained recent popularity. We aim to improve the explainability of the graph-based MDS by analyzing their attention weights. In a graph-based MDS such as GraphSum, vertices represent the textual units, while the edges form some similarity graph over the units. We compare GraphSum's performance utilizing different textual units, i. e., sentences versus paragraphs, on two news benchmark datasets, namely WikiSum and MultiNews. Our experiments show that paragraph-level representations provide the best summarization performance. Thus, we subsequently focus oAnalysisn analyzing the paragraph-level attention weights of GraphSum's multi-heads and decoding layers in order to improve the explainability of a transformer-based MDS model. As a reference metric, we calculate the ROUGE scores between the input paragraphs and each sentence in the generated summary, which indicate source origin information via text similarity. We observe a high correlation between the attention weights and this reference metric, especially on the the later decoding layers of the transformer architecture. Finally, we investigate if the generated summaries follow a pattern of positional bias by extracting which paragraph provided the most information for each generated summary. Our results show that there is a high correlation between the position in the summary and the source origin.

1 citations

Journal ArticleDOI
TL;DR: The evaluation results show that the combination of PageRank along with rhetorical relations does help to improve the quality of extractive summarization.
Abstract: This paper presents link analysis based on rhetorical relations with the aim of performing extractive summarization for multiple documents. We first extracted sentences with salient terms from individual document using statistical model. We then ranked the extracted sentences by measuring their relative importance according to their connectivity among the sentences in the document set using PageRank based on the rhetorical relations. The rhetorical relations were examined beforehand to determine which relations are crucial to this task, and the relations among sentences from documents were automatically identified by SVMs. We used the relations to emphasize important sentences during sentence ranking by PageRank and eliminate redundancy from the summary candidates. Our framework omits fully annotated sentences by humans and the evaluation results show that the combination of PageRank along with rhetorical relations does help to improve the quality of extractive summarization. key words: probability model, N-grams, link-based analysis, support vector machine, rhetorical relation

1 citations

Book ChapterDOI
26 Apr 2017
TL;DR: This paper modeled the input documents as a sentence dissimilarity graph and a given query as a query to sentences similarity vector to formalize the sentence selection/extraction as a multi facility location problem (mFLP).
Abstract: In this paper we propose a query focused multi document summarization method based on facility location problem. In order to formalize the sentence selection/extraction as a multi facility location problem (mFLP) we modeled the input documents as a sentence dissimilarity graph and a given query as a query to sentences similarity vector. In mFLP terminology the former is known as a cost to serve matrix, and latter as a cost to establish vector. By formulating the mFLP as the mixed integer linear programming problem we were able to optimally select sentences (facilities) which minimize the weighted sum of distances from each demand point to its nearest facility, plus the sum of opening costs of the facilities (query to sentences similarity). The performance of this new method has been tested using the DUC2005 and DUC2006 data corpus. The effectiveness of this technique is measured using the ROUGE score. The results indicate that presented methodology is truly a promising research direction.

1 citations

Journal ArticleDOI
TL;DR: A new Bayesian theory-based Hybrid Learning Model (BHLM) is proposed in this paper, used to perform the multi-document summarization and assign the class label assisted by the mean, variance and probability measures.
Abstract: In order to understand and organize the document in an efficient way, the multi-document summarization becomes the prominent technique in the Internet world. As the information available is in a large amount, it is necessary to summarize the document for obtaining the condensed information. To perform the multi-document summarization, a new Bayesian theory-based Hybrid Learning Model (BHLM) is proposed in this paper. Initially, the input documents are preprocessed, where the stop words are removed from the document. Then, the feature of the sentence is extracted to determine the sentence score for summarizing the document. The extracted feature is then fed into the hybrid learning model for learning. Subsequently, learning feature, training error and correlation coefficient are integrated with the Bayesian model to develop BHLM. Also, the proposed method is used to assign the class label assisted by the mean, variance and probability measures. Finally, based on the class label, the sentences are sorted ou...

1 citations

Journal ArticleDOI
TL;DR: This paper presented an effective way to summarize using a Text Rank algorithm, which focuses on summarizing single Hindi text document at a time based on natural language processing (NLP) for Hindi text documents.
Abstract: The availability of information today accessible in digital form has accelerated. Retrieving useful document from such large pool of information gets difficult. So, to summarize these text documents is very crucial. Text summarization is a process of minimizing the original source document to get essential information of that document. It eliminates the redundant, less important content and provides you with the vital information in a shorter version usually half a length of the original text. Creating a manual summary is a very time-consuming task. Automatic summarization helps in getting the gist of information present in a particular document in a very short period. In the comparison of all Indian regional languages, there is very less amount of work done for summarization of Hindi documents. This paper presents an effective way to summarize using a Text Rank algorithm. It focuses on summarizing single Hindi text document at a time based on natural language processing (NLP).

1 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852