scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposed various features of Summary Extraction and also analyzed features that are to be applied depending upon the size of the Document and proposed that use of the concept based features helps in improving the quality of the summary in case of large documents.
Abstract: As the amount of textual Information increases, we experience a need for Automatic Text Summarizers. In Automatic summarization a text document or a larger corpus of multiple documents are reduced to a short set of words or paragraph that conveys the main meaning of the text Summarization can be classified into two approaches: extraction and abstraction. This paper focuses on extraction approach.The goal of text summarization based on extraction approach is sentences selection. The first step in summarization by extraction is the identification of important features. In our approach short stories and biographies are used as test documents. Each document is prepared by pre-processing process: sentence segmentation, tokenization, stop word removal, case folding, lemmatization, and stemming. Then, using important features, sentence filtering, data compression and finally calculating score for each sentence is done. In this paper we proposed various features of Summary Extraction and also analyzed features that are to be applied depending upon the size of the Document. The experimentation is performed with the DUC 2002 dataset. The comparative results of the proposed approach and that of MS-Word are also presented here. The concept based features are given more weightage. From these results we propose that use of the concept based features helps in improving the quality of the summary in case of large documents.

1 citations

Proceedings Article
Chen Moye1, Wei Li, Jiachen Liu1, Xinyan Xiao1, Hua Wu1, Haifeng Wang1 
25 Oct 2021
TL;DR: Li et al. as discussed by the authors proposed a sub-graph selection method for multi-document summarization, in which source documents are regarded as relation graphs of sentences (e.g., similarity graph or discourse graph) and the candidate summaries are its subgraphs, and instead of selecting salient sentences, SgSum selects a salient subgraph from the relation graph as the summary.
Abstract: Most of existing extractive multi-document summarization (MDS) methods score each sentence individually and extract salient sentences one by one to compose a summary, which have two main drawbacks: (1) neglecting both the intra and cross-document relations between sentences; (2) neglecting the coherence and conciseness of the whole summary. In this paper, we propose a novel MDS framework (SgSum) to formulate the MDS task as a sub-graph selection problem, in which source documents are regarded as a relation graph of sentences (e.g., similarity graph or discourse graph) and the candidate summaries are its sub-graphs. Instead of selecting salient sentences, SgSum selects a salient sub-graph from the relation graph as the summary. Comparing with traditional methods, our method has two main advantages: (1) the relations between sentences are captured by modeling both the graph structure of the whole document set and the candidate sub-graphs; (2) directly outputs an integrate summary in the form of sub-graph which is more informative and coherent. Extensive experiments on MultiNews and DUC datasets show that our proposed method brings substantial improvements over several strong baselines. Human evaluation results also demonstrate that our model can produce significantly more coherent and informative summaries compared with traditional MDS methods. Moreover, the proposed architecture has strong transfer ability from single to multi-document input, which can reduce the resource bottleneck in MDS tasks.

1 citations

Journal ArticleDOI
TL;DR: In this paper , the authors proposed a news summarization system for the news data streams from RSS feeds and Google news, where news streams are summarized for the users with a novel summarization algorithm.
Abstract: News feed is one of the potential information providing sources which give updates on various topics of different domains. These updates on various topics need to be collected since the domain specific interested users are in need of important updates in their domains with organized data from various sources. In this paper, the news summarization system is proposed for the news data streams from RSS feeds and Google news. Since news stream analysis requires live content, the news data are continuously collected for our experimentation. The major contributions of this work involve domain corpus based news collection, news content extraction, hierarchical clustering of the news and summarization of news. Many of the existing news summarization systems lack in providing dynamic content with domain wise representation. This is alleviated in our proposed system by tagging the news feed with domain corpuses and organizing the news streams with the hierarchical structure with topic wise representation. Further, the news streams are summarized for the users with a novel summarization algorithm. The proposed summarization system generates topic wise summaries effectively for the user and no system in the literature has handled the news summarization by collecting the data dynamically and organizing the content hierarchically. The proposed system is compared with existing systems and achieves better results in generating news summaries. The Online news content editors are highly benefitted by this system for instantly getting the news summaries of their domain interest.

1 citations

Proceedings Article
02 Jun 2010
TL;DR: This demo describes the summarization of textual material about locations in the context of a geo-spatial information display system using a hierarchical summarization framework, conditioned on the small space available for display.
Abstract: This demo describes the summarization of textual material about locations in the context of a geo-spatial information display system. When the amount of associated textual data is large, it is organized and summarized before display. A hierarchical summarization framework, conditioned on the small space available for display, has been fully implemented. Snapshots of the system, with narrative descriptions, demonstrate our results.

1 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852