scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal Article
TL;DR: A multi-document summarization method based on spectral clustering that creatively takes the importance of each class into consideration, along with sentence position, length and other factors to obtain the score of importance of the sentences.
Abstract: This paper proposes a multi-document summarization method based on spectral clustering.Based on clustering topic-relevant sentences in the documents together,this method creatively takes the importance of each class into consideration,along with sentence position,length and other factors to obtain the score of importance of the sentences.The sentences are sorted according to the score and extracted that meet the requirement of number of words as the summarization.Experimental results show that this method performs better than traditional methods and can improve the quality of summarization effectively.

1 citations

Book ChapterDOI
01 Jan 2011
TL;DR: This paper identifies significant sentences in the subtitles of a video by using text summarization techniques and then compose a video summary by finding the video parts corresponding to these summary sentences.
Abstract: Video summarization algorithms present condensed versions of a full length video by identifying the most significant parts of the video. In this paper, we propose an automatic video summarization method using the subtitles of videos and text summarization techniques. We identify significant sentences in the subtitles of a video by using text summarization techniques and then we compose a video summary by finding the video parts corresponding to these summary sentences

1 citations

Book
01 Jan 1999
TL;DR: This dissertation addresses the problem of automatic summarization of multiple structured texts with an algorithm for creating discourse trees implemented with TEI encoded XML and develops a summarization architecture using standardized TEI tags for a specific type of text.
Abstract: My dissertation addresses the problem of automatic summarization of multiple structured texts I present an algorithm for creating discourse trees implemented with TEI encoded XML These discourse trees are efficiently combined to form a summarization tree using hierarchical representations of text structure Using this architecture, intelligent text summarization is possible My summarization method is completely domain independent and it allows users to compare and contrast related text My text summarization approach can be embedded into XML-capable browsers, into information retrieval systems, and into information extraction systems to manage different classes of documents, different parts of documents, and different types of information contained in a document My summarization architecture makes use of three theories of discourse structure to perform multiple document summarization RST provides the knowledge of a nucleus and a satellite to represent important parts of a structured text Much of the research in the area of RST makes the assumption that the text contained in the nucleus is more important than the text contained in the satellite I utilize this inherent characteristic to make decisions about summarization for argumentative text Specifically, I map argumentative “objects” (proposition, claim, evidence, etc) to the nucleus articles of RST I map argumentative “actions” (elaborates, supports, negates, etc) to the satellite articles of RST As such, I know that argumentative objects are more important to text summarization than argumentative actions The theories presented in DMS provide the concept of discourse segment hierarchies DMS outlines how the specific segment contributes to the overall purpose of the discourse In my architecture, the purpose for an argument is straightforward (ie, the author intends to persuade the reader to believe a given proposition) DMS allows for a hierarchy of segment types within objects and actions to represent the given argument DMS outlines the theory for the relationship between different segment types I use the hierarchical concepts and the structural relationships to support my decision to create a tree as the correct representation of written argumentative discourse Text types are also fundamental to my summarization approach My algorithm makes the basic assumption that all writers use a particular schema when producing an argument Text-type theory provides the supporting research for this assumption In my architecture, I utilize a standard schema for written argumentative discourse to combine documents written by more than one author I utilize text-type theory to define the overall structure of the schema and I use argumentation theory to define the components of that schema In my research, I investigated several architectural approaches and develop following: (1) I created a summarization architecture using standardized TEI tags for a specific type of text I relied on the structure of the underlying text instead of the grammar Thus, my technique is applicable to other text types described by the TEI (2) I combined knowledge of argumentative text types to create XML text trees with embedded TEI tags I determined how to create, manipulate, and analyze the XML trees and demonstrated the flexibility of output available from these XML trees (3) I combined argumentative text types, XML, and TEI tags to create an architectural model that utilizes industry standards

1 citations

Book ChapterDOI
Weikang Li1, Xingxing Zhang2, Yunfang Wu1, Furu Wei2, Ming Zhou2 
09 Oct 2019
TL;DR: A novel adaptation method to improve QMDS by using the relatively large datasets from DQA, which consists of a sentence encoder, a query filter and a document encoder that can model the sentence salience and query relevance well.
Abstract: Due to the lack of large scale datasets, it remains difficult to train neural Query-focused Multi-Document Summarization (QMDS) models. Several large size datasets on the Document-based Question Answering (DQA) have been released and numerous neural network models achieve good performance. These two tasks above are similar in that they all select sentences from a document to answer a given query/question. We therefore propose a novel adaptation method to improve QMDS by using the relatively large datasets from DQA. Specifically, we first design a neural network model to model both tasks. The model, which consists of a sentence encoder, a query filter and a document encoder, can model the sentence salience and query relevance well. Then we train this model on both the QMDS and DQA datasets with several different strategies. Experimental results on three benchmark DUC datasets demonstrate that our approach outperforms a variety of baselines by a wide margin and achieves comparable results with state-of-the-art methods.

1 citations

Book ChapterDOI
10 Oct 2017
TL;DR: The paper shows that taking into account the described factors positively affects the quality of the annotations created, which together contain the main details of the news story.
Abstract: The number of news articles that are published daily is larger than any person can afford to study. Correct summarization of the information allows for an easy search for the event of interest. This research was designed to address the issue of constructing annotations of news story. Standard multi-document summarization approaches are not able to extract all information relevant to the event. This is due to the fact that such approaches do not take into account the variability of the event context in time. We have implemented a system that automatically builds timeline summary. We investigated impact of three factors: query extension, accounting for temporal nature and structure of news article in form of inverted pyramid. The annotations that we generate are composed of sentences sorted in chronological order, which together contain the main details of the news story. The paper shows that taking into account the described factors positively affects the quality of the annotations created.

1 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852