scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
01 Jan 1998
TL;DR: The research described here focuses on multi-lingual summarization (MLS), where summaries of documents are produced in their original language; corresponding summaries in English will eventually be generated.
Abstract: The research described here focuses on multi-lingual summarization (MLS). Summaries of documents are produced in their original language; corresponding summaries in English will eventually be generated. The source languages supported are Spanish, Japanese, English and Russian. Background

11 citations

Proceedings ArticleDOI
09 Aug 2015
TL;DR: This paper proposes StarSum a star bipartite graph which models sentences and their topic signature phrases and extracts sentences in an approach that guarantees diversity and coverage which are crucial for multi-document summarization.
Abstract: Graph-based approaches for multi-document summarization have been widely used to extract top sentences for a summary. Traditionally, the documents' cluster is modeled as a graph of the cluster's sentences only which might limit the ability of recognizing topically discriminative sentences in regard to other clusters. In this paper, we propose StarSum a star bipartite graph which models sentences and their topic signature phrases. The approach ensures sentence similarity and content importance from the graph structure. We extract sentences in an approach that guarantees diversity and coverage which are crucial for multi-document summarization. Regardless of the simplicity of the approach in ranking, a DUC experiment shows the effectiveness of StarSum compared to different baselines.

11 citations

DOI
01 Jan 2007
TL;DR: A formal model of video summarization specialized for the generation of video previews is introduced and the results have shown that previews generated using this optimization-based approach are not as good as manually made previews, but have higher quality than previews created using subsample.
Abstract: The amount of digital video content available to users is rapidly increasing. Developments in computer, digital network, and storage technologies all contribute to broaden the offer of digital video. Only users’ attention and time remain scarce resources. Users face the problem of choosing the right content to watch among hundreds of potentially interesting offers. Video and audio have a dynamic nature: they cannot be properly perceived without considering their temporal dimension. This property makes it difficult to get a good idea of what a video item is about without watching it. Video previews aim at solving this issue by providing compact representations of video items that can help users making choices in massive content collections. This thesis is concerned with solving the problem of automatic creation of video previews. To allow fast and convenient content selection, a video preview should take into consideration more than thirty requirements that we have collected by analyzing related literature on video summarization and film production. The list has been completed with additional requirements elicited by interviewing end-users, experts and practitioners in the field of video editing and multimedia. This list represents our collection of user needs with respect to video previews. The requirements, presented from the point of view of the end-users, can be divided into seven categories: duration, continuity, priority, uniqueness, exclusion, structural, and temporal order. Duration requirements deal with the durations of the preview and its subparts. Continuity requirements request video previews to be as continuous as possible. Priority requirements indicate which content should be included in the preview to convey as much information as possible in the shortest time. Uniqueness requirements aim at maximizing the efficiency of the preview by minimizing redundancy. Exclusion requirements indicate which content should not be included in the preview. Structural requirements are concerned with the structural properties of video, while temporal order requirements set the order of the sequences included in the preview. Based on these requirements, we have introduced a formal model of video summarization specialized for the generation of video previews. The basic idea is to translate the requirements into score functions. Each score function is defined to have a non-positive value if a requirement is not met, and to increase depending on the degree of fulfillment of the requirement. A global objective function is then defined that combines all the score functions and the problem of generating a preview is translated into the problem of finding the parts of the initial content that maximize the objective function. Our solution approach is based on two main steps: preparation and selection. In the preparation step, the raw audiovisual data is analyzed and segmented into basic elements that are suitable for being included in a preview. The segmentation of the raw data is based on a shot-cut detection algorithm. In the selection step various content analysis algorithms are used to perform scene segmentation, advertisements detection and to extract numerical descriptors of the content that, introduced in the objective function, allow to estimate the quality of a video preview. The core part of the selection step is the optimization step that consists in searching the set of segments that maximizes the objective function in the space of all possible previews. Instead of solving the optimization problem exactly, an approximate solution is found by means of a local search algorithm using simulated annealing. We have performed a numerical evaluation of the quality of the solutions generated by our algorithm with respect to previews generated randomly or by selecting segments uniformly in time. The results on thirty content items have shown that the local search approach outperforms the other methods. However, based on this evaluation, we cannot conclude that the degree of fulfillment of the requirements achieved by our method satisfies the end-user needs completely. To validate our approach and assess end-user satisfaction, we conducted a user evaluation study in which we compared six aspects of previews generated using our algorithm to human-made previews and to previews generated by subsampling. The results have shown that previews generated using our optimization-based approach are not as good as manually made previews, but have higher quality than previews created using subsample. The differences between the previews are statistically significant.

11 citations

Book ChapterDOI
08 Apr 2002
TL;DR: It is argued that for summaries to be truly useful within data mining, they must include concepts abstracted from the text in addition to sentences extracted from theText summarization.
Abstract: Text summarizers automatically construct summaries of a natural-language document. This paper examines the use of text summarization within data mining, identifying the potential summarizers have for uncovering interesting and unexpected information. It describes the current state of the art in commercial summarization and current approaches to the evaluation of summarizers. The paper then proposes a new model for text summarization and suggests a new form of evaluation. It argues that for summaries to be truly useful within data mining, they must include concepts abstracted from the text in addition to sentences extracted from the text. The paper uses two news articles to illustrate its points.

11 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852