scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
09 Nov 2016
TL;DR: It is shown that in order to extract meaning summaries, it is not crucial what is being said; but rather how it is beingSaid, to predict summary candidature scores, regardless topics in unseen documents.
Abstract: We investigate a novel framework for Automatic Text Summarization. In this framework underlying language-use features are learned from a minimal sample corpus. We argue the low complexity of this kind of features allows relying in generalization ability of a learning machine, rather than in diverse human-abstracted summaries. In this way, our method reliably estimates a relevance measure for predicting summary candidature scores, regardless topics in unseen documents. Our output summaries are comparable to the state-of-the-art. Thus we show that in order to extract meaning summaries, it is not crucial what is being said; but rather how it is being said.

4 citations

Book ChapterDOI
01 Jan 2013
TL;DR: This paper reports an initial study that aims to assess the viability of multi-document summarization techniques for automatic captioning of geo-referenced images and shows that query-based summaries perform better than generic ones and thus are more appropriate for the task of image captioning or generation of short descriptions related to the location/place captured in the image.
Abstract: This paper reports an initial study that aims to assess the viability of multi-document summarization techniques for automatic captioning of geo-referenced images The automatic captioning procedure requires summarizing multiple Web documents that contain information related to images’ location We use different state-of-the art summarization systems to generate generic and query-based multi-document summaries and evaluate them using ROUGE metrics [24] relative to human generated summaries Results show that query-based summaries perform better than generic ones and thus are more appropriate for the task of image captioning or generation of short descriptions related to the location/place captured in the image For our future work in automatic image captioning this result suggests that developing the query-based summarizer further and biasing it to account for user-specific requirements will prove worthwhile

4 citations

01 Jan 2014
TL;DR: Context-Based similarity analysis for document summarization extracts a condensed version of the original document in the information retrieval task, using the similarity between sentences in the document to extract the most salient sentences.
Abstract: Context-Based similarity analysis for document summarization extracts a condensed version of the original document in the information retrieval task. The document summarization mainly uses the similarity between sentences in the document to extract the most salient sentences. A document summary is useful to give an overview of the original document in a shorter period of time. The sentence similarity values remain independent of the context. The context is not taken into consideration for the document as well as the sentences are indexed using traditional term indexing measures. Context sensitive document indexing model based on the Bernoulli model of randomness is used for document summarization process. The lexical association between terms is used to produce a context sensitive weight to the document terms. The context sensitive indexing weights are used to compute the sentence similarity matrix and as a result, the informative sentences are presented on the top of the summary. The quality of the summary is to make a positive impact.

4 citations

Book ChapterDOI
07 Jan 2013
TL;DR: This work proposes a novel multi-document summarization technique which employs the tag cluster on Flickr, a kind of folksonomy systems, for detecting key sentences from multiple documents and creates a word frequency table for analyzing the semantics and contribution of words by using HITS algorithm.
Abstract: Multi-document summarization techniques aim to reduce the documents into a small set of words or paragraphs that convey the main meaning of the original documents. Many approaches for multi-document summarization have used probability based methods and machine learning techniques to summarize multiple documents sharing a common topic at the same time. However, these techniques fail to semantically analyze proper nouns and newly-coined words because most of them depend on old-fashioned dictionary or thesaurus. To overcome these drawbacks, we propose a novel multi-document summarization technique which employs the tag cluster on Flickr, a kind of folksonomy systems, for detecting key sentences from multiple documents. We first create a word frequency table for analyzing the semantics and contribution of words by using HITS algorithm. Then, by exploiting tag clusters, we analyze the semantic relationship between words in the word frequency table. The experimental results on TAC 2008, 2009 data sets demonstrate the improvement of our proposed framework over existing summarization systems.

4 citations

Journal Article
TL;DR: A cascaded regression analysis based macro-micro importance discriminative model for the content selection of TMDS is presented, which mines the temporal characteristics at different levels of topical detail in order to provide the cue for extracting the important content.
Abstract: Temporal multi-document summarization (TMDS) aims to capture evolving information of a single topic over time and produce a summary delivering the main information content. This paper presents a cascaded regression analysis based macro-micro importance discriminative model for the content selection of TMDS, which mines the temporal characteristics at different levels of topical detail in order to provide the cue for extracting the important content. Temporally evolving data can be treated as dynamic objects that have changing content over time. Firstly, we extract important time points with macro importance discriminative model, then extract important sentences in these time points with micro importance discriminative model. Macro and micro importance discriminative models are combined to form a cascaded regression analysis approach. The summary is made up of the important sentences evolving over time. Experiments on five Chinese datasets demonstrate the encouraging performance of the proposed approach, but the problem is far from solved.

4 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852