scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Proceedings ArticleDOI
01 Jun 2014
TL;DR: TIMEMMR is proposed, a modification to Maximal Marginal Relevance that promotes temporal diversity by way of computing time span similarity, and its utility in summarizing certain document sets is shown.
Abstract: We study the use of temporal information in the form of timelines to enhance multidocument summarization. We employ a fully automated temporal processing system to generate a timeline for each input document. We derive three features from these timelines, and show that their use in supervised summarization lead to a significant 4.1% improvement in ROUGE performance over a state-of-the-art baseline. In addition, we propose TIMEMMR, a modification to Maximal Marginal Relevance that promotes temporal diversity by way of computing time span similarity, and show its utility in summarizing certain document sets. We also propose a filtering metric to discard noisy timelines generated by our automatic processes, to purify the timeline input for summarization. By selectively using timelines guided by filtering, overall summarization performance is increased by a significant 5.9%.

29 citations

Proceedings ArticleDOI
14 Sep 2004
TL;DR: Preliminary experimental results show that the proposed method outperforms the conventional basic summarization method under the evaluation scheme when dealing with diverse genres of Chinese documents with free writing style and flexible topic distribution.
Abstract: Automatic summarization is an important research issue in natural language processing. This paper presents a special summarization method to generate single-document summary with maximum topic completeness and minimum redundancy. It initially implements the semantic-class-based vector representations of various kinds of linguistic units in a document by means of HowNet (an existing ontology), which can improve the representation quality of traditional term-based vector space model in a certain degree. Then, by adopting K-means clustering algorithm as well as a clustering analysis algorithm, we can capture the number of different latent topic regions in a document adoptively. Finally, topic representative sentences are selected from each topic region to form the final summary. In order to evaluate the effectiveness of the proposed summarization method, a novel metric which is known as representation entropy is used for summarization redundancy evaluation. Preliminary experimental results show that the proposed method outperforms the conventional basic summarization method under the evaluation scheme when dealing with diverse genres of Chinese documents with free writing style and flexible topic distribution.

29 citations

01 Jan 2002
TL;DR: This thesis presents a cut-and-paste approach to addressing the text generation problem in domain-independent, single-document summarization, and built a large-scale, reusable lexicon by combining multiple, heterogeneous resources.
Abstract: Automatic text summarization provides a concise summary for a document. In this thesis, we present a cut-and-paste approach to addressing the text generation problem in domain-independent, single-document summarization. We found that professional abstractors often reuse the text in an original document for producing the text in a summary. But rather than simply extracting the original text, as in most existing automatic summarizers, humans often edit the extracted sentences. We call such editing operations “revision operations”. Our summarizer simulates two revision operations that are frequently used by humans: sentence reduction and sentence combination. Sentence reduction removes inessential phrases from sentences and sentence combination merges sentences and phrases together. The sentence reduction algorithm we propose relies on multiple sources of knowledge to decide when it is appropriate to delete a phrase from a sentence, including linguistic knowledge, probabilities trained from corpus examples, and context information. The sentence combination module relies on a set of rules to decide how to combine sentences and phrases and when to combine them. Sentence reduction aims to improve the conciseness of generated summaries and sentence combination aims to improve the coherence of generated summaries. We call this approach “cut-and-paste” since it produces summaries by excerpting and combining sentences and phrases from original documents, unlike the extraction technique which produces summaries by simply extracting sentences or passages. Our work also includes a Hidden Markov Model based sentence decomposition program which analyzes human-written summaries. The decomposition program identifies where the phrases of a summary originate in the original document, producing an aligned corpus of summaries and articles that we use to train and evaluate the summarizer. We also built a large-scale, reusable lexicon by combining multiple, heterogeneous resources. The lexicon contains lexical, syntactic, and semantic knowledge. It can be used in many applications.

29 citations

Journal ArticleDOI
TL;DR: This paper proposed a method to calculate sentence importance using scores, for responses to multiple questions, generated by a Question-Answering engine, and described the integration of this method with a generic multi-document summarization system.
Abstract: In recent years, answer-focused summarization has gained attention as a technology complementary to information retrieval and question answering. In order to realize multi-document summarization focused by multiple questions, we propose a method to calculate sentence importance using scores, for responses to multiple questions, generated by a Question-Answering engine. Further, we describe the integration of this method with a generic multi-document summarization system. The evaluation results demonstrate that the performance of the proposed method is better than not only several baselines but also other participants' systems at the evaluation workshop NTCIR4 TSC3 Formal Run. However, it should be noted that some of the other systems do not use the information of questions.

29 citations

Proceedings ArticleDOI
26 Nov 2003
TL;DR: The goal of the paper is to provide the basic definitions of widely used terms such as skimming, summarization, and highlighting and distinguish among the dimensions of task, content, and method and provide an extensive classification model for the same.
Abstract: The ability to summarize and abstract information will be an essential part of intelligent behavior in consumer devices. Various summarization methods have been the topic of intensive research in the content-based video analysis community. Summarization in traditional information retrieval is a well understood problem. While there has been a lot of research in the multimedia community there is no agreed upon terminology and classification of the problems in this domain. Although the problem has been researched from different aspects there is usually no distinction between the various dimensions of summarization. The goal of the paper is to provide the basic definitions of widely used terms such as skimming, summarization, and highlighting. The different levels of summarization: local, global, and meta-level are made explicit. We distinguish among the dimensions of task, content, and method and provide an extensive classification model for the same. We map the existing summary extraction approaches in the literature into this model and we classify the aspects of proposed systems in the literature. In addition, we outline the evaluation methods and provide a brief survey. Finally we propose future research directions based on the white spots that we identified by analysis of existing systems in the literature.

29 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852