Topic
Multi-document summarization
About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.
Papers published on a yearly basis
Papers
More filters
••
30 Jan 2015TL;DR: This article introduces Thai Elementary Discourse Units (TEDUs) and their derivatives, called Combined TEDUs (CTEDUs), and presents the three-stage method of Thai multi-document summarization, that is, unit segmentation, unit-graph formulation, and unit selection and summary generation.
Abstract: Due to lack of a word/phrase/sentence boundary, summarization of Thai multiple documents has several challenges in unit segmentation, unit selection, duplication elimination, and evaluation dataset construction. In this article, we introduce Thai Elementary Discourse Units (TEDUs) and their derivatives, called Combined TEDUs (CTEDUs), and then present our three-stage method of Thai multi-document summarization, that is, unit segmentation, unit-graph formulation, and unit selection and summary generation. To examine performance of our proposed method, a number of experiments are conducted using 50 sets of Thai news articles with their manually constructed reference summaries. Based on measures of ROUGE-1, ROUGE-2, and ROUGE-SU4, the experimental results show that: (1) the TEDU-based summarization outperforms paragraph-based summarization; (2) our proposed graph-based TEDU weighting with importance-based selection achieves the best performance; and (3) unit duplication consideration and weight recalculation help improve summary quality.
8 citations
•
01 Jan 2003
TL;DR: A novel method to extract a set of comprehensible sentences that centers on several key points is proposed, which generates a similarity network from documents with a lexical dictionary and applies spreading activation to rank sentences.
Abstract: Although there has been a great deal of research on automatic summarization, most methods are based on a statistical approach, disregarding relationships between extracted textual segments. To ensure sentence connectivity, we propose a novel method to extract a set of comprehensible sentences that centers on several key points. This method generates a similarity network from documents with a lexical dictionary and applies spreading activation to rank sentences. Also, we show evaluation results of a multi-document summarization system based on the method, participating in a competition of summarization, TSC (Text Summarization Challenge) task organized by the third NTCIR (NII-NACSIS Test Collection for IR Systems) project.
8 citations
••
25 Aug 2016TL;DR: This research work uses a degrading extraction approach to create a document summarization framework where in, if one extraction strategy fails then the model gracefully degrades to another.
Abstract: With the advent of information revolution, electronic documents have become the powerhouse of business and academic information. Modern organizations handle terabytes of data in text format alone. In order to fully understand and utilize these documents, it is necessary to be able to extract the essence of these documents. Having a system that would summarize text would thus be immensely useful in serving this need. For generating a summary, we have to identify the most important and relevant pieces of information from the document, omit irrelevant parts, and assemble them into a compact format. A lot of research has been performed for finding important sentences in a document. This research work focuses on identifying and extracting important parts of the document and to form a coherent summary using sentiment analysis. It uses a degrading extraction approach to create a document summarization framework where in, if one extraction strategy fails then the model gracefully degrades to another.
8 citations
•
02 Jun 2010TL;DR: This work explores the possibility of detecting novelty at various stages of summarization, and proposes new scoring features, re-ranking criterions and filtering strategies to identify "relevant novel" information.
Abstract: A Progressive summary helps a user to monitor changes in evolving news topics over a period of time Detecting novel information is the essential part of progressive summarization that differentiates it from normal multi document summarization In this work, we explore the possibility of detecting novelty at various stages of summarization New scoring features, Re-ranking criterions and filtering strategies are proposed to identify "relevant novel" information We compare these techniques using an automated evaluation framework ROUGE, and determine the best Overall, our summarizer is able to perform on par with existing prime methods in progressive summarization
8 citations
••
26 Nov 2003
TL;DR: By analyzing semantically important low-level and mid-level audiovisual features, this method universally summarizes the MPEG-1/-2 contents in the form of digest or highlight, and shows that news highlights and sports highlights in TV baseball games can be successfully extracted according to simple shot transition models.
Abstract: This paper addresses automatic summarization of MPEG audiovisual content on compressed domain By analyzing semantically important low-level and mid-level audiovisual features, our method universally summarizes the MPEG-1/-2 contents in the form of digest or highlight The former is a shortened version of an original, while the latter is an aggregation of important or interesting events In our proposal, first, the incoming MPEG stream is segmented into shots and the above features are derived from each shot Then the features are adaptively evaluated in an integrated manner, and finally the qualified shots are aggregated into a summary Since all the processes are performed completely on compressed domain, summarization is achieved at very low computational cost The experimental results show that news highlights and sports highlights in TV baseball games can be successfully extracted according to simple shot transition models As for digest extraction, subjective evaluation proves that meaningful shots are extracted from content without a priori knowledge, even if it contains multiple genres of programs Our method also has the advantage of generating an MPEG-7 based description such as summary and audiovisual segments in the course of summarization
8 citations