scispace - formally typeset
Search or ask a question
Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.


Papers
More filters
Journal ArticleDOI
30 Jan 2015
TL;DR: This article introduces Thai Elementary Discourse Units (TEDUs) and their derivatives, called Combined TEDUs (CTEDUs), and presents the three-stage method of Thai multi-document summarization, that is, unit segmentation, unit-graph formulation, and unit selection and summary generation.
Abstract: Due to lack of a word/phrase/sentence boundary, summarization of Thai multiple documents has several challenges in unit segmentation, unit selection, duplication elimination, and evaluation dataset construction. In this article, we introduce Thai Elementary Discourse Units (TEDUs) and their derivatives, called Combined TEDUs (CTEDUs), and then present our three-stage method of Thai multi-document summarization, that is, unit segmentation, unit-graph formulation, and unit selection and summary generation. To examine performance of our proposed method, a number of experiments are conducted using 50 sets of Thai news articles with their manually constructed reference summaries. Based on measures of ROUGE-1, ROUGE-2, and ROUGE-SU4, the experimental results show that: (1) the TEDU-based summarization outperforms paragraph-based summarization; (2) our proposed graph-based TEDU weighting with importance-based selection achieves the best performance; and (3) unit duplication consideration and weight recalculation help improve summary quality.

8 citations

Proceedings Article
01 Jan 2003
TL;DR: A novel method to extract a set of comprehensible sentences that centers on several key points is proposed, which generates a similarity network from documents with a lexical dictionary and applies spreading activation to rank sentences.
Abstract: Although there has been a great deal of research on automatic summarization, most methods are based on a statistical approach, disregarding relationships between extracted textual segments. To ensure sentence connectivity, we propose a novel method to extract a set of comprehensible sentences that centers on several key points. This method generates a similarity network from documents with a lexical dictionary and applies spreading activation to rank sentences. Also, we show evaluation results of a multi-document summarization system based on the method, participating in a competition of summarization, TSC (Text Summarization Challenge) task organized by the third NTCIR (NII-NACSIS Test Collection for IR Systems) project.

8 citations

Proceedings ArticleDOI
25 Aug 2016
TL;DR: This research work uses a degrading extraction approach to create a document summarization framework where in, if one extraction strategy fails then the model gracefully degrades to another.
Abstract: With the advent of information revolution, electronic documents have become the powerhouse of business and academic information. Modern organizations handle terabytes of data in text format alone. In order to fully understand and utilize these documents, it is necessary to be able to extract the essence of these documents. Having a system that would summarize text would thus be immensely useful in serving this need. For generating a summary, we have to identify the most important and relevant pieces of information from the document, omit irrelevant parts, and assemble them into a compact format. A lot of research has been performed for finding important sentences in a document. This research work focuses on identifying and extracting important parts of the document and to form a coherent summary using sentiment analysis. It uses a degrading extraction approach to create a document summarization framework where in, if one extraction strategy fails then the model gracefully degrades to another.

8 citations

Proceedings Article
02 Jun 2010
TL;DR: This work explores the possibility of detecting novelty at various stages of summarization, and proposes new scoring features, re-ranking criterions and filtering strategies to identify "relevant novel" information.
Abstract: A Progressive summary helps a user to monitor changes in evolving news topics over a period of time Detecting novel information is the essential part of progressive summarization that differentiates it from normal multi document summarization In this work, we explore the possibility of detecting novelty at various stages of summarization New scoring features, Re-ranking criterions and filtering strategies are proposed to identify "relevant novel" information We compare these techniques using an automated evaluation framework ROUGE, and determine the best Overall, our summarizer is able to perform on par with existing prime methods in progressive summarization

8 citations

Proceedings ArticleDOI
26 Nov 2003
TL;DR: By analyzing semantically important low-level and mid-level audiovisual features, this method universally summarizes the MPEG-1/-2 contents in the form of digest or highlight, and shows that news highlights and sports highlights in TV baseball games can be successfully extracted according to simple shot transition models.
Abstract: This paper addresses automatic summarization of MPEG audiovisual content on compressed domain By analyzing semantically important low-level and mid-level audiovisual features, our method universally summarizes the MPEG-1/-2 contents in the form of digest or highlight The former is a shortened version of an original, while the latter is an aggregation of important or interesting events In our proposal, first, the incoming MPEG stream is segmented into shots and the above features are derived from each shot Then the features are adaptively evaluated in an integrated manner, and finally the qualified shots are aggregated into a summary Since all the processes are performed completely on compressed domain, summarization is achieved at very low computational cost The experimental results show that news highlights and sports highlights in TV baseball games can be successfully extracted according to simple shot transition models As for digest extraction, subjective evaluation proves that meaningful shots are extracted from content without a priori knowledge, even if it contains multiple genres of programs Our method also has the advantage of generating an MPEG-7 based description such as summary and audiovisual segments in the course of summarization

8 citations


Network Information
Related Topics (5)
Natural language
31.1K papers, 806.8K citations
85% related
Ontology (information science)
57K papers, 869.1K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
83% related
Graph (abstract data type)
69.9K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202374
2022160
202152
202061
201947
201852