Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

An approach to concept-obtained text summarization

[...]

Meng Wang¹, Xiaorong Wang, Chao Xu•Institutions (1)

Guangxi University of Technology¹

12 Oct 2005

TL;DR: The idea of the approach is to obtain concepts of words based on HowNet, and use concept as feature, instead of word, to form a rough summarization, and then calculate degree of semantic similarity of sentence for reducing its redundancy.

...read moreread less

Abstract: In this paper, we propose a practical approach for extracting the most relevant sentences from the original document to form a summary. The idea of our approach is to obtain concepts of words based on HowNet, and use concept as feature, instead of word. We use conceptual vector space model to form a rough summarization, and then calculate degree of semantic similarity of sentence for reducing its redundancy. Experimental results show that our approach is effective and efficient, and performance of the system is reliable.

...read moreread less

8 citations

Dissertation•

Graph-based models for multi-document summarization

[...]

Canhasi Ercan

17 Apr 2014

TL;DR: This thesis proposes a novel AA based summarization method, the weighted Hierarchical Archetypal Analysis, and investigates the impact of using the content-graph and multi-element graph model for language- and domain-independent extractive multi-document generic and query focused summarization.

...read moreread less

Abstract: This thesis is about automatic document summarization, with experimental results on general, query, update and comparative multi-document summarization (MDS). We describe prior work and our own improvements on some important aspects of a summarization system, including text modeling by means of a graph and sentence selection via archetypal analysis. The centerpiece of this work is a novel method for summarization that we call “Archetypal Analysis Summarization”. Archetypal Analysis (AA) is a promising unsupervised learning tool able to completely assemble the advantages of clustering and the flexibility of matrix factorization. We propose a novel AA based summarization method based on following observations. In generic document summarization, given a graph representation of a set of documents, positively and/or negatively salient sentences are values on the data set boundary. To compute these extreme values, general or weighted archetypes, we choose to use archetypal analysis and weighted archetypal analysis, respectively. While each sentence in a data set is estimated as a mixture of archetypal sentences, the archetypes themselves are restricted to being sparse mixtures, i.e. convex combinations of the original sentences. Since AA in this way readily offers soft clustering and probabilistic ranking, we suggest considering it as a method for simultaneous sentence clustering and ranking. Another important argument in favour of using AA in MDS is that in contrast to other factorization methods which extract prototypical, characteristic, even basic sentences, AA selects distinct (archetypal) sentences, thus induces variability and diversity in produced summaries. Our research contributes by presenting some new modeling approaches based on graph notation which facilitate the text summarization task. We investigate the impact of using the content-graph and multi-element graph model for language- and domain-independent extractive multi-document generic and query focused summarization. We also propose the novel version of AA, the weighted Hierarchical Archetypal Analysis. We consider the use of it for four best-known summarization tasks, including generic, query-focused, update, and comparative summarization. Experiments on summarization data sets (DUC04-07, TAC08) are conducted to demonstrate the efficiency and effectiveness of our framework for all four kinds of the multi-document summarization task.

...read moreread less

8 citations

Journal Article•DOI•

Conceptual framework for abstractive text summarization

[...]

Nikita Munot, Sharvari Govilkar

28 Feb 2015

TL;DR: The idea is to propose a system that will accept single document as input in English and processes the input by building a rich semantic graph and then reducing this graph for generating the final summary.

...read moreread less

Abstract: As the volume of information available on the Internet increases, there is a growing need for tools helping users to find, filter and manage these resources. While more and more textual information is available on- line, effective retrieval is difficult without proper indexing and summarization of the content. One of the possible solutions to this problem is abstractive text summarization. The idea is to propose a system that will accept single document as input in English and processes the input by building a rich semantic graph and then reducing this graph for generating the final summary. Text summarization is one of the most popular research areas today because of the problem of the information overloading available on the web, and has increased the necessity of the more strong and powerful text summarizers. The condensation of information from text is needed and this can be achieved by text summarization by reducing the length of the original text. Text summarization is commonly classified into two types extractive and abstractive. Extractive summarization means extracting few sentences from the original document based on some statistical factors and adding them into summary. Extractive summarization usually tends to sentence extraction rather than summarization. Whereas abstractive summarization are more powerful than extractive summarization because they generate the sentences based on their semantic meaning. Hence this leads to a meaningful summarization which is more accurate than extractive summaries. Summarization by extractive just extracts the sentences from the original document and adds them to summary. Extractive method is based on statistical features not on semantic relation with sentences (2) and are easier to implement. Therefore the summary generated by this method tends to be inconsistent. Summarization by abstraction needs understanding of the original text and then generating the summary which is semantically related. It is difficult to compute abstractive summary because it needs understanding of complex natural language processing tasks. There are few issues of extractive summarization. Extracted sentences usually tend to be longer than average. Due to this, parts of the segments that are not essential for summary also get included, consuming space. Important or relevant information is usually spread across sentences, and extractive summaries cannot capture this (unless the summary is long enough to hold all those sentences). Conflicting information may not be presented accurately. Pure extraction often leads to problems in overall coherence of the summary. These problems become more severe in the multi-document case, since extracts are drawn from different sources. Therefore abstractive

...read moreread less

8 citations

Proceedings Article•DOI•

Multi-layered summarization of spoken document archives by information extraction and semantic structuring.

[...]

Lin-Shan Lee, Sheng-yi Kong, Yi-Cheng Pan, Yi-sheng Fu, Yu-tsun Huang - Show less +1 more

17 Sep 2006

8 citations

Journal Article•DOI•

Optimizing word set coverage for multi-event summarization

[...]

Jihong Yan¹, Wenliang Cheng², Chengyu Wang², Jun Liu³, Ming Gao², Aoying Zhou² - Show less +2 more•Institutions (3)

Shanghai Second Polytechnic University¹, East China Normal University², Shanghai Jiao Tong University³

01 Nov 2015-Journal of Combinatorial Optimization

TL;DR: A novel methodology is proposed that identifies events from these text corpora and creates summarization for each event using a probabilistic, topic model to learn the potential topics from the massive documents and further discover events in terms of the topic distributions of documents.

...read moreread less

Abstract: We have witnessed the proliferation of the Internet over the past few decades. A large amount of textual information is generated on the Web. It is impossible to locate and digest all the latest updates available on the Web for individuals. Text summarization would provide an efficient way to generate short, concise abstracts from the massive documents. These massive documents involve many events which are hard to be identified by the summarization procedure directly. We propose a novel methodology that identifies events from these text corpora and creates summarization for each event. We employ a probabilistic, topic model to learn the potential topics from the massive documents and further discover events in terms of the topic distributions of documents. To target the summarization, we define the word set coverage problem (WSCP) to capture the most representative sentences to summarize an event. For getting solution of the WSCP, we propose an approximate algorithm to solve the optimization problem. We conduct a set of experiments to evaluate our proposed approach on two real datasets: Sina news and Johnson & Johnson medical news. On both datasets, our proposed method outperforms competitive baselines by considering the harmonic mean of coverage and conciseness.

...read moreread less

7 citations

Collapse

Network Information

Performance

Metrics

2,507

Papers

81,726

Citations

No. of papers in the topic in previous years
Year	Papers
2023	74
2022	160
2021	52
2020	61
2019	47
2018	52

Multi-document summarization

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics