Topic

Multi-document summarization

About: Multi-document summarization is a research topic. Over the lifetime, 2270 publications have been published within this topic receiving 71850 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Internet as Corpus : Automatic Construction of a Swedish News Corpus

[...]

Martin Hassel¹•Institutions (1)

Royal Institute of Technology¹

01 Jan 2001

TL;DR: This thesis contributes a novel approach to highly portable automatic text summarization, coupled with methods for building the needed corpora, both for training and evaluation on the new language.

...read moreread less

Abstract: Today, with digitally stored information available in abundance, even for many minor languages, this information must by some means be filtered and extracted in order to avoid drowning in it. Automatic summarization is one such technique, where a computer summarizes a longer text to a shorter non-rendundant form. Apart from the major languages of the world there are a lot of languages for which large bodies of data aimed at language technology research to a high degree are lacking. There might also not be resources available to develop such bodies of data, since it is usually time consuming and requires substantial manual labor, hence being expensive. Nevertheless, there will still be a need for automatic text summarization for these languages in order to subdue this constantly increasing amount of electronically produced text. This thesis thus sets the focus on automatic summarization of text and the evaluation of summaries using as few human resources as possible. The resources that are used should to as high extent as possible be already existing, not specifically aimed at summarization or evaluation of summaries and, preferably, created as part of natural literary processes. Moreover, the summarization systems should be able to be easily assembled using only a small set of basic language processing tools, again, not specifically aimed at summarization/evaluation. The summarization system should thus be near language independent as to be quickly ported between different natural languages. The research put forth in this thesis mainly concerns three computerized systems, one for near language independent summarization – The HolSum summarizer; one for the collection of large-scale corpora – The KTH News Corpus; and one for summarization evaluation – The KTH eXtract Corpus. These three systems represent three different aspects of transferring the proposed summarization method to a new language. One aspect is the actual summarization method and how it relates to the highly irregular nature of human language and to the difference in traits among language groups. This aspect is discussed in detail in Chapter 3. This chapter also presents the notion of “holistic summarization”, an approach to self-evaluative summarization that weighs the fitness of the summary as a whole, by semantically comparing it to the text being summarized, before presenting it to the user. This approach is embodied as the text summarizer HolSum, which is presented in this chapter and evaluated in Paper 5. A second aspect is the collection of large-scale corpora for languages where few or none such exist. This type of corpora is on the one hand needed for building the language model used by HolSum when comparing summaries on semantic grounds, on the other hand a large enough set of (written) language use is needed to guarantee the randomly selected subcorpus used for evaluation to be representative. This topic briefly touched upon in Chapter 4, and detailed in Paper 1. The third aspect is, of course, the evaluation of the proposed summarization method on a new language. This aspect is investigated in Chapter 4. Evaluations of HolSum have been run on English as well as on Swedish, using both well established data and evaluation schemes (English) as well as with corpora gathered “in the wild” (Swedish). During the development of the latter corpora, which is discussed in Paper 4, evaluations of a traditional sentence ranking text summarizer, SweSum, have also been run. These can be found in Paper 2 and 3. This thesis thus contributes a novel approach to highly portable automatic text summarization, coupled with methods for building the needed corpora, both for training and evaluation on the new language.

...read moreread less

15 citations

Patent•

Scalable Summarization of Data Graphs

[...]

Songyun Duan¹, Achille B. Fokoue-Nkoutche¹, Anastasios Kementsietsidis¹, Wangchao Le¹, Feifei Li¹, Kavitha Srinivas¹ - Show less +2 more•Institutions (1)

IBM¹

20 Nov 2012

TL;DR: In this paper, a succinct and effective summarization is built from the underlying resource description framework data given a keyword query, the summarization lends significant pruning powers to exploratory keyword searches and leads to much better efficiency compared to previous work.

...read moreread less

Abstract: Keyword searching is used to explore and search large Resource Description Framework datasets having unknown or constantly changing structures. A succinct and effective summarization is built from the underlying resource description framework data. Given a keyword query, the summarization lends significant pruning powers to exploratory keyword searches and leads to much better efficiency compared to previous work. The summarization returns exact results and can be updated incrementally and efficiently.

...read moreread less

15 citations

Proceedings Article•

A Supervised Aggregation Framework for Multi-Document Summarization

[...]

Yulong Pei¹, Wenpeng Yin¹, Qifeng Fan, Lian'en Huang¹•Institutions (1)

Peking University¹

01 Dec 2012

TL;DR: A novel supervised aggregation approach for summarization is proposed which combines different summarization methods including LexPageRank, LexHITS, manifold-ranking method and DivRank and the results demonstrate the effectiveness of the supervised aggregation method compared with typical ensemble approaches.

...read moreread less

Abstract: In most summarization approaches, sentence ranking plays a vital role. Most previous work explored different features and combined them into unified ranking methods. However, it would be imprecise to rank sentences from a single point of view because contributions from the features are onefold in these methods. In this paper, a novel supervised aggregation approach for summarization is proposed which combines different summarization methods including LexPageRank, LexHITS, manifold-ranking method and DivRank. Human labeled data are used to train an optimization model which combines these multiple summarizers and then the weights assigned to each individual summarizer are learned. Experiments are conducted on DUC2004 data set and the results demonstrate the effectiveness of the supervised aggregation method compared with typical ensemble approaches. In addition, we also investigate the influence of training data construction and component diversity on the summarization results.

...read moreread less

15 citations

Proceedings Article•DOI•

Automated extractive single-document summarization: beating the baselines with a new approach

[...]

Araly Barrera¹, Rakesh M. Verma¹•Institutions (1)

University of Houston¹

21 Mar 2011

TL;DR: This study introduces a new approach to single document summarization and its implementation, SynSem, which fuses syntactic, semantic, and statistical methodologies and reflects the importance of text headings in articles along with the presence of thematic keywords in sentences.

...read moreread less

Abstract: Single document summarization, which is as important as multiple document summarization for a variety of reasons, has been attracting declining interest recently. The goal of this study is to introduce a new approach to single document summarization and its implementation, SynSem. Our approach fuses syntactic, semantic, and statistical methodologies and reflects the importance of text headings in articles along with the presence of thematic keywords in sentences. Successful summary evaluation results are demonstrated when SynSem is tested on the Document Understanding Conference (DUC) 2002 data set using ROUGE, which compares single document summaries to baselines.

...read moreread less

15 citations

Proceedings Article•DOI•

Multiple Text Document Summarization System using hybrid Summarization technique

[...]

Harsha Dave¹, Shree Jaswal¹•Institutions (1)

St. Francis Institute of Technology¹

01 Sep 2015

TL;DR: This paper presents a novel approach to generate abstractive summary from extractive summary using WordNet ontology and an experimental result shows the generated summary in well-compressed, grammatically correct and human readable format.

...read moreread less

Abstract: Text Summarization plays an important role in the area of text mining and natural language processing. As the information resources are increasing tremendously, readers are overloaded with loads of information. Finding out the relevant data and manually summarizing it in short time is much more difficult, challenging and tedious task for a human being. Text Summarization aims to compress the source text into a shorter and concise form with preserving its information content and overall meaning. Summarization can be classified into two main categories i.e. extractive summarization and abstractive summarization. This paper presents a novel approach to generate abstractive summary from extractive summary using WordNet ontology. An experimental result shows the generated summary in well-compressed, grammatically correct and human readable format.

...read moreread less

15 citations

Collapse

Network Information

Performance

Metrics

2,507

Papers

81,726

Citations

No. of papers in the topic in previous years
Year	Papers
2023	74
2022	160
2021	52
2020	61
2019	47
2018	52

Multi-document summarization

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics