Towards Abstractive Multi-Document Summarization Using Submodular Function-Based Framework, Sentence Compression and Merging

Open AccessProceedings Article

Towards Abstractive Multi-Document Summarization Using Submodular Function-Based Framework, Sentence Compression and Merging

- Vol. 2, pp 418-424

TLDR

A submodular function-based summarization system which integrates three important measures namely importance, coverage, and non-redundancy to detect the important sentences for the summary is proposed.

Abstract:

We propose a submodular function-based summarization system which integrates three important measures namely importance, coverage, and non-redundancy to detect the important sentences for the summary. We design monotone and submodular functions which allow us to apply an efficient and scalable greedy algorithm to obtain informative and well-covered summaries. In addition, we integrate two abstraction-based methods namely sentence compression and merging for generating an abstractive sentence set. We design our summarization models for both generic and query-focused summarization. Experimental results on DUC-2004 and DUC-2007 datasets show that our generic and query-focused summarizers have outperformed the state-of-the-art summarization systems in terms of ROUGE-1 and ROUGE-2 recall and F-measure.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal

Demian Gholipour Ghalandari, +3 more

TL;DR: This work presents a new dataset for MDS that is large both in the total number of document clusters and in the size of individual clusters, and provides a quantitative analysis of the dataset and empirical results for several state-of-the-art MDS techniques.

...read moreread less

Proceedings Article

Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion

Mir Tafseer Nayeem, +2 more

TL;DR: A paraphrastic sentence fusion model which jointly performs sentence fusion and paraphrasing using skip-gram word embedding model at the sentence level is designed which improves the information coverage and at the same time abstractiveness of the generated sentences.

...read moreread less

Posted Content

DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion

Mor Geva, +3 more

- 27 Feb 2019 -

arXiv: Computation and Language

TL;DR: A method for automatically-generating fusion examples from raw text and a sequence-to-sequence model on DiscoFuse, a large scale dataset for discourse-based sentence fusion, are proposed and shown to improve performance on WebSplit when viewed as a sentence fusion task.

...read moreread less

Proceedings ArticleDOI

Improving Neural Abstractive Document Summarization with Structural Regularization

Wei Li, +3 more

TL;DR: This paper proposes to leverage the structural information of both documents and multi-sentence summaries to improve the document summarization performance and imports both structural-compression and structural-coverage regularization into the summarization process in order to capture the information compression and information coverage properties.

...read moreread less

Proceedings ArticleDOI

Extract with Order for Coherent Multi-Document Summarization

Mir Tafseer Nayeem, +1 more

TL;DR: A rank based sentence selection using continuous vector representations along with key-phrases is implemented and a model to tackle summary coherence for increasing readability is proposed.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Proceedings Article

ROUGE: A Package for Automatic Evaluation of Summaries

Chin-Yew Lin

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.

...read moreread less

Journal ArticleDOI

An algorithm for suffix stripping

M. F. Porter

- 01 Dec 1997 -

Program: Electronic Library and Informat...

TL;DR: An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.

...read moreread less

Proceedings ArticleDOI

The Stanford CoreNLP Natural Language Processing Toolkit

Christopher D. Manning, +5 more

TL;DR: The design and use of the Stanford CoreNLP toolkit is described, an extensible pipeline that provides core natural language analysis, and it is suggested that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.

...read moreread less

Collapse

Towards Abstractive Multi-Document Summarization Using Submodular Function-Based Framework, Sentence Compression and Merging

Citations

A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal

Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion

DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion

Improving Neural Abstractive Document Summarization with Structural Regularization

Extract with Order for Coherent Multi-Document Summarization

References

Latent Dirichlet Allocation

Distributed Representations of Words and Phrases and their Compositionality

ROUGE: A Package for Automatic Evaluation of Summaries

An algorithm for suffix stripping

The Stanford CoreNLP Natural Language Processing Toolkit

Related Papers (5)

ROUGE: A Package for Automatic Evaluation of Summaries

A Class of Submodular Functions for Document Summarization

Improving Query-Based Summarization Using Document Graphs

A new sentence similarity measure and sentence based extractive technique for automatic text summarization

Evolutionary Algorithm for Extractive Text Summarization