scispace - formally typeset
Open AccessProceedings Article

Towards Abstractive Multi-Document Summarization Using Submodular Function-Based Framework, Sentence Compression and Merging

TLDR
A submodular function-based summarization system which integrates three important measures namely importance, coverage, and non-redundancy to detect the important sentences for the summary is proposed.
Abstract
We propose a submodular function-based summarization system which integrates three important measures namely importance, coverage, and non-redundancy to detect the important sentences for the summary. We design monotone and submodular functions which allow us to apply an efficient and scalable greedy algorithm to obtain informative and well-covered summaries. In addition, we integrate two abstraction-based methods namely sentence compression and merging for generating an abstractive sentence set. We design our summarization models for both generic and query-focused summarization. Experimental results on DUC-2004 and DUC-2007 datasets show that our generic and query-focused summarizers have outperformed the state-of-the-art summarization systems in terms of ROUGE-1 and ROUGE-2 recall and F-measure.

read more

Citations
More filters
Proceedings ArticleDOI

A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal

TL;DR: This work presents a new dataset for MDS that is large both in the total number of document clusters and in the size of individual clusters, and provides a quantitative analysis of the dataset and empirical results for several state-of-the-art MDS techniques.
Proceedings Article

Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion

TL;DR: A paraphrastic sentence fusion model which jointly performs sentence fusion and paraphrasing using skip-gram word embedding model at the sentence level is designed which improves the information coverage and at the same time abstractiveness of the generated sentences.
Posted Content

DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion

TL;DR: A method for automatically-generating fusion examples from raw text and a sequence-to-sequence model on DiscoFuse, a large scale dataset for discourse-based sentence fusion, are proposed and shown to improve performance on WebSplit when viewed as a sentence fusion task.
Proceedings ArticleDOI

Improving Neural Abstractive Document Summarization with Structural Regularization

TL;DR: This paper proposes to leverage the structural information of both documents and multi-sentence summaries to improve the document summarization performance and imports both structural-compression and structural-coverage regularization into the summarization process in order to capture the information compression and information coverage properties.
Proceedings ArticleDOI

Extract with Order for Coherent Multi-Document Summarization

TL;DR: A rank based sentence selection using continuous vector representations along with key-phrases is implemented and a model to tackle summary coherence for increasing readability is proposed.
References
More filters
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Proceedings Article

ROUGE: A Package for Automatic Evaluation of Summaries

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.
Journal ArticleDOI

An algorithm for suffix stripping

TL;DR: An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.
Proceedings ArticleDOI

The Stanford CoreNLP Natural Language Processing Toolkit

TL;DR: The design and use of the Stanford CoreNLP toolkit is described, an extensible pipeline that provides core natural language analysis, and it is suggested that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.