Open AccessProceedings Article
Towards Abstractive Multi-Document Summarization Using Submodular Function-Based Framework, Sentence Compression and Merging
Yllias Chali,Moin Mahmud Tanvee,Mir Tafseer Nayeem +2 more
- Vol. 2, pp 418-424
TLDR
A submodular function-based summarization system which integrates three important measures namely importance, coverage, and non-redundancy to detect the important sentences for the summary is proposed.Abstract:
We propose a submodular function-based summarization system which integrates three important measures namely importance, coverage, and non-redundancy to detect the important sentences for the summary. We design monotone and submodular functions which allow us to apply an efficient and scalable greedy algorithm to obtain informative and well-covered summaries. In addition, we integrate two abstraction-based methods namely sentence compression and merging for generating an abstractive sentence set. We design our summarization models for both generic and query-focused summarization. Experimental results on DUC-2004 and DUC-2007 datasets show that our generic and query-focused summarizers have outperformed the state-of-the-art summarization systems in terms of ROUGE-1 and ROUGE-2 recall and F-measure.read more
Citations
More filters
Proceedings ArticleDOI
A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal
TL;DR: This work presents a new dataset for MDS that is large both in the total number of document clusters and in the size of individual clusters, and provides a quantitative analysis of the dataset and empirical results for several state-of-the-art MDS techniques.
Proceedings Article
Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion
TL;DR: A paraphrastic sentence fusion model which jointly performs sentence fusion and paraphrasing using skip-gram word embedding model at the sentence level is designed which improves the information coverage and at the same time abstractiveness of the generated sentences.
Posted Content
DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion
TL;DR: A method for automatically-generating fusion examples from raw text and a sequence-to-sequence model on DiscoFuse, a large scale dataset for discourse-based sentence fusion, are proposed and shown to improve performance on WebSplit when viewed as a sentence fusion task.
Proceedings ArticleDOI
Improving Neural Abstractive Document Summarization with Structural Regularization
TL;DR: This paper proposes to leverage the structural information of both documents and multi-sentence summaries to improve the document summarization performance and imports both structural-compression and structural-coverage regularization into the summarization process in order to capture the information compression and information coverage properties.
Proceedings ArticleDOI
Extract with Order for Coherent Multi-Document Summarization
Mir Tafseer Nayeem,Yllias Chali +1 more
TL;DR: A rank based sentence selection using continuous vector representations along with key-phrases is implemented and a model to tackle summary coherence for increasing readability is proposed.
References
More filters
Proceedings Article
Latent Dirichlet Allocation
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Proceedings Article
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Proceedings Article
ROUGE: A Package for Automatic Evaluation of Summaries
TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.
Journal ArticleDOI
An algorithm for suffix stripping
TL;DR: An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.
Proceedings ArticleDOI
The Stanford CoreNLP Natural Language Processing Toolkit
Christopher D. Manning,Mihai Surdeanu,John Bauer,Jenny Rose Finkel,Steven Bethard,David McClosky +5 more
TL;DR: The design and use of the Stanford CoreNLP toolkit is described, an extensible pipeline that provides core natural language analysis, and it is suggested that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.