scispace - formally typeset
Open AccessProceedings ArticleDOI

SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression

TLDR
This article proposed SummPip, an unsupervised method for multi-document summarization, in which they convert the original documents to a sentence graph, taking both linguistic and deep representation into account, then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
Abstract
Obtaining training data for multi-document summarization (MDS) is time consuming and resource-intensive, so recent neural models can only be trained for limited domains. In this paper, we propose SummPip: an unsupervised method for multi-document summarization, in which we convert the original documents to a sentence graph, taking both linguistic and deep representation into account, then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary. Experiments on Multi-News and DUC-2004 datasets show that our method is competitive to previous unsupervised methods and is even comparable to the neural supervised approaches. In addition, human evaluation shows our system produces consistent and complete summaries compared to human written ones.

read more

Citations
More filters
Proceedings ArticleDOI

Overview and Insights from the Shared Tasks at Scholarly Document Processing 2020: CL-SciSumm, LaySumm and LongSumm.

TL;DR: The quality and quantity of the submissions show that there is ample interest in scholarly document summarization, and the state of the art in this domain is at a midway point between being an impossible task and one that is fully resolved.
Proceedings ArticleDOI

SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline

TL;DR: SciSummPip, an unsupervised text summarization system for multi-document in News domain inspired by SummP Pip, that includes a transformer-based language model SciBERT for contextual sentence representation and content selection and a summary length constraint is applied to adapt to the scientific domain.
Proceedings ArticleDOI

Unsupervised document summarization using pre-trained sentence embeddings and graph centrality

TL;DR: A method for incorporating sentence embeddings produced by deep language models into extractive summarization techniques based on graph centrality in an unsupervised manner that can summarize any kind of document of any size and can satisfy any length constraints for the summaries produced.
Posted Content

SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline

TL;DR: SciSummPip is an unsupervised text summarization system for multi-document in news domain that includes a transformer-based language model SciBERT for contextual sentence representation, content selection with PageRank, sentence graph construction with both deep and linguistic information, and within-graph summary generation.
References
More filters
Proceedings Article

ROUGE: A Package for Automatic Evaluation of Summaries

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.
Proceedings Article

TextRank: Bringing Order into Text

Rada Mihalcea, +1 more
TL;DR: TextRank, a graph-based ranking model for text processing, is introduced and it is shown how this model can be successfully used in natural language applications.
Proceedings ArticleDOI

Get To The Point: Summarization with Pointer-Generator Networks

TL;DR: A novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways, using a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator.
Journal ArticleDOI

LexRank: graph-based lexical centrality as salience in text summarization

TL;DR: LexRank as discussed by the authors is a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing (NLP), which is based on the concept of eigenvector centrality.
Journal ArticleDOI

The use of MMR, diversity-based reranking for reordering documents and producing summaries

TL;DR: A method for combining query-relevance with information-novelty in the context of text retrieval and summarization and preliminary results indicate some benefits for MMR diversity ranking in document retrieval and in single document summarization.
Related Papers (5)