SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression

doi:10.1145/3397271.3401327

Open AccessProceedings ArticleDOI

SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression

Jinming Zhao, +7 more

- 17 Jul 2020 -

arXiv: Computation and Language

TLDR

This article proposed SummPip, an unsupervised method for multi-document summarization, in which they convert the original documents to a sentence graph, taking both linguistic and deep representation into account, then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.

Abstract:

Obtaining training data for multi-document summarization (MDS) is time consuming and resource-intensive, so recent neural models can only be trained for limited domains. In this paper, we propose SummPip: an unsupervised method for multi-document summarization, in which we convert the original documents to a sentence graph, taking both linguistic and deep representation into account, then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary. Experiments on Multi-News and DUC-2004 datasets show that our method is competitive to previous unsupervised methods and is even comparable to the neural supervised approaches. In addition, human evaluation shows our system produces consistent and complete summaries compared to human written ones.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Overview and Insights from the Shared Tasks at Scholarly Document Processing 2020: CL-SciSumm, LaySumm and LongSumm.

Muthu Kumar Chandrasekaran, +5 more

TL;DR: The quality and quantity of the submissions show that there is ample interest in scholarly document summarization, and the state of the art in this domain is at a midway point between being an impossible task and one that is fully resolved.

...read moreread less

Proceedings ArticleDOI

SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline

Jiaxin Ju, +3 more

TL;DR: SciSummPip, an unsupervised text summarization system for multi-document in News domain inspired by SummP Pip, that includes a transformer-based language model SciBERT for contextual sentence representation and content selection and a summary length constraint is applied to adapt to the scientific domain.

...read moreread less

Proceedings ArticleDOI

Unsupervised document summarization using pre-trained sentence embeddings and graph centrality

Juan Ramirez-Orta, +1 more

TL;DR: A method for incorporating sentence embeddings produced by deep language models into extractive summarization techniques based on graph centrality in an unsupervised manner that can summarize any kind of document of any size and can satisfy any length constraints for the summaries produced.

...read moreread less

Posted Content

SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline

Jiaxin Ju, +3 more

- 19 Oct 2020 -

arXiv: Computation and Language

TL;DR: SciSummPip is an unsupervised text summarization system for multi-document in news domain that includes a transformer-based language model SciBERT for contextual sentence representation, content selection with PageRank, sentence graph construction with both deep and linguistic information, and within-graph summary generation.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article

ROUGE: A Package for Automatic Evaluation of Summaries

Chin-Yew Lin

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.

...read moreread less

Proceedings Article

TextRank: Bringing Order into Text

Rada Mihalcea, +1 more

TL;DR: TextRank, a graph-based ranking model for text processing, is introduced and it is shown how this model can be successfully used in natural language applications.

...read moreread less

Proceedings ArticleDOI

Get To The Point: Summarization with Pointer-Generator Networks

Abigail See, +2 more

TL;DR: A novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways, using a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator.

...read moreread less

Journal ArticleDOI

LexRank: graph-based lexical centrality as salience in text summarization

Gunes Erkan, +1 more

- 01 Jul 2004 -

Journal of Artificial Intelligence Resea...

TL;DR: LexRank as discussed by the authors is a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing (NLP), which is based on the concept of eigenvector centrality.

...read moreread less

Journal ArticleDOI

The use of MMR, diversity-based reranking for reordering documents and producing summaries

Jaime Carbinell, +1 more

TL;DR: A method for combining query-relevance with information-novelty in the context of text retrieval and summarization and preliminary results indicate some benefits for MMR diversity ranking in document retrieval and in single document summarization.

...read moreread less

Related Papers (5)