SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression
TLDR
This article proposed SummPip, an unsupervised method for multi-document summarization, in which they convert the original documents to a sentence graph, taking both linguistic and deep representation into account, then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.Abstract:
Obtaining training data for multi-document summarization (MDS) is time consuming and resource-intensive, so recent neural models can only be trained for limited domains. In this paper, we propose SummPip: an unsupervised method for multi-document summarization, in which we convert the original documents to a sentence graph, taking both linguistic and deep representation into account, then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary. Experiments on Multi-News and DUC-2004 datasets show that our method is competitive to previous unsupervised methods and is even comparable to the neural supervised approaches. In addition, human evaluation shows our system produces consistent and complete summaries compared to human written ones.read more
Citations
More filters
Proceedings ArticleDOI
Overview and Insights from the Shared Tasks at Scholarly Document Processing 2020: CL-SciSumm, LaySumm and LongSumm.
Muthu Kumar Chandrasekaran,Guy Feigenblat,Eduard Hovy,Abhilasha Ravichander,Michal Shmueli-Scheuer,Anita de Waard +5 more
TL;DR: The quality and quantity of the submissions show that there is ample interest in scholarly document summarization, and the state of the art in this domain is at a midway point between being an impossible task and one that is fully resolved.
Proceedings ArticleDOI
SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline
TL;DR: SciSummPip, an unsupervised text summarization system for multi-document in News domain inspired by SummP Pip, that includes a transformer-based language model SciBERT for contextual sentence representation and content selection and a summary length constraint is applied to adapt to the scientific domain.
Proceedings ArticleDOI
Unsupervised document summarization using pre-trained sentence embeddings and graph centrality
TL;DR: A method for incorporating sentence embeddings produced by deep language models into extractive summarization techniques based on graph centrality in an unsupervised manner that can summarize any kind of document of any size and can satisfy any length constraints for the summaries produced.
Posted Content
SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline
TL;DR: SciSummPip is an unsupervised text summarization system for multi-document in news domain that includes a transformer-based language model SciBERT for contextual sentence representation, content selection with PageRank, sentence graph construction with both deep and linguistic information, and within-graph summary generation.
References
More filters
Proceedings Article
ROUGE: A Package for Automatic Evaluation of Summaries
TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.
Proceedings Article
TextRank: Bringing Order into Text
Rada Mihalcea,Paul Tarau +1 more
TL;DR: TextRank, a graph-based ranking model for text processing, is introduced and it is shown how this model can be successfully used in natural language applications.
Proceedings ArticleDOI
Get To The Point: Summarization with Pointer-Generator Networks
TL;DR: A novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways, using a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator.
Journal ArticleDOI
LexRank: graph-based lexical centrality as salience in text summarization
Gunes Erkan,Dragomir R. Radev +1 more
TL;DR: LexRank as discussed by the authors is a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing (NLP), which is based on the concept of eigenvector centrality.
Journal ArticleDOI
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Jaime Carbinell,Jade Goldstein +1 more
TL;DR: A method for combining query-relevance with information-novelty in the context of text retrieval and summarization and preliminary results indicate some benefits for MMR diversity ranking in document retrieval and in single document summarization.