ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training.

doi:10.18653/V1/2020.FINDINGS-EMNLP.217

Open AccessProceedings ArticleDOI

ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training.

- pp 2401-2410

TLDR

A new sequence-to-sequence pre-training model called ProphetNet is presented, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism that predicts the next n tokens simultaneously based on previous context tokens at each time step.

Abstract:

This paper presents a new sequence-to-sequence pre-training model called ProphetNet, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism. Instead of optimizing one-step-ahead prediction in the traditional sequence-to-sequence model, the ProphetNet is optimized by n-step ahead prediction that predicts the next n tokens simultaneously based on previous context tokens at each time step. The future n-gram prediction explicitly encourages the model to plan for the future tokens and prevent overfitting on strong local correlations. We pre-train ProphetNet using a base scale dataset (16GB) and a large-scale dataset (160GB), respectively. Then we conduct experiments on CNN/DailyMail, Gigaword, and SQuAD 1.1 benchmarks for abstractive summarization and question generation tasks. Experimental results show that ProphetNet achieves new state-of-the-art results on all these datasets compared to the models using the same scale pre-training corpus.

Citations

PDF

Open Access

More filters

Posted Content

ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training

Weizhen Qi, +7 more

- 13 Jan 2020 -

arXiv: Computation and Language

TL;DR: Proposed ProphetNet as discussed by the authors introduces a self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism, which encourages the model to plan for the future tokens and prevent overfitting on strong local correlations.

...read moreread less

Proceedings ArticleDOI

Neural Extractive Summarization with Hierarchical Attentive Heterogeneous Graph Network

Ruipeng Jia, +5 more

TL;DR: This paper proposes HAHSum (as shorthand for Hierarchical Attentive Heterogeneous Graph for Text Summarization), which well models different levels of information, including words and sentences, and spotlights redundancy dependencies between sentences.

...read moreread less

Posted Content

GLGE: A New General Language Generation Evaluation Benchmark

Dayiheng Liu, +17 more

- 24 Nov 2020 -

arXiv: Computation and Language

TL;DR: The General Language Generation Evaluation (GLGE), a new multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks, is presented and a leaderboard with strong baselines including MASS, BART, and ProphetNet is built.

...read moreread less

Proceedings ArticleDOI

SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization

Yixin Liu, +1 more

TL;DR: SimCLS as discussed by the authors formulates text generation as a reference-free evaluation problem assisted by contrastive learning, which can bridge the gap between the learning objective and evaluation metrics resulting from the currently dominated sequence-to-sequence learning framework.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Proceedings Article

ROUGE: A Package for Automatic Evaluation of Summaries

Chin-Yew Lin

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.

...read moreread less

Proceedings ArticleDOI

Deep contextualized word representations

Matthew E. Peters, +6 more

TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).

...read moreread less

Posted Content

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, +2 more

- 10 Jul 2018 -

arXiv: Learning

TL;DR: This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.

...read moreread less

ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training.

Citations

ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training

Neural Extractive Summarization with Hierarchical Attentive Heterogeneous Graph Network

GLGE: A New General Language Generation Evaluation Benchmark

A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?

SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization

References

Adam: A Method for Stochastic Optimization

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

ROUGE: A Package for Automatic Evaluation of Summaries

Deep contextualized word representations

Representation Learning with Contrastive Predictive Coding

Related Papers (5)

Attention is All you Need

ROUGE: A Package for Automatic Evaluation of Summaries

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Bleu: a Method for Automatic Evaluation of Machine Translation

Teaching machines to read and comprehend