scispace - formally typeset
Open AccessProceedings ArticleDOI

ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training.

TLDR
A new sequence-to-sequence pre-training model called ProphetNet is presented, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism that predicts the next n tokens simultaneously based on previous context tokens at each time step.
Abstract
This paper presents a new sequence-to-sequence pre-training model called ProphetNet, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism. Instead of optimizing one-step-ahead prediction in the traditional sequence-to-sequence model, the ProphetNet is optimized by n-step ahead prediction that predicts the next n tokens simultaneously based on previous context tokens at each time step. The future n-gram prediction explicitly encourages the model to plan for the future tokens and prevent overfitting on strong local correlations. We pre-train ProphetNet using a base scale dataset (16GB) and a large-scale dataset (160GB), respectively. Then we conduct experiments on CNN/DailyMail, Gigaword, and SQuAD 1.1 benchmarks for abstractive summarization and question generation tasks. Experimental results show that ProphetNet achieves new state-of-the-art results on all these datasets compared to the models using the same scale pre-training corpus.

read more

Citations
More filters
Posted Content

ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training

TL;DR: Proposed ProphetNet as discussed by the authors introduces a self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism, which encourages the model to plan for the future tokens and prevent overfitting on strong local correlations.
Proceedings ArticleDOI

Neural Extractive Summarization with Hierarchical Attentive Heterogeneous Graph Network

TL;DR: This paper proposes HAHSum (as shorthand for Hierarchical Attentive Heterogeneous Graph for Text Summarization), which well models different levels of information, including words and sentences, and spotlights redundancy dependencies between sentences.
Posted Content

GLGE: A New General Language Generation Evaluation Benchmark

TL;DR: The General Language Generation Evaluation (GLGE), a new multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks, is presented and a leaderboard with strong baselines including MASS, BART, and ProphetNet is built.
Journal ArticleDOI

A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?

TL;DR: In this article , a comprehensive review of existing generative AI tasks is provided, including text, images, videos, 3D content, etc., which depicts the full potential of ChatGPT's future.
Proceedings ArticleDOI

SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization

TL;DR: SimCLS as discussed by the authors formulates text generation as a reference-free evaluation problem assisted by contrastive learning, which can bridge the gap between the learning objective and evaluation metrics resulting from the currently dominated sequence-to-sequence learning framework.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings Article

ROUGE: A Package for Automatic Evaluation of Summaries

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.
Proceedings ArticleDOI

Deep contextualized word representations

TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).
Posted Content

Representation Learning with Contrastive Predictive Coding

TL;DR: This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.
Related Papers (5)