Jacob Devlin

Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

- 11 Oct 2018 -

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Journal Article

PaLM: Scaling Language Modeling with Pathways

Aakanksha Chowdhery, +66 more

- 05 Apr 2022 -

arXiv.org

TL;DR: A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.

...read moreread less

Journal ArticleDOI

Scaling Instruction-Finetuned Language Models

Hyung Won Chung, +31 more

- 20 Oct 2022 -

arXiv.org

TL;DR: This result shows that instruction and UL2 continued pre-training are complementary compute-eﬃcient methods to improve the performance of language models without increasing model scale.

...read moreread less

Proceedings ArticleDOI

Generating Natural Questions About an Image

Nasrin Mostafazadeh, +5 more

TL;DR: This paper introduces the novel task of Visual Question Generation, where the system is tasked with asking a natural and engaging question when shown an image, and provides three datasets which cover a variety of images from object-centric to event-centric.

...read moreread less

Papers

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

PaLM: Scaling Language Modeling with Pathways

Scaling Instruction-Finetuned Language Models

Generating Natural Questions About an Image