Thomas Henighan

Proceedings Article

Language Models are Few-Shot Learners

TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

...read moreread less

Posted Content

Language Models are Few-Shot Learners

Tom B. Brown, +30 more

- 28 May 2020 -

arXiv: Computation and Language

TL;DR: This article showed that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.

...read moreread less

Posted Content

Scaling Laws for Neural Language Models

Jared Kaplan, +9 more

- 23 Jan 2020 -

arXiv: Learning

TL;DR: Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.

...read moreread less

Journal ArticleDOI

Ultrafast disordering of vanadium dimers in photoexcited VO2

Simon Wall, +14 more

- 02 Nov 2018 -

Science

TL;DR: It is shown that atomic disordering in photoexcited vanadium dioxide (VO2) is central to the transition mechanism and that, after photoexcitation, the system explores a large volume of phase space on a time scale comparable to that of a single phonon oscillation.

...read moreread less

Posted Content

Scaling Laws for Autoregressive Generative Modeling

Thomas Henighan, +18 more

- 28 Oct 2020 -

arXiv: Learning

TL;DR: The case that scaling laws have important implications for neural network performance, including on downstream tasks is strengthened, as empirical scaling laws for the cross-entropy loss are identified.

...read moreread less

Papers

Language Models are Few-Shot Learners

Language Models are Few-Shot Learners

Scaling Laws for Neural Language Models

Ultrafast disordering of vanadium dimers in photoexcited VO2

Scaling Laws for Autoregressive Generative Modeling