William Saunders

Researcher at University of Waterloo

Publications - 13

Citations - 779

William Saunders is an academic researcher from University of Waterloo. The author has contributed to research in topics: Desk & GeoTIFF. The author has an hindex of 7, co-authored 11 publications receiving 283 citations. Previous affiliations of William Saunders include University of Oxford.

Papers

PDF

Open Access

More filters

Journal Article

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Aarohi Srivastava, +439 more

- 09 Jun 2022 -

arXiv.org

TL;DR: Evaluation of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters finds that model performance and calibration both improve with scale, but are poor in absolute terms.

...read moreread less

Posted Content

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

William Saunders, +3 more

- 17 Jul 2017 -

arXiv: Artificial Intelligence

TL;DR: This work formalizes human intervention for RL and shows how to reduce the human labor required by training a supervised learner to imitate the human's intervention decisions, and outlines extensions of the scheme that are necessary if the authors are to train model-free agents without a single catastrophe.

...read moreread less

Proceedings Article

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

William Saunders, +3 more

TL;DR: In this article, the authors explore how human oversight can be combined with a supervised learning system to prevent catastrophic events during training and demonstrate this scheme on Atari games, with a Deep RL agent being overseen by a human for four hours.

...read moreread less

Posted Content

Evaluating Large Language Models Trained on Code

Mark Chen, +57 more

- 07 Jul 2021 -

arXiv: Learning

TL;DR: Codex as discussed by the authors is a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities, showing that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts.

...read moreread less

Journal ArticleDOI

Self-critiquing models for assisting human evaluators

William Saunders, +6 more

- 12 Jun 2022 -

arXiv.org

TL;DR: This work fine-tune large language models to write natural language critiques (natural language critical comments) using behavioral cloning, and suggests that even large models may still have relevant knowledge they cannot or do not articulate as critiques with both topic-based summarization and synthetic tasks.

...read moreread less