scispace - formally typeset
W

William Saunders

Researcher at University of Waterloo

Publications -  13
Citations -  779

William Saunders is an academic researcher from University of Waterloo. The author has contributed to research in topics: Desk & GeoTIFF. The author has an hindex of 7, co-authored 11 publications receiving 283 citations. Previous affiliations of William Saunders include University of Oxford.

Papers
More filters
Journal Article

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Aarohi Srivastava, +439 more
- 09 Jun 2022 - 
TL;DR: Evaluation of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters finds that model performance and calibration both improve with scale, but are poor in absolute terms.
Posted Content

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

TL;DR: This work formalizes human intervention for RL and shows how to reduce the human labor required by training a supervised learner to imitate the human's intervention decisions, and outlines extensions of the scheme that are necessary if the authors are to train model-free agents without a single catastrophe.
Proceedings Article

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

TL;DR: In this article, the authors explore how human oversight can be combined with a supervised learning system to prevent catastrophic events during training and demonstrate this scheme on Atari games, with a Deep RL agent being overseen by a human for four hours.
Journal ArticleDOI

Self-critiquing models for assisting human evaluators

TL;DR: This work fine-tune large language models to write natural language critiques (natural language critical comments) using behavioral cloning, and suggests that even large models may still have relevant knowledge they cannot or do not articulate as critiques with both topic-based summarization and synthetic tasks.