scispace - formally typeset
N

Nicholas Joseph

Publications -  12
Citations -  786

Nicholas Joseph is an academic researcher. The author has contributed to research in topics: Computer science & Counterintuitive. The author has an hindex of 1, co-authored 1 publications receiving 39 citations.

Papers
More filters
Journal ArticleDOI

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

TL;DR: An iterated online mode of training, where preference models and RL policies are updated on a weekly cadence with fresh human feedback data, and a roughly linear relation between the RL reward and the square root of the KL divergence between the policy and its initialization is identified.
Journal ArticleDOI

Language Models (Mostly) Know What They Know

TL;DR: This article showed that large models are well-calibrated on diverse multiple choice and true/false questions when they are provided in the right format, and showed that models can be trained to predict the probability that"I know"the answer to a question, without reference to any particular proposed answer.
Proceedings ArticleDOI

Predictability and Surprise in Large Generative Models

TL;DR: This paper highlights a counterintuitive property of large-scale generative models, which have a paradoxical combination of predictable loss on a broad training distribution, and unpredictable specific capabilities, inputs, and outputs, and analyzed how these conflicting properties combine to give model developers various motivations for deploying these models, and challenges that can hinder deployment.