D
Daniel M. Ziegler
Researcher at Massachusetts Institute of Technology
Publications - Â 15
Citations - Â 13290
Daniel M. Ziegler is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Language model & File system. The author has an hindex of 9, co-authored 13 publications receiving 3582 citations.
Papers
More filters
Proceedings Article
Language Models are Few-Shot Learners
Tom B. Brown,Benjamin Mann,Nick Ryder,Melanie Subbiah,Jared Kaplan,Prafulla Dhariwal,Arvind Neelakantan,Pranav Shyam,Girish Sastry,Amanda Askell,Sandhini Agarwal,Ariel Herbert-Voss,Gretchen Krueger,Thomas Henighan,Rewon Child,Aditya Ramesh,Daniel M. Ziegler,Jeffrey Wu,Clemens Winter,Christopher Hesse,Mark Chen,Eric Sigler,Mateusz Litwin,Scott Gray,Benjamin Chess,Jack Clark,Christopher Berner,Samuel McCandlish,Alec Radford,Ilya Sutskever,Dario Amodei +30 more
TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Posted Content
Language Models are Few-Shot Learners
Tom B. Brown,Benjamin Mann,Nick Ryder,Melanie Subbiah,Jared Kaplan,Prafulla Dhariwal,Arvind Neelakantan,Pranav Shyam,Girish Sastry,Amanda Askell,Sandhini Agarwal,Ariel Herbert-Voss,Gretchen Krueger,Thomas Henighan,Rewon Child,Aditya Ramesh,Daniel M. Ziegler,Jeffrey Wu,Clemens Winter,Christopher Hesse,Mark Chen,Eric Sigler,Mateusz Litwin,Scott Gray,Benjamin Chess,Jack Clark,Christopher Berner,Samuel McCandlish,Alec Radford,Ilya Sutskever,Dario Amodei +30 more
TL;DR: This article showed that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.
Proceedings Article
Learning to summarize from human feedback
Nisan Stiennon,Long Ouyang,Jeffrey Wu,Daniel M. Ziegler,Ryan Lowe,Chelsea Voss,Alec Radford,Dario Amodei,Paul F. Christiano +8 more
TL;DR: The authors use reinforcement learning to fine-tune a summarization policy according to human feedback, which results in better summaries than optimizing ROUGE according to humans, and transfer to CNN/DM news articles, producing summaries nearly as good as the human reference.
Posted Content
Fine-Tuning Language Models from Human Preferences.
Daniel M. Ziegler,Nisan Stiennon,Jeffrey Wu,Tom B. Brown,Alec Radford,Dario Amodei,Paul F. Christiano,Geoffrey Irving +7 more
TL;DR: This paper builds on advances in generative pretraining of language models to apply reward learning to four natural language tasks: continuing text with positive sentiment or physically descriptive language, and summarization tasks on the TL;DR and CNN/Daily Mail datasets.
Proceedings ArticleDOI
Using Crash Hoare logic for certifying the FSCQ file system
Haogang Chen,Daniel M. Ziegler,Tej Chajed,Adam Chlipala,M. Frans Kaashoek,Nickolai Zeldovich +5 more
TL;DR: The Crash Hoare logic (CHL), which extends traditionalHoare logic with a crash condition, a recovery procedure, and logical address spaces for specifying disk states at different abstraction levels, is introduced, which reduces the proof effort for developers through proof automation.