A
Arvind Neelakantan
Researcher at University of Massachusetts Amherst
Publications - 41
Citations - 15055
Arvind Neelakantan is an academic researcher from University of Massachusetts Amherst. The author has contributed to research in topics: Artificial neural network & Knowledge base. The author has an hindex of 23, co-authored 37 publications receiving 5594 citations. Previous affiliations of Arvind Neelakantan include BBN Technologies & Google.
Papers
More filters
Proceedings Article
Language Models are Few-Shot Learners
Tom B. Brown,Benjamin Mann,Nick Ryder,Melanie Subbiah,Jared Kaplan,Prafulla Dhariwal,Arvind Neelakantan,Pranav Shyam,Girish Sastry,Amanda Askell,Sandhini Agarwal,Ariel Herbert-Voss,Gretchen Krueger,Thomas Henighan,Rewon Child,Aditya Ramesh,Daniel M. Ziegler,Jeffrey Wu,Clemens Winter,Christopher Hesse,Mark Chen,Eric Sigler,Mateusz Litwin,Scott Gray,Benjamin Chess,Jack Clark,Christopher Berner,Samuel McCandlish,Alec Radford,Ilya Sutskever,Dario Amodei +30 more
TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Posted Content
Language Models are Few-Shot Learners
Tom B. Brown,Benjamin Mann,Nick Ryder,Melanie Subbiah,Jared Kaplan,Prafulla Dhariwal,Arvind Neelakantan,Pranav Shyam,Girish Sastry,Amanda Askell,Sandhini Agarwal,Ariel Herbert-Voss,Gretchen Krueger,Thomas Henighan,Rewon Child,Aditya Ramesh,Daniel M. Ziegler,Jeffrey Wu,Clemens Winter,Christopher Hesse,Mark Chen,Eric Sigler,Mateusz Litwin,Scott Gray,Benjamin Chess,Jack Clark,Christopher Berner,Samuel McCandlish,Alec Radford,Ilya Sutskever,Dario Amodei +30 more
TL;DR: This article showed that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.
Proceedings ArticleDOI
Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space
TL;DR: An extension to the Skip-gram model that efficiently learns multiple embeddings per word type is presented, and its scalability is demonstrated by training with one machine on a corpus of nearly 1 billion tokens in less than 6 hours.
Posted Content
Adding Gradient Noise Improves Learning for Very Deep Networks
Arvind Neelakantan,Luke Vilnis,Quoc V. Le,Ilya Sutskever,Lukasz Kaiser,Karol Kurach,James Martens +6 more
TL;DR: This paper explores the low-overhead and easy-to-implement optimization technique of adding annealed Gaussian noise to the gradient, which it is found surprisingly effective when training these very deep architectures.
Proceedings ArticleDOI
Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks
TL;DR: The authors combine the rich multi-step inference of symbolic logical reasoning with the generalization capabilities of neural networks for complex reasoning about entities and relations in text and large-scale knowledge bases (KBs).