scispace - formally typeset
N

Noam Shazeer

Researcher at Google

Publications -  96
Citations -  83000

Noam Shazeer is an academic researcher from Google. The author has contributed to research in topics: Artificial neural network & Transformer (machine learning model). The author has an hindex of 41, co-authored 90 publications receiving 41094 citations. Previous affiliations of Noam Shazeer include Duke University.

Papers
More filters
Proceedings Article

Attention is All you Need

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Posted Content

Attention Is All You Need

TL;DR: A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Posted Content

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TL;DR: This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
Journal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Proceedings Article

Scheduled sampling for sequence prediction with recurrent Neural networks

TL;DR: This work proposes a curriculum learning strategy to gently change the training process from a fully guided scheme using the true previous token, towards a less guided scheme which mostly uses the generated token instead.