Showing papers by "Google published in 2020"

PDF

Open Access

Posted Content•

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

[...]

Alexey Dosovitskiy¹, Lucas Beyer¹, Alexander Kolesnikov¹, Dirk Weissenborn², Xiaohua Zhai¹, Thomas Unterthiner¹, Mostafa Dehghani¹, Matthias Minderer¹, Georg Heigold², Sylvain Gelly¹, Jakob Uszkoreit¹, Neil Houlsby¹ - Show less +8 more•Institutions (2)

Google¹, German Research Centre for Artificial Intelligence²

22 Oct 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

...read moreread less

Abstract: While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

...read moreread less

12,690 citations

Proceedings Article•

Language Models are Few-Shot Learners

[...]

Tom B. Brown¹, Benjamin Mann, Nick Ryder², Melanie Subbiah, Jared Kaplan³, Prafulla Dhariwal¹, Arvind Neelakantan⁴, Pranav Shyam, Girish Sastry¹, Amanda Askell¹, Sandhini Agarwal¹, Ariel Herbert-Voss¹, Gretchen Krueger¹, Thomas Henighan¹, Rewon Child¹, Aditya Ramesh¹, Daniel M. Ziegler⁵, Jeffrey Wu¹, Clemens Winter, Christopher Hesse¹, Mark Chen¹, Eric Sigler, Mateusz Litwin, Scott Gray¹, Benjamin Chess¹, Jack Clark¹, Christopher Berner, Samuel McCandlish¹, Alec Radford¹, Ilya Sutskever¹, Dario Amodei¹ - Show less +27 more•Institutions (5)

OpenAI¹, University of California, Berkeley², Johns Hopkins University³, Google⁴, Massachusetts Institute of Technology⁵

28 May 2020

TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

...read moreread less

Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

...read moreread less

10,132 citations

Posted Content•

A Simple Framework for Contrastive Learning of Visual Representations

[...]

Ting Chen¹, Simon Kornblith¹, Mohammad Norouzi¹, Geoffrey E. Hinton¹•Institutions (1)

Google¹

13 Feb 2020-arXiv: Learning

TL;DR: It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.

...read moreread less

Abstract: This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100X fewer labels.

...read moreread less

7,951 citations

Journal Article•DOI•

Array programming with NumPy

[...]

Charles R. Harris, K. Jarrod Millman¹, Stefan van der Walt¹, Stefan van der Walt², Ralf Gommers, Pauli Virtanen³, David Cournapeau, Eric Wieser⁴, Julian Taylor, Sebastian Berg¹, Nathaniel J. Smith, Robert Kern, Matti Picus¹, Stephan Hoyer⁵, Marten H. van Kerkwijk⁶, Matthew Brett¹, Matthew Brett⁷, Allan Haldane⁸, Jaime Fernández del Río⁵, Mark Wiebe⁹, Mark Wiebe¹⁰, Pearu Peterson, Pierre Gérard-Marchant¹¹, Kevin Sheppard¹², Tyler Reddy¹³, Warren Weckesser¹, Hameer Abbasi, Christoph Gohlke¹⁴, Travis E. Oliphant - Show less +25 more•Institutions (14)

University of California, Berkeley¹, Stellenbosch University², University of Jyväskylä³, University of Cambridge⁴, Google⁵, University of Toronto⁶, University of Birmingham⁷, Temple University⁸, University of British Columbia⁹, Amazon.com¹⁰, University of Georgia¹¹, University of Oxford¹², Los Alamos National Laboratory¹³, University of California, Irvine¹⁴

16 Sep 2020-Nature

TL;DR: In this paper, the authors review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data, and their evolution into a flexible interoperability layer between increasingly specialized computational libraries is discussed.

...read moreread less

Abstract: Array programming provides a powerful, compact and expressive syntax for accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It has an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, materials science, engineering, finance and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves1 and in the first imaging of a black hole2. Here we review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data. NumPy is the foundation upon which the scientific Python ecosystem is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Owing to its central position in the ecosystem, NumPy increasingly acts as an interoperability layer between such array computation libraries and, together with its application programming interface (API), provides a flexible framework to support the next decade of scientific and industrial analysis. NumPy is the primary array programming library for Python; here its fundamental concepts are reviewed and its evolution into a flexible interoperability layer between increasingly specialized computational libraries is discussed.

...read moreread less

7,624 citations

Journal Article•DOI•

SciPy 1.0: fundamental algorithms for scientific computing in Python.

[...]

Pauli Virtanen¹, Ralf Gommers, Travis E. Oliphant, Matt Haberland², Matt Haberland³, Tyler Reddy⁴, David Cournapeau, Evgeni Burovski⁵, Pearu Peterson, Warren Weckesser⁶, Jonathan Bright, Stefan van der Walt⁶, Matthew Brett⁷, Joshua Wilson, K. Jarrod Millman⁶, Nikolay Mayorov, Andrew Nelson⁸, Eric Jones, Robert Kern, Eric B. Larson⁹, CJ Carey¹⁰, Ilhan Polat, Yu Feng⁶, Eric Moore, Jake Vanderplas⁹, Denis Laxalde, Josef Perktold, Robert Cimrman¹¹, Ian Henriksen¹², Ian Henriksen¹³, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro¹⁴, Fabian Pedregosa¹⁵, Paul van Mulbregt¹⁵, SciPy . Contributors - Show less +33 more•Institutions (15)

University of Jyväskylä¹, University of California, Los Angeles², California Polytechnic State University³, Los Alamos National Laboratory⁴, National Research University – Higher School of Economics⁵, University of California, Berkeley⁶, University of Birmingham⁷, Australian Nuclear Science and Technology Organisation⁸, University of Washington⁹, University of Massachusetts Amherst¹⁰, University of West Bohemia¹¹, Brigham Young University¹², University of Texas at Austin¹³, Universidade Federal de Minas Gerais¹⁴, Google¹⁵

03 Feb 2020-Nature Methods

TL;DR: SciPy as discussed by the authors is an open-source scientific computing library for the Python programming language, which has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year.

...read moreread less

Abstract: SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments.

...read moreread less

6,244 citations

Journal Article•DOI•

Array Programming with NumPy

[...]

Charles R. Harris, K. Jarrod Millman¹, Stefan van der Walt², Stefan van der Walt¹, Ralf Gommers, Pauli Virtanen³, David Cournapeau, Eric Wieser⁴, Julian Taylor, Sebastian Berg¹, Nathaniel J. Smith, Robert Kern, Matti Picus¹, Stephan Hoyer⁵, Marten H. van Kerkwijk⁶, Matthew Brett¹, Matthew Brett⁷, Allan Haldane⁸, Jaime Fernández del Río⁵, Mark Wiebe⁹, Mark Wiebe¹⁰, Pearu Peterson, Pierre Gérard-Marchant¹¹, Kevin Sheppard¹², Tyler Reddy¹³, Warren Weckesser¹, Hameer Abbasi, Christoph Gohlke¹⁴, Travis E. Oliphant - Show less +25 more•Institutions (14)

University of California, Berkeley¹, Stellenbosch University², University of Jyväskylä³, University of Cambridge⁴, Google⁵, University of Toronto⁶, University of Birmingham⁷, Temple University⁸, Amazon.com⁹, University of British Columbia¹⁰, University of Georgia¹¹, University of Oxford¹², Los Alamos National Laboratory¹³, University of California, Irvine¹⁴

18 Jun 2020-arXiv: Mathematical Software

TL;DR: How a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data is reviewed.

...read moreread less

Abstract: Array programming provides a powerful, compact, expressive syntax for accessing, manipulating, and operating on data in vectors, matrices, and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It plays an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, material science, engineering, finance, and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves and the first imaging of a black hole. Here we show how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring, and analyzing scientific data. NumPy is the foundation upon which the entire scientific Python universe is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Because of its central position in the ecosystem, NumPy increasingly plays the role of an interoperability layer between these new array computation libraries.

...read moreread less

4,342 citations

Proceedings Article•DOI•

EfficientDet: Scalable and Efficient Object Detection

[...]

Mingxing Tan¹, Ruoming Pang¹, Quoc V. Le¹•Institutions (1)

Google¹

14 Jun 2020

TL;DR: EfficientDetD7 as discussed by the authors proposes a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion, and a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time.

...read moreread less

Abstract: Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion; Second, we propose a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time. Based on these optimizations and EfficientNet backbones, we have developed a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior art across a wide spectrum of resource constraints. In particular, with single-model and single-scale, our EfficientDetD7 achieves state-of-the-art 52.2 AP on COCO test-dev with 52M parameters and 325B FLOPs, being 4x – 9x smaller and using 13x – 42x fewer FLOPs than previous detector.

...read moreread less

3,423 citations

Journal Article•DOI•

Generative adversarial networks

[...]

Ian Goodfellow¹, Jean Pouget-Abadie², Mehdi Mirza², Bing Xu², David Warde-Farley², Sherjil Ozair², Aaron Courville², Yoshua Bengio² - Show less +4 more•Institutions (2)

Google¹, Université de Montréal²

22 Oct 2020-Communications of The ACM

TL;DR: A generative adversarial networks algorithm designed to solve the generative modeling problem and its applications in medicine, education and robotics are studied.

...read moreread less

Abstract: Generative adversarial networks are a kind of artificial intelligence algorithm designed to solve the generative modeling problem. The goal of a generative model is to study a collection of training examples and learn the probability distribution that generated them. Generative Adversarial Networks (GANs) are then able to generate more examples from the estimated probability distribution. Generative models based on deep learning are common, but GANs are among the most successful generative models (especially in terms of their ability to generate realistic high-resolution images). GANs have been successfully applied to a wide variety of tasks (mostly in research settings) but continue to present unique challenges and research opportunities because they are based on game theory while most other approaches to generative modeling are based on optimization.

...read moreread less

2,447 citations

Posted Content•

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

[...]

Ben Mildenhall¹, Pratul P. Srinivasan¹, Matthew Tancik¹, Jonathan T. Barron², Ravi Ramamoorthi³, Ren Ng¹ - Show less +2 more•Institutions (3)

University of California, Berkeley¹, Google², University of California, San Diego³

19 Mar 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work describes how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrates results that outperform prior work on neural rendering and view synthesis.

...read moreread less

Abstract: We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location $(x,y,z)$ and viewing direction $(\theta, \phi)$) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.

...read moreread less

2,435 citations

Proceedings Article•

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

[...]

Zhenzhong Lan¹, Mingda Chen², Sebastian Goodman¹, Kevin Gimpel³, Piyush Sharma¹, Radu Soricut¹ - Show less +2 more•Institutions (3)

Google¹, Toyota Technological Institute at Chicago², New York University³

30 Apr 2020

TL;DR: This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence.

...read moreread less

Abstract: Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations, longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.

...read moreread less

2,367 citations

Posted Content•

Language Models are Few-Shot Learners

[...]

OpenAI¹, University of California, Berkeley², Johns Hopkins University³, Google⁴, Massachusetts Institute of Technology⁵

28 May 2020-arXiv: Computation and Language

TL;DR: This article showed that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.

...read moreread less

Proceedings Article•DOI•

Self-Training With Noisy Student Improves ImageNet Classification

[...]

Qizhe Xie¹, Minh-Thang Luong¹, Eduard Hovy², Quoc V. Le¹•Institutions (2)

Google¹, Carnegie Mellon University²

14 Jun 2020

TL;DR: A simple self-training method that achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images.

...read moreread less

Abstract: We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to 83.7%, reduces ImageNet-C mean corruption error from 45.7 to 28.3, and reduces ImageNet-P mean flip rate from 27.8 to 12.2. To achieve this result, we first train an EfficientNet model on labeled ImageNet images and use it as a teacher to generate pseudo labels on 300M unlabeled images. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher. During the generation of the pseudo labels, the teacher is not noised so that the pseudo labels are as accurate as possible. However, during the learning of the student, we inject noise such as dropout, stochastic depth and data augmentation via RandAugment to the student so that the student generalizes better than the teacher.

...read moreread less

Proceedings Article•

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

[...]

Kevin Clark¹, Minh-Thang Luong², Quoc V. Le², Christopher D. Manning¹•Institutions (2)

Stanford University¹, Google²

30 Apr 2020

TL;DR: This paper proposed a more sample-efficient pre-training task called replaced token detection, which corrupts the input by replacing some input tokens with plausible alternatives sampled from a small generator network and then predicts whether each token in the corrupted input was replaced by a generator sample or not.

...read moreread less

Abstract: While masked language modeling (MLM) pre-training methods such as BERT produce excellent results on downstream NLP tasks, they require large amounts of compute to be effective. These approaches corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. As an alternative, we propose a more sample-efficient pre-training task called replaced token detection. Instead of masking the input, our approach corrupts it by replacing some input tokens with plausible alternatives sampled from a small generator network. Then, instead of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that predicts whether each token in the corrupted input was replaced by a generator sample or not. Thorough experiments demonstrate this new pre-training task is more efficient than MLM because the model learns from all input tokens rather than just the small subset that was masked out. As a result, the contextual representations learned by our approach substantially outperform the ones learned by methods such as BERT and XLNet given the same model size, data, and compute. The gains are particularly strong for small models; for example, we train a model on one GPU for 4 days that outperforms GPT (trained using 30x more compute) on the GLUE natural language understanding benchmark. Our approach also works well at scale, where we match the performance of RoBERTa, the current state-of-the-art pre-trained transformer, while using less than 1/4 of the compute.

...read moreread less

Posted Content•

Supervised Contrastive Learning.

[...]

Prannay Khosla, Piotr Teterwak¹, Chen Wang², Aaron Sarna¹, Yonglong Tian³, Phillip Isola³, Aaron Maschinot¹, Ce Liu¹, Dilip Krishnan¹ - Show less +5 more•Institutions (3)

Google¹, Fudan University², Massachusetts Institute of Technology³

23 Apr 2020-arXiv: Learning

TL;DR: In this paper, the authors extend the self-supervised batch contrastive approach to the fully supervised setting, allowing them to effectively leverage label information and achieve state-of-the-art performance in unsupervised training of deep image models.

...read moreread less

Abstract: Contrastive learning applied to self-supervised representation learning has seen a resurgence in recent years, leading to state of the art performance in the unsupervised training of deep image models. Modern batch contrastive approaches subsume or significantly outperform traditional contrastive losses such as triplet, max-margin and the N-pairs loss. In this work, we extend the self-supervised batch contrastive approach to the fully-supervised setting, allowing us to effectively leverage label information. Clusters of points belonging to the same class are pulled together in embedding space, while simultaneously pushing apart clusters of samples from different classes. We analyze two possible versions of the supervised contrastive (SupCon) loss, identifying the best-performing formulation of the loss. On ResNet-200, we achieve top-1 accuracy of 81.4% on the ImageNet dataset, which is 0.8% above the best number reported for this architecture. We show consistent outperformance over cross-entropy on other datasets and two ResNet variants. The loss shows benefits for robustness to natural corruptions and is more stable to hyperparameter settings such as optimizers and data augmentations. Our loss function is simple to implement, and reference TensorFlow code is released at this https URL.

...read moreread less

Proceedings Article•

RandAugment: Practical Automated Data Augmentation with a Reduced Search Space

[...]

Ekin D. Cubuk¹, Barret Zoph¹, Jonathon Shlens¹, Quoc V. Le¹•Institutions (1)

Google¹

01 Jan 2020

TL;DR: This work proposes a simplified search space that vastly reduces the computational expense of automated augmentation, and permits the removal of a separate proxy task.

...read moreread less

Abstract: Recent work has shown that data augmentation has the potential to significantly improve the generalization of deep learning models. Recently, automated augmentation strategies have led to state-of-the-art results in image classification and object detection. While these strategies were optimized for improving validation accuracy, they also led to state-of-the-art results in semi-supervised learning and improved robustness to common corruptions of images. An obstacle to a large-scale adoption of these methods is a separate search phase which increases the training complexity and may substantially increase the computational cost. Additionally, due to the separate search phase, these approaches are unable to adjust the regularization strength based on model or dataset size. Automated augmentation policies are often found by training small models on small datasets and subsequently applied to train larger models. In this work, we remove both of these obstacles. RandAugment has a significantly reduced search space which allows it to be trained on the target task with no need for a separate proxy task. Furthermore, due to the parameterization, the regularization strength may be tailored to different model and dataset sizes. RandAugment can be used uniformly across different tasks and datasets and works out of the box, matching or surpassing all previous automated augmentation approaches on CIFAR-10/100, SVHN, and ImageNet. On the ImageNet dataset we achieve 85.0% accuracy, a 0.6% increase over the previous state-of-the-art and 1.0% increase over baseline augmentation. On object detection, RandAugment leads to 1.0-1.3% improvement over baseline augmentation, and is within 0.3% mAP of AutoAugment on COCO. Finally, due to its interpretable hyperparameter, RandAugment may be used to investigate the role of data augmentation with varying model and dataset size. Code is available online.

...read moreread less

Journal Article•DOI•

International evaluation of an AI system for breast cancer screening.

[...]

Scott Mayer McKinney¹, Marcin Sieniek¹, Varun Godbole¹, Jonathan Godwin, Natasha Antropova, Hutan Ashrafian², Trevor Back, Mary Chesus, Greg C. Corrado¹, Ara Darzi², Mozziyar Etemadi, Florencia Garcia-Vicente, Fiona J. Gilbert³, Mark D. Halling-Brown⁴, Demis Hassabis, Sunny Jansen, Alan Karthikesalingam¹, Christopher Kelly¹, Dominic King¹, Joseph R. Ledsam, David S. Melnick, Hormuz Mostofi¹, Lily Peng¹, Joshua J. Reicher⁵, Bernardino Romera-Paredes, Richard Sidebottom⁶, Mustafa Suleyman, Daniel Tse¹, Kenneth C. Young⁴, Jeffrey De Fauw, Shravya Shetty¹ - Show less +27 more•Institutions (6)

Google¹, Imperial College London², University of Cambridge³, Royal Surrey County Hospital⁴, Veterans Health Administration⁵, The Royal Marsden NHS Foundation Trust⁶

01 Jan 2020-Nature

TL;DR: A robust assessment of the AI system paves the way for clinical trials to improve the accuracy and efficiency of breast cancer screening and using a combination of AI and human inputs could help to improve screening efficiency.

...read moreread less

Abstract: Screening mammography aims to identify breast cancer at earlier stages of the disease, when treatment can be more successful1. Despite the existence of screening programmes worldwide, the interpretation of mammograms is affected by high rates of false positives and false negatives2. Here we present an artificial intelligence (AI) system that is capable of surpassing human experts in breast cancer prediction. To assess its performance in the clinical setting, we curated a large representative dataset from the UK and a large enriched dataset from the USA. We show an absolute reduction of 5.7% and 1.2% (USA and UK) in false positives and 9.4% and 2.7% in false negatives. We provide evidence of the ability of the system to generalize from the UK to the USA. In an independent study of six radiologists, the AI system outperformed all of the human readers: the area under the receiver operating characteristic curve (AUC-ROC) for the AI system was greater than the AUC-ROC for the average radiologist by an absolute margin of 11.5%. We ran a simulation in which the AI system participated in the double-reading process that is used in the UK, and found that the AI system maintained non-inferior performance and reduced the workload of the second reader by 88%. This robust assessment of the AI system paves the way for clinical trials to improve the accuracy and efficiency of breast cancer screening. An artificial intelligence (AI) system performs as well as or better than radiologists at detecting breast cancer from mammograms, and using a combination of AI and human inputs could help to improve screening efficiency.

...read moreread less

Proceedings Article•

A Simple Framework for Contrastive Learning of Visual Representations

[...]

Ting Chen¹, Simon Kornblith¹, Mohammad Norouzi¹, Geoffrey E. Hinton¹•Institutions (1)

Google¹

12 Jul 2020

TL;DR: SimCLR as mentioned in this paper is a simple framework for contrastive learning of visual representations and achieves state-of-the-art performance on ImageNet. But it requires large batch sizes and more training steps compared to supervised learning.

...read moreread less

Proceedings Article•

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

[...]

Kihyuk Sohn¹, David Berthelot², Chun-Liang Li³, Zizhao Zhang⁴, Nicholas Carlini², Ekin D. Cubuk², Alex Kurakin², Han Zhang², Colin Raffel² - Show less +5 more•Institutions (4)

NEC¹, Google², Carnegie Mellon University³, University of Florida⁴

21 Jan 2020

TL;DR: This paper demonstrates the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling, and shows that FixMatch achieves state-of-the-art performance across a variety of standard semi-supervised learning benchmarks.

...read moreread less

Abstract: Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance. In this paper, we demonstrate the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling. Our algorithm, FixMatch, first generates pseudo-labels using the model's predictions on weakly-augmented unlabeled images. For a given image, the pseudo-label is only retained if the model produces a high-confidence prediction. The model is then trained to predict the pseudo-label when fed a strongly-augmented version of the same image. Despite its simplicity, we show that FixMatch achieves state-of-the-art performance across a variety of standard semi-supervised learning benchmarks, including 94.93% accuracy on CIFAR-10 with 250 labels and 88.61% accuracy with 40 -- just 4 labels per class. Since FixMatch bears many similarities to existing SSL methods that achieve worse performance, we carry out an extensive ablation study to tease apart the experimental factors that are most important to FixMatch's success. We make our code available at this https URL.

...read moreread less

Posted Content•

Conformer: Convolution-augmented Transformer for Speech Recognition

[...]

Anmol Gulati¹, James Qin¹, Chung-Cheng Chiu¹, Niki Parmar¹, Yu Zhang¹, Jiahui Yu², Wei Han¹, Shibo Wang, Zhengdong Zhang¹, Yonghui Wu¹, Ruoming Pang¹ - Show less +7 more•Institutions (2)

Google¹, Adobe Systems²

16 May 2020-arXiv: Audio and Speech Processing

TL;DR: This work proposes the convolution-augmented transformer for speech recognition, named Conformer, which significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies.

...read moreread less

Abstract: Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. In this work, we achieve the best of both worlds by studying how to combine convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way. To this regard, we propose the convolution-augmented transformer for speech recognition, named Conformer. Conformer significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies. On the widely used LibriSpeech benchmark, our model achieves WER of 2.1%/4.3% without using a language model and 1.9%/3.9% with an external language model on test/testother. We also observe competitive performance of 2.7%/6.3% with a small model of only 10M parameters.

...read moreread less

Posted Content•

Score-Based Generative Modeling through Stochastic Differential Equations

[...]

Yang Song¹, Jascha Sohl-Dickstein², Diederik P. Kingma², Abhishek Kumar², Stefano Ermon¹, Ben Poole² - Show less +2 more•Institutions (2)

Stanford University¹, Google²

26 Nov 2020-arXiv: Learning

TL;DR: This work presents a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by Slowly removing the noise.

...read moreread less

Abstract: Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.

...read moreread less

Posted Content•

Big Self-Supervised Models are Strong Semi-Supervised Learners

[...]

Ting Chen¹, Simon Kornblith¹, Kevin Swersky¹, Mohammad Norouzi¹, Geoffrey E. Hinton¹ - Show less +1 more•Institutions (1)

Google¹

17 Jun 2020-arXiv: Learning

TL;DR: The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2 (a modification of SimCLRs), supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge.

...read moreread less

Abstract: One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to common approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet A key ingredient of our approach is the use of big (deep and wide) networks during pretraining and fine-tuning We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network After fine-tuning, the big network can be further improved and distilled into a much smaller one with little loss in classification accuracy by using the unlabeled examples for a second time, but in a task-specific way The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2, supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge This procedure achieves 739% ImageNet top-1 accuracy with just 1% of the labels ($\le$13 labeled images per class) using ResNet-50, a $10\times$ improvement in label efficiency over the previous state-of-the-art With 10% of labels, ResNet-50 trained with our method achieves 775% top-1 accuracy, outperforming standard supervised training with all of the labels

...read moreread less

Book Chapter•DOI•

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

[...]

Ben Mildenhall¹, Pratul P. Srinivasan¹, Matthew Tancik¹, Jonathan T. Barron², Ravi Ramamoorthi³, Ren Ng¹ - Show less +2 more•Institutions (3)

University of California, Berkeley¹, Google², University of California, San Diego³

23 Aug 2020

TL;DR: In this article, a fully-connected (non-convolutional) deep network is used to synthesize novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views.

...read moreread less

Abstract: We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x, y, z) and viewing direction $(\theta ,\phi )$) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.

...read moreread less

Posted Content•

Big Bird: Transformers for Longer Sequences

[...]

Manzil Zaheer, Guru Guruganesh, Avinava Dubey¹, Joshua Ainslie, Chris Alberti, Santiago Ontañón, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed - Show less +7 more•Institutions (1)

Google¹

28 Jul 2020-arXiv: Learning

TL;DR: It is shown that BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model.

...read moreread less

Abstract: Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our theoretical analysis reveals some of the benefits of having $O(1)$ global tokens (such as CLS), that attend to the entire sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to 8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context, BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also propose novel applications to genomics data.

...read moreread less

Proceedings Article•

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

[...]

Sai Praneeth Karimireddy¹, Satyen Kale², Mehryar Mohri³, Sashank J. Reddi², Sebastian U. Stich¹, Ananda Theertha Suresh² - Show less +2 more•Institutions (3)

École Polytechnique Fédérale de Lausanne¹, Google², Courant Institute of Mathematical Sciences³

12 Jul 2020

TL;DR: This work obtains tight convergence rates for FedAvg and proves that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence, and proposes a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the ` client-drifts' in its local updates.

...read moreread less

Abstract: Federated Averaging (FedAvg) has emerged as the algorithm of choice for federated learning due to its simplicity and low communication cost. However, in spite of recent research efforts, its performance is not fully understood. We obtain tight convergence rates for FedAvg and prove that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence. As a solution, we propose a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the `client-drift' in its local updates. We prove that SCAFFOLD requires significantly fewer communication rounds and is not affected by data heterogeneity or client sampling. Further, we show that (for quadratics) SCAFFOLD can take advantage of similarity in the client's data yielding even faster convergence. The latter is the first result to quantify the usefulness of local-steps in distributed optimization.

...read moreread less

Posted Content•

Reformer: The Efficient Transformer

[...]

Nikita Kitaev¹, Łukasz Kaiser², Anselm Levskaya²•Institutions (2)

University of California, Berkeley¹, Google²

13 Jan 2020-arXiv: Learning

TL;DR: The Reformer as discussed by the authors uses locality-sensitive hashing to improve the efficiency of Transformers and achieves state-of-the-art results on a number of tasks, but training these models can be prohibitively costly.

...read moreread less

Abstract: Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L\log L$), where $L$ is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of $N$ times, where $N$ is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.

...read moreread less

Journal Article•DOI•

Variational Quantum Algorithms

[...]

Marco Cerezo¹, Marco Cerezo², Andrew Arrasmith¹, Andrew Arrasmith², Ryan Babbush³, Simon C. Benjamin⁴, Suguru Endo⁵, Keisuke Fujii⁶, Jarrod R. McClean³, Kosuke Mitarai⁷, Kosuke Mitarai⁶, Xiao Yuan⁸, Xiao Yuan⁹, Lukasz Cincio¹, Lukasz Cincio², Patrick J. Coles¹, Patrick J. Coles² - Show less +13 more•Institutions (9)

Los Alamos National Laboratory¹, Oak Ridge National Laboratory², Google³, University of Oxford⁴, Nippon Telegraph and Telephone⁵, Osaka University⁶, National Presto Industries⁷, Peking University⁸, Stanford University⁹

16 Dec 2020-arXiv: Quantum Physics

TL;DR: An overview of the field of Variational Quantum Algorithms is presented and strategies to overcome their challenges as well as the exciting prospects for using them as a means to obtain quantum advantage are discussed.

...read moreread less

Abstract: Applications such as simulating complicated quantum systems or solving large-scale linear algebra problems are very challenging for classical computers due to the extremely high computational cost. Quantum computers promise a solution, although fault-tolerant quantum computers will likely not be available in the near future. Current quantum devices have serious constraints, including limited numbers of qubits and noise processes that limit circuit depth. Variational Quantum Algorithms (VQAs), which use a classical optimizer to train a parametrized quantum circuit, have emerged as a leading strategy to address these constraints. VQAs have now been proposed for essentially all applications that researchers have envisioned for quantum computers, and they appear to the best hope for obtaining quantum advantage. Nevertheless, challenges remain including the trainability, accuracy, and efficiency of VQAs. Here we overview the field of VQAs, discuss strategies to overcome their challenges, and highlight the exciting prospects for using them to obtain quantum advantage.

...read moreread less

Journal Article•DOI•

A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges

[...]

Moloud Abdar¹, Farhad Pourpanah², Sadiq Hussain³, Dana Rezazadegan⁴, Li Liu⁵, Mohammad Ghavamzadeh⁶, Paul Fieguth⁷, Xiaochun Cao⁸, Abbas Khosravi¹, U. Rajendra Acharya⁹, U. Rajendra Acharya¹⁰, U. Rajendra Acharya¹¹, Vladimir Makarenkov¹², Saeid Nahavandi¹ - Show less +10 more•Institutions (12)

Deakin University¹, Shenzhen University², Dibrugarh University³, Swinburne University of Technology⁴, University of Oulu⁵, Google⁶, University of Waterloo⁷, Chinese Academy of Sciences⁸, Asia University (Taiwan)⁹, Ngee Ann Polytechnic¹⁰, National University of Singapore¹¹, Université du Québec¹²

12 Nov 2020-arXiv: Learning

TL;DR: This study reviews recent advances in UQ methods used in deep learning and investigates the application of these methods in reinforcement learning (RL), and outlines a few important applications of UZ methods.

...read moreread less

Abstract: Uncertainty quantification (UQ) plays a pivotal role in reduction of uncertainties during both optimization and decision making processes. It can be applied to solve a variety of real-world applications in science and engineering. Bayesian approximation and ensemble learning techniques are two most widely-used UQ methods in the literature. In this regard, researchers have proposed different UQ methods and examined their performance in a variety of applications such as computer vision (e.g., self-driving cars and object detection), image processing (e.g., image restoration), medical image analysis (e.g., medical image classification and segmentation), natural language processing (e.g., text classification, social media texts and recidivism risk-scoring), bioinformatics, etc. This study reviews recent advances in UQ methods used in deep learning. Moreover, we also investigate the application of these methods in reinforcement learning (RL). Then, we outline a few important applications of UQ methods. Finally, we briefly highlight the fundamental research challenges faced by UQ methods and discuss the future research directions in this field.

...read moreread less

Proceedings Article•DOI•

Scalability in Perception for Autonomous Driving: Waymo Open Dataset

[...]

Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine¹, Vijay K. Vasudevan¹, Wei Han¹, Jiquan Ngiam¹, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang, Jonathon Shlens¹, Zhifeng Chen¹, Dragomir Anguelov - Show less +19 more•Institutions (1)

Google¹

14 Jun 2020

TL;DR: In this paper, a large scale, high quality, and diverse dataset for self-driving data is presented, consisting of LiDAR and camera data captured across a range of urban and suburban geographies.

...read moreread less

Abstract: The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the over-all viability of the technology. In an effort to help align the research community’s contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest camera+LiDAR dataset available based on our proposed diversity metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at http://www.waymo.com/open.

...read moreread less

Posted Content•

Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains

[...]

Matthew Tancik¹, Pratul P. Srinivasan¹, Ben Mildenhall¹, Sara Fridovich-Keil¹, Nithin Raghavan, Utkarsh Singhal¹, Ravi Ramamoorthi², Jonathan T. Barron³, Ren Ng¹ - Show less +5 more•Institutions (3)

University of California, Berkeley¹, University of California, San Diego², Google³

18 Jun 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: An approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities is suggested.

...read moreread less

Abstract: We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron (MLP) to learn high-frequency functions in low-dimensional problem domains These results shed light on recent advances in computer vision and graphics that achieve state-of-the-art results by using MLPs to represent complex 3D objects and scenes Using tools from the neural tangent kernel (NTK) literature, we show that a standard MLP fails to learn high frequencies both in theory and in practice To overcome this spectral bias, we use a Fourier feature mapping to transform the effective NTK into a stationary kernel with a tunable bandwidth We suggest an approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities

...read moreread less

Proceedings Article•DOI•

Randaugment: Practical automated data augmentation with a reduced search space

[...]

Ekin D. Cubuk¹, Barret Zoph¹, Jonathon Shlens¹, Quoc V. Le¹•Institutions (1)

Google¹

14 Jun 2020

TL;DR: EfficientNet-B7 as mentioned in this paper proposes a simplified search space that greatly reduces the computational expense of automated augmentation, and permits the removal of a separate proxy task, and achieves state-of-the-art results in image classification and object detection.

...read moreread less

Abstract: Recent work on automated augmentation strategies has led to state-of-the-art results in image classification and object detection. An obstacle to a large-scale adoption of these methods is that they require a separate and expensive search phase. A common way to overcome the expense of the search phase was to use a smaller proxy task. However, it was not clear if the optimized hyperparameters found on the proxy task are also optimal for the actual task. In this work, we rethink the process of designing automated augmentation strategies. We find that while previous work required a search for both magnitude and probability of each operation independently, it is sufficient to only search for a single distortion magnitude that jointly controls all operations. We hence propose a simplified search space that vastly reduces the computational expense of automated augmentation, and permits the removal of a separate proxy task. Despite the simplifications, our method achieves equal or better performance over previous automated augmentation strategies on on CIFAR-10/100, SVHN, ImageNet and COCO datasets. EfficientNet-B7, we achieve 85.0% accuracy, a 1.0% increase over baseline augmentation, a 0.6% improvement over AutoAugment on the ImageNet dataset. With EfficientNet-B8, we achieve 85.4% accuracy on ImageNet, which matches a previous result that used 3.5B extra images. On object detection, the same method as classification leads to 1.0-1.3% improvement over baseline augmentation. Code will be made available online.

...read moreread less

Collapse