Top 14 papers published by Kaiming He from Facebook in 2020

Journal Article•DOI•

[...]

Tsung-Yi Lin¹, Priya Goyal¹, Ross Girshick¹, Kaiming He¹, Piotr Dollár¹ - Show less +1 more•Institutions (1)

01 Feb 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Focal loss as discussed by the authors focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training, which improves the accuracy of one-stage detectors.

...read moreread less

Abstract: The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors. Code is at: https://github.com/facebookresearch/Detectron .

...read moreread less

5,734 citations

Proceedings Article•DOI•

Momentum Contrast for Unsupervised Visual Representation Learning

[...]

Kaiming He¹, Haoqi Fan¹, Yuxin Wu¹, Saining Xie¹, Ross Girshick¹ - Show less +1 more•Institutions (1)

Facebook¹

14 Jun 2020

TL;DR: This article proposed Momentum Contrast (MoCo) for unsupervised visual representation learning, which enables building a large and consistent dictionary on-the-fly that facilitates contrastive learning.

...read moreread less

Abstract: We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.

...read moreread less

4,128 citations

Posted Content•

Improved Baselines with Momentum Contrastive Learning

[...]

Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He

09 Mar 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: With simple modifications to MoCo, this note establishes stronger baselines that outperform SimCLR and do not require large training batches, and hopes this will make state-of-the-art unsupervised learning research more accessible.

...read moreread less

Abstract: Contrastive unsupervised learning has recently shown encouraging progress, e.g., in Momentum Contrast (MoCo) and SimCLR. In this note, we verify the effectiveness of two of SimCLR's design improvements by implementing them in the MoCo framework. With simple modifications to MoCo---namely, using an MLP projection head and more data augmentation---we establish stronger baselines that outperform SimCLR and do not require large training batches. We hope this will make state-of-the-art unsupervised learning research more accessible. Code will be made public.

...read moreread less

1,947 citations

Posted Content•

Exploring Simple Siamese Representation Learning

[...]

Xinlei Chen¹, Kaiming He¹•Institutions (1)

Facebook¹

20 Nov 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: Surprising empirical results are reported that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders.

...read moreread less

Abstract: Siamese networks have become a common structure in various recent models for unsupervised visual representation learning. These models maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions. In this paper, we report surprising empirical results that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders. Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing. We provide a hypothesis on the implication of stop-gradient, and further show proof-of-concept experiments verifying it. Our "SimSiam" method achieves competitive results on ImageNet and downstream tasks. We hope this simple baseline will motivate people to rethink the roles of Siamese architectures for unsupervised representation learning. Code will be made available.

...read moreread less

1,733 citations

Journal Article•DOI•

Mask R-CNN

[...]

Kaiming He¹, Georgia Gkioxari¹, Piotr Dollár¹, Ross Girshick¹•Institutions (1)

Facebook¹

01 Feb 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Mask R-CNN as discussed by the authors extends Faster-RCNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition, which achieves state-of-the-art performance in instance segmentation.

...read moreread less

Abstract: We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code has been made available at: https://github.com/facebookresearch/Detectron .

...read moreread less

1,506 citations

Proceedings Article•DOI•

Designing Network Design Spaces

[...]

Ilija Radosavovic¹, Raj Prateek Kosaraju¹, Ross Girshick¹, Kaiming He¹, Piotr Dollár¹ - Show less +1 more•Institutions (1)

Facebook¹

14 Jun 2020

TL;DR: The RegNet design space provides simple and fast networks that work well across a wide range of flop regimes, and outperform the popular EfficientNet models while being up to 5x faster on GPUs.

...read moreread less

Abstract: In this work, we present a new network design paradigm. Our goal is to help advance the understanding of network design and discover design principles that generalize across settings. Instead of focusing on designing individual network instances, we design network design spaces that parametrize populations of networks. The overall process is analogous to classic manual design of networks, but elevated to the design space level. Using our methodology we explore the structure aspect of network design and arrive at a low-dimensional design space consisting of simple, regular networks that we call RegNet. The core insight of the RegNet parametrization is surprisingly simple: widths and depths of good networks can be explained by a quantized linear function. We analyze the RegNet design space and arrive at interesting findings that do not match the current practice of network design. The RegNet design space provides simple and fast networks that work well across a wide range of flop regimes. Under comparable training settings and flops, the RegNet models outperform the popular EfficientNet models while being up to 5x faster on GPUs.

...read moreread less

1,041 citations

Proceedings Article•DOI•

PointRend: Image Segmentation As Rendering

[...]

Alexander Kirillov¹, Yuxin Wu¹, Kaiming He¹, Ross Girshick¹•Institutions (1)

Facebook¹

14 Jun 2020

TL;DR: PointRend as discussed by the authors proposes a point-based rendering module that performs segmentation predictions at adaptively selected locations based on an iterative subdivision algorithm, which produces crisp object boundaries in regions that are over-smoothed by previous methods.

...read moreread less

Abstract: We present a new method for efficient high-quality image segmentation of objects and scenes. By analogizing classical computer graphics methods for efficient rendering with over- and undersampling challenges faced in pixel labeling tasks, we develop a unique perspective of image segmentation as a rendering problem. From this vantage, we present the PointRend (Point-based Rendering) neural network module: a module that performs point-based segmentation predictions at adaptively selected locations based on an iterative subdivision algorithm. PointRend can be flexibly applied to both instance and semantic segmentation tasks by building on top of existing state-of-the-art models. While many concrete implementations of the general idea are possible, we show that a simple design already achieves excellent results. Qualitatively, PointRend outputs crisp object boundaries in regions that are over-smoothed by previous methods. Quantitatively, PointRend yields significant gains on COCO and Cityscapes, for both instance and semantic segmentation. PointRend's efficiency enables output resolutions that are otherwise impractical in terms of memory or computation compared to existing approaches. Code has been made available at https://github.com/facebookresearch/detectron2/tree/master/projects/PointRend.

...read moreread less

393 citations

Posted Content•

Designing Network Design Spaces

[...]

Ilija Radosavovic¹, Raj Prateek Kosaraju¹, Ross Girshick¹, Kaiming He¹, Piotr Dollár¹ - Show less +1 more•Institutions (1)

Facebook¹

30 Mar 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors propose a new network design paradigm called RegNet, where instead of focusing on designing individual network instances, they design network design spaces that parametrize populations of networks.

...read moreread less

Abstract: In this work, we present a new network design paradigm. Our goal is to help advance the understanding of network design and discover design principles that generalize across settings. Instead of focusing on designing individual network instances, we design network design spaces that parametrize populations of networks. The overall process is analogous to classic manual design of networks, but elevated to the design space level. Using our methodology we explore the structure aspect of network design and arrive at a low-dimensional design space consisting of simple, regular networks that we call RegNet. The core insight of the RegNet parametrization is surprisingly simple: widths and depths of good networks can be explained by a quantized linear function. We analyze the RegNet design space and arrive at interesting findings that do not match the current practice of network design. The RegNet design space provides simple and fast networks that work well across a wide range of flop regimes. Under comparable training settings and flops, the RegNet models outperform the popular EfficientNet models while being up to 5x faster on GPUs.

...read moreread less

99 citations

Book Chapter•DOI•

Are Labels Necessary for Neural Architecture Search

[...]

Chenxi Liu¹, Piotr Dollár², Kaiming He², Ross Girshick², Alan L. Yuille¹, Saining Xie² - Show less +2 more•Institutions (2)

Johns Hopkins University¹, Facebook²

23 Aug 2020

TL;DR: The potentially surprising finding that labels are not necessary, and the image statistics alone may be sufficient to identify good neural architectures is revealed.

...read moreread less

Abstract: Existing neural network architectures in computer vision—whether designed by humans or by machines—were typically found using both images and their associated labels. In this paper, we ask the question: can we find high-quality neural architectures using only images, but no human-annotated labels? To answer this question, we first define a new setup called Unsupervised Neural Architecture Search (UnNAS). We then conduct two sets of experiments. In sample-based experiments, we train a large number (500) of diverse architectures with either supervised or unsupervised objectives, and find that the architecture rankings produced with and without labels are highly correlated. In search-based experiments, we run a well-established NAS algorithm (DARTS) using various unsupervised objectives, and report that the architectures searched without labels can be competitive to their counterparts searched with labels. Together, these results reveal the potentially surprising finding that labels are not necessary, and the image statistics alone may be sufficient to identify good neural architectures.

...read moreread less

62 citations

Proceedings Article•DOI•

A Multigrid Method for Efficiently Training Video Models

[...]

Chao-Yuan Wu¹, Ross Girshick², Kaiming He², Christoph Feichtenhofer², Philipp Krähenbühl¹ - Show less +1 more•Institutions (2)

University of Texas at Austin¹, Facebook²

14 Jun 2020

TL;DR: Inspired by multigrid methods in numerical optimization, this work proposes to use variable mini-batch shapes with different spatial-temporal resolutions that are varied according to a schedule to speed up competitive deep video models training.

...read moreread less

Abstract: Training competitive deep video models is an order of magnitude slower than training their counterpart image models. Slow training causes long research cycles, which hinders progress in video understanding research. Following standard practice for training image models, video model training has used a fixed mini-batch shape: a specific number of clips, frames, and spatial size. However, what is the optimal shape? High resolution models perform well, but train slowly. Low resolution models train faster, but are less accurate. Inspired by multigrid methods in numerical optimization, we propose to use variable mini-batch shapes with different spatial-temporal resolutions that are varied according to a schedule. The different shapes arise from resampling the training data on multiple sampling grids. Training is accelerated by scaling up the mini-batch size and learning rate when shrinking the other dimensions. We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU). As an illustrative example, the proposed multigrid method trains a ResNet-50 SlowFast network 4.5x faster (wall-clock time, same hardware) while also improving accuracy (+0.8% absolute) on Kinetics-400 compared to baseline training. Code is available online.

...read moreread less

59 citations

Posted Content•

Graph Structure of Neural Networks

[...]

Jiaxuan You, Jure Leskovec, Kaiming He, Saining Xie

13 Jul 2020-arXiv: Learning

TL;DR: A novel graph-based representation of neural networks called relational graph is developed, where layers of neural network computation correspond to rounds of message exchange along the graph structure, which shows that a "sweet spot" of relational graphs leads to neural networks with significantly improved predictive performance.

...read moreread less

Abstract: Neural networks are often represented as graphs of connections between neurons. However, despite their wide use, there is currently little understanding of the relationship between the graph structure of the neural network and its predictive performance. Here we systematically investigate how does the graph structure of neural networks affect their predictive performance. To this end, we develop a novel graph-based representation of neural networks called relational graph, where layers of neural network computation correspond to rounds of message exchange along the graph structure. Using this representation we show that: (1) a "sweet spot" of relational graphs leads to neural networks with significantly improved predictive performance; (2) neural network's performance is approximately a smooth function of the clustering coefficient and average path length of its relational graph; (3) our findings are consistent across many different tasks and datasets; (4) the sweet spot can be identified efficiently; (5) top-performing neural networks have graph structure surprisingly similar to those of real biological neural networks. Our work opens new directions for the design of neural architectures and the understanding on neural networks in general.

...read moreread less

Proceedings Article•

Graph Structure of Neural Networks

[...]

Jiaxuan You¹, Kaiming He², Jure Leskovec¹, Saining Xie²•Institutions (2)

Stanford University¹, Facebook²

12 Jul 2020

TL;DR: In this paper, the authors developed a graph-based representation of neural networks called relational graph, where layers of neural network computation correspond to rounds of message exchange along the graph structure. And they showed that a "sweet spot" of relational graphs leads to neural networks with significantly improved predictive performance.

...read moreread less

Abstract: Neural networks are often represented as graphs of connections between neurons. However, despite their wide use, there is currently little understanding of the relationship between the graph structure of the neural network and its predictive performance. Here we systematically investigate how does the graph structure of neural networks affect their predictive performance. To this end, we develop a novel graph-based representation of neural networks called relational graph, where layers of neural network computation correspond to rounds of message exchange along the graph structure. Using this representation we show that: (1) a "sweet spot" of relational graphs leads to neural networks with significantly improved predictive performance; (2) neural network's performance is approximately a smooth function of the clustering coefficient and average path length of its relational graph; (3) our findings are consistent across many different tasks and datasets; (4) the sweet spot can be identified efficiently; (5) top-performing neural networks have graph structure surprisingly similar to those of real biological neural networks. Our work opens new directions for the design of neural architectures and the understanding on neural networks in general.

...read moreread less

Patent•

Method and system for using machine-learning for object instance segmentation

[...]

Kaiming He¹, Georgia Gkioxari, Piotr Dollár, Ross Girshick•Institutions (1)

Facebook¹

14 Jul 2020

TL;DR: In this article, an instance segmentation mask associated with the region of interest is generated by processing the regional feature map using a second neural network. But the second network is configured to generate instance segmentations for object instances depicted in images.

...read moreread less

Abstract: In one embodiment, a method includes a computing system accessing a training image. The system may generate a feature map for the training image using a first neural network. The system may identify a region of interest in the feature map and generate a regional feature map for the region of interest based on sampling locations defined by a sampling region. The sampling region and the region of interest may correspond to the same region in the feature map. The system may generate an instance segmentation mask associated with the region of interest by processing the regional feature map using a second neural network. The second neural network may be trained using the instance segmentation mask. Once trained, the second neural network is configured to generate instance segmentation masks for object instances depicted in images.

...read moreread less

Posted Content•

Are Labels Necessary for Neural Architecture Search

[...]

Chenxi Liu¹, Piotr Dollár², Kaiming He², Ross Girshick², Alan L. Yuille¹, Saining Xie² - Show less +2 more•Institutions (2)

Johns Hopkins University¹, Facebook²

26 Mar 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors define a new setup called Unsupervised Neural Architecture Search (UnNAS) and conduct two sets of experiments to find high-quality neural architectures using only images, but no human-annotated labels.

...read moreread less

Abstract: Existing neural network architectures in computer vision -- whether designed by humans or by machines -- were typically found using both images and their associated labels. In this paper, we ask the question: can we find high-quality neural architectures using only images, but no human-annotated labels? To answer this question, we first define a new setup called Unsupervised Neural Architecture Search (UnNAS). We then conduct two sets of experiments. In sample-based experiments, we train a large number (500) of diverse architectures with either supervised or unsupervised objectives, and find that the architecture rankings produced with and without labels are highly correlated. In search-based experiments, we run a well-established NAS algorithm (DARTS) using various unsupervised objectives, and report that the architectures searched without labels can be competitive to their counterparts searched with labels. Together, these results reveal the potentially surprising finding that labels are not necessary, and the image statistics alone may be sufficient to identify good neural architectures.

...read moreread less

Showing papers by "Kaiming He published in 2020"