Home
/
Authors
/
Wan-Yen Lo

Author

Wan-Yen Lo

Bio: Wan-Yen Lo is an academic researcher from Facebook. The author has contributed to research in topics: Network planning and design & Hardware acceleration. The author has an hindex of 3, co-authored 5 publications receiving 206 citations.

Papers

PDF

Open Access

More filters

Posted Content•

Accelerating 3D Deep Learning with PyTorch3D

[...]

Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, Georgia Gkioxari - Show less +3 more

16 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: 1. Accelerating 3D Deep Learning with PyTorch3D, arXiv 2007 2. Mesh R-CNN, ICCV 2019 3. SynSin: End-to-end View Synthesis from a Single Image, CVPR 2020 4. Fast Differentiable Raycasting for Neural Rendering using Sphere-based Representations.

...read moreread less

Abstract: Deep learning has significantly improved 2D image recognition. Extending into 3D may advance many new applications including autonomous vehicles, virtual and augmented reality, authoring 3D content, and even improving 2D recognition. However despite growing interest, 3D deep learning remains relatively underexplored. We believe that some of this disparity is due to the engineering challenges involved in 3D deep learning, such as efficiently processing heterogeneous data and reframing graphics operations to be differentiable. We address these challenges by introducing PyTorch3D, a library of modular, efficient, and differentiable operators for 3D deep learning. It includes a fast, modular differentiable renderer for meshes and point clouds, enabling analysis-by-synthesis approaches. Compared with other differentiable renderers, PyTorch3D is more modular and efficient, allowing users to more easily extend it while also gracefully scaling to large meshes and images. We compare the PyTorch3D operators and renderer with other implementations and demonstrate significant speed and memory improvements. We also use PyTorch3D to improve the state-of-the-art for unsupervised 3D mesh and point cloud prediction from 2D images on ShapeNet. PyTorch3D is open-source and we hope it will help accelerate research in 3D deep learning.

...read moreread less

430 citations

Proceedings Article•DOI•

On Network Design Spaces for Visual Recognition

[...]

Ilija Radosavovic¹, Justin Johnson¹, Saining Xie¹, Wan-Yen Lo¹, Piotr Dollár - Show less +1 more•Institutions (1)

Facebook¹

01 Oct 2019

TL;DR: A new comparison paradigm of distribution estimates is introduced, in which network design spaces are compared by applying statistical techniques to populations of sampled models, while controlling for confounding factors like network complexity.

...read moreread less

Abstract: Over the past several years progress in designing better neural network architectures for visual recognition has been substantial. To help sustain this rate of progress, in this work we propose to reexamine the methodology for comparing network architectures. In particular, we introduce a new comparison paradigm of distribution estimates, in which network design spaces are compared by applying statistical techniques to populations of sampled models, while controlling for confounding factors like network complexity. Compared to current methodologies of comparing point and curve estimates of model families, distribution estimates paint a more complete picture of the entire design landscape. As a case study, we examine design spaces used in neural architecture search (NAS). We find significant statistical differences between recent NAS design space variants that have been largely overlooked. Furthermore, our analysis reveals that the design spaces for standard model families like ResNeXt can be comparable to the more complex ones used in recent NAS work. We hope these insights into distribution analysis will enable more robust progress toward discovering better networks for visual recognition.

...read moreread less

81 citations

Posted Content•

On Network Design Spaces for Visual Recognition

[...]

Ilija Radosavovic¹, Justin Johnson¹, Saining Xie¹, Wan-Yen Lo¹, Piotr Dollár - Show less +1 more•Institutions (1)

Facebook¹

30 May 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors introduce a new comparison paradigm of distribution estimates, in which network design spaces are compared by applying statistical techniques to populations of sampled models, while controlling for confounding factors like network complexity.

...read moreread less

55 citations

Proceedings Article•DOI•

PyTorchVideo: A Deep Learning Library for Video Understanding

[...]

Haoqi Fan¹, Tullie Murrell¹, Heng Wang¹, Kalyan Vasudev Alwala¹, Yanghao Li¹, Yilei Li¹, Bo Xiong¹, Nikhila Ravi¹, Meng Li¹, Haichuan Yang¹, Jitendra Malik¹, Ross Girshick¹, Matt Feiszli¹, Aaron Adcock¹, Wan-Yen Lo¹, Christoph Feichtenhofer¹ - Show less +12 more•Institutions (1)

Facebook¹

17 Oct 2021

TL;DR: PyTorchVideo as discussed by the authors is an open-source deep learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.

...read moreread less

Abstract: We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing. The library covers a full stack of video understanding tools including multimodal data loading, transformations, and models that reproduce state-of-the-art performance. PyTorchVideo further supports hardware acceleration that enables real-time inference on mobile devices. The library is based on PyTorch and can be used by any training framework; for example, PyTorchLightning, PySlowFast, or Classy Vision. PyTorchVideo is available at https://pytorchvideo.org/.

...read moreread less

34 citations

Posted Content•

PyTorchVideo: A Deep Learning Library for Video Understanding.

[...]

Haoqi Fan¹, Tullie Murrell¹, Heng Wang¹, Kalyan Vasudev Alwala¹, Yanghao Li¹, Yilei Li¹, Bo Xiong², Nikhila Ravi¹, Meng Li¹, Haichuan Yang¹, Jitendra Malik¹, Ross Girshick¹, Matt Feiszli¹, Aaron Adcock¹, Wan-Yen Lo¹, Christoph Feichtenhofer¹ - Show less +12 more•Institutions (2)

Facebook¹, Association for Computing Machinery²

18 Nov 2021-arXiv: Computer Vision and Pattern Recognition

...read moreread less

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

A ConvNet for the 2020s

[...]

Zhuang Liu, Hanzi Mao, Chaozheng Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie - Show less +2 more

10 Jan 2022

TL;DR: This work gradually “modernize” a standard ResNet toward the design of a vision Transformer, and discovers several key components that contribute to the performance difference along the way, leading to a family of pure ConvNet models dubbed ConvNeXt.

...read moreread less

Abstract: The “Roaring 20s” of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla ViT, on the other hand, faces difficulties when applied to general computer vision tasks such as object detection and semantic segmentation. It is the hierarchical Transformers (e.g., Swin Transformers) that reintroduced several ConvNet priors, making Transformers practically viable as a generic vision backbone and demonstrating remarkable performance on a wide variety of vision tasks. However, the effectiveness of such hybrid approaches is still largely credited to the intrinsic superiority of Transformers, rather than the inherent inductive biases of convolutions. In this work, we reexamine the design spaces and test the limits of what a pure ConvNet can achieve. We gradually “modernize” a standard ResNet toward the design of a vision Transformer, and discover several key components that contribute to the performance difference along the way. The outcome of this exploration is a family of pure ConvNet models dubbed ConvNeXt. Constructed entirely from standard ConvNet modules, ConvNeXts compete favorably with Transformers in terms of accuracy and scalability, achieving 87.8% ImageNet top-1 accuracy and outperforming Swin Transformers on COCO detection and ADE20K segmentation, while maintaining the simplicity and efficiency of standard ConvNets.

...read moreread less

1,203 citations

Proceedings Article•DOI•

Designing Network Design Spaces

[...]

Ilija Radosavovic¹, Raj Prateek Kosaraju¹, Ross Girshick¹, Kaiming He¹, Piotr Dollár¹ - Show less +1 more•Institutions (1)

Facebook¹

14 Jun 2020

TL;DR: The RegNet design space provides simple and fast networks that work well across a wide range of flop regimes, and outperform the popular EfficientNet models while being up to 5x faster on GPUs.

...read moreread less

Abstract: In this work, we present a new network design paradigm. Our goal is to help advance the understanding of network design and discover design principles that generalize across settings. Instead of focusing on designing individual network instances, we design network design spaces that parametrize populations of networks. The overall process is analogous to classic manual design of networks, but elevated to the design space level. Using our methodology we explore the structure aspect of network design and arrive at a low-dimensional design space consisting of simple, regular networks that we call RegNet. The core insight of the RegNet parametrization is surprisingly simple: widths and depths of good networks can be explained by a quantized linear function. We analyze the RegNet design space and arrive at interesting findings that do not match the current practice of network design. The RegNet design space provides simple and fast networks that work well across a wide range of flop regimes. Under comparable training settings and flops, the RegNet models outperform the popular EfficientNet models while being up to 5x faster on GPUs.

...read moreread less

1,041 citations

Proceedings Article•DOI•

Bottleneck Transformers for Visual Recognition

[...]

Aravind Srinivas¹, Tsung-Yi Lin², Niki Parmar², Jonathon Shlens², Pieter Abbeel¹, Ashish Vaswani² - Show less +2 more•Institutions (2)

University of California, Berkeley¹, Google²

20 Jun 2021

TL;DR: BoTNet as mentioned in this paper incorporates self-attention for image classification, object detection, and instance segmentation, and achieves state-of-the-art performance on the ImageNet benchmark.

...read moreread less

Abstract: We present BoTNet, a conceptually simple yet powerful backbone architecture that incorporates self-attention for multiple computer vision tasks including image classification, object detection and instance segmentation. By just replacing the spatial convolutions with global self-attention in the final three bottleneck blocks of a ResNet and no other changes, our approach improves upon the baselines significantly on instance segmentation and object detection while also reducing the parameters, with minimal overhead in latency. Through the design of BoTNet, we also point out how ResNet bottleneck blocks with self-attention can be viewed as Transformer blocks. Without any bells and whistles, BoTNet achieves 44.4% Mask AP and 49.7% Box AP on the COCO Instance Segmentation benchmark using the Mask R-CNN framework; surpassing the previous best published single model and single scale results of ResNeSt [67] evaluated on the COCO validation set. Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84.7% top-1 accuracy on the ImageNet benchmark while being up to 1.64x faster in "compute"1 time than the popular EfficientNet models on TPU-v3 hardware. We hope our simple and effective approach will serve as a strong baseline for future research in self-attention models for vision.2

...read moreread less

675 citations

Proceedings Article•DOI•

A ConvNet for the 2020s

[...]

01 Jun 2022

TL;DR: ConvNeXt as discussed by the authors is a family of pure ConvNet models, which compete favorably with Transformers in terms of accuracy and scalability, achieving 87.8% ImageNet top-1 accuracy and outperforming Swin Transformers on COCO detection and ADE20K segmentation.

...read moreread less

502 citations

Book Chapter•DOI•

Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

[...]

Haotian Tang¹, Zhijian Liu¹, Shengyu Zhao¹, Shengyu Zhao², Yujun Lin¹, Ji Lin¹, Hanrui Wang¹, Song Han¹ - Show less +4 more•Institutions (2)

Massachusetts Institute of Technology¹, Tsinghua University²

23 Aug 2020

TL;DR: In this paper, the authors propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch.

...read moreread less

Abstract: Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely. Given the limited hardware resources, existing 3D perception models are not able to recognize small instances (e.g., pedestrians, cyclists) very well due to the low-resolution voxelization and aggressive downsampling. To this end, we propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch. With negligible overhead, this point-based branch is able to preserve the fine details even from large outdoor scenes. To explore the spectrum of efficient 3D models, we first define a flexible architecture design space based on SPVConv, and we then present 3D Neural Architecture Search (3D-NAS) to search the optimal network architecture over this diverse design space efficiently and effectively. Experimental results validate that the resulting SPVNAS model is fast and accurate: it outperforms the state-of-the-art MinkowskiNet by 3.3%, ranking 1\(^\mathbf{st}\) on the competitive SemanticKITTI leaderboard\(^\star \). It also achieves 8–23\(\times \) computation reduction and 3\(\times \) measured speedup over MinkowskiNet and KPConv with higher accuracy. Finally, we transfer our method to 3D object detection, and it achieves consistent improvements over the one-stage detection baseline on KITTI.

...read moreread less

340 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119

Collapse