Showing papers by "Jia Deng published in 2021"

PDF

Open Access

Proceedings Article•DOI•

RAFT: Recurrent All-Pairs Field Transforms for Optical Flow (Extended Abstract).

[...]

Zachary Teed¹, Jia Deng¹•Institutions (1)

Princeton University¹

09 Aug 2021

42 citations

Proceedings Article•DOI•

Tangent Space Backpropagation for 3D Transformation Groups

[...]

Zachary Teed¹, Jia Deng¹•Institutions (1)

Princeton University¹

01 Jun 2021

TL;DR: Lietorch as discussed by the authors exploits the group structure of 3D transformations and performs backpropagation in the tangent spaces of manifolds, which is numerically more stable, easier to implement, and beneficial to a diverse set of tasks.

...read moreread less

Abstract: We address the problem of performing backpropagation for computation graphs involving 3D transformation groups SO(3), SE(3), and Sim(3). 3D transformation groups are widely used in 3D vision and robotics, but they do not form vector spaces and instead lie on smooth manifolds. The standard backpropagation approach, which embeds 3D transformations in Euclidean spaces, suffers from numerical difficulties. We introduce a new library, which exploits the group structure of 3D transformations and performs backpropagation in the tangent spaces of manifolds. We show that our approach is numerically more stable, easier to implement, and beneficial to a diverse set of tasks. Our plug-and-play PyTorch library is available at https://github.com/princeton-vl/lietorch.

...read moreread less

28 citations

Proceedings Article•DOI•

RAFT-3D: Scene Flow using Rigid-Motion Embeddings

[...]

Zachary Teed¹, Jia Deng¹•Institutions (1)

Princeton University¹

01 Jun 2021

TL;DR: The authors proposed a new deep architecture for scene flow, based on the RAFT model developed for optical flow but iteratively updating a dense field of pixelwise SE3 motion instead of 2D motion.

...read moreread less

Abstract: We address the problem of scene flow: given a pair of stereo or RGB-D video frames, estimate pixelwise 3D motion. We introduce RAFT-3D, a new deep architecture for scene flow. RAFT-3D is based on the RAFT model developed for optical flow but iteratively updates a dense field of pixelwise SE3 motion instead of 2D motion. A key innovation of RAFT-3D is rigid-motion embeddings, which represent a soft grouping of pixels into rigid objects. Integral to rigid-motion embeddings is Dense-SE3, a differentiable layer that enforces geometric consistency of the embeddings. Experiments show that RAFT-3D achieves state-of-the-art performance. On FlyingThings3D, under the twoview evaluation, we improved the best published accuracy (δ < 0.05) from 34.3% to 83.7%. On KITTI, we achieve an error of 5.77, outperforming the best published method (6.31), despite using no object instance supervision.

...read moreread less

27 citations

Posted Content•

A Study of Face Obfuscation in ImageNet

[...]

Kaiyu Yang, Jacqueline Yau, Li Fei-Fei, Jia Deng, Olga Russakovsky - Show less +1 more

10 Mar 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors explore the effects of face obfuscation on the popular ImageNet challenge visual recognition benchmark and demonstrate that face blurring has minimal impact on the accuracy of recognition models.

...read moreread less

Abstract: Face obfuscation (blurring, mosaicing, etc.) has been shown to be effective for privacy protection; nevertheless, object recognition research typically assumes access to complete, unobfuscated images. In this paper, we explore the effects of face obfuscation on the popular ImageNet challenge visual recognition benchmark. Most categories in the ImageNet challenge are not people categories; however, many incidental people appear in the images, and their privacy is a concern. We first annotate faces in the dataset. Then we demonstrate that face blurring -- a typical obfuscation technique -- has minimal impact on the accuracy of recognition models. Concretely, we benchmark multiple deep neural networks on face-blurred images and observe that the overall recognition accuracy drops only slightly (no more than 0.68%). Further, we experiment with transfer learning to 4 downstream tasks (object recognition, scene recognition, face attribute classification, and object detection) and show that features learned on face-blurred images are equally transferable. Our work demonstrates the feasibility of privacy-aware visual recognition, improves the highly-used ImageNet challenge benchmark, and suggests an important path for future visual datasets. Data and code are available at this https URL.

...read moreread less

21 citations

Posted Content•

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline

[...]

Ankit Goyal¹, Hei Law¹, Bowei Liu¹, Alejandro Newell¹, Jia Deng¹ - Show less +1 more•Institutions (1)

Princeton University¹

09 Jun 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: PointNet++ as mentioned in this paper uses auxiliary factors like different evaluation schemes, data augmentation strategies, and loss functions, which are independent of the model architecture, to make a large difference in performance.

...read moreread less

Abstract: Processing point cloud data is an important component of many real-world systems. As such, a wide variety of point-based approaches have been proposed, reporting steady benchmark improvements over time. We study the key ingredients of this progress and uncover two critical results. First, we find that auxiliary factors like different evaluation schemes, data augmentation strategies, and loss functions, which are independent of the model architecture, make a large difference in performance. The differences are large enough that they obscure the effect of architecture. When these factors are controlled for, PointNet++, a relatively older network, performs competitively with recent methods. Second, a very simple projection-based method, which we refer to as SimpleView, performs surprisingly well. It achieves on par or better results than sophisticated state-of-the-art methods on ModelNet40 while being half the size of PointNet++. It also outperforms state-of-the-art methods on ScanObjectNN, a real-world point cloud benchmark, and demonstrates better cross-dataset generalization. Code is available at this https URL.

...read moreread less

6 citations

Posted Content•

Dynamically Grown Generative Adversarial Networks

[...]

Lanlan Liu¹, Yuting Zhang², Jia Deng³, Stefano Soatto²•Institutions (3)

University of Michigan¹, Amazon.com², Princeton University³

16 Jun 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors propose a method to dynamically grow a GAN during training, optimizing the network architecture and its parameters together with automation, which is an interleaving step with gradient-based training to periodically seek the optimal architecture growing strategy for the generator and discriminator.

...read moreread less

Abstract: Recent work introduced progressive network growing as a promising way to ease the training for large GANs, but the model design and architecture-growing strategy still remain under-explored and needs manual design for different image data In this paper, we propose a method to dynamically grow a GAN during training, optimizing the network architecture and its parameters together with automation The method embeds architecture search techniques as an interleaving step with gradient-based training to periodically seek the optimal architecture-growing strategy for the generator and discriminator It enjoys the benefits of both eased training because of progressive growing and improved performance because of broader architecture design space Experimental results demonstrate new state-of-the-art of image generation Observations in the search procedure also provide constructive insights into the GAN model design such as generator-discriminator balance and convolutional layer choices

...read moreread less

4 citations

Proceedings Article•

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline

[...]

Ankit Goyal¹, Hei Law¹, Bowei Liu¹, Alejandro Newell¹, Jia Deng¹ - Show less +1 more•Institutions (1)

Princeton University¹

04 May 2021

...read moreread less

Abstract: Processing point cloud data is an important component of many real-world systems. As such, a wide variety of point-based approaches have been proposed, reporting steady benchmark improvements over time. We study the key ingredients of this progress and uncover two critical results. First, we find that auxiliary factors like different evaluation schemes, data augmentation strategies, and loss functions, which are independent of the model architecture, make a large difference in performance. The differences are large enough that they obscure the effect of architecture. When these factors are controlled for, PointNet++, a relatively older network, performs competitively with recent methods. Second, a very simple projection-based method, which we refer to as SimpleView, performs surprisingly well. It achieves on par or better results than sophisticated state-of-the-art methods on ModelNet40, while being half the size of PointNet++. It also outperforms state-of-the-art methods on ScanObjectNN, a real-world point cloud benchmark, and demonstrates better cross-dataset generalization.

...read moreread less

2 citations

Posted Content•

Non-deep Networks

[...]

Ankit Goyal, Alexey Bochkovskiy, Jia Deng, Vladlen Koltun

14 Oct 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors use parallel subnetworks instead of stacking one layer after another, which helps effectively reduce depth while maintaining high performance and achieves state-of-the-art performance.

...read moreread less

Abstract: Depth is the hallmark of deep neural networks. But more depth means more sequential computation and higher latency. This begs the question -- is it possible to build high-performing "non-deep" neural networks? We show that it is. To do so, we use parallel subnetworks instead of stacking one layer after another. This helps effectively reduce depth while maintaining high performance. By utilizing parallel substructures, we show, for the first time, that a network with a depth of just 12 can achieve top-1 accuracy over 80% on ImageNet, 96% on CIFAR10, and 81% on CIFAR100. We also show that a network with a low-depth (12) backbone can achieve an AP of 48% on MS-COCO. We analyze the scaling rules for our design and show how to increase performance without changing the network's depth. Finally, we provide a proof of concept for how non-deep networks could be used to build low-latency recognition systems. Code is available at this https URL

...read moreread less

1 citations

Posted Content•

DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras

[...]

Zachary Teed¹, Jia Deng¹•Institutions (1)

Princeton University¹

24 Aug 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: DROID-SLAM as mentioned in this paper is a deep learning-based SLAM system that consists of recurrent iterative updates of camera pose and pixelwise depth through a Dense Bundle Adjustment layer.

...read moreread less

Abstract: We introduce DROID-SLAM, a new deep learning based SLAM system. DROID-SLAM consists of recurrent iterative updates of camera pose and pixelwise depth through a Dense Bundle Adjustment layer. DROID-SLAM is accurate, achieving large improvements over prior work, and robust, suffering from substantially fewer catastrophic failures. Despite training on monocular video, it can leverage stereo or RGB-D video to achieve improved performance at test time. The URL to our open source code is this https URL.

...read moreread less

1 citations

Proceedings Article•

Dynamically grown generative adversarial networks

[...]

Lanlan Liu¹, Yuting Zhang², Jia Deng³, Stefano Soatto²•Institutions (3)

University of Michigan¹, Amazon.com², Princeton University³

18 May 2021

TL;DR: In this article, the authors propose a method to dynamically grow a GAN during training, optimizing the network architecture and its parameters together with automation, which enjoys the benefits of both eased training because of progressive growing and improved performance because of broader architecture design space.

...read moreread less

Abstract: Recent work introduced progressive network growing as a promising way to ease the training for large GANs, but the model design and architecture-growing strategy still remain under-explored and needs manual design for different image data. In this paper, we propose a method to dynamically grow a GAN during training, optimizing the network architecture and its parameters together with automation. The method embeds architecture search techniques as an interleaving step with gradient-based training to periodically seek the optimal architecture-growing strategy for the generator and discriminator. It enjoys the benefits of both eased training because of progressive growing and improved performance because of broader architecture design space. Experimental results demonstrate new state-of-the-art of image generation. Observations in the search procedure also provide constructive insights into the GAN model design such as generator-discriminator balance and convolutional layer choices.

...read moreread less

1 citations

Posted Content•

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching.

[...]

Lahav Lipson, Zachary Teed, Jia Deng

15 Sep 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a new deep architecture for rectified stereo based on the optical flow network RAFT is introduced, which more efficiently propagates information across the image and ranks first on the Middlebury leaderboard, outperforming the next best method on 1px error.

...read moreread less

Abstract: We introduce RAFT-Stereo, a new deep architecture for rectified stereo based on the optical flow network RAFT. We introduce multi-level convolutional GRUs, which more efficiently propagate information across the image. A modified version of RAFT-Stereo can perform accurate real-time inference. RAFT-stereo ranks first on the Middlebury leaderboard, outperforming the next best method on 1px error by 29% and outperforms all published work on the ETH3D two-view stereo benchmark. Code is available at this https URL.

...read moreread less

Posted Content•

Tangent Space Backpropagation for 3D Transformation Groups

[...]

Zachary Teed¹, Jia Deng¹•Institutions (1)

Princeton University¹

22 Mar 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: PyTorch as mentioned in this paper performs backpropagation in the tangent spaces of manifolds by exploiting the group structure of 3D transformations and performs back propagation in the groups SO(3), SE(3) and Sim(3).

...read moreread less

Posted Content•

Learning Symbolic Rules for Reasoning in Quasi-Natural Language

[...]

Kaiyu Yang¹, Jia Deng¹•Institutions (1)

Princeton University¹

23 Nov 2021-arXiv: Learning

TL;DR: This article proposed MetaQNL, a quasi-natural language that can express both formal logic and natural language sentences, and MetaInduce, a learning algorithm that induces metaQNL rules from training data consisting of questions and answers with or without intermediate reasoning steps.

...read moreread less

Abstract: Symbolic reasoning, rule-based symbol manipulation, is a hallmark of human intelligence. However, rule-based systems have had limited success competing with learning-based systems outside formalized domains such as automated theorem proving. We hypothesize that this is due to the manual construction of rules in past attempts. In this work, we ask how we can build a rule-based system that can reason with natural language input but without the manual construction of rules. We propose MetaQNL, a "Quasi-Natural" language that can express both formal logic and natural language sentences, and MetaInduce, a learning algorithm that induces MetaQNL rules from training data consisting of questions and answers, with or without intermediate reasoning steps. Our approach achieves state-of-the-art accuracy on multiple reasoning benchmarks; it learns compact models with much less data and produces not only answers but also checkable proofs. Further, experiments on a real-world morphological analysis benchmark show that it is possible for our method to handle noise and ambiguity. Code will be released at this https URL.

...read moreread less