scispace - formally typeset
Search or ask a question

Showing papers by "Jia Deng published in 2021"



Proceedings ArticleDOI
01 Jun 2021
TL;DR: Lietorch as discussed by the authors exploits the group structure of 3D transformations and performs backpropagation in the tangent spaces of manifolds, which is numerically more stable, easier to implement, and beneficial to a diverse set of tasks.
Abstract: We address the problem of performing backpropagation for computation graphs involving 3D transformation groups SO(3), SE(3), and Sim(3). 3D transformation groups are widely used in 3D vision and robotics, but they do not form vector spaces and instead lie on smooth manifolds. The standard backpropagation approach, which embeds 3D transformations in Euclidean spaces, suffers from numerical difficulties. We introduce a new library, which exploits the group structure of 3D transformations and performs backpropagation in the tangent spaces of manifolds. We show that our approach is numerically more stable, easier to implement, and beneficial to a diverse set of tasks. Our plug-and-play PyTorch library is available at https://github.com/princeton-vl/lietorch.

28 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: The authors proposed a new deep architecture for scene flow, based on the RAFT model developed for optical flow but iteratively updating a dense field of pixelwise SE3 motion instead of 2D motion.
Abstract: We address the problem of scene flow: given a pair of stereo or RGB-D video frames, estimate pixelwise 3D motion. We introduce RAFT-3D, a new deep architecture for scene flow. RAFT-3D is based on the RAFT model developed for optical flow but iteratively updates a dense field of pixelwise SE3 motion instead of 2D motion. A key innovation of RAFT-3D is rigid-motion embeddings, which represent a soft grouping of pixels into rigid objects. Integral to rigid-motion embeddings is Dense-SE3, a differentiable layer that enforces geometric consistency of the embeddings. Experiments show that RAFT-3D achieves state-of-the-art performance. On FlyingThings3D, under the twoview evaluation, we improved the best published accuracy (δ < 0.05) from 34.3% to 83.7%. On KITTI, we achieve an error of 5.77, outperforming the best published method (6.31), despite using no object instance supervision.

27 citations


Posted Content
TL;DR: In this paper, the authors explore the effects of face obfuscation on the popular ImageNet challenge visual recognition benchmark and demonstrate that face blurring has minimal impact on the accuracy of recognition models.
Abstract: Face obfuscation (blurring, mosaicing, etc.) has been shown to be effective for privacy protection; nevertheless, object recognition research typically assumes access to complete, unobfuscated images. In this paper, we explore the effects of face obfuscation on the popular ImageNet challenge visual recognition benchmark. Most categories in the ImageNet challenge are not people categories; however, many incidental people appear in the images, and their privacy is a concern. We first annotate faces in the dataset. Then we demonstrate that face blurring -- a typical obfuscation technique -- has minimal impact on the accuracy of recognition models. Concretely, we benchmark multiple deep neural networks on face-blurred images and observe that the overall recognition accuracy drops only slightly (no more than 0.68%). Further, we experiment with transfer learning to 4 downstream tasks (object recognition, scene recognition, face attribute classification, and object detection) and show that features learned on face-blurred images are equally transferable. Our work demonstrates the feasibility of privacy-aware visual recognition, improves the highly-used ImageNet challenge benchmark, and suggests an important path for future visual datasets. Data and code are available at this https URL.

21 citations


Posted Content
Ankit Goyal1, Hei Law1, Bowei Liu1, Alejandro Newell1, Jia Deng1 
TL;DR: PointNet++ as mentioned in this paper uses auxiliary factors like different evaluation schemes, data augmentation strategies, and loss functions, which are independent of the model architecture, to make a large difference in performance.
Abstract: Processing point cloud data is an important component of many real-world systems. As such, a wide variety of point-based approaches have been proposed, reporting steady benchmark improvements over time. We study the key ingredients of this progress and uncover two critical results. First, we find that auxiliary factors like different evaluation schemes, data augmentation strategies, and loss functions, which are independent of the model architecture, make a large difference in performance. The differences are large enough that they obscure the effect of architecture. When these factors are controlled for, PointNet++, a relatively older network, performs competitively with recent methods. Second, a very simple projection-based method, which we refer to as SimpleView, performs surprisingly well. It achieves on par or better results than sophisticated state-of-the-art methods on ModelNet40 while being half the size of PointNet++. It also outperforms state-of-the-art methods on ScanObjectNN, a real-world point cloud benchmark, and demonstrates better cross-dataset generalization. Code is available at this https URL.

6 citations


Posted Content
TL;DR: In this paper, the authors propose a method to dynamically grow a GAN during training, optimizing the network architecture and its parameters together with automation, which is an interleaving step with gradient-based training to periodically seek the optimal architecture growing strategy for the generator and discriminator.
Abstract: Recent work introduced progressive network growing as a promising way to ease the training for large GANs, but the model design and architecture-growing strategy still remain under-explored and needs manual design for different image data In this paper, we propose a method to dynamically grow a GAN during training, optimizing the network architecture and its parameters together with automation The method embeds architecture search techniques as an interleaving step with gradient-based training to periodically seek the optimal architecture-growing strategy for the generator and discriminator It enjoys the benefits of both eased training because of progressive growing and improved performance because of broader architecture design space Experimental results demonstrate new state-of-the-art of image generation Observations in the search procedure also provide constructive insights into the GAN model design such as generator-discriminator balance and convolutional layer choices

4 citations


Proceedings Article
Ankit Goyal1, Hei Law1, Bowei Liu1, Alejandro Newell1, Jia Deng1 
04 May 2021
TL;DR: PointNet++ as mentioned in this paper uses auxiliary factors like different evaluation schemes, data augmentation strategies, and loss functions, which are independent of the model architecture, to make a large difference in performance.
Abstract: Processing point cloud data is an important component of many real-world systems. As such, a wide variety of point-based approaches have been proposed, reporting steady benchmark improvements over time. We study the key ingredients of this progress and uncover two critical results. First, we find that auxiliary factors like different evaluation schemes, data augmentation strategies, and loss functions, which are independent of the model architecture, make a large difference in performance. The differences are large enough that they obscure the effect of architecture. When these factors are controlled for, PointNet++, a relatively older network, performs competitively with recent methods. Second, a very simple projection-based method, which we refer to as SimpleView, performs surprisingly well. It achieves on par or better results than sophisticated state-of-the-art methods on ModelNet40, while being half the size of PointNet++. It also outperforms state-of-the-art methods on ScanObjectNN, a real-world point cloud benchmark, and demonstrates better cross-dataset generalization.

2 citations


Posted Content
TL;DR: In this paper, the authors use parallel subnetworks instead of stacking one layer after another, which helps effectively reduce depth while maintaining high performance and achieves state-of-the-art performance.
Abstract: Depth is the hallmark of deep neural networks. But more depth means more sequential computation and higher latency. This begs the question -- is it possible to build high-performing "non-deep" neural networks? We show that it is. To do so, we use parallel subnetworks instead of stacking one layer after another. This helps effectively reduce depth while maintaining high performance. By utilizing parallel substructures, we show, for the first time, that a network with a depth of just 12 can achieve top-1 accuracy over 80% on ImageNet, 96% on CIFAR10, and 81% on CIFAR100. We also show that a network with a low-depth (12) backbone can achieve an AP of 48% on MS-COCO. We analyze the scaling rules for our design and show how to increase performance without changing the network's depth. Finally, we provide a proof of concept for how non-deep networks could be used to build low-latency recognition systems. Code is available at this https URL

1 citations


Posted Content
TL;DR: DROID-SLAM as mentioned in this paper is a deep learning-based SLAM system that consists of recurrent iterative updates of camera pose and pixelwise depth through a Dense Bundle Adjustment layer.
Abstract: We introduce DROID-SLAM, a new deep learning based SLAM system. DROID-SLAM consists of recurrent iterative updates of camera pose and pixelwise depth through a Dense Bundle Adjustment layer. DROID-SLAM is accurate, achieving large improvements over prior work, and robust, suffering from substantially fewer catastrophic failures. Despite training on monocular video, it can leverage stereo or RGB-D video to achieve improved performance at test time. The URL to our open source code is this https URL.

1 citations


Proceedings Article
18 May 2021
TL;DR: In this article, the authors propose a method to dynamically grow a GAN during training, optimizing the network architecture and its parameters together with automation, which enjoys the benefits of both eased training because of progressive growing and improved performance because of broader architecture design space.
Abstract: Recent work introduced progressive network growing as a promising way to ease the training for large GANs, but the model design and architecture-growing strategy still remain under-explored and needs manual design for different image data. In this paper, we propose a method to dynamically grow a GAN during training, optimizing the network architecture and its parameters together with automation. The method embeds architecture search techniques as an interleaving step with gradient-based training to periodically seek the optimal architecture-growing strategy for the generator and discriminator. It enjoys the benefits of both eased training because of progressive growing and improved performance because of broader architecture design space. Experimental results demonstrate new state-of-the-art of image generation. Observations in the search procedure also provide constructive insights into the GAN model design such as generator-discriminator balance and convolutional layer choices.

1 citations


Posted Content
TL;DR: In this paper, a new deep architecture for rectified stereo based on the optical flow network RAFT is introduced, which more efficiently propagates information across the image and ranks first on the Middlebury leaderboard, outperforming the next best method on 1px error.
Abstract: We introduce RAFT-Stereo, a new deep architecture for rectified stereo based on the optical flow network RAFT. We introduce multi-level convolutional GRUs, which more efficiently propagate information across the image. A modified version of RAFT-Stereo can perform accurate real-time inference. RAFT-stereo ranks first on the Middlebury leaderboard, outperforming the next best method on 1px error by 29% and outperforms all published work on the ETH3D two-view stereo benchmark. Code is available at this https URL.

Posted Content
TL;DR: PyTorch as mentioned in this paper performs backpropagation in the tangent spaces of manifolds by exploiting the group structure of 3D transformations and performs back propagation in the groups SO(3), SE(3) and Sim(3).
Abstract: We address the problem of performing backpropagation for computation graphs involving 3D transformation groups SO(3), SE(3), and Sim(3). 3D transformation groups are widely used in 3D vision and robotics, but they do not form vector spaces and instead lie on smooth manifolds. The standard backpropagation approach, which embeds 3D transformations in Euclidean spaces, suffers from numerical difficulties. We introduce a new library, which exploits the group structure of 3D transformations and performs backpropagation in the tangent spaces of manifolds. We show that our approach is numerically more stable, easier to implement, and beneficial to a diverse set of tasks. Our plug-and-play PyTorch library is available at https://github.com/princeton-vl/lietorch.

Posted Content
TL;DR: This article proposed MetaQNL, a quasi-natural language that can express both formal logic and natural language sentences, and MetaInduce, a learning algorithm that induces metaQNL rules from training data consisting of questions and answers with or without intermediate reasoning steps.
Abstract: Symbolic reasoning, rule-based symbol manipulation, is a hallmark of human intelligence. However, rule-based systems have had limited success competing with learning-based systems outside formalized domains such as automated theorem proving. We hypothesize that this is due to the manual construction of rules in past attempts. In this work, we ask how we can build a rule-based system that can reason with natural language input but without the manual construction of rules. We propose MetaQNL, a "Quasi-Natural" language that can express both formal logic and natural language sentences, and MetaInduce, a learning algorithm that induces MetaQNL rules from training data consisting of questions and answers, with or without intermediate reasoning steps. Our approach achieves state-of-the-art accuracy on multiple reasoning benchmarks; it learns compact models with much less data and produces not only answers but also checkable proofs. Further, experiments on a real-world morphological analysis benchmark show that it is possible for our method to handle noise and ambiguity. Code will be released at this https URL.