Proceedings ArticleDOI
Reconstructing vehicles from a single image: Shape priors for road scene understanding
J. Krishna Murthy,G. V. Sai Krishna,Falak Chhaya,K. Madhava Krishna +3 more
- pp 724-731
TLDR
Though the problem appears to be ill-posed, it is demonstrated that prior knowledge about how 3D shapes of vehicles project to an image can be used to reason about the reverse process, i.e., how shapes (back-)project from 2D to 3D.Abstract:
We present an approach for reconstructing vehicles from a single (RGB) image, in the context of autonomous driving. Though the problem appears to be ill-posed, we demonstrate that prior knowledge about how 3D shapes of vehicles project to an image can be used to reason about the reverse process, i.e., how shapes (back-)project from 2D to 3D. We encode this knowledge in shape priors, which are learnt over a small keypoint-annotated dataset. We then formulate a shape-aware adjustment problem that uses the learnt shape priors to recover the 3D pose and shape of a query object from an image. For shape representation and inference, we leverage recent successes of Convolutional Neural Networks (CNNs) for the task of object and keypoint localization, and train a novel cascaded fully-convolutional architecture to localize vehicle keypoints in images. The shape-aware adjustment then robustly recovers shape (3D locations of the detected keypoints) while simultaneously filling in occluded keypoints. To tackle estimation errors incurred due to erroneously detected keypoints, we use an Iteratively Re-weighted Least Squares (IRLS) scheme for robust optimization, and as a by-product characterize noise models for each predicted keypoint. We evaluate our approach on autonomous driving benchmarks, and present superior results to existing monocular, as well as stereo approaches.read more
Citations
More filters
Proceedings ArticleDOI
Stereo R-CNN Based 3D Object Detection for Autonomous Driving
TL;DR: Stereo R-CNN as mentioned in this paper proposes a 3D object detection method for autonomous driving by fully exploiting the sparse and dense, semantic and geometry information in stereo imagery, which adds extra branches after stereo Region Proposal Network (RPN) to predict sparse keypoints, viewpoints and object dimensions, which are combined with 2D left-right boxes to calculate a coarse 3D bounding box.
Proceedings ArticleDOI
6-DoF object pose from semantic keypoints
TL;DR: In this paper, the authors combine semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model to estimate the continuous 6-DoF pose of an object from a single RGB image.
Proceedings ArticleDOI
Disentangling Monocular 3D Object Detection
TL;DR: In this paper, a disentangling transformation for 2D and 3D detection losses and a self-supervised confidence score for 3D bounding boxes is proposed for monocular 3D object detection.
Journal ArticleDOI
CubeSLAM: Monocular 3-D Object SLAM
Shichao Yang,Sebastian Scherer +1 more
TL;DR: The SLAM method achieves the state-of-the-art monocular camera pose estimation and at the same time, improves the 3-D object detection accuracy.
Proceedings ArticleDOI
MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships
TL;DR: This work proposes a novel method to improve the monocular 3D object detection by considering the relationship of paired samples, which allows us to encode spatial constraints for partially-occluded objects from their adjacent neighbors.
References
More filters
Posted Content
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia,Evan Shelhamer,Jeff Donahue,Sergey Karayev,Jonathan Long,Ross Girshick,Sergio Guadarrama,Trevor Darrell +7 more
TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Proceedings ArticleDOI
Are we ready for autonomous driving? The KITTI vision benchmark suite
TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.
Proceedings ArticleDOI
Deep Convolutional Network Cascade for Facial Point Detection
Yi Sun,Xiaogang Wang,Xiaoou Tang +2 more
TL;DR: The proposed approach outperforms state-of-the-art methods in both detection accuracy and reliability and can avoid local minimum caused by ambiguity and data corruption in difficult image samples due to occlusions, large pose variations, and extreme lightings.
Proceedings ArticleDOI
Monocular 3D Object Detection for Autonomous Driving
TL;DR: This work proposes an energy minimization approach that places object candidates in 3D using the fact that objects should be on the ground-plane, and achieves the best detection performance on the challenging KITTI benchmark, among published monocular competitors.
Journal ArticleDOI
Articulated Human Detection with Flexible Mixtures of Parts
Yi Yang,Deva Ramanan +1 more
TL;DR: A general, flexible mixture model that jointly captures spatial relations between part locations and co-occurrence Relations between part mixtures, augmenting standard pictorial structure models that encode just spatial relations.