Reconstructing vehicles from a single image: Shape priors for road scene understanding

doi:10.1109/ICRA.2017.7989089

Proceedings ArticleDOI

Reconstructing vehicles from a single image: Shape priors for road scene understanding

- pp 724-731

TLDR

Though the problem appears to be ill-posed, it is demonstrated that prior knowledge about how 3D shapes of vehicles project to an image can be used to reason about the reverse process, i.e., how shapes (back-)project from 2D to 3D.

Abstract:

We present an approach for reconstructing vehicles from a single (RGB) image, in the context of autonomous driving. Though the problem appears to be ill-posed, we demonstrate that prior knowledge about how 3D shapes of vehicles project to an image can be used to reason about the reverse process, i.e., how shapes (back-)project from 2D to 3D. We encode this knowledge in shape priors, which are learnt over a small keypoint-annotated dataset. We then formulate a shape-aware adjustment problem that uses the learnt shape priors to recover the 3D pose and shape of a query object from an image. For shape representation and inference, we leverage recent successes of Convolutional Neural Networks (CNNs) for the task of object and keypoint localization, and train a novel cascaded fully-convolutional architecture to localize vehicle keypoints in images. The shape-aware adjustment then robustly recovers shape (3D locations of the detected keypoints) while simultaneously filling in occluded keypoints. To tackle estimation errors incurred due to erroneously detected keypoints, we use an Iteratively Re-weighted Least Squares (IRLS) scheme for robust optimization, and as a by-product characterize noise models for each predicted keypoint. We evaluate our approach on autonomous driving benchmarks, and present superior results to existing monocular, as well as stereo approaches.

Reconstructing vehicles from a single image: Shape priors for road scene understanding

Citations

Stereo R-CNN Based 3D Object Detection for Autonomous Driving

6-DoF object pose from semantic keypoints

Disentangling Monocular 3D Object Detection

CubeSLAM: Monocular 3-D Object SLAM

MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships

References

Caffe: Convolutional Architecture for Fast Feature Embedding

Are we ready for autonomous driving? The KITTI vision benchmark suite

Deep Convolutional Network Cascade for Facial Point Detection

Monocular 3D Object Detection for Autonomous Driving

Articulated Human Detection with Flexible Mixtures of Parts

Related Papers (5)

Are we ready for autonomous driving? The KITTI vision benchmark suite

Monocular 3D Object Detection for Autonomous Driving

Multi-view 3D Object Detection Network for Autonomous Driving

Mask R-CNN

Stacked Hourglass Networks for Human Pose Estimation