Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal
Zhixin Wang,Kui Jia +1 more
- pp 1742-1749
TLDR
A novel method termed Frustum ConvNet (F-ConvNet), which aggregates point-wise features as frustum-level feature vectors, and arrays these feature vectors as a feature map for use of its subsequent component of fully convolutional network (FCN).Abstract:
In this work, we propose a novel method termed Frustum ConvNet (F-ConvNet) for amodal 3D object detection from point clouds. Given 2D region proposals in an RGB image, our method first generates a sequence of frustums for each region proposal, and uses the obtained frustums to group local points. F-ConvNet aggregates point-wise features as frustum-level feature vectors, and arrays these feature vectors as a feature map for use of its subsequent component of fully convolutional network (FCN), which spatially fuses frustum-level features and supports an end-to-end and continuous estimation of oriented boxes in the 3D space. We also propose component variants of F-ConvNet, including an FCN variant that extracts multi-resolution frustum features, and a refined use of F-ConvNet over a reduced 3D space. Careful ablation studies verify the efficacy of these component variants. F-ConvNet assumes no prior knowledge of the working 3D environment and is thus dataset-agnostic. We present experiments on both the indoor SUN-RGBD and outdoor KITTI datasets. F-ConvNet outperforms all existing methods on SUN-RGBD, and at the time of submission it outperforms all published works on the KITTI benchmark. Code has been made available at: https://github.com/zhixinwang/frustum-convnet.read more
Citations
More filters
Journal ArticleDOI
Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges
Di Feng,Christian Haase-Schutz,Lars Rosenbaum,Heinz Hertlein,Claudius Gläser,Fabian Timm,Werner Wiesbeck,Klaus Dietmayer +7 more
TL;DR: In this article, the authors systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving and provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection.
Book ChapterDOI
3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-view Spatial Feature Fusion for 3D Object Detection
TL;DR: Li et al. as discussed by the authors proposed a 3D-CVF that combines the camera and LiDAR features using the cross-view spatial feature fusion strategy, which achieved state-of-the-art performance in the KITTI benchmark.
Journal ArticleDOI
Deep Learning for Image and Point Cloud Fusion in Autonomous Driving: A Review
TL;DR: A review of recent deep-learning-based data fusion approaches that leverage both image and point cloud data processing and identifies gaps and over-looked challenges between current academic researches and real-world applications.
Journal ArticleDOI
Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review
TL;DR: Three key tasks during vision-based robotic grasping are concluded, which are object localization, object pose estimation and grasp estimation, which include 2D planar grasp methods and 6DoF grasp methods.
Proceedings ArticleDOI
Joint 3D Instance Segmentation and Object Detection for Autonomous Driving
TL;DR: A simple but practical detection framework to jointly predict the 3D BBox and instance segmentation and a Spatial Embeddings (SEs) strategy to assemble all foreground points into their corresponding object centers is proposed.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Book ChapterDOI
SSD: Single Shot MultiBox Detector
Wei Liu,Dragomir Anguelov,Dumitru Erhan,Christian Szegedy,Scott Reed,Cheng-Yang Fu,Alexander C. Berg +6 more
TL;DR: The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.
Proceedings ArticleDOI
Fast R-CNN
TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.
Proceedings ArticleDOI
Are we ready for autonomous driving? The KITTI vision benchmark suite
TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.
Proceedings ArticleDOI
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
TL;DR: This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.