scispace - formally typeset
Open AccessProceedings ArticleDOI

Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal

TLDR
A novel method termed Frustum ConvNet (F-ConvNet), which aggregates point-wise features as frustum-level feature vectors, and arrays these feature vectors as a feature map for use of its subsequent component of fully convolutional network (FCN).
Abstract
In this work, we propose a novel method termed Frustum ConvNet (F-ConvNet) for amodal 3D object detection from point clouds. Given 2D region proposals in an RGB image, our method first generates a sequence of frustums for each region proposal, and uses the obtained frustums to group local points. F-ConvNet aggregates point-wise features as frustum-level feature vectors, and arrays these feature vectors as a feature map for use of its subsequent component of fully convolutional network (FCN), which spatially fuses frustum-level features and supports an end-to-end and continuous estimation of oriented boxes in the 3D space. We also propose component variants of F-ConvNet, including an FCN variant that extracts multi-resolution frustum features, and a refined use of F-ConvNet over a reduced 3D space. Careful ablation studies verify the efficacy of these component variants. F-ConvNet assumes no prior knowledge of the working 3D environment and is thus dataset-agnostic. We present experiments on both the indoor SUN-RGBD and outdoor KITTI datasets. F-ConvNet outperforms all existing methods on SUN-RGBD, and at the time of submission it outperforms all published works on the KITTI benchmark. Code has been made available at: https://github.com/zhixinwang/frustum-convnet.

read more

Citations
More filters
Journal ArticleDOI

Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

TL;DR: In this article, the authors systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving and provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection.
Book ChapterDOI

3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-view Spatial Feature Fusion for 3D Object Detection

TL;DR: Li et al. as discussed by the authors proposed a 3D-CVF that combines the camera and LiDAR features using the cross-view spatial feature fusion strategy, which achieved state-of-the-art performance in the KITTI benchmark.
Journal ArticleDOI

Deep Learning for Image and Point Cloud Fusion in Autonomous Driving: A Review

TL;DR: A review of recent deep-learning-based data fusion approaches that leverage both image and point cloud data processing and identifies gaps and over-looked challenges between current academic researches and real-world applications.
Journal ArticleDOI

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

TL;DR: Three key tasks during vision-based robotic grasping are concluded, which are object localization, object pose estimation and grasp estimation, which include 2D planar grasp methods and 6DoF grasp methods.
Proceedings ArticleDOI

Joint 3D Instance Segmentation and Object Detection for Autonomous Driving

TL;DR: A simple but practical detection framework to jointly predict the 3D BBox and instance segmentation and a Spatial Embeddings (SEs) strategy to assemble all foreground points into their corresponding object centers is proposed.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Book ChapterDOI

SSD: Single Shot MultiBox Detector

TL;DR: The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.
Proceedings ArticleDOI

Fast R-CNN

TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.
Proceedings ArticleDOI

Are we ready for autonomous driving? The KITTI vision benchmark suite

TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.
Proceedings ArticleDOI

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

TL;DR: This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.
Related Papers (5)