Dense 3D semantic mapping of indoor scenes from RGB-D images

doi:10.1109/ICRA.2014.6907236

Proceedings ArticleDOI

Dense 3D semantic mapping of indoor scenes from RGB-D images

- pp 2631-2638

TLDR

A novel 2D-3D label transfer based on Bayesian updates and dense pairwise 3D Conditional Random Fields and it is shown that it is not needed to obtain a semantic segmentation for every frame in a sequence in order to create accurate semantic 3D reconstructions.

Abstract:

Dense semantic segmentation of 3D point clouds is a challenging task. Many approaches deal with 2D semantic segmentation and can obtain impressive results. With the availability of cheap RGB-D sensors the field of indoor semantic segmentation has seen a lot of progress. Still it remains unclear how to deal with 3D semantic segmentation in the best way. We propose a novel 2D-3D label transfer based on Bayesian updates and dense pairwise 3D Conditional Random Fields. This approach allows us to use 2D semantic segmentations to create a consistent 3D semantic reconstruction of indoor scenes. To this end, we also propose a fast 2D semantic segmentation approach based on Randomized Decision Forests. Furthermore, we show that it is not needed to obtain a semantic segmentation for every frame in a sequence in order to create accurate semantic 3D reconstructions. We evaluate our approach on both NYU Depth datasets and show that we can obtain a significant speed-up compared to other methods.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Vijay Badrinarayanan, +2 more

- 01 Dec 2017 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures, including FCN and DeconvNet.

...read moreread less

Proceedings ArticleDOI

ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes

Angela Dai, +5 more

TL;DR: This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks.

...read moreread less

Proceedings ArticleDOI

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture

David Eigen, +1 more

TL;DR: This paper addresses three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling using a multiscale convolutional network that is able to adapt easily to each task using only small modifications.

...read moreread less

Proceedings ArticleDOI

3D Semantic Parsing of Large-Scale Indoor Spaces

Iro Armeni, +6 more

TL;DR: This paper argues that identification of structural elements in indoor spaces is essentially a detection problem, rather than segmentation which is commonly used, and proposes a method for semantic parsing the 3D point cloud of an entire building using a hierarchical approach.

...read moreread less

Posted Content

ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes

Angela Dai, +5 more

- 14 Feb 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The ScanNet dataset as discussed by the authors contains 2.5M RGB-D views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Extremely randomized trees

Pierre Geurts, +2 more

- 01 Apr 2006 -

Machine Learning

TL;DR: A new tree-based ensemble method for supervised classification and regression problems that consists of randomizing strongly both attribute and cut-point choice while splitting a tree node and builds totally randomized trees whose structures are independent of the output values of the learning sample.

...read moreread less

Book ChapterDOI

Indoor segmentation and support inference from RGBD images

Nathan Silberman, +3 more

TL;DR: The goal is to parse typical, often messy, indoor scenes into floor, walls, supporting surfaces, and object regions, and to recover support relationships, to better understand how 3D cues can best inform a structured 3D interpretation.

...read moreread less

Proceedings ArticleDOI

Real-time human pose recognition in parts from single depth images

Jamie Shotton, +7 more

TL;DR: This work takes an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem, and generates confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes.

...read moreread less

Proceedings Article

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Philipp Krähenbühl, +1 more

TL;DR: This paper considers fully connected CRF models defined on the complete set of pixels in an image and proposes a highly efficient approximate inference algorithm in which the pairwise edge potentials are defined by a linear combination of Gaussian kernels.

...read moreread less

Proceedings ArticleDOI

A benchmark for the evaluation of RGB-D SLAM systems

Jrgen Sturm, +4 more

TL;DR: A large set of image sequences from a Microsoft Kinect with highly accurate and time-synchronized ground truth camera poses from a motion capture system is recorded for the evaluation of RGB-D SLAM systems.

...read moreread less

Collapse

Dense 3D semantic mapping of indoor scenes from RGB-D images

Citations

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture

3D Semantic Parsing of Large-Scale Indoor Spaces

ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes

References

Extremely randomized trees

Indoor segmentation and support inference from RGBD images

Real-time human pose recognition in parts from single depth images

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

A benchmark for the evaluation of RGB-D SLAM systems

Related Papers (5)

Indoor segmentation and support inference from RGBD images

Fully convolutional networks for semantic segmentation

KinectFusion: Real-time dense surface mapping and tracking

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes