scispace - formally typeset
Proceedings ArticleDOI

Dense 3D semantic mapping of indoor scenes from RGB-D images

TLDR
A novel 2D-3D label transfer based on Bayesian updates and dense pairwise 3D Conditional Random Fields and it is shown that it is not needed to obtain a semantic segmentation for every frame in a sequence in order to create accurate semantic 3D reconstructions.
Abstract
Dense semantic segmentation of 3D point clouds is a challenging task. Many approaches deal with 2D semantic segmentation and can obtain impressive results. With the availability of cheap RGB-D sensors the field of indoor semantic segmentation has seen a lot of progress. Still it remains unclear how to deal with 3D semantic segmentation in the best way. We propose a novel 2D-3D label transfer based on Bayesian updates and dense pairwise 3D Conditional Random Fields. This approach allows us to use 2D semantic segmentations to create a consistent 3D semantic reconstruction of indoor scenes. To this end, we also propose a fast 2D semantic segmentation approach based on Randomized Decision Forests. Furthermore, we show that it is not needed to obtain a semantic segmentation for every frame in a sequence in order to create accurate semantic 3D reconstructions. We evaluate our approach on both NYU Depth datasets and show that we can obtain a significant speed-up compared to other methods.

read more

Citations
More filters
Journal ArticleDOI

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

TL;DR: Quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures, including FCN and DeconvNet.
Proceedings ArticleDOI

ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes

TL;DR: This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks.
Proceedings ArticleDOI

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture

TL;DR: This paper addresses three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling using a multiscale convolutional network that is able to adapt easily to each task using only small modifications.
Proceedings ArticleDOI

3D Semantic Parsing of Large-Scale Indoor Spaces

TL;DR: This paper argues that identification of structural elements in indoor spaces is essentially a detection problem, rather than segmentation which is commonly used, and proposes a method for semantic parsing the 3D point cloud of an entire building using a hierarchical approach.
Posted Content

ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes

TL;DR: The ScanNet dataset as discussed by the authors contains 2.5M RGB-D views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations.
References
More filters
Journal ArticleDOI

Extremely randomized trees

TL;DR: A new tree-based ensemble method for supervised classification and regression problems that consists of randomizing strongly both attribute and cut-point choice while splitting a tree node and builds totally randomized trees whose structures are independent of the output values of the learning sample.
Book ChapterDOI

Indoor segmentation and support inference from RGBD images

TL;DR: The goal is to parse typical, often messy, indoor scenes into floor, walls, supporting surfaces, and object regions, and to recover support relationships, to better understand how 3D cues can best inform a structured 3D interpretation.
Proceedings ArticleDOI

Real-time human pose recognition in parts from single depth images

TL;DR: This work takes an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem, and generates confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes.
Proceedings Article

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

TL;DR: This paper considers fully connected CRF models defined on the complete set of pixels in an image and proposes a highly efficient approximate inference algorithm in which the pairwise edge potentials are defined by a linear combination of Gaussian kernels.
Proceedings ArticleDOI

A benchmark for the evaluation of RGB-D SLAM systems

TL;DR: A large set of image sequences from a Microsoft Kinect with highly accurate and time-synchronized ground truth camera poses from a motion capture system is recorded for the evaluation of RGB-D SLAM systems.
Related Papers (5)