scispace - formally typeset
Proceedings ArticleDOI

RDFNet: RGB-D Multi-level Residual Feature Fusion for Indoor Semantic Segmentation

TLDR
This paper presents a novel network that extends the core idea of residual learning to RGB-D semantic segmentation by including multi-modal feature fusion blocks and multi-level feature refinement blocks and achieves the state-of-the-art accuracy on two challenging RGB- D indoor datasets, NYUDv2 and SUNRGB-D.
Abstract
In multi-class indoor semantic segmentation using RGB-D data, it has been shown that incorporating depth feature into RGB feature is helpful to improve segmentation accuracy. However, previous studies have not fully exploited the potentials of multi-modal feature fusion, e.g., simply concatenating RGB and depth features or averaging RGB and depth score maps. To learn the optimal fusion of multimodal features, this paper presents a novel network that extends the core idea of residual learning to RGB-D semantic segmentation. Our network effectively captures multilevel RGB-D CNN features by including multi-modal feature fusion blocks and multi-level feature refinement blocks. Feature fusion blocks learn residual RGB and depth features and their combinations to fully exploit the complementary characteristics of RGB and depth data. Feature refinement blocks learn the combination of fused features from multiple levels to enable high-resolution prediction. Our network can efficiently train discriminative multi-level features from each modality end-to-end by taking full advantage of skip-connections. Our comprehensive experiments demonstrate that the proposed architecture achieves the state-of-the-art accuracy on two challenging RGB-D indoor datasets, NYUDv2 and SUN RGB-D.

read more

Citations
More filters
Posted Content

Image Segmentation Using Deep Learning: A Survey

TL;DR: A comprehensive review of recent pioneering efforts in semantic and instance segmentation, including convolutional pixel-labeling networks, encoder-decoder architectures, multiscale and pyramid-based approaches, recurrent networks, visual attention models, and generative models in adversarial settings are provided.
Journal ArticleDOI

Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks

TL;DR: In this paper, the authors investigate various methods to deal with semantic labeling of very high-resolution multi-modal remote sensing data and propose an efficient multi-scale approach to leverage both a large spatial context and the high resolution data, and investigate early and late fusion of Lidar and multispectral data.
Journal ArticleDOI

Survey on semantic segmentation using deep learning techniques

TL;DR: A survey of semantic segmentation methods by categorizing them into ten different classes according to the common concepts underlying their architectures, and providing an overview of the publicly available datasets on which they have been assessed.
Proceedings ArticleDOI

Progressively Complementarity-Aware Fusion Network for RGB-D Salient Object Detection

TL;DR: A novel complementarity-aware fusion (CA-Fuse) module when adopting the Convolutional Neural Network (CNN) and the proposed RGB-D fusion network disambiguates both cross-modal and cross-level fusion processes and enables more sufficient fusion results.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Book ChapterDOI

U-Net: Convolutional Networks for Biomedical Image Segmentation

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Related Papers (5)