Showing papers on "Image segmentation published in 2018"

PDF

Open Access

Proceedings Article•DOI•

MobileNetV2: Inverted Residuals and Linear Bottlenecks

[...]

Mark Sandler¹, Andrew Howard¹, Menglong Zhu¹, Andrey Zhmoginov¹, Liang-Chieh Chen¹ - Show less +1 more•Institutions (1)

18 Jun 2018

TL;DR: MobileNetV2 as mentioned in this paper is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers and intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity.

...read moreread less

Abstract: In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on ImageNet [1] classification, COCO object detection [2], VOC image segmentation [3]. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as actual latency, and the number of parameters.

...read moreread less

9,381 citations

Posted Content•

MobileNetV2: Inverted Residuals and Linear Bottlenecks

[...]

Mark Sandler¹, Andrew Howard¹, Menglong Zhu¹, Andrey Zhmoginov¹, Liang-Chieh Chen¹ - Show less +1 more•Institutions (1)

Google¹

13 Jan 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: A new mobile architecture, MobileNetV2, is described that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes and allows decoupling of the input/output domains from the expressiveness of the transformation.

...read moreread less

Abstract: In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. The MobileNetV2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input an MobileNetV2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on Imagenet classification, COCO object detection, VOC image segmentation. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as the number of parameters

...read moreread less

8,807 citations

Proceedings Article•DOI•

Non-local Neural Networks

[...]

Xiaolong Wang¹, Ross Girshick¹, Abhinav Gupta², Kaiming He¹•Institutions (2)

Facebook¹, Carnegie Mellon University²

18 Jun 2018

TL;DR: In this article, the non-local operation computes the response at a position as a weighted sum of the features at all positions, which can be used to capture long-range dependencies.

...read moreread less

Abstract: Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method [4] in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our nonlocal models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code will be made available.

...read moreread less

8,059 citations

Proceedings Article•DOI•

Path Aggregation Network for Instance Segmentation

[...]

Shu Liu¹, Lu Qi¹, Haifang Qin², Jianping Shi³, Jiaya Jia¹ - Show less +1 more•Institutions (3)

The Chinese University of Hong Kong¹, Peking University², SenseTime³

18 Jun 2018

TL;DR: PANet as mentioned in this paper enhances the entire feature hierarchy with accurate localization signals in lower layers by bottom-up path augmentation, which shortens the information path between lower layers and topmost feature.

...read moreread less

Abstract: The way that information propagates in neural networks is of great importance. In this paper, we propose Path Aggregation Network (PANet) aiming at boosting information flow in proposal-based instance segmentation framework. Specifically, we enhance the entire feature hierarchy with accurate localization signals in lower layers by bottom-up path augmentation, which shortens the information path between lower layers and topmost feature. We present adaptive feature pooling, which links feature grid and all feature levels to make useful information in each level propagate directly to following proposal subnetworks. A complementary branch capturing different views for each proposal is created to further improve mask prediction. These improvements are simple to implement, with subtle extra computational overhead. Yet they are useful and make our PANet reach the 1st place in the COCO 2017 Challenge Instance Segmentation task and the 2nd place in Object Detection task without large-batch training. PANet is also state-of-the-art on MVD and Cityscapes.

...read moreread less

3,784 citations

Proceedings Article•DOI•

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

[...]

Ting-Chun Wang¹, Ming-Yu Liu¹, Jun-Yan Zhu², Andrew Tao¹, Jan Kautz¹, Bryan Catanzaro¹ - Show less +2 more•Institutions (2)

Nvidia¹, University of California, Berkeley²

18 Jun 2018

TL;DR: In this paper, a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs) is presented.

...read moreread less

Abstract: We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). Conditional GANs have enabled a variety of applications, but the results are often limited to low-resolution and still far from realistic. In this work, we generate 2048 A— 1024 visually appealing results with a novel adversarial loss, as well as new multi-scale generator and discriminator architectures. Furthermore, we extend our framework to interactive visual manipulation with two additional features. First, we incorporate object instance segmentation information, which enables object manipulations such as removing/adding objects and changing the object category. Second, we propose a method to generate diverse results given the same input, allowing users to edit the object appearance interactively. Human opinion studies demonstrate that our method significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing.

...read moreread less

3,457 citations

Posted Content•

Attention U-Net: Learning Where to Look for the Pancreas

[...]

Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew C. H. Lee, Mattias P. Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y. Hammerla, Bernhard Kainz, Ben Glocker, Daniel Rueckert - Show less +8 more

11 Apr 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes is proposed to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs).

...read moreread less

Abstract: We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed Attention U-Net architecture is evaluated on two large CT abdominal datasets for multi-class image segmentation. Experimental results show that AGs consistently improve the prediction performance of U-Net across different datasets and training sizes while preserving computational efficiency. The code for the proposed architecture is publicly available.

...read moreread less

2,452 citations

Posted Content•

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

[...]

Zongwei Zhou¹, Mahfuzur Rahman Siddiquee¹, Nima Tajbakhsh¹, Jianming Liang¹•Institutions (1)

Arizona State University¹

18 Jul 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper presents UNet++, a new, more powerful architecture for medical image segmentation where the encoder and decoder sub-networks are connected through a series of nested, dense skip pathways, and argues that the optimizer would deal with an easier learning task when the feature maps from the decoder and encoder networks are semantically similar.

...read moreread less

Abstract: In this paper, we present UNet++, a new, more powerful architecture for medical image segmentation. Our architecture is essentially a deeply-supervised encoder-decoder network where the encoder and decoder sub-networks are connected through a series of nested, dense skip pathways. The re-designed skip pathways aim at reducing the semantic gap between the feature maps of the encoder and decoder sub-networks. We argue that the optimizer would deal with an easier learning task when the feature maps from the decoder and encoder networks are semantically similar. We have evaluated UNet++ in comparison with U-Net and wide U-Net architectures across multiple medical image segmentation tasks: nodule segmentation in the low-dose CT scans of chest, nuclei segmentation in the microscopy images, liver segmentation in abdominal CT scans, and polyp segmentation in colonoscopy videos. Our experiments demonstrate that UNet++ with deep supervision achieves an average IoU gain of 3.9 and 3.4 points over U-Net and wide U-Net, respectively.

...read moreread less

2,254 citations

Book Chapter•DOI•

Unet++: A nested u-net architecture for medical image segmentation

[...]

Zongwei Zhou¹, Mahfuzur Rahman Siddiquee¹, Nima Tajbakhsh¹, Jianming Liang¹•Institutions (1)

Arizona State University¹

20 Sep 2018

TL;DR: UNet++ as discussed by the authors is a deeply-supervised encoder-decoder network where the encoder and decoder sub-networks are connected through a series of nested, dense skip pathways.

...read moreread less

2,067 citations

Journal Article•DOI•

Road Extraction by Deep Residual U-Net

[...]

Zhengxin Zhang¹, Qingjie Liu¹, Yunhong Wang¹•Institutions (1)

Beihang University¹

08 Mar 2018-IEEE Geoscience and Remote Sensing Letters

TL;DR: A semantic segmentation neural network, which combines the strengths of residual learning and U-Net, is proposed for road area extraction, which outperforms all the comparing methods and demonstrates its superiority over recently developed state of the arts methods.

...read moreread less

Abstract: Road extraction from aerial images has been a hot research topic in the field of remote sensing image analysis. In this letter, a semantic segmentation neural network, which combines the strengths of residual learning and U-Net, is proposed for road area extraction. The network is built with residual units and has similar architecture to that of U-Net. The benefits of this model are twofold: first, residual units ease training of deep networks. Second, the rich skip connections within the network could facilitate information propagation, allowing us to design networks with fewer parameters, however, better performance. We test our network on a public road data set and compare it with U-Net and other two state-of-the-art deep-learning-based road extraction methods. The proposed approach outperforms all the comparing methods, which demonstrates its superiority over recently developed state of the arts.

...read moreread less

1,564 citations

Journal Article•DOI•

H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation From CT Volumes

[...]

Xiaomeng Li¹, Hao Chen¹, Xiaojuan Qi¹, Qi Dou¹, Chi-Wing Fu¹, Pheng-Ann Heng¹ - Show less +2 more•Institutions (1)

The Chinese University of Hong Kong¹

11 Jun 2018-IEEE Transactions on Medical Imaging

TL;DR: This work proposes a novel hybrid densely connected UNet (H-DenseUNet), which consists of a 2-D Dense UNet for efficiently extracting intra-slice features and a 3-D counterpart for hierarchically aggregating volumetric contexts under the spirit of the auto-context algorithm for liver and tumor segmentation.

...read moreread less

Abstract: Liver cancer is one of the leading causes of cancer death. To assist doctors in hepatocellular carcinoma diagnosis and treatment planning, an accurate and automatic liver and tumor segmentation method is highly demanded in the clinical practice. Recently, fully convolutional neural networks (FCNs), including 2-D and 3-D FCNs, serve as the backbone in many volumetric image segmentation. However, 2-D convolutions cannot fully leverage the spatial information along the third dimension while 3-D convolutions suffer from high computational cost and GPU memory consumption. To address these issues, we propose a novel hybrid densely connected UNet (H-DenseUNet), which consists of a 2-D DenseUNet for efficiently extracting intra-slice features and a 3-D counterpart for hierarchically aggregating volumetric contexts under the spirit of the auto-context algorithm for liver and tumor segmentation. We formulate the learning process of the H-DenseUNet in an end-to-end manner, where the intra-slice representations and inter-slice features can be jointly optimized through a hybrid feature fusion layer. We extensively evaluated our method on the data set of the MICCAI 2017 Liver Tumor Segmentation Challenge and 3DIRCADb data set. Our method outperformed other state-of-the-arts on the segmentation results of tumors and achieved very competitive performance for liver segmentation even with a single model.

...read moreread less

1,561 citations

Proceedings Article•DOI•

Maximum Classifier Discrepancy for Unsupervised Domain Adaptation

[...]

Kuniaki Saito¹, Kohei Watanabe¹, Yoshitaka Ushiku¹, Tatsuya Harada¹•Institutions (1)

University of Tokyo¹

18 Jun 2018

TL;DR: MCD-DA as discussed by the authors aligns distributions of source and target by utilizing the task-specific decision boundaries between classes to detect target samples that are far from the support of the source.

...read moreread less

Abstract: In this work, we present a method for unsupervised domain adaptation. Many adversarial learning methods train domain classifier networks to distinguish the features as either a source or target and train a feature generator network to mimic the discriminator. Two problems exist with these methods. First, the domain classifier only tries to distinguish the features as a source or target and thus does not consider task-specific decision boundaries between classes. Therefore, a trained generator can generate ambiguous features near class boundaries. Second, these methods aim to completely match the feature distributions between different domains, which is difficult because of each domain's characteristics. To solve these problems, we introduce a new approach that attempts to align distributions of source and target by utilizing the task-specific decision boundaries. We propose to maximize the discrepancy between two classifiers' outputs to detect target samples that are far from the support of the source. A feature generator learns to generate target features near the support to minimize the discrepancy. Our method outperforms other methods on several datasets of image classification and semantic segmentation. The codes are available at https://github.com/mil-tokyo/MCD_DA

...read moreread less

Proceedings Article•DOI•

Learning to Adapt Structured Output Space for Semantic Segmentation

[...]

Yi-Hsuan Tsai, Wei-Chih Hung¹, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang², Manmohan Chandraker³, Manmohan Chandraker⁴ - Show less +3 more•Institutions (4)

University of California, Merced¹, University of California², NEC³, University of California, San Diego⁴

18 Jun 2018

TL;DR: In this paper, a multi-level adversarial network is proposed to perform output space domain adaptation at different feature levels, including synthetic-to-real and cross-city scenarios.

...read moreread less

Abstract: Convolutional neural network-based approaches for semantic segmentation rely on supervision with pixel-level ground truth, but may not generalize well to unseen image domains. As the labeling process is tedious and labor intensive, developing algorithms that can adapt source ground truth labels to the target domain is of great interest. In this paper, we propose an adversarial learning method for domain adaptation in the context of semantic segmentation. Considering semantic segmentations as structured outputs that contain spatial similarities between the source and target domains, we adopt adversarial learning in the output space. To further enhance the adapted model, we construct a multi-level adversarial network to effectively perform output space domain adaptation at different feature levels. Extensive experiments and ablation study are conducted under various domain adaptation settings, including synthetic-to-real and cross-city scenarios. We show that the proposed method performs favorably against the state-of-the-art methods in terms of accuracy and visual quality.

...read moreread less

Proceedings Article•DOI•

Generative Image Inpainting with Contextual Attention

[...]

Jiahui Yu¹, Zhe Lin², Jimei Yang², Xiaohui Shen², Xin Lu², Thomas S. Huang¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Adobe Systems²

18 Jun 2018

TL;DR: Yu et al. as discussed by the authors proposed a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions.

...read moreread less

Abstract: Recent deep learning based approaches have shown promising results for the challenging task of inpainting large missing regions in an image. These methods can generate visually plausible image structures and textures, but often create distorted structures or blurry textures inconsistent with surrounding areas. This is mainly due to ineffectiveness of convolutional neural networks in explicitly borrowing or copying information from distant spatial locations. On the other hand, traditional texture and patch synthesis approaches are particularly suitable when it needs to borrow textures from the surrounding regions. Motivated by these observations, we propose a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions. The model is a feedforward, fully convolutional neural network which can process images with multiple holes at arbitrary locations and with variable sizes during the test time. Experiments on multiple datasets including faces (CelebA, CelebA-HQ), textures (DTD) and natural images (ImageNet, Places2) demonstrate that our proposed approach generates higher-quality inpainting results than existing ones. Code, demo and models are available at: https://github.com/JiahuiYu/generative_inpainting.

...read moreread less

Proceedings Article•DOI•

Understanding Convolution for Semantic Segmentation

[...]

Panqu Wang, Pengfei Chen, Ye Yuan¹, Ding Liu², Zehua Huang, Xiaodi Hou, Garrison W. Cottrell³ - Show less +3 more•Institutions (3)

Carnegie Mellon University¹, University of Illinois at Urbana–Champaign², University of California, San Diego³

12 Mar 2018

TL;DR: DUC is designed to generate pixel-level prediction, which is able to capture and decode more detailed information that is generally missing in bilinear upsampling, and a hybrid dilated convolution (HDC) framework in the encoding phase is proposed.

...read moreread less

Abstract: Recent advances in deep learning, especially deep convolutional neural networks (CNNs), have led to significant improvement over previous semantic segmentation systems. Here we show how to improve pixel-wise semantic segmentation by manipulating convolution-related operations that are of both theoretical and practical value. First, we design dense upsampling convolution (DUC) to generate pixel-level prediction, which is able to capture and decode more detailed information that is generally missing in bilinear upsampling. Second, we propose a hybrid dilated convolution (HDC) framework in the encoding phase. This framework 1) effectively enlarges the receptive fields (RF) of the network to aggregate global information; 2) alleviates what we call the "gridding issue"caused by the standard dilated convolution operation. We evaluate our approaches thoroughly on the Cityscapes dataset, and achieve a state-of-art result of 80.1% mIOU in the test set at the time of submission. We also have achieved state-of-theart overall on the KITTI road estimation benchmark and the PASCAL VOC2012 segmentation task. Our source code can be found at https://github.com/TuSimple/TuSimple-DUC.

...read moreread less

Proceedings Article•DOI•

Context Encoding for Semantic Segmentation

[...]

Hang Zhang¹, Kristin J. Dana¹, Jianping Shi, Zhongyue Zhang², Xiaogang Wang³, Ambrish Tyagi², Amit Agrawal² - Show less +3 more•Institutions (3)

Rutgers University¹, Amazon.com², The Chinese University of Hong Kong³

18 Jun 2018

TL;DR: The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN, and can improve the feature representation of relatively shallow networks for the image classification on CIFAR-10 dataset.

...read moreread less

Abstract: Recent work has made significant progress in improving spatial resolution for pixelwise labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous convolution, utilizing multi-scale features and refining boundaries. In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures the semantic context of scenes and selectively highlights class-dependent featuremaps. The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN. Our approach has achieved new state-of-the-art results 51.7% mIoU on PASCAL-Context, 85.9% mIoU on PASCAL VOC 2012. Our single model achieves a final score of 0.5567 on ADE20K test set, which surpasses the winning entry of COCO-Place Challenge 2017. In addition, we also explore how the Context Encoding Module can improve the feature representation of relatively shallow networks for the image classification on CIFAR-10 dataset. Our 14 layer network has achieved an error rate of 3.45%, which is comparable with state-of-the-art approaches with over 10A— more layers. The source code for the complete system are publicly available1.

...read moreread less

Proceedings Article•DOI•

DenseASPP for Semantic Segmentation in Street Scenes

[...]

Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, Kuiyuan Yang - Show less +1 more

18 Jun 2018

TL;DR: Densely connected Atrous Spatial Pyramid Pooling (DenseASPP) is proposed, which connects a set of atrous convolutional layers in a dense way, such that it generates multi-scale features that not only cover a larger scale range, but also cover that scale range densely, without significantly increasing the model size.

...read moreread less

Abstract: Semantic image segmentation is a basic street scene understanding task in autonomous driving, where each pixel in a high resolution image is categorized into a set of semantic labels. Unlike other scenarios, objects in autonomous driving scene exhibit very large scale changes, which poses great challenges for high-level feature representation in a sense that multi-scale information must be correctly encoded. To remedy this problem, atrous convolution[14]was introduced to generate features with larger receptive fields without sacrificing spatial resolution. Built upon atrous convolution, Atrous Spatial Pyramid Pooling (ASPP)[2] was proposed to concatenate multiple atrous-convolved features using different dilation rates into a final feature representation. Although ASPP is able to generate multi-scale features, we argue the feature resolution in the scale-axis is not dense enough for the autonomous driving scenario. To this end, we propose Densely connected Atrous Spatial Pyramid Pooling (DenseASPP), which connects a set of atrous convolutional layers in a dense way, such that it generates multi-scale features that not only cover a larger scale range, but also cover that scale range densely, without significantly increasing the model size. We evaluate DenseASPP on the street scene benchmark Cityscapes[4] and achieve state-of-the-art performance.

...read moreread less

Proceedings Article•DOI•

Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs

[...]

Loic Landrieu¹, Martin Simonovsky²•Institutions (2)

University of Paris¹, École des ponts ParisTech²

18 Jun 2018

TL;DR: It is argued that the organization of 3D point clouds can be efficiently captured by a structure called superpoint graph (SPG), derived from a partition of the scanned scene into geometrically homogeneous elements.

...read moreread less

Abstract: We propose a novel deep learning-based framework to tackle the challenge of semantic segmentation of large-scale point clouds of millions of points. We argue that the organization of 3D point clouds can be efficiently captured by a structure called superpoint graph (SPG), derived from a partition of the scanned scene into geometrically homogeneous elements. SPGs offer a compact yet rich representation of contextual relationships between object parts, which is then exploited by a graph convolutional network. Our framework sets a new state of the art for segmenting outdoor LiDAR scans (+11.9 and +8.8 mIoU points for both Semantic3D test sets), as well as indoor scans (+12.4 mIoU points for the S3DIS dataset).

...read moreread less

Proceedings Article•DOI•

SuperPoint: Self-Supervised Interest Point Detection and Description

[...]

Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich

18 Jun 2018

TL;DR: In this paper, a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision is presented, which operates on full-sized images and jointly computes pixel-level interest point locations and associated descriptors in one forward pass.

...read moreread less

Abstract: This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision. As opposed to patch-based neural networks, our fully-convolutional model operates on full-sized images and jointly computes pixel-level interest point locations and associated descriptors in one forward pass. We introduce Homographic Adaptation, a multi-scale, multi-homography approach for boosting interest point detection repeatability and performing cross-domain adaptation (e.g., synthetic-to-real). Our model, when trained on the MS-COCO generic image dataset using Homographic Adaptation, is able to repeatedly detect a much richer set of interest points than the initial pre-adapted deep model and any other traditional corner detector. The final system gives rise to state-of-the-art homography estimation results on HPatches when compared to LIFT, SIFT and ORB.

...read moreread less

Proceedings Article•DOI•

LeGO-LOAM: Lightweight and Ground-Optimized Lidar Odometry and Mapping on Variable Terrain

[...]

Tixiao Shan¹, Brendan Englot¹•Institutions (1)

Stevens Institute of Technology¹

01 Oct 2018

TL;DR: A lightweight and ground-optimized lidar odometry and mapping method, LeGO-LOAM, for realtime six degree-of-freedom pose estimation with ground vehicles and integrated into a SLAM framework to eliminate the pose estimation error caused by drift is integrated.

...read moreread less

Abstract: We propose a lightweight and ground-optimized lidar odometry and mapping method, LeGO-LOAM, for realtime six degree-of-freedom pose estimation with ground vehicles. LeGO-LOAM is lightweight, as it can achieve realtime pose estimation on a low-power embedded system. LeGO-LOAM is ground-optimized, as it leverages the presence of a ground plane in its segmentation and optimization steps. We first apply point cloud segmentation to filter out noise, and feature extraction to obtain distinctive planar and edge features. A two-step Levenberg-Marquardt optimization method then uses the planar and edge features to solve different components of the six degree-of-freedom transformation across consecutive scans. We compare the performance of LeGO-LOAM with a state-of-the-art method, LOAM, using datasets gathered from variable-terrain environments with ground vehicles, and show that LeGO-LOAM achieves similar or better accuracy with reduced computational expense. We also integrate LeGO-LOAM into a SLAM framework to eliminate the pose estimation error caused by drift, which is tested using the KITTI dataset.

...read moreread less

Proceedings Article•DOI•

3D Semantic Segmentation with Submanifold Sparse Convolutional Networks

[...]

Benjamin Graham¹, Martin Engelcke², Laurens van der Maaten³•Institutions (3)

Facebook¹, University of Oxford², Carnegie Mellon University³

18 Jun 2018

TL;DR: This work introduces new sparse convolutional operations that are designed to process spatially-sparse data more efficiently, and uses them to develop Spatially-Sparse Convolutional networks, which outperform all prior state-of-the-art models on two tasks involving semantic segmentation of 3D point clouds.

...read moreread less

Abstract: Convolutional networks are the de-facto standard for analyzing spatio-temporal data such as images, videos, and 3D shapes. Whilst some of this data is naturally dense (e.g., photos), many other data sources are inherently sparse. Examples include 3D point clouds that were obtained using a LiDAR scanner or RGB-D camera. Standard "dense" implementations of convolutional networks are very inefficient when applied on such sparse data. We introduce new sparse convolutional operations that are designed to process spatially-sparse data more efficiently, and use them to develop spatially-sparse convolutional networks. We demonstrate the strong performance of the resulting models, called submanifold sparse convolutional networks (SS-CNs), on two tasks involving semantic segmentation of 3D point clouds. In particular, our models outperform all prior state-of-the-art on the test set of a recent semantic segmentation competition.

...read moreread less

Posted Content•

Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation.

[...]

Md. Zahangir Alom, Mahmudul Hasan, Chris Yakopcic, Tarek M. Taha, Vijayan K. Asari - Show less +1 more

20 Feb 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: A Recurrent Convolutional Neural Network (RCNN) based on U-Net as well as a Recurrent Residual convolutional neural Network (RRCNN), which are named RU-Net and R2U-Net respectively are proposed, which show superior performance on segmentation tasks compared to equivalent models including U-nets and residual U- net.

...read moreread less

Abstract: Deep learning (DL) based semantic segmentation methods have been providing state-of-the-art performance in the last few years. More specifically, these techniques have been successfully applied to medical image classification, segmentation, and detection tasks. One deep learning technique, U-Net, has become one of the most popular for these applications. In this paper, we propose a Recurrent Convolutional Neural Network (RCNN) based on U-Net as well as a Recurrent Residual Convolutional Neural Network (RRCNN) based on U-Net models, which are named RU-Net and R2U-Net respectively. The proposed models utilize the power of U-Net, Residual Network, as well as RCNN. There are several advantages of these proposed architectures for segmentation tasks. First, a residual unit helps when training deep architecture. Second, feature accumulation with recurrent residual convolutional layers ensures better feature representation for segmentation tasks. Third, it allows us to design better U-Net architecture with same number of network parameters with better performance for medical image segmentation. The proposed models are tested on three benchmark datasets such as blood vessel segmentation in retina images, skin cancer segmentation, and lung lesion segmentation. The experimental results show superior performance on segmentation tasks compared to equivalent models including U-Net and residual U-Net (ResU-Net).

...read moreread less

Posted Content•

Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation

[...]

Andrew Howard, Andrey Zhmoginov, Liang-Chieh Chen, Mark Sandler, Menglong Zhu - Show less +1 more

13 Jan 2018

TL;DR: A new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes is described.

...read moreread less

Abstract: In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. The MobileNetV2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input an MobileNetV2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on Imagenet classification, COCO object detection, VOC image segmentation. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as the number of parameters

...read moreread less

Journal Article•DOI•

Joint Optic Disc and Cup Segmentation Based on Multi-Label Deep Network and Polar Transformation

[...]

Huazhu Fu¹, Jun Cheng¹, Yanwu Xu, Damon Wing Kee Wong¹, Jiang Liu², Xiaochun Cao² - Show less +2 more•Institutions (2)

Agency for Science, Technology and Research¹, Chinese Academy of Sciences²

09 Jan 2018-IEEE Transactions on Medical Imaging

TL;DR: Zhang et al. as discussed by the authors proposed a multi-scale input layer, U-shape convolutional network, side-output layer, and multi-label loss function for OD and OC segmentation.

...read moreread less

Abstract: Glaucoma is a chronic eye disease that leads to irreversible vision loss. The cup to disc ratio (CDR) plays an important role in the screening and diagnosis of glaucoma. Thus, the accurate and automatic segmentation of optic disc (OD) and optic cup (OC) from fundus images is a fundamental task. Most existing methods segment them separately, and rely on hand-crafted visual feature from fundus images. In this paper, we propose a deep learning architecture, named M-Net, which solves the OD and OC segmentation jointly in a one-stage multi-label system. The proposed M-Net mainly consists of multi-scale input layer, U-shape convolutional network, side-output layer, and multi-label loss function. The multi-scale input layer constructs an image pyramid to achieve multiple level receptive field sizes. The U-shape convolutional network is employed as the main body network structure to learn the rich hierarchical representation, while the side-output layer acts as an early classifier that produces a companion local prediction map for different scale layers. Finally, a multi-label loss function is proposed to generate the final segmentation map. For improving the segmentation performance further, we also introduce the polar transformation, which provides the representation of the original image in the polar coordinate system. The experiments show that our M-Net system achieves state-of-the-art OD and OC segmentation result on ORIGA data set. Simultaneously, the proposed method also obtains the satisfactory glaucoma screening performances with calculated CDR value on both ORIGA and SCES datasets.

...read moreread less

Proceedings Article•DOI•

Image Generation from Scene Graphs

[...]

Justin Johnson¹, Agrim Gupta¹, Li Fei-Fei¹•Institutions (1)

Stanford University¹

04 Apr 2018

TL;DR: This work proposes a method for generating images from scene graphs, enabling explicitly reasoning about objects and their relationships, and validates this approach on Visual Genome and COCO-Stuff.

...read moreread less

Abstract: To truly understand the visual world our models should be able not only to recognize images but also generate them. To this end, there has been exciting recent progress on generating images from natural language descriptions. These methods give stunning results on limited domains such as descriptions of birds or flowers, but struggle to faithfully reproduce complex sentences with many objects and relationships. To overcome this limitation we propose a method for generating images from scene graphs, enabling explicitly reasoning about objects and their relationships. Our model uses graph convolution to process input graphs, computes a scene layout by predicting bounding boxes and segmentation masks for objects, and converts the layout to an image with a cascaded refinement network. The network is trained adversarially against a pair of discriminators to ensure realistic outputs. We validate our approach on Visual Genome and COCO-Stuff, where qualitative results, ablations, and user studies demonstrate our method's ability to generate complex images with multiple objects.

...read moreread less

Journal Article•DOI•

A deep learning model integrating FCNNs and CRFs for brain tumor segmentation.

[...]

Xiaomei Zhao¹, Yihong Wu¹, Guidong Song², Zhenye Li², Yazhuo Zhang², Yong Fan³ - Show less +2 more•Institutions (3)

Chinese Academy of Sciences¹, Capital Medical University², University of Pennsylvania³

01 Jan 2018-Medical Image Analysis

TL;DR: A novel brain tumor segmentation method developed by integrating fully convolutional neural networks (FCNNs) and Conditional Random Fields (CRFs) in a unified framework to obtain segmentation results with appearance and spatial consistency could segment brain images slice‐by‐slice, much faster than those based on image patches.

...read moreread less

Journal Article•DOI•

Color Balance and Fusion for Underwater Image Enhancement

[...]

Codruta O. Ancuti¹, Cosmin Ancuti¹, Christophe De Vleeschouwer², Philippe Bekaert³•Institutions (3)

Politehnica University of Timișoara¹, Université catholique de Louvain², University of Hasselt³

01 Jan 2018-IEEE Transactions on Image Processing

TL;DR: This work introduces an effective technique to enhance the images captured underwater and degraded due to the medium scattering and absorption by building on the blending of two images that are directly derived from a color-compensated and white-balanced version of the original degraded image.

...read moreread less

Abstract: We introduce an effective technique to enhance the images captured underwater and degraded due to the medium scattering and absorption. Our method is a single image approach that does not require specialized hardware or knowledge about the underwater conditions or scene structure. It builds on the blending of two images that are directly derived from a color-compensated and white-balanced version of the original degraded image. The two images to fusion, as well as their associated weight maps, are defined to promote the transfer of edges and color contrast to the output image. To avoid that the sharp weight map transitions create artifacts in the low frequency components of the reconstructed image, we also adapt a multiscale fusion strategy. Our extensive qualitative and quantitative evaluation reveals that our enhanced images and videos are characterized by better exposedness of the dark regions, improved global contrast, and edges sharpness. Our validation also proves that our algorithm is reasonably independent of the camera settings, and improves the accuracy of several image processing applications, such as image segmentation and keypoint matching.

...read moreread less

Proceedings Article•DOI•

Recovering Realistic Texture in Image Super-Resolution by Deep Spatial Feature Transform

[...]

Xintao Wang¹, Ke Yu¹, Chao Dong², Chen Change Loy¹•Institutions (2)

The Chinese University of Hong Kong¹, SenseTime²

18 Jun 2018

TL;DR: In this paper, a spatial feature transform (SFT) layer was proposed to generate affine transformation parameters for spatial-wise feature modulation in a single-image super-resolution network.

...read moreread less

Abstract: Despite that convolutional neural networks (CNN) have recently demonstrated high-quality reconstruction for single-image super-resolution (SR), recovering natural and realistic texture remains a challenging problem. In this paper, we show that it is possible to recover textures faithful to semantic classes. In particular, we only need to modulate features of a few intermediate layers in a single network conditioned on semantic segmentation probability maps. This is made possible through a novel Spatial Feature Transform (SFT) layer that generates affine transformation parameters for spatial-wise feature modulation. SFT layers can be trained end-to-end together with the SR network using the same loss function. During testing, it accepts an input image of arbitrary size and generates a high-resolution image with just a single forward pass conditioned on the categorical priors. Our final results show that an SR network equipped with SFT can generate more realistic and visually pleasing textures in comparison to state-of-the-art SRGAN [27] and EnhanceNet [38].

...read moreread less

Proceedings Article•DOI•

Mask-Guided Contrastive Attention Model for Person Re-identification

[...]

Chunfeng Song¹, Yan Huang¹, Wanli Ouyang², Liang Wang¹•Institutions (2)

Chinese Academy of Sciences¹, University of Sydney²

18 Jun 2018

TL;DR: This paper introduces the binary segmentation masks to construct synthetic RGB-Mask pairs as inputs, then designs a mask-guided contrastive attention model (MGCAM) to learn features separately from the body and background regions, and proposes a novel region-level triplet loss to restrain the features learnt from different regions.

...read moreread less

Abstract: Person Re-identification (ReID) is an important yet challenging task in computer vision. Due to the diverse background clutters, variations on viewpoints and body poses, it is far from solved. How to extract discriminative and robust features invariant to background clutters is the core problem. In this paper, we first introduce the binary segmentation masks to construct synthetic RGB-Mask pairs as inputs, then we design a mask-guided contrastive attention model (MGCAM) to learn features separately from the body and background regions. Moreover, we propose a novel region-level triplet loss to restrain the features learnt from different regions, i.e., pulling the features from the full image and body region close, whereas pushing the features from backgrounds away. We may be the first one to successfully introduce the binary mask into person ReID task and the first one to propose region-level contrastive learning. We evaluate the proposed method on three public datasets, including MARS, Market-1501 and CUHK03. Extensive experimental results show that the proposed method is effective and achieves the state-of-the-art results. Mask and code will be released upon request.

...read moreread less

Journal Article•DOI•

Interactive Medical Image Segmentation Using Deep Learning With Image-Specific Fine Tuning

[...]

Guotai Wang¹, Wenqi Li¹, Maria A. Zuluaga¹, Rosalind Pratt¹, Premal A. Patel¹, Michael Aertsen², Tom Doel¹, Anna L. David¹, Jan Deprest¹, Sebastien Ourselin¹, Tom Vercauteren¹ - Show less +7 more•Institutions (2)

University College London¹, Katholieke Universiteit Leuven²

26 Jan 2018-IEEE Transactions on Medical Imaging

TL;DR: A novel deep learning-based interactive segmentation framework by incorporating CNNs into a bounding box and scribble-based segmentation pipeline and proposing a weighted loss function considering network and interaction-based uncertainty for the fine tuning is proposed.

...read moreread less

Abstract: Convolutional neural networks (CNNs) have achieved state-of-the-art performance for automatic medical image segmentation. However, they have not demonstrated sufficiently accurate and robust results for clinical use. In addition, they are limited by the lack of image-specific adaptation and the lack of generalizability to previously unseen object classes (a.k.a. zero-shot learning). To address these problems, we propose a novel deep learning-based interactive segmentation framework by incorporating CNNs into a bounding box and scribble-based segmentation pipeline. We propose image-specific fine tuning to make a CNN model adaptive to a specific test image, which can be either unsupervised (without additional user interactions) or supervised (with additional scribbles). We also propose a weighted loss function considering network and interaction-based uncertainty for the fine tuning. We applied this framework to two applications: 2-D segmentation of multiple organs from fetal magnetic resonance (MR) slices, where only two types of these organs were annotated for training and 3-D segmentation of brain tumor core (excluding edema) and whole brain tumor (including edema) from different MR sequences, where only the tumor core in one MR sequence was annotated for training. Experimental results show that: 1) our model is more robust to segment previously unseen objects than state-of-the-art CNNs; 2) image-specific fine tuning with the proposed weighted loss function significantly improves segmentation accuracy; and 3) our method leads to accurate results with fewer user interactions and less user time than traditional interactive segmentation methods.

...read moreread less

Journal Article•DOI•

Radiomics: the facts and the challenges of image analysis

[...]

Stefania Rizzo¹, Francesca Botta¹, Sara Raimondi¹, Daniela Origgi¹, Cristiana Fanciullo², Alessio G. Morganti³, Massimo Bellomi² - Show less +3 more•Institutions (3)

European Institute of Oncology¹, University of Milan², University of Bologna³

14 Nov 2018-European Radiology Experimental

TL;DR: The major issues regarding this multi-step process, focussing in particular on challenges of the extraction of radiomic features from data sets provided by computed tomography, positron emission tomographic, and magnetic resonance imaging are summarised.

...read moreread less

Abstract: Radiomics is an emerging translational field of research aiming to extract mineable high-dimensional data from clinical images. The radiomic process can be divided into distinct steps with definable inputs and outputs, such as image acquisition and reconstruction, image segmentation, features extraction and qualification, analysis, and model building. Each step needs careful evaluation for the construction of robust and reliable models to be transferred into clinical practice for the purposes of prognosis, non-invasive disease tracking, and evaluation of disease response to treatment. After the definition of texture parameters (shape features; first-, second-, and higher-order features), we briefly discuss the origin of the term radiomics and the methods for selecting the parameters useful for a radiomic approach, including cluster analysis, principal component analysis, random forest, neural network, linear/logistic regression, and other. Reproducibility and clinical value of parameters should be firstly tested with internal cross-validation and then validated on independent external cohorts. This article summarises the major issues regarding this multi-step process, focussing in particular on challenges of the extraction of radiomic features from data sets provided by computed tomography, positron emission tomography, and magnetic resonance imaging.

...read moreread less

Collapse