Showing papers on "Image segmentation published in 2020"

PDF

Open Access

Journal Article•DOI•

[...]

Kaiming He¹, Georgia Gkioxari¹, Piotr Dollár¹, Ross Girshick¹•Institutions (1)

01 Feb 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Mask R-CNN as discussed by the authors extends Faster-RCNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition, which achieves state-of-the-art performance in instance segmentation.

...read moreread less

Abstract: We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code has been made available at: https://github.com/facebookresearch/Detectron .

...read moreread less

1,506 citations

Journal Article•DOI•

UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

[...]

Zongwei Zhou¹, Mahfuzur Rahman Siddiquee¹, Nima Tajbakhsh¹, Jianming Liang¹•Institutions (1)

Arizona State University¹

01 Jun 2020-IEEE Transactions on Medical Imaging

TL;DR: UNet++ as mentioned in this paper proposes an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision, leading to a highly flexible feature fusion scheme.

...read moreread less

Abstract: The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations: (1) their optimal depth is apriori unknown, requiring extensive architecture search or inefficient ensemble of models of varying depths; and (2) their skip connections impose an unnecessarily restrictive fusion scheme, forcing aggregation only at the same-scale feature maps of the encoder and decoder sub-networks. To overcome these two limitations, we propose UNet++, a new neural architecture for semantic and instance segmentation, by (1) alleviating the unknown network depth with an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision; (2) redesigning skip connections to aggregate features of varying semantic scales at the decoder sub-networks, leading to a highly flexible feature fusion scheme; and (3) devising a pruning scheme to accelerate the inference speed of UNet++. We have evaluated UNet++ using six different medical image segmentation datasets, covering multiple imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and electron microscopy (EM), and demonstrating that (1) UNet++ consistently outperforms the baseline models for the task of semantic segmentation across different datasets and backbone architectures; (2) UNet++ enhances segmentation quality of varying-size objects—an improvement over the fixed-depth U-Net; (3) Mask RCNN++ (Mask R-CNN with UNet++ design) outperforms the original Mask R-CNN for the task of instance segmentation; and (4) pruned UNet++ models achieve significant speedup while showing only modest performance degradation. Our implementation and pre-trained models are available at https://github.com/MrGiovanni/UNetPlusPlus .

...read moreread less

1,487 citations

Proceedings Article•DOI•

nuScenes: A Multimodal Dataset for Autonomous Driving

[...]

Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, Oscar Beijbom - Show less +6 more

14 Jun 2020

TL;DR: nuScenes as discussed by the authors is the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field of view.

...read moreread less

Abstract: Robust detection and tracking of objects is crucial for the deployment of autonomous vehicle technology. Image based benchmark datasets have driven development in computer vision tasks such as object detection, tracking and segmentation of agents in the environment. Most autonomous vehicles, however, carry a combination of cameras and range sensors such as lidar and radar. As machine learning based methods for detection and tracking become more prevalent, there is a need to train and evaluate such methods on datasets containing range sensor data along with images. In this work we present nuTonomy scenes (nuScenes), the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field of view. nuScenes comprises 1000 scenes, each 20s long and fully annotated with 3D bounding boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset. We define novel 3D detection and tracking metrics. We also provide careful dataset analysis as well as baselines for lidar and image based detection and tracking. Data, development kit and more information are available online.

...read moreread less

1,378 citations

Journal Article•DOI•

MultiResUNet : Rethinking the U-Net architecture for multimodal biomedical image segmentation.

[...]

Nabil Ibtehaz¹, M. Sohel Rahman²•Institutions (2)

Samsung¹, Bangladesh University of Engineering and Technology²

01 Jan 2020-Neural Networks

TL;DR: This work develops a novel architecture, MultiResUNet, as the potential successor to the U-Net architecture, and tests and compared it with the classical U- net on a vast repertoire of multimodal medical images.

...read moreread less

1,027 citations

Proceedings Article•DOI•

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

[...]

Qingyong Hu¹, Bo Yang¹, Linhai Xie¹, Stefano Rosa¹, Yulan Guo², Zhihua Wang¹, Niki Trigoni¹, Andrew Markham¹ - Show less +4 more•Institutions (2)

University of Oxford¹, National University of Defense Technology²

14 Jun 2020

TL;DR: This paper introduces RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds, and introduces a novel local feature aggregation module to progressively increase the receptive field for each 3D point, thereby effectively preserving geometric details.

...read moreread less

Abstract: We study the problem of efficient semantic segmentation for large-scale 3D point clouds. By relying on expensive sampling techniques or computationally heavy pre/post-processing steps, most existing approaches are only able to be trained and operate over small-scale point clouds. In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Although remarkably computation and memory efficient, random sampling can discard key features by chance. To overcome this, we introduce a novel local feature aggregation module to progressively increase the receptive field for each 3D point, thereby effectively preserving geometric details. Extensive experiments show that our RandLA-Net can process 1 million points in a single pass with up to 200x faster than existing approaches. Moreover, our RandLA-Net clearly surpasses state-of-the-art approaches for semantic segmentation on two large-scale benchmarks Semantic3D and SemanticKITTI.

...read moreread less

977 citations

Posted Content•

Image Segmentation Using Deep Learning: A Survey

[...]

Shervin Minaee, Yuri Boykov¹, Fatih Porikli², Antonio Plaza³, Nasser Kehtarnavaz⁴, Demetri Terzopoulos⁵ - Show less +2 more•Institutions (5)

University of Waterloo¹, Australian National University², University of Extremadura³, University of Texas at Dallas⁴, University of California, Los Angeles⁵

15 Jan 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A comprehensive review of recent pioneering efforts in semantic and instance segmentation, including convolutional pixel-labeling networks, encoder-decoder architectures, multiscale and pyramid-based approaches, recurrent networks, visual attention models, and generative models in adversarial settings are provided.

...read moreread less

Abstract: Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.

...read moreread less

950 citations

Proceedings Article•DOI•

UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation

[...]

Huimin Huang¹, Lanfen Lin¹, Ruofeng Tong¹, Hongjie Hu², Qiaowei Zhang², Yutaro Iwamoto³, Xian-Hua Han³, Yen-Wei Chen³, Jian Wu¹ - Show less +5 more•Institutions (3)

Zhejiang University¹, Sir Run Run Shaw Hospital², Ritsumeikan University³

04 May 2020

TL;DR: A novel UNet 3+ is proposed, which takes advantage of full-scale skip connections and deep supervisions, and can reduce the network parameters to improve the computation efficiency.

...read moreread less

Abstract: Recently, a growing interest has been seen in deep learning-based semantic segmentation. UNet, which is one of deep learning networks with an encoder-decoder architecture, is widely used in medical image segmentation. Combining multi-scale features is one of important factors for accurate segmentation. UNet++ was developed as a modified Unet by designing an architecture with nested and dense skip connections. However, it does not explore sufficient information from full scales and there is still a large room for improvement. In this paper, we propose a novel UNet 3+, which takes advantage of full-scale skip connections and deep supervisions. The full-scale skip connections incorporate low-level details with high-level semantics from feature maps in different scales; while the deep supervision learns hierarchical representations from the full-scale aggregated feature maps. The proposed method is especially benefiting for organs that appear at varying scales. In addition to accuracy improvements, the proposed UNet 3+ can reduce the network parameters to improve the computation efficiency. We further propose a hybrid loss function and devise a classification-guided module to enhance the organ boundary and reduce the over-segmentation in a non-organ image, yielding more accurate segmentation results. The effectiveness of the proposed method is demonstrated on two datasets. The code is available at: github.com/ZJUGiveLab/UNet-Version

...read moreread less

897 citations

Journal Article•DOI•

Inf-Net: Automatic COVID-19 Lung Infection Segmentation From CT Images

[...]

Deng-Ping Fan, Tao Zhou, Ge-Peng Ji¹, Yi Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, Ling Shao² - Show less +4 more•Institutions (2)

Wuhan University¹, Zayed University²

22 May 2020-IEEE Transactions on Medical Imaging

TL;DR: Li et al. as discussed by the authors proposed a COVID-19 Lung Infection Segmentation Deep Network ( Inf-Net) to automatically identify infected regions from chest CT slices, where a parallel partial decoder is used to aggregate the high-level features and generate a global map.

...read moreread less

Abstract: Coronavirus Disease 2019 (COVID-19) spread globally in early 2020, causing the world to face an existential health crisis. Automated detection of lung infections from computed tomography (CT) images offers a great potential to augment the traditional healthcare strategy for tackling COVID-19. However, segmenting infected regions from CT slices faces several challenges, including high variation in infection characteristics, and low intensity contrast between infections and normal tissues. Further, collecting a large amount of data is impractical within a short time period, inhibiting the training of a deep model. To address these challenges, a novel COVID-19 Lung Infection Segmentation Deep Network ( Inf-Net ) is proposed to automatically identify infected regions from chest CT slices. In our Inf-Net , a parallel partial decoder is used to aggregate the high-level features and generate a global map. Then, the implicit reverse attention and explicit edge-attention are utilized to model the boundaries and enhance the representations. Moreover, to alleviate the shortage of labeled data, we present a semi-supervised segmentation framework based on a randomly selected propagation strategy, which only requires a few labeled images and leverages primarily unlabeled data. Our semi-supervised framework can improve the learning ability and achieve a higher performance. Extensive experiments on our COVID-SemiSeg and real CT volumes demonstrate that the proposed Inf-Net outperforms most cutting-edge segmentation models and advances the state-of-the-art performance.

...read moreread less

633 citations

Journal Article•DOI•

Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation

[...]

Nima Tajbakhsh, Laura Jeyaseelan, Qian Li, Jeffrey N. Chiang, Zhihao Wu, Xiaowei Ding - Show less +2 more

01 Jul 2020-Medical Image Analysis

TL;DR: This article provides a detailed review of the solutions above, summarizing both the technical novelties and empirical results, and compares the benefits and requirements of the surveyed methodologies and provides recommended solutions.

...read moreread less

487 citations

Proceedings Article•DOI•

PointPainting: Sequential Fusion for 3D Object Detection

[...]

Sourabh Vora, Alex H. Lang, Bassam Helou, Oscar Beijbom

14 Jun 2020

TL;DR: PointPainting as mentioned in this paper projects lidar points into the output of an image-only semantic segmentation network and appends the class scores to each point, which can then be fed to any lidar-only method.

...read moreread less

Abstract: Camera and lidar are important sensor modalities for robotics in general and self-driving cars in particular. The sensors provide complementary information offering an opportunity for tight sensor-fusion. Surprisingly, lidar-only methods outperform fusion methods on the main benchmark datasets, suggesting a gap in the literature. In this work, we propose PointPainting: a sequential fusion method to fill this gap. PointPainting works by projecting lidar points into the output of an image-only semantic segmentation network and appending the class scores to each point. The appended (painted) point cloud can then be fed to any lidar-only method. Experiments show large improvements on three different state-of-the art methods, Point-RCNN, VoxelNet and PointPillars on the KITTI and nuScenes datasets. The painted version of PointRCNN represents a new state of the art on the KITTI leaderboard for the bird's-eye view detection task. In ablation, we study how the effects of Painting depends on the quality and format of the semantic segmentation output, and demonstrate how latency can be minimized through pipelining.

...read moreread less

486 citations

Proceedings Article•DOI•

A survey of loss functions for semantic segmentation

[...]

Shruti Jadon¹•Institutions (1)

University of Massachusetts Amherst¹

27 Oct 2020

TL;DR: A new log-cosh dice loss function is introduced and it is showcased that certain loss functions perform well across all data-sets and can be taken as a good baseline choice in unknown data distribution scenarios.

...read moreread less

Abstract: Image Segmentation has been an active field of research as it has a wide range of applications, ranging from automated disease detection to self driving cars. In the past five years, various papers came up with different objective loss functions used in different cases such as biased data, sparse segmentation, etc. In this paper, we have summarized some of the well-known loss functions widely used for Image Segmentation and listed out the cases where their usage can help in fast and better convergence of a model. Furthermore, we have also introduced a new log-cosh dice loss function and compared its performance on NBFS skull-segmentation open source data-set with widely used loss functions. We also showcased that certain loss functions perform well across all data-sets and can be taken as a good baseline choice in unknown data distribution scenarios.

...read moreread less

Journal Article•DOI•

The ApolloScape Open Dataset for Autonomous Driving and Its Application

[...]

Xinyu Huang¹, Peng Wang¹, Cheng Xinjing¹, Dingfu Zhou¹, Qichuan Geng¹, Ruigang Yang¹ - Show less +2 more•Institutions (1)

Baidu¹

01 Oct 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper provides a sensor fusion scheme integrating camera videos, consumer-grade motion sensors (GPS/IMU), and a 3D semantic map in order to achieve robust self-localization and semantic segmentation for autonomous driving.

...read moreread less

Abstract: Autonomous driving has attracted tremendous attention especially in the past few years. The key techniques for a self-driving car include solving tasks like 3D map construction, self-localization, parsing the driving road and understanding objects, which enable vehicles to reason and act. However, large scale data set for training and system evaluation is still a bottleneck for developing robust perception models. In this paper, we present the ApolloScape dataset [1] and its applications for autonomous driving. Compared with existing public datasets from real scenes, e.g., KITTI [2] or Cityscapes [3] , ApolloScape contains much large and richer labelling including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labelling, lanemark labelling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities and daytimes. For each task, it contains at lease 15x larger amount of images than SOTA datasets. To label such a complete dataset, we develop various tools and algorithms specified for each task to accelerate the labelling process, such as joint 3D-2D segment labeling, active labelling in videos etc. Depend on ApolloScape , we are able to develop algorithms jointly consider the learning and inference of multiple tasks. In this paper, we provide a sensor fusion scheme integrating camera videos, consumer-grade motion sensors (GPS/IMU), and a 3D semantic map in order to achieve robust self-localization and semantic segmentation for autonomous driving. We show that practically, sensor fusion and joint learning of multiple tasks are beneficial to achieve a more robust and accurate system. We expect our dataset and proposed relevant algorithms can support and motivate researchers for further development of multi-sensor fusion and multi-task learning in the field of computer vision.

...read moreread less

Proceedings Article•DOI•

PointRend: Image Segmentation As Rendering

[...]

Alexander Kirillov¹, Yuxin Wu¹, Kaiming He¹, Ross Girshick¹•Institutions (1)

Facebook¹

14 Jun 2020

TL;DR: PointRend as discussed by the authors proposes a point-based rendering module that performs segmentation predictions at adaptively selected locations based on an iterative subdivision algorithm, which produces crisp object boundaries in regions that are over-smoothed by previous methods.

...read moreread less

Abstract: We present a new method for efficient high-quality image segmentation of objects and scenes. By analogizing classical computer graphics methods for efficient rendering with over- and undersampling challenges faced in pixel labeling tasks, we develop a unique perspective of image segmentation as a rendering problem. From this vantage, we present the PointRend (Point-based Rendering) neural network module: a module that performs point-based segmentation predictions at adaptively selected locations based on an iterative subdivision algorithm. PointRend can be flexibly applied to both instance and semantic segmentation tasks by building on top of existing state-of-the-art models. While many concrete implementations of the general idea are possible, we show that a simple design already achieves excellent results. Qualitatively, PointRend outputs crisp object boundaries in regions that are over-smoothed by previous methods. Quantitatively, PointRend yields significant gains on COCO and Cityscapes, for both instance and semantic segmentation. PointRend's efficiency enables output resolutions that are otherwise impractical in terms of memory or computation compared to existing approaches. Code has been made available at https://github.com/facebookresearch/detectron2/tree/master/projects/PointRend.

...read moreread less

Proceedings Article•DOI•

Multi-Scale Progressive Fusion Network for Single Image Deraining

[...]

Kui Jiang¹, Zhongyuan Wang¹, Peng Yi¹, Chen Chen², Baojin Huang¹, Yimin Luo³, Jiayi Ma¹, Junjun Jiang⁴ - Show less +4 more•Institutions (4)

Wuhan University¹, University of North Carolina at Charlotte², King's College London³, Harbin Institute of Technology⁴

14 Jun 2020

TL;DR: This work explores the multi-scale collaborative representation for rain streaks from the perspective of input image scales and hierarchical deep features in a unified framework, termed multi- scale progressive fusion network (MSPFN) for single image rain streak removal.

...read moreread less

Abstract: Rain streaks in the air appear in various blurring degrees and resolutions due to different distances from their positions to the camera. Similar rain patterns are visible in a rain image as well as its multi-scale (or multi-resolution) versions, which makes it possible to exploit such complementary information for rain streak representation. In this work, we explore the multi-scale collaborative representation for rain streaks from the perspective of input image scales and hierarchical deep features in a unified framework, termed multi-scale progressive fusion network (MSPFN) for single image rain streak removal. For the similar rain streaks at different positions, we employ recurrent calculation to capture the global texture, thus allowing to explore the complementary and redundant information at the spatial dimension to characterize target rain streaks. Besides, we construct multi-scale pyramid structure, and further introduce the attention mechanism to guide the fine fusion of these correlated information from different scales. This multi-scale progressive fusion strategy not only promotes the cooperative representation, but also boosts the end-to-end training. Our proposed method is extensively evaluated on several benchmark datasets and achieves the state-of-the-art results. Moreover, we conduct experiments on joint deraining, detection, and segmentation tasks, and inspire a new research direction of vision task driven image deraining. The source code is available at https://github.com/kuihua/MSPFN.

...read moreread less

Proceedings Article•DOI•

Uninformed Students: Student-Teacher Anomaly Detection With Discriminative Latent Embeddings

[...]

Paul Bergmann, Michael Fauser, David Sattlegger, Carsten Steger

14 Jun 2020

TL;DR: A powerful student-teacher framework for the challenging problem of unsupervised anomaly detection and pixel-precise anomaly segmentation in high-resolution images by trained to regress the output of a descriptive teacher network that was pretrained on a large dataset of patches from natural images.

...read moreread less

Abstract: We introduce a powerful student-teacher framework for the challenging problem of unsupervised anomaly detection and pixel-precise anomaly segmentation in high-resolution images. Student networks are trained to regress the output of a descriptive teacher network that was pretrained on a large dataset of patches from natural images. This circumvents the need for prior data annotation. Anomalies are detected when the outputs of the student networks differ from that of the teacher network. This happens when they fail to generalize outside the manifold of anomaly-free training data. The intrinsic uncertainty in the student networks is used as an additional scoring function that indicates anomalies. We compare our method to a large number of existing deep learning based methods for unsupervised anomaly detection. Our experiments demonstrate improvements over state-of-the-art methods on a number of real-world datasets, including the recently introduced MVTec Anomaly Detection dataset that was specifically designed to benchmark anomaly segmentation algorithms.

...read moreread less

Journal Article•DOI•

Multi-task deep learning based CT imaging analysis for COVID-19 pneumonia: Classification and segmentation.

[...]

Amine Amyar¹, Amine Amyar², Romain Modzelewski², Hua Li³, Su Ruan² - Show less +1 more•Institutions (3)

GE Healthcare¹, University of Rouen², University of Illinois at Urbana–Champaign³

08 Oct 2020-Computers in Biology and Medicine

TL;DR: An automatic classification segmentation tool for helping screening COVID-19 pneumonia using chest CT imaging and shows very encouraging performance with a dice coefficient higher than 0.88 for the segmentation and an area under the ROC curve higher than 97% for the classification.

...read moreread less

Proceedings Article•DOI•

PolarMask: Single Shot Instance Segmentation With Polar Representation

[...]

Enze Xie¹, Peize Sun², Xiaoge Song³, Wenhai Wang³, Xuebo Liu⁴, Ding Liang⁴, Chunhua Shen⁵, Ping Luo¹ - Show less +4 more•Institutions (5)

University of Hong Kong¹, Xi'an Jiaotong University², Nanjing University³, SenseTime⁴, University of Adelaide⁵

14 Jun 2020

TL;DR: PolarMask as discussed by the authors formulates the instance segmentation problem as predicting contour of instance through instance center classification and dense distance regression in a polar coordinate, which can be used by easily embedding it into most off-the-shelf detection methods.

...read moreread less

Abstract: In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used by easily embedding it into most off-the-shelf detection methods. Our method, termed PolarMask, formulates the instance segmentation problem as predicting contour of instance through instance center classification and dense distance regression in a polar coordinate. Moreover, we propose two effective approaches to deal with sampling high-quality center examples and optimization for dense distance regression, respectively, which can significantly improve the performance and simplify the training process. Without any bells and whistles, PolarMask achieves 32.9% in mask mAP with single-model and single-scale training/testing on the challenging COCO dataset. For the first time, we show that the complexity of instance segmentation, in terms of both design and computation complexity, can be the same as bounding box object detection and this much simpler and flexible instance segmentation framework can achieve competitive accuracy. We hope that the proposed PolarMask framework can serve as a fundamental and strong baseline for single shot instance segmentation task.

...read moreread less

Journal Article•DOI•

A Noise-Robust Framework for Automatic Segmentation of COVID-19 Pneumonia Lesions From CT Images

[...]

Guotai Wang¹, Xinglong Liu², Chaoping Li, Zhiyong Xu, Jiugen Ruan, Haifeng Zhu, Tao Meng, Kang Li³, Ning Huang², Shaoting Zhang¹ - Show less +6 more•Institutions (3)

University of Electronic Science and Technology of China¹, SenseTime², Sichuan University³

05 Jun 2020-IEEE Transactions on Medical Imaging

TL;DR: A noise-robust Dice loss that is a generalization of Dice loss for segmentation and Mean Absolute Error (MAE) loss for robustness against noise is introduced and combined with an adaptive self-ensembling framework for training.

...read moreread less

Abstract: Segmentation of pneumonia lesions from CT scans of COVID-19 patients is important for accurate diagnosis and follow-up. Deep learning has a potential to automate this task but requires a large set of high-quality annotations that are difficult to collect. Learning from noisy training labels that are easier to obtain has a potential to alleviate this problem. To this end, we propose a novel noise-robust framework to learn from noisy labels for the segmentation task. We first introduce a noise-robust Dice loss that is a generalization of Dice loss for segmentation and Mean Absolute Error (MAE) loss for robustness against noise, then propose a novel COVID-19 Pneumonia Lesion segmentation network (COPLE-Net) to better deal with the lesions with various scales and appearances. The noise-robust Dice loss and COPLE-Net are combined with an adaptive self-ensembling framework for training, where an Exponential Moving Average (EMA) of a student model is used as a teacher model that is adaptively updated by suppressing the contribution of the student to EMA when the student has a large training loss. The student model is also adaptive by learning from the teacher only when the teacher outperforms the student. Experimental results showed that: (1) our noise-robust Dice loss outperforms existing noise-robust loss functions, (2) the proposed COPLE-Net achieves higher performance than state-of-the-art image segmentation networks, and (3) our framework with adaptive self-ensembling significantly outperforms a standard training process and surpasses other noise-robust training approaches in the scenario of learning from noisy labels for COVID-19 pneumonia lesion segmentation.

...read moreread less

Journal Article•DOI•

Deep Neural Networks Motivated by Partial Differential Equations

[...]

Lars Ruthotto¹, Eldad Haber²•Institutions (2)

Emory University¹, University of British Columbia²

01 Apr 2020-Journal of Mathematical Imaging and Vision

TL;DR: In this article, a new PDE interpretation of a class of deep convolutional neural networks (CNN) was established, which are commonly used to learn from speech, image, and video data.

...read moreread less

Abstract: Partial differential equations (PDEs) are indispensable for modeling many physical phenomena and also commonly used for solving image processing tasks. In the latter area, PDE-based approaches interpret image data as discretizations of multivariate functions and the output of image processing algorithms as solutions to certain PDEs. Posing image processing problems in the infinite-dimensional setting provides powerful tools for their analysis and solution. For the last few decades, the reinterpretation of classical image processing problems through the PDE lens has been creating multiple celebrated approaches that benefit a vast area of tasks including image segmentation, denoising, registration, and reconstruction. In this paper, we establish a new PDE interpretation of a class of deep convolutional neural networks (CNN) that are commonly used to learn from speech, image, and video data. Our interpretation includes convolution residual neural networks (ResNet), which are among the most promising approaches for tasks such as image classification having improved the state-of-the-art performance in prestigious benchmark challenges. Despite their recent successes, deep ResNets still face some critical challenges associated with their design, immense computational costs and memory requirements, and lack of understanding of their reasoning. Guided by well-established PDE theory, we derive three new ResNet architectures that fall into two new classes: parabolic and hyperbolic CNNs. We demonstrate how PDE theory can provide new insights and algorithms for deep learning and demonstrate the competitiveness of three new CNN architectures using numerical experiments.

...read moreread less

Journal Article•DOI•

[...]

Xiaolin Zhang¹, Yunchao Wei¹, Yi Yang¹, Thomas S. Huang²•Institutions (2)

University of Technology, Sydney¹, University of Illinois at Urbana–Champaign²

04 Jun 2020-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This article proposes a simple yet effective similarity guidance network to tackle the one-shot (SG-One) segmentation problem, aiming at predicting the segmentation mask of a query image with the reference to one densely labeled support image of the same category.

...read moreread less

Abstract: One-shot image semantic segmentation poses a challenging task of recognizing the object regions from unseen categories with only one annotated example as supervision. In this article, we propose a simple yet effective similarity guidance network to tackle the one-shot (SG-One) segmentation problem. We aim at predicting the segmentation mask of a query image with the reference to one densely labeled support image of the same category. To obtain the robust representative feature of the support image, we first adopt a masked average pooling strategy for producing the guidance features by only taking the pixels belonging to the support image into account. We then leverage the cosine similarity to build the relationship between the guidance features and features of pixels from the query image. In this way, the possibilities embedded in the produced similarity maps can be adopted to guide the process of segmenting objects. Furthermore, our SG-One is a unified framework that can efficiently process both support and query images within one network and be learned in an end-to-end manner. We conduct extensive experiments on Pascal VOC 2012. In particular, our SG-One achieves the mIoU score of 46.3%, surpassing the baseline methods.

...read moreread less

Proceedings Article•DOI•

FDA: Fourier Domain Adaptation for Semantic Segmentation

[...]

Yanchao Yang¹, Stefano Soatto¹•Institutions (1)

University of California, Los Angeles¹

14 Jun 2020

TL;DR: A simple method for unsupervised domain adaptation, whereby the discrepancy between the source and target distributions is reduced by swapping the low-frequency spectrum of one with the other, which results indicate that even simple procedures can discount nuisance variability in the data that more sophisticated methods struggle to learn away.

...read moreread less

Abstract: We describe a simple method for unsupervised domain adaptation, whereby the discrepancy between the source and target distributions is reduced by swapping the low-frequency spectrum of one with the other. We illustrate the method in semantic segmentation, where densely annotated images are aplenty in one domain (synthetic data), but difficult to obtain in another (real images). Current state-of-the-art methods are complex, some requiring adversarial optimization to render the backbone of a neural network invariant to the discrete domain selection variable. Our method does not require any training to perform the domain alignment, just a simple Fourier Transform and its inverse. Despite its simplicity, it achieves state-of-the-art performance in the current benchmarks, when integrated into a relatively standard semantic segmentation model. Our results indicate that even simple procedures can discount nuisance variability in the data that more sophisticated methods struggle to learn away.

...read moreread less

Proceedings Article•DOI•

DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation

[...]

Debesh Jha, Michael Riegler, Dag Johansen, Pål Halvorsen¹, Håvard D. Johansen - Show less +1 more•Institutions (1)

Metropolitan University¹

28 Jul 2020

TL;DR: Encouraging results show that DoubleU-Net can be used as a strong baseline for both medical image segmentation and cross-dataset evaluation testing to measure the generalizability of Deep Learning (DL) models.

...read moreread less

Abstract: Semantic image segmentation is the process of labeling each pixel of an image with its corresponding class. An encoder-decoder based approach, like U-Net and its variants, is a popular strategy for solving medical image segmentation tasks. To improve the performance of U-Net on various segmentation tasks, we propose a novel architecture called DoubleU-Net, which is a combination of two U-Net architectures stacked on top of each other. The first U-Net uses a pre-trained VGG-19 as the encoder, which has already learned features from ImageNet and can be transferred to another task easily. To capture more semantic information efficiently, we added another U-Net at the bottom. We also adopt Atrous Spatial Pyramid Pooling (ASPP) to capture contextual information within the network. We have evaluated DoubleU-Net using four medical segmentation datasets, covering various imaging modalities such as colonoscopy, dermoscopy, and microscopy. Experiments on the MICCAI 2015 segmentation challenge, the CVC-ClinicDB, the 2018 Data Science Bowl challenge, and the Lesion boundary segmentation datasets demonstrate that the DoubleU-Net outperforms U-Net and the baseline models. Moreover, DoubleU-Net produces more accurate segmentation masks, especially in the case of the CVC-ClinicDB and MICCAI 2015 segmentation challenge datasets, which have challenging images such as smaller and flat polyps. These results show the improvement over the existing U-Net model. The encouraging results, produced on various medical image segmentation datasets, show that DoubleU-Net can be used as a strong baseline for both medical image segmentation and cross-dataset evaluation testing to measure the generalizability of Deep Learning (DL) models.

...read moreread less

Proceedings Article•DOI•

SEAN: Image Synthesis With Semantic Region-Adaptive Normalization

[...]

Peihao Zhu¹, Rameen Abdal¹, Yipeng Qin, Peter Wonka¹•Institutions (1)

King Abdullah University of Science and Technology¹

14 Jun 2020

TL;DR: Semantic Region Adaptive Normalization (SEAN) as mentioned in this paper is a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image.

...read moreread less

Abstract: We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, we can build a network architecture that can control the style of each semantic region individually, e.g., we can specify one style reference image per region. SEAN is better suited to encode, transfer, and synthesize style than the best previous method in terms of reconstruction quality, variability, and visual quality. We evaluate SEAN on multiple datasets and report better quantitative metrics (e.g. FID, PSNR) than the current state of the art. SEAN also pushes the frontier of interactive image editing. We can interactively edit images by changing segmentation masks or the style for any given region. We can also interpolate styles from two reference images per region.

...read moreread less

Proceedings Article•DOI•

Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

[...]

Bowen Cheng¹, Maxwell D. Collins¹, Yukun Zhu¹, Ting Liu¹, Thomas S. Huang², Hartwig Adam¹, Liang-Chieh Chen¹ - Show less +3 more•Institutions (2)

Google¹, University of Illinois at Urbana–Champaign²

14 Jun 2020

TL;DR: Panoptic-DeepLab as discussed by the authors adopts the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively, aiming to establish a solid baseline for bottom-up methods that can achieve comparable performance of two-stage methods.

...read moreread less

Abstract: In this work, we introduce Panoptic-DeepLab, a simple, strong, and fast system for panoptic segmentation, aiming to establish a solid baseline for bottom-up methods that can achieve comparable performance of two-stage methods while yielding fast inference speed. In particular, Panoptic-DeepLab adopts the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation model (e.g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression. As a result, our single Panoptic-DeepLab simultaneously ranks first at all three Cityscapes benchmarks, setting the new state-of-art of 84.2% mIoU, 39.0% AP, and 65.5% PQ on test set. Additionally, equipped with MobileNetV3, Panoptic-DeepLab runs nearly in real-time with a single 1025x2049 image (15.8 frames per second), while achieving a competitive performance on Cityscapes (54.1 PQ% on test set). On Mapillary Vistas test set, our ensemble of six models attains 42.7% PQ, outperforming the challenge winner in 2018 by a healthy margin of 1.5%. Finally, our Panoptic-DeepLab also performs on par with several top-down approaches on the challenging COCO dataset. For the first time, we demonstrate a bottom-up approach could deliver state-of-the-art results on panoptic segmentation.

...read moreread less

Book Chapter•DOI•

Kvasir-SEG: A Segmented Polyp Dataset

[...]

Debesh Jha, Pia H. Smedsrud, Michael Riegler, Pål Halvorsen, Thomas de Lange¹, Dag Johansen, Håvard D. Johansen - Show less +3 more•Institutions (1)

University of Oslo¹

05 Jan 2020

TL;DR: This paper presents Kvasir-SEG: an open-access dataset of gastrointestinal polyp images and corresponding segmentation masks, manually annotated by a medical doctor and then verified by an experienced gastroenterologist, and demonstrates the use of the dataset with a traditional segmentation approach and a modern deep-learning based Convolutional Neural Network approach.

...read moreread less

Abstract: Pixel-wise image segmentation is a highly demanding task in medical-image analysis. In practice, it is difficult to find annotated medical images with corresponding segmentation masks. In this paper, we present Kvasir-SEG: an open-access dataset of gastrointestinal polyp images and corresponding segmentation masks, manually annotated by a medical doctor and then verified by an experienced gastroenterologist. Moreover, we also generated the bounding boxes of the polyp regions with the help of segmentation masks. We demonstrate the use of our dataset with a traditional segmentation approach and a modern deep-learning based Convolutional Neural Network (CNN) approach. The dataset will be of value for researchers to reproduce results and compare methods. By adding segmentation masks to the Kvasir dataset, which only provide frame-wise annotations, we enable multimedia and computer vision researchers to contribute in the field of polyp segmentation and automatic analysis of colonoscopy images.

...read moreread less

Proceedings Article•DOI•

Self-Supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

[...]

Yude Wang¹, Jie Zhang¹, Meina Kan¹, Shiguang Shan¹, Xilin Chen¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

14 Jun 2020

TL;DR: Zhang et al. as mentioned in this paper proposed a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap between full and weak supervisions.

...read moreread less

Abstract: Image-level weakly supervised semantic segmentation is a challenging problem that has been deeply studied in recent years. Most of advanced solutions exploit class activation map (CAM). However, CAMs can hardly serve as the object mask due to the gap between full and weak supervisions. In this paper, we propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap. Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation, whose pixel-level labels take the same spatial transformation as the input images during data augmentation. However, this constraint is lost on the CAMs trained by image-level supervision. Therefore, we propose consistency regularization on predicted CAMs from various transformed images to provide self-supervision for network learning. Moreover, we propose a pixel correlation module (PCM), which exploits context appearance information and refines the prediction of current pixel by its similar neighbors, leading to further improvement on CAMs consistency. Extensive experiments on PASCAL VOC 2012 dataset demonstrate our method outperforms state-of-the-art methods using the same level of supervision. The code is released online.

...read moreread less

Journal Article•DOI•

A Multi-Organ Nucleus Segmentation Challenge

[...]

Neeraj Kumar¹, Ruchika Verma², Deepak Anand³, Yanning Zhou⁴, Omer Fahri Onder, E. D. Tsougenis, Hao Chen, Pheng-Ann Heng⁴, Jiahui Li⁵, Zhiqiang Hu⁶, Yunzhi Wang⁷, Navid Alemi Koohbanani⁸, Mostafa Jahanifar⁸, Neda Zamani Tajeddin⁸, Ali Gooya⁸, Nasir M. Rajpoot⁸, Xuhua Ren⁹, Sihang Zhou¹⁰, Qian Wang⁹, Dinggang Shen¹⁰, Cheng-Kun Yang, Chi-Hung Weng, Wei-Hsiang Yu, Chao-Yuan Yeh, Shuang Yang¹¹, Shuoyu Xu¹², Pak-Hei Yeung¹³, Peng Sun¹², Amirreza Mahbod¹⁴, Gerald Schaefer¹⁵, Isabella Ellinger¹⁴, Rupert Ecker, Örjan Smedby¹⁶, Chunliang Wang¹⁶, Benjamin Chidester¹⁷, That-Vinh Ton¹⁸, Minh-Triet Tran¹⁹, Jian Ma¹⁷, Minh N. Do¹⁸, Simon Graham⁸, Quoc Dang Vu²⁰, Jin Tae Kwak²⁰, Akshaykumar Gunda²¹, Raviteja Chunduri³, Corey Hu²², Xiaoyang Zhou²³, Dariush Lotfi²⁴, Reza Safdari²⁴, Antanas Kascenas, Alison O'Neil, Dennis Eschweiler²⁵, Johannes Stegmaier²⁵, Yanping Cui²⁶, Baocai Yin, Kailin Chen, Xinmei Tian²⁶, Philipp Gruening²⁷, Erhardt Barth²⁷, Elad Arbel²⁸, Itay Remer²⁸, Amir Ben-Dor²⁸, Ekaterina Sirazitdinova, Matthias Kohl, Stefan Braunewell, Yuexiang Li²⁹, Xinpeng Xie²⁹, Linlin Shen²⁹, Jun Ma³⁰, Krishanu Das Baksi³¹, Mohammad Azam Khan³², Jaegul Choo³², Adrián Colomer³³, Valery Naranjo³³, Linmin Pei³⁴, Khan M. Iftekharuddin³⁴, Kaushiki Roy³⁵, Debotosh Bhattacharjee³⁵, Anibal Pedraza³⁶, Maria Gloria Bueno³⁶, Sabarinathan Devanathan³⁷, Saravanan Radhakrishnan³⁷, Praveen Koduganty³⁷, Zihan Wu³⁸, Guanyu Cai³⁹, Xiaojie Liu³⁹, Yuqin Wang³⁹, Amit Sethi³ - Show less +83 more•Institutions (39)

University of Illinois at Chicago¹, Case Western Reserve University², Indian Institute of Technology Bombay³, The Chinese University of Hong Kong⁴, Beijing University of Posts and Telecommunications⁵, Peking University⁶, University of Oklahoma⁷, University of Warwick⁸, Shanghai Jiao Tong University⁹, University of North Carolina at Chapel Hill¹⁰, Zhejiang University¹¹, Sun Yat-sen University¹², University of Hong Kong¹³, Medical University of Vienna¹⁴, Loughborough University¹⁵, Royal Institute of Technology¹⁶, Carnegie Mellon University¹⁷, University of Illinois at Urbana–Champaign¹⁸, Vietnam National University, Ho Chi Minh City¹⁹, Sejong University²⁰, Indian Institute of Technology Madras²¹, University of California, Berkeley²², Hong Kong University of Science and Technology²³, Islamic Azad University²⁴, RWTH Aachen University²⁵, University of Science and Technology of China²⁶, University of Lübeck²⁷, Agilent Technologies²⁸, Shenzhen University²⁹, Nanjing University of Science and Technology³⁰, Tata Consultancy Services³¹, Korea University³², Polytechnic University of Valencia³³, Old Dominion University³⁴, Jadavpur University³⁵, University of Castilla–La Mancha³⁶, Cognizant³⁷, Xiamen University³⁸, Tongji University³⁹

01 May 2020-IEEE Transactions on Medical Imaging

TL;DR: Several of the top techniques compared favorably to an individual human annotator and can be used with confidence for nuclear morphometrics as well as heavy data augmentation in the MoNuSeg 2018 challenge.

...read moreread less

Abstract: Generalized nucleus segmentation techniques can contribute greatly to reducing the time to develop and validate visual biomarkers for new digital pathology datasets. We summarize the results of MoNuSeg 2018 Challenge whose objective was to develop generalizable nuclei segmentation techniques in digital pathology. The challenge was an official satellite event of the MICCAI 2018 conference in which 32 teams with more than 80 participants from geographically diverse institutes participated. Contestants were given a training set with 30 images from seven organs with annotations of 21,623 individual nuclei. A test dataset with 14 images taken from seven organs, including two organs that did not appear in the training set was released without annotations. Entries were evaluated based on average aggregated Jaccard index (AJI) on the test set to prioritize accurate instance segmentation as opposed to mere semantic segmentation. More than half the teams that completed the challenge outperformed a previous baseline. Among the trends observed that contributed to increased accuracy were the use of color normalization as well as heavy data augmentation. Additionally, fully convolutional networks inspired by variants of U-Net, FCN, and Mask-RCNN were popularly used, typically based on ResNet or VGG base architectures. Watershed segmentation on predicted semantic segmentation maps was a popular post-processing strategy. Several of the top techniques compared favorably to an individual human annotator and can be used with confidence for nuclear morphometrics.

...read moreread less

Journal Article•DOI•

Generalizing Deep Learning for Medical Image Segmentation to Unseen Domains via Deep Stacked Transformation

[...]

Ling Zhang¹, Xiaosong Wang¹, Dong Yang¹, Thomas Sanford², Stephanie Harmon, Baris Turkbey², Bradford J. Wood², Holger R. Roth¹, Andriy Myronenko¹, Daguang Xu¹, Ziyue Xu¹ - Show less +7 more•Institutions (2)

Nvidia¹, National Institutes of Health²

12 Feb 2020-IEEE Transactions on Medical Imaging

TL;DR: A deep stacked transformation approach for domain generalization that can be generalized to the design of highly robust deep segmentation models for clinical deployment and reaches the performance of state-of theart fully supervised models that are trained and tested on their source domains.

...read moreread less

Abstract: Recent advances in deep learning for medical image segmentation demonstrate expert-level accuracy. However, application of these models in clinically realistic environments can result in poor generalization and decreased accuracy, mainly due to the domain shift across different hospitals, scanner vendors, imaging protocols, and patient populations etc. Common transfer learning and domain adaptation techniques are proposed to address this bottleneck. However, these solutions require data (and annotations) from the target domain to retrain the model, and is therefore restrictive in practice for widespread model deployment. Ideally, we wish to have a trained (locked) model that can work uniformly well across unseen domains without further training. In this paper, we propose a deep stacked transformation approach for domain generalization. Specifically, a series of ${n}$ stacked transformations are applied to each image during network training. The underlying assumption is that the “expected” domain shift for a specific medical imaging modality could be simulated by applying extensive data augmentation on a single source domain, and consequently, a deep model trained on the augmented “big” data (BigAug) could generalize well on unseen domains. We exploit four surprisingly effective, but previously understudied, image-based characteristics for data augmentation to overcome the domain generalization problem. We train and evaluate the BigAug model (with ${n}={9}$ transformations) on three different 3D segmentation tasks (prostate gland, left atrial, left ventricle) covering two medical imaging modalities (MRI and ultrasound) involving eight publicly available challenge datasets. The results show that when training on relatively small dataset (n = 10~32 volumes, depending on the size of the available datasets) from a single source domain: (i) BigAug models degrade an average of 11%(Dice score change) from source to unseen domain, substantially better than conventional augmentation (degrading 39%) and CycleGAN-based domain adaptation method (degrading 25%), (ii) BigAug is better than “shallower” stacked transforms (i.e. those with fewer transforms) on unseen domains and demonstrates modest improvement to conventional augmentation on the source domain, (iii) after training with BigAug on one source domain, performance on an unseen domain is similar to training a model from scratch on that domain when using the same number of training samples. When training on large datasets (n = 465 volumes) with BigAug, (iv) application to unseen domains reaches the performance of state-of-the-art fully supervised models that are trained and tested on their source domains. These findings establish a strong benchmark for the study of domain generalization in medical imaging, and can be generalized to the design of highly robust deep segmentation models for clinical deployment.

...read moreread less

Journal Article•DOI•

CPFNet: Context Pyramid Fusion Network for Medical Image Segmentation

[...]

Shuanglang Feng¹, Heming Zhao¹, Fei Shi¹, Xuena Cheng¹, Meng Wang¹, Yuhui Ma¹, Dehui Xiang¹, Weifang Zhu¹, Xinjian Chen¹ - Show less +5 more•Institutions (1)

Soochow University (Suzhou)¹

27 Mar 2020-IEEE Transactions on Medical Imaging

TL;DR: Experimental results show that the proposed novel Context Pyramid Fusion Network (named CPFNet) is very competitive with other state-of-the-art methods on four different challenging tasks, including skin lesion segmentation, retinal linear lesion segmentsation, multi-class segmentation of thoracic organs at risk and multi- class segmentsation of retinal edema lesions.

...read moreread less

Abstract: Accurate and automatic segmentation of medical images is a crucial step for clinical diagnosis and analysis. The convolutional neural network (CNN) approaches based on the U-shape structure have achieved remarkable performances in many different medical image segmentation tasks. However, the context information extraction capability of single stage is insufficient in this structure, due to the problems such as imbalanced class and blurred boundary. In this paper, we propose a novel Context Pyramid Fusion Network (named CPFNet) by combining two pyramidal modules to fuse global/multi-scale context information. Based on the U-shape structure, we first design multiple global pyramid guidance (GPG) modules between the encoder and the decoder, aiming at providing different levels of global context information for the decoder by reconstructing skip-connection. We further design a scale-aware pyramid fusion (SAPF) module to dynamically fuse multi-scale context information in high-level features. These two pyramidal modules can exploit and fuse rich context information progressively. Experimental results show that our proposed method is very competitive with other state-of-the-art methods on four different challenging tasks, including skin lesion segmentation, retinal linear lesion segmentation, multi-class segmentation of thoracic organs at risk and multi-class segmentation of retinal edema lesions.

...read moreread less

Journal Article•DOI•

Unsupervised Bidirectional Cross-Modality Adaptation via Deeply Synergistic Image and Feature Alignment for Medical Image Segmentation

[...]

Cheng Chen¹, Qi Dou¹, Hao Chen¹, Jing Qin², Pheng-Ann Heng¹ - Show less +1 more•Institutions (2)

The Chinese University of Hong Kong¹, Hong Kong Polytechnic University²

10 Feb 2020-IEEE Transactions on Medical Imaging

TL;DR: In this paper, the authors proposed a novel unsupervised domain adaptation framework, named as synergistic image and feature alignment (SIFA), to effectively adapt a segmentation network to an unlabeled target domain.

...read moreread less

Abstract: Unsupervised domain adaptation has increasingly gained interest in medical image computing, aiming to tackle the performance degradation of deep neural networks when being deployed to unseen data with heterogeneous characteristics. In this work, we present a novel unsupervised domain adaptation framework, named as Synergistic Image and Feature Alignment (SIFA) , to effectively adapt a segmentation network to an unlabeled target domain. Our proposed SIFA conducts synergistic alignment of domains from both image and feature perspectives. In particular, we simultaneously transform the appearance of images across domains and enhance domain-invariance of the extracted features by leveraging adversarial learning in multiple aspects and with a deeply supervised mechanism. The feature encoder is shared between both adaptive perspectives to leverage their mutual benefits via end-to-end learning. We have extensively evaluated our method with cardiac substructure segmentation and abdominal multi-organ segmentation for bidirectional cross-modality adaptation between MRI and CT images. Experimental results on two different tasks demonstrate that our SIFA method is effective in improving segmentation performance on unlabeled target images, and outperforms the state-of-the-art domain adaptation approaches by a large margin.

...read moreread less

Collapse