scispace - formally typeset
Open AccessBook ChapterDOI

Deep learning for multi-task medical image segmentation in multiple modalities

Reads0
Chats0
TLDR
In this article, a single convolutional neural network (CNN) is trained to perform different segmentation tasks for different medical images, such as image segmentation, segmentation of anatomical structures, and image classification.
Abstract
Automatic segmentation of medical images is an important task for many clinical applications. In practice, a wide range of anatomical structures are visualised using different imaging modalities. In this paper, we investigate whether a single convolutional neural network (CNN) can be trained to perform different segmentation tasks.

read more

Content maybe subject to copyright    Report

Deep Learning for Multi-task Medical Image
Segmentation in Multiple Modalities
Pim Moeskops
1,2(
B
)
, Jelmer M. Wolterink
1
, Bas H.M. van der Velden
1
,
Kenneth G.A. Gilhuijs
1
, Tim Leiner
3
, Max A. Viergever
1
, and Ivana Iˇsgum
1
1
Image Sciences Institute, University Medical Center Utrecht,
Utrecht, The Netherlands
pim@isi.uu.nl
2
Medical Image Analysis, Eindhoven University of Technology,
Eindhoven, The Netherlands
3
Department of Radiology, University Medical Center Utrecht,
Utrecht, The Netherlands
Abstract. Automatic segmentation of medical images is an imp ort ant
task for many clinical applications. In practice, a wide range of anatom-
ical structures are visualised using different imaging modalities. In this
pap er, we investigate whether a single convolutional neural network
(CNN) can be trained to perform different segmentation tasks.
A single CNN is trained to segment six tissues in MR brain images,
the pectoral muscle in MR breast images, and the coronary arteries in
cardiac CTA. The CNN therefore learns to identify the imaging modal-
ity, the visualised anatomical structures, and the tissue classes.
For each of the three tasks (brain MRI, breast MRI and cardiac CTA),
this combined training procedure resulted in a segmentation performance
equivalent to that of a CNN trained specifically for that task, demon-
strating the high capacity of CNN architectures. Hence, a single system
could be used in clinical practice to automatically perform diverse seg-
mentation tasks without task-specific training.
Keywords: Deep learning
· Convolutional neural networks · Medical
image segmentation · Brain MRI · Breast MRI · Cardiac CTA
1 Introduction
Automatic segmentation is an important task in medical images acquired with
different mo d ali ties visualising a wide range of anatomical structures. A common
approach to automatic segmentation is the use of supervised voxel classification,
where a classifier is trained to assign a class label to each voxel. The classical
approach to supervised classification is to train a classifier that discriminates
between tissue classes based on a set of hand-crafted features. In contrast to this
approach, convolutional neural networks (CNNs) automatically extract features
P. Moeskops and J.M. Wolterink—Equally contributed.
c
Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part II, LNCS 9901, pp. 478–486, 2016.
DOI: 10.1007/978-3-319-46723-8
55

Deep Learning for Multi-task Medical Image Segmentation 479
that are optimised for the classification task at hand. CNNs have been success-
fully applied to medical image segmentation of e.g. knee cartilage [
11], brain
regions [
1,10], the pancreas [12], and coronary artery calcifications [18]. Each of
these studies employed CNNs, but problem-specific optimisations with respect
to th e network architecture were still performed and networks were only trained
to perform one specific task.
CNNs have not only been used for processing of medical images, but also for
natural images. CNN architectures designed for image classification in natural
images [
7] have shown great generalisability f or divergent tasks such as image
segmentation [
13], object detection [3], and object localisation in medical image
analysis [
17]. Hence, CNN architectures may have the flexibility to be used for
different tasks with limited modifications.
In this study, we first investigate the feasibility of using a single CNN architec-
ture for different medical image segmentation tasks in different imaging mod ali-
ties visualising different anatomical structures. Secondly, we investigate the fea-
sibility of using a single trained instance of this CNN architecture for different
segmentation tasks. Such a system would be able to perform multiple tasks in dif-
ferent modalities without problem-specific tailoring of the network architecture
or hyperparameters. Hence, the network recognises the modality of the image,
the anatomy visualised in th e image, and the tissues of interest. We demonstrate
this concept using three different and potentially adversarial medical image seg-
mentation problems: segmentation of six brain tissues in brain MRI, pectoral
muscle segmentation in breast MRI, and coronary artery segmentation in car-
diac CT angiography (CTA).
2 Data
Brain MRI 34 T
1
-weighted MR brain images from the OASIS project [
9]
were acquired on a Siemens Vision 1.5 T scanner, as provided by the MICCAI
challenge on multi-atlas labelling [
8]
1
. The images were acquired with voxel sizes
of 1.0×1.0×1.25 mm
3
and resampled to isotropic voxel sizes of 1.0×1.0×1.0mm
3
.
The images were manually segmented, in the coronal plane, into 134 classes that
were, for the purpose of this paper, combined into six commonly used tissue
classes: white matter, cortical grey matter, basal gan glia and thalami, ventricular
cerebrospinal fluid, cerebellum, and brain stem.
Breast MRI 34 T
1
-weighted MR breast images were acquired on a Siemens
Magnetom 1.5 T scanner with a dedicated double breast array coil [
16]. The
images were acquired with in-plane voxel sizes between 1.21 and 1.35 mm and
slice thicknesses between 1.35 and 1.69 mm. All images were resampled to
isotropic voxel sizes corresponding to their in-plane voxel size. The pectoral
muscle was manually segmented in the axial plane by contour drawing.
1
https://masi.vuse.vanderbilt.edu/workshop2012.

480 P. Moeskops et al.
Cardiac CTA Ten cardiac CTA scans were acqui red on a 256-detector row
Philips Brilliance iCT scanner using 120 kVp and 200–300 mAs, with ECG-
triggering and contrast enhancement. The reconstructed images had between
0.4 and 0.5 mm in-plane voxel sizes and 0.45/0.90 mm slice spacing/thickness.
All images were resampled to isotropic 0.45 × 0.45 × 0.45 mm
3
voxel size. To
set a manual reference standard, a human observer traversed the scan in the
craniocaudal direction and painted voxels in the main coronary arteries and
their branches in the axial plane.
3Method
All voxels in the images were labelled by a CNN using seven different training
experiments (Fig.
1).
3.1 CNN Architecture
For each voxel, three orthogonal (axial, sagittal, and coronal) patches of 51 ×
51 voxels centred at the target voxel were extracted. For each of these three
patches, features were determined using a deep stack of convolution layers. Each
convolution layer contained 32 small (3× 3 voxels) convolution kernels for a total
of 25 convolution layers [
14]. To prevent over- or undersegmentation of structures
due to translational invariance, no subsampling layers were used. To reduce the
number of trainable parameters in the network and hence the risk of over-fitting,
the same stack of convolutional layers was used for the axial, sagittal and coronal
patches.
The output of the convolution layers were 32 features for each of the three
orthogonal input patches, hence, 96 features in total. These features were input
to two subsequent fully connected layers, each with 192 nodes. The second fully
Cardiac
CTA
12
3
4
5
6
7
Training experiment
Brain
MRI
Sagittal
Coronal
Axial
Breast
MRI
32 kernels
3x3
32 kernels
3x3
32 kernels
3x3
25 convolution
layers
32 features
1x1
32 features
1x1
32 features
1x1
192 nodes
2 fully
connected
layers
Cerebellum
Basal ganglia
and thalami
Ventricular
cerebrospinal fluid
White matter
Brain stem
Cortical grey matter
Pectoral muscle
Coronary artery
Background
Input
51x51
Output layer
Input
51x51
Input
51x51
Fig. 1. Example 51×51 triplanar input patches (left). CNN architecture with 25 shared
convolution layers, 2 fully connected layers and an output layer with at most 9 classes,
including a background class common among tasks (centre ). Output classes included
in each training experiment (right ).

Deep Learning for Multi-task Medical Image Segmentation 481
connected layer was connected to a softmax classification layer. Depending on
the tasks of the network, this layer contained 2, 3, 7, 8 or 9 output nodes. The
fully connected layers were implemented as 1 × 1 voxel convolutions, to allow
fast processing of arbitrarily sized images. Exponential linear units [
2] were used
for all non -linear activation functions. Batch normalisation [
5] was used on all
layers and dropout [15] was used on the fully connected layers.
3.2 Training Experiments
The same model was trained for each combination of the three tasks. In total
seven training experiments were performed (Fig.
1, right): three networks were
trained to perform one task (Experiments 1–3), three networks were trained to
perform two tasks (Experiments 4–6), and one network was trained to perform
three tasks (Experiment 7). The number of output nodes in the CNN was modi-
fied accordingly. In each experiment, background classes of the target tasks were
merged into one class.
Each CNN was trained using mini-batch learning. A mini-batch contained
210 samples, equally balanced over the tasks of the network. For each task, t he
training samples were randomly drawn from all training images, balanced over
the task-specific classes. All voxels with image intensity > 0 were considered
samples. The network parameters were optimized using Adam stochastic opti-
misation [
6] with categorical cross-entropy as the cost-function.
4 Experiments and Results
The data for brain MRI, breast MRI and cardiac CTA were split into 14/20,
14/20 and 6/4 training/test images, respectively. Four results were obtained for
each task: one with a network trained for only that task, two with networks
trained for that task and an additional task, and one with a network trained for
all tasks together. Each network was trained with 25000 mini-batches per task.
No post-processing steps other than probability thresholding for evaluation
purposes were performed. The results are presented on the full test set. In b rain
MRI, the voxel class labels were determined by the highest class activation.
The performance was evaluated per brain tissue type, using the Dice coefficient
between the manual and automatic segmentations. In breast MRI and cardiac
CTA, precision-recall curve analysis was performed to identify the optimal oper-
ating point, defined, for each experiment, as the highest Dice coefficient over the
whole test set. The thresholds at this optimal operating point were then appli ed
to all images.
Figure
2 shows the results of the described quantitative analysis, performed at
intervals of 1000 mini-batches per task. As the networks learned, the obtained
Dice coefficients increased and the stability of the results improved. For each
segmentation task, the learning curves were similar for all experiments. Nev-
ertheless, slight differences were visible between the obtained learning curves.
To assess whether these differences were systematic or caused by the stochastic

482 P. Moeskops et al.
Fig. 2. Learning curves showing Dice coefficients for tissue segmentation in brain MRI
(top three rows), breast MRI (bottom left ), and cardiac CTA (bottom right), reported
at 1000 mini-batch intervals for experiments including that task. The line colours cor-
respond to the training experiments in Fig.
1.

Citations
More filters
Journal ArticleDOI

A survey on deep learning in medical image analysis

TL;DR: This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year, to survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks.
Proceedings ArticleDOI

ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases

TL;DR: The ChestX-ray dataset as discussed by the authors contains 108,948 frontal-view X-ray images of 32,717 unique patients with the text-mined eight disease image labels from the associated radiological reports using natural language processing.
Journal ArticleDOI

Artificial intelligence in radiology

TL;DR: A general understanding of AI methods, particularly those pertaining to image-based tasks, is established and how these methods could impact multiple facets of radiology is explored, with a general focus on applications in oncology.
Proceedings ArticleDOI

ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases

TL;DR: A new chest X-rays database, namely ChestX-ray8, is presented, which comprises 108,948 frontal-view X-ray images of 32,717 unique patients with the text-mined eight disease image labels from the associated radiological reports using natural language processing, which is validated using the proposed dataset.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Journal Article

Dropout: a simple way to prevent neural networks from overfitting

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.