Deep learning for multi-task medical image segmentation in multiple modalities

doi:10.1007/978-3-319-46723-8_55

Deep Learning for Multi-task Medical Image

Segmentation in Multiple Modalities

Pim Moeskops

1,2(

B

)

, Jelmer M. Wolterink

1

, Bas H.M. van der Velden

1

,

Kenneth G.A. Gilhuijs

1

, Tim Leiner

3

, Max A. Viergever

1

, and Ivana Iˇsgum

1

Image Sciences Institute, University Medical Center Utrecht,

Utrecht, The Netherlands

pim@isi.uu.nl

2

Medical Image Analysis, Eindhoven University of Technology,

Eindhoven, The Netherlands

3

Department of Radiology, University Medical Center Utrecht,

Utrecht, The Netherlands

Abstract. Automatic segmentation of medical images is an imp ort ant

task for many clinical applications. In practice, a wide range of anatom-

ical structures are visualised using diﬀerent imaging modalities. In this

pap er, we investigate whether a single convolutional neural network

(CNN) can be trained to perform diﬀerent segmentation tasks.

A single CNN is trained to segment six tissues in MR brain images,

the pectoral muscle in MR breast images, and the coronary arteries in

cardiac CTA. The CNN therefore learns to identify the imaging modal-

ity, the visualised anatomical structures, and the tissue classes.

For each of the three tasks (brain MRI, breast MRI and cardiac CTA),

this combined training procedure resulted in a segmentation performance

equivalent to that of a CNN trained speciﬁcally for that task, demon-

strating the high capacity of CNN architectures. Hence, a single system

could be used in clinical practice to automatically perform diverse seg-

mentation tasks without task-speciﬁc training.

Keywords: Deep learning

· Convolutional neural networks · Medical

image segmentation · Brain MRI · Breast MRI · Cardiac CTA

1 Introduction

Automatic segmentation is an important task in medical images acquired with

diﬀerent mo d ali ties visualising a wide range of anatomical structures. A common

approach to automatic segmentation is the use of supervised voxel classiﬁcation,

where a classiﬁer is trained to assign a class label to each voxel. The classical

approach to supervised classiﬁcation is to train a classiﬁer that discriminates

between tissue classes based on a set of hand-crafted features. In contrast to this

approach, convolutional neural networks (CNNs) automatically extract features

P. Moeskops and J.M. Wolterink—Equally contributed.

c

 Springer International Publishing AG 2016

S. Ourselin et al. (Eds.): MICCAI 2016, Part II, LNCS 9901, pp. 478–486, 2016.

DOI: 10.1007/978-3-319-46723-8

55

Deep Learning for Multi-task Medical Image Segmentation 479

that are optimised for the classiﬁcation task at hand. CNNs have been success-

fully applied to medical image segmentation of e.g. knee cartilage [

11], brain

regions [

1,10], the pancreas [12], and coronary artery calciﬁcations [18]. Each of

these studies employed CNNs, but problem-speciﬁc optimisations with respect

to th e network architecture were still performed and networks were only trained

to perform one speciﬁc task.

CNNs have not only been used for processing of medical images, but also for

natural images. CNN architectures designed for image classiﬁcation in natural

images [

7] have shown great generalisability f or divergent tasks such as image

segmentation [

13], object detection [3], and object localisation in medical image

analysis [

17]. Hence, CNN architectures may have the ﬂexibility to be used for

diﬀerent tasks with limited modiﬁcations.

In this study, we ﬁrst investigate the feasibility of using a single CNN architec-

ture for diﬀerent medical image segmentation tasks in diﬀerent imaging mod ali-

ties visualising diﬀerent anatomical structures. Secondly, we investigate the fea-

sibility of using a single trained instance of this CNN architecture for diﬀerent

segmentation tasks. Such a system would be able to perform multiple tasks in dif-

ferent modalities without problem-speciﬁc tailoring of the network architecture

or hyperparameters. Hence, the network recognises the modality of the image,

the anatomy visualised in th e image, and the tissues of interest. We demonstrate

this concept using three diﬀerent and potentially adversarial medical image seg-

mentation problems: segmentation of six brain tissues in brain MRI, pectoral

muscle segmentation in breast MRI, and coronary artery segmentation in car-

diac CT angiography (CTA).

2 Data

Brain MRI – 34 T

1

-weighted MR brain images from the OASIS project [

9]

were acquired on a Siemens Vision 1.5 T scanner, as provided by the MICCAI

challenge on multi-atlas labelling [

8]

1

. The images were acquired with voxel sizes

of 1.0×1.0×1.25 mm

3

and resampled to isotropic voxel sizes of 1.0×1.0×1.0mm

3

.

The images were manually segmented, in the coronal plane, into 134 classes that

were, for the purpose of this paper, combined into six commonly used tissue

classes: white matter, cortical grey matter, basal gan glia and thalami, ventricular

cerebrospinal ﬂuid, cerebellum, and brain stem.

Breast MRI – 34 T

1

-weighted MR breast images were acquired on a Siemens

Magnetom 1.5 T scanner with a dedicated double breast array coil [

16]. The

images were acquired with in-plane voxel sizes between 1.21 and 1.35 mm and

slice thicknesses between 1.35 and 1.69 mm. All images were resampled to

isotropic voxel sizes corresponding to their in-plane voxel size. The pectoral

muscle was manually segmented in the axial plane by contour drawing.

1

https://masi.vuse.vanderbilt.edu/workshop2012.

480 P. Moeskops et al.

Cardiac CTA – Ten cardiac CTA scans were acqui red on a 256-detector row

Philips Brilliance iCT scanner using 120 kVp and 200–300 mAs, with ECG-

triggering and contrast enhancement. The reconstructed images had between

0.4 and 0.5 mm in-plane voxel sizes and 0.45/0.90 mm slice spacing/thickness.

All images were resampled to isotropic 0.45 × 0.45 × 0.45 mm

3

voxel size. To

set a manual reference standard, a human observer traversed the scan in the

craniocaudal direction and painted voxels in the main coronary arteries and

their branches in the axial plane.

3Method

All voxels in the images were labelled by a CNN using seven diﬀerent training

experiments (Fig.

1).

3.1 CNN Architecture

For each voxel, three orthogonal (axial, sagittal, and coronal) patches of 51 ×

51 voxels centred at the target voxel were extracted. For each of these three

patches, features were determined using a deep stack of convolution layers. Each

convolution layer contained 32 small (3× 3 voxels) convolution kernels for a total

of 25 convolution layers [

14]. To prevent over- or undersegmentation of structures

due to translational invariance, no subsampling layers were used. To reduce the

number of trainable parameters in the network and hence the risk of over-ﬁtting,

the same stack of convolutional layers was used for the axial, sagittal and coronal

patches.

The output of the convolution layers were 32 features for each of the three

orthogonal input patches, hence, 96 features in total. These features were input

to two subsequent fully connected layers, each with 192 nodes. The second fully

Cardiac

CTA

12

3

4

5

6

7

Training experiment

Brain

MRI

Sagittal

Coronal

Axial

Breast

MRI

32 kernels

3x3

32 kernels

3x3

32 kernels

3x3

25 convolution

layers

32 features

1x1

32 features

1x1

32 features

1x1

192 nodes

2 fully

connected

layers

Cerebellum

Basal ganglia

and thalami

Ventricular

cerebrospinal fluid

White matter

Brain stem

Cortical grey matter

Pectoral muscle

Coronary artery

Background

Input

51x51

Output layer

Input

51x51

Input

51x51

Fig. 1. Example 51×51 triplanar input patches (left). CNN architecture with 25 shared

convolution layers, 2 fully connected layers and an output layer with at most 9 classes,

including a background class common among tasks (centre ). Output classes included

in each training experiment (right ).

Deep Learning for Multi-task Medical Image Segmentation 481

connected layer was connected to a softmax classiﬁcation layer. Depending on

the tasks of the network, this layer contained 2, 3, 7, 8 or 9 output nodes. The

fully connected layers were implemented as 1 × 1 voxel convolutions, to allow

fast processing of arbitrarily sized images. Exponential linear units [

2] were used

for all non -linear activation functions. Batch normalisation [

5] was used on all

layers and dropout [15] was used on the fully connected layers.

3.2 Training Experiments

The same model was trained for each combination of the three tasks. In total

seven training experiments were performed (Fig.

1, right): three networks were

trained to perform one task (Experiments 1–3), three networks were trained to

perform two tasks (Experiments 4–6), and one network was trained to perform

three tasks (Experiment 7). The number of output nodes in the CNN was modi-

ﬁed accordingly. In each experiment, background classes of the target tasks were

merged into one class.

Each CNN was trained using mini-batch learning. A mini-batch contained

210 samples, equally balanced over the tasks of the network. For each task, t he

training samples were randomly drawn from all training images, balanced over

the task-speciﬁc classes. All voxels with image intensity > 0 were considered

samples. The network parameters were optimized using Adam stochastic opti-

misation [

6] with categorical cross-entropy as the cost-function.

4 Experiments and Results

The data for brain MRI, breast MRI and cardiac CTA were split into 14/20,

14/20 and 6/4 training/test images, respectively. Four results were obtained for

each task: one with a network trained for only that task, two with networks

trained for that task and an additional task, and one with a network trained for

all tasks together. Each network was trained with 25000 mini-batches per task.

No post-processing steps other than probability thresholding for evaluation

purposes were performed. The results are presented on the full test set. In b rain

MRI, the voxel class labels were determined by the highest class activation.

The performance was evaluated per brain tissue type, using the Dice coeﬃcient

between the manual and automatic segmentations. In breast MRI and cardiac

CTA, precision-recall curve analysis was performed to identify the optimal oper-

ating point, deﬁned, for each experiment, as the highest Dice coeﬃcient over the

whole test set. The thresholds at this optimal operating point were then appli ed

to all images.

Figure

2 shows the results of the described quantitative analysis, performed at

intervals of 1000 mini-batches per task. As the networks learned, the obtained

Dice coeﬃcients increased and the stability of the results improved. For each

segmentation task, the learning curves were similar for all experiments. Nev-

ertheless, slight diﬀerences were visible between the obtained learning curves.

To assess whether these diﬀerences were systematic or caused by the stochastic

482 P. Moeskops et al.

Fig. 2. Learning curves showing Dice coeﬃcients for tissue segmentation in brain MRI

(top three rows), breast MRI (bottom left ), and cardiac CTA (bottom right), reported

at 1000 mini-batch intervals for experiments including that task. The line colours cor-

respond to the training experiments in Fig.

1.

Deep learning for multi-task medical image segmentation in multiple modalities

Citations

A survey on deep learning in medical image analysis

ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases

Artificial intelligence in radiology

ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases

ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases.

References

Adam: A Method for Stochastic Optimization

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Dropout: a simple way to prevent neural networks from overfitting

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Related Papers (5)

U-Net: Convolutional Networks for Biomedical Image Segmentation

Fully convolutional networks for semantic segmentation

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Deep learning