scispace - formally typeset
Open AccessJournal ArticleDOI

Interactive Medical Image Segmentation Using Deep Learning With Image-Specific Fine Tuning

TLDR
A novel deep learning-based interactive segmentation framework by incorporating CNNs into a bounding box and scribble-based segmentation pipeline and proposing a weighted loss function considering network and interaction-based uncertainty for the fine tuning is proposed.
Abstract
Convolutional neural networks (CNNs) have achieved state-of-the-art performance for automatic medical image segmentation. However, they have not demonstrated sufficiently accurate and robust results for clinical use. In addition, they are limited by the lack of image-specific adaptation and the lack of generalizability to previously unseen object classes (a.k.a. zero-shot learning). To address these problems, we propose a novel deep learning-based interactive segmentation framework by incorporating CNNs into a bounding box and scribble-based segmentation pipeline. We propose image-specific fine tuning to make a CNN model adaptive to a specific test image, which can be either unsupervised (without additional user interactions) or supervised (with additional scribbles). We also propose a weighted loss function considering network and interaction-based uncertainty for the fine tuning. We applied this framework to two applications: 2-D segmentation of multiple organs from fetal magnetic resonance (MR) slices, where only two types of these organs were annotated for training and 3-D segmentation of brain tumor core (excluding edema) and whole brain tumor (including edema) from different MR sequences, where only the tumor core in one MR sequence was annotated for training. Experimental results show that: 1) our model is more robust to segment previously unseen objects than state-of-the-art CNNs; 2) image-specific fine tuning with the proposed weighted loss function significantly improves segmentation accuracy; and 3) our method leads to accurate results with fewer user interactions and less user time than traditional interactive segmentation methods.

read more

Content maybe subject to copyright    Report

King’s Research Portal
DOI:
10.1109/TMI.2018.2791721
Document Version
Publisher's PDF, also known as Version of record
Link to publication record in King's Research Portal
Citation for published version (APA):
Wang, G., Li, W., Zuluaga, M. A., Pratt, R., Patel, P. A., Aertsen, M., Doel, T., David, A. L., Deprest, J., Ourselin,
S., & Vercauteren, T. (2018). Interactive Medical Image Segmentation using Deep Learning with Image-specific
Fine-tuning. IEEE Transactions on Medical Imaging, 37(7), 1562 - 1573.
https://doi.org/10.1109/TMI.2018.2791721
Citing this paper
Please note that where the full-text provided on King's Research Portal is the Author Accepted Manuscript or Post-Print version this may
differ from the final Published version. If citing, it is advised that you check and use the publisher's definitive version for pagination,
volume/issue, and date of publication details. And where the final published version is provided on the Research Portal, if citing you are
again advised to check the publisher's website for any subsequent corrections.
General rights
Copyright and moral rights for the publications made accessible in the Research Portal are retained by the authors and/or other copyright
owners and it is a condition of accessing publications that users recognize and abide by the legal requirements associated with these rights.
•Users may download and print one copy of any publication from the Research Portal for the purpose of private study or research.
•You may not further distribute the material or use it for any profit-making activity or commercial gain
•You may freely distribute the URL identifying the publication in the Research Portal
Take down policy
If you believe that this document breaches copyright please contact librarypure@kcl.ac.uk providing details, and we will remove access to
the work immediately and investigate your claim.
Download date: 09. Aug. 2022

1562 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 37, NO. 7, JULY 2018
Interactive Medical Image Segmentation Using
Deep Learning With Image-Specific Fine Tuning
Guotai Wang , Wenqi Li , Maria A. Zuluaga , Rosalind Pratt, Premal A. Patel, Michael Aertsen,
Tom Doel, Anna L. David, Jan Deprest, Sébastien Ourselin, and Tom Vercauteren
Abstract
Convolutional neural networks (CNNs) have
achieved state-of-the-art performance for automatic
medical image segmentation. However, they have not
demonstrated sufficiently accurate and robust results for
clinical use. In addition, they are limited by the lack of
image-specific adaptation and the lack of generalizability
to previously unseen object classes (a.k.a. zero-shot
learning). To address these problems, we propose a novel
deep learning-based interactive segmentation framework
by incorporating CNNs into a bounding box and scribble-
based segmentation pipeline. We propose image-specific
fine tuning to make a CNN model adaptive to a specific test
image, which can be either unsupervised(without additional
user interactions) or supervised (with additional scribbles).
We also propose a weighted loss function considering
network and interaction-based uncertainty for the fine
tuning. We applied this framework to two applications:
Manuscript received October 11, 2017; revised January 4, 2018;
accepted January 5, 2018. Date of publication January 26, 2018; date
of current version June 30, 2018. This work was supported in part
by the Wellcome Trust under Grant WT101957, Grant WT97914, and
Grant HICF-T4-275, in part by the EPSRC under Grant NS/A000027/1,
Grant EP/H046410/1, Grant EP/J020990/1, Grant EP/K005278, and
Grant NS/A000050/1, in part by the Wellcome/EPSRC under Grant
203145Z/16/Z, in part by the Royal Society under Grant RG160569,
in part by the National Institute for Health Research University College
London (UCL) Hospitals Biomedical Research Centre, in part by the
Great Ormond Street Hospital Charity, in part by UCL ORS and GRS,
in part by NVIDIA, and in part by Emerald, a GPU-accelerated High
Performance Computer, made available by the Science and Engineering
South Consortium operated in partnership with the STFC Rutherford-
Appleton Laboratory.
(Corresponding author: Guotai Wang.)
G. Wang, W. Li, R. Pratt, P. A. Patel, T. Doel, and S. Ourselin
are with the Wellcome EPSRC Centre for Interventional and Surgical
Sciences, Department of Medical Physics and Biomedical Engineering,
University College London, London WC1E 6BT, U.K. (e-mail: guotai.
wang.14@ucl.ac.uk).
M. A. Zuluaga is with the Department of Medical Physics and
Biomedical Engineering, University College London, London WC1E
6BT, U.K., with the Facultad de Medicina, Universidad Nacional de
Colombia, Bogotá 111321, Colombia, and also with Amadeus S.A.S.,
06560 Sophia-Antipolis, France.
M. Aertsen is with the Department of Radiology, University Hospitals
KU Leuven, 3000 Leuven, Belgium
A. L. David and J. Deprest are with the Wellcome EPSRC Centre
for Interventional and Surgical Sciences, Institute for Women’s Health,
University College London, London WC1E 6BT, U.K., and also with
Department of Obstetrics and Gynaecology, KU Leuven, 3000 Leuven,
Belgium.
T. Vercauteren is with the Wellcome EPSRC Centre for Interventional
and Surgical Sciences, Department of Medical Physics and Biomedical
Engineering, University College London, London WC1E 6BT, U.K., and
also with KU Leuven, 3000 Leuven, Belgium.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TMI.2018.2791721
2-D segmentation of multiple organs from fetal magnetic
resonance (MR) slices, where only two types of these
organs were annotated for training and 3-D segmentation of
brain tumor core (excluding edema) and whole brain tumor
(including edema) from different MR sequences, where
only the tumor core in one MR sequence was annotated
for training. Experimental results show that: 1) our model
is more robust to segment previously unseen objects than
state-of-the-art CNNs; 2) image-specific fine tuning with
the proposed weighted loss function significantly improves
segmentation accuracy; and 3) our method leads to
accurate results with fewer user interactions and less user
time than traditional interactive segmentation methods.
Index Terms
Interactive image segmentation, convolu-
tional neural network, fine-tuning, fetal MRI, brain tumor.
I. INTRODUCTION
D
EEP learning with convolutional neural networks
(CNNs) has achieved state-of-the-art performance for
automated medical image segmentation [1]. However, auto-
matic segmentation methods have not demonstrated suffi-
ciently accurate and robust results for clinical use due to
the inherent challenges of medical images, such as poor
image quality, different imaging and segmentation protocols,
and variations among patients [2]. Alternatively, interactive
segmentation methods are widely adopted, as they integrate
the user’s knowledge and take into account the application
requirements for more robust segmentation performance [2].
As such, interactive segmentation remains the state of the art
for existing commercial surgical planning and navigation prod-
ucts. Though leveraging user interactions often leads to more
robust segmentations, an interactive method should require as
short user time as possible to reduce the burden on users.
Motivated by these observations, we investigate combining
CNNs with user interactions for medical image segmentation
to achieve higher segmentation accuracy and robustness with
fewer user interactions and less user time. However, there are
very few studies on using CNNs for interactive segmenta-
tion [3]–[5]. This is mainly due to the requirement of large
amounts of annotated images for training, the lack of image-
specific adaptation and the demanding balance among model
complexity, inference time and memory space efficiency.
The first challenge of using CNNs for interactive segmenta-
tion is that current CNNs do not generalize well to previously
unseen object classes that are not present in the training set.
As a result, they require labeled instances of each object
class to be present in the training set. For medical images,
This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/

WANG
et al.
: INTERACTIVE MEDICAL IMAGE SEGMENTATION USING DEEP LEARNING 1563
annotations are often expensive to acquire as both expertise
and time are needed to produce accurate annotations. This
limits the performance of CNNs to segment objects for which
annotations are not available in the training stage.
Second, interactive segmentation often requires image-
specific learning to deal with large context variations among
different images, but current CNNs are not adaptive to dif-
ferent test images, as parameters of the model are learned
from training images and then fixed in the testing stage,
without image-specific adaptation. It has been shown that
image-specific adaptation of a pre-trained Gaussian Mixture
Model (GMM) helps to improve segmentation accuracy [6].
However, transitioning from simple GMMs to powerful but
complex CNNs in this context has not yet been demonstrated.
Third, fast inference and memory efficiency are demanded
for interactive segmentation. They can be relatively easily
achieved for 2D images, but become much more problematic
for 3D images. For example, DeepMedic [7] works on 3D
local patches to reduce memory requirements but results in a
slow inference. HighRes3DNet [8] works on 3D whole images
with relatively fast inference but needs a large amount of GPU
memory, leading to high hardware requirements. To make a
CNN-based interactive segmentation method efficient to use,
enabling CNNs to respond quickly to user interactions and
to work on a machine with limited GPU resources (e.g.,
a standard desktop PC or a laptop) is desirable. DeepIGeoS [5]
combines CNNs with user interactions and has demonstrated
good interactivity. However, it has a lack of adaptability to
unseen image contexts.
This paper presents a new framework to address these
challenges for deep learning-based interactive segmentation.
To generalize to previously unseen objects, we propose a
bounding-box-based segmentation pipeline that extracts the
foreground from a given region of interest, and design a 2D
and a 3D CNN with good compactness to avoid over-fitting.
To make CNNs adaptive to different test images, we propose
image-specific fine-tuning. In addition, our networks consider
a balance among receptive field, inference time and memory
efficiency so as to be responsive to user interactions and have
low requirements in terms of GPU resources.
A. Contributions
The contributions of this work are four-fold. First, we pro-
pose a novel deep learning-based framework for interactive 2D
and 3D medical image segmentation by incorporating CNNs
into a bounding box and scribble-based binary segmentation
pipeline. Second, we propose image-specific fine-tuning to
adapt a CNN model to each test image independently. The
fine-tuning can be either unsupervised (without additional user
interactions) or supervised by user-provided scribbles. Third,
we propose a weighted loss function considering network
and interaction-based uncertainty during the image-specific
fine-tuning. Fourth, we present the first attempt to employ
CNNs to deal with previously unseen objects (a.k.a. zero-
shot learning) in the context of image segmentation. The
proposed framework does not require all the object classes
to be annotated for training. Thus, it can be applied to new
organs or new segmentation protocols directly.
B. Related Works
1) CNNs for Image Segmentation:
For natural image
segmentation, FCN [9] and DeepLab [10] are among the
state-of-the-art performing methods. For 2D biomedical
image segmentation, efficient networks such as U-Net [11],
DCAN [12] and Nabla-net [13] have been proposed. For
3D volumes, patch-based CNNs have been proposed for
segmentation of the brain tumor [7] and pancreas [14], and
more powerful end-to-end 3D CNNs include V-Net [15],
HighRes3DNet [8], and 3D deeply supervised network [16].
2) Interactive Segmentation Methods:
A wide range of
interactive segmentation methods have been proposed [2].
Representative methods include Graph Cuts [17], Random
Walks [18] and GeoS [19]. Machine learning has been popu-
larly used to achieve high accuracy and interaction efficiency.
For example, GMMs are used by GrabCut [20] to segment
color images. Online Random Forests (ORFs) are employed
by Slic-Seg [21] for placenta segmentation from fetal Magnetic
Resonance images (MRI). In [22], active learning is used
to segment 3D Computed Tomography (CT) images. They
have achieved more accurate segmentations with fewer user
interactions than traditional interactive segmentation methods.
To combine user interactions with CNNs, DeepCut [3] and
ScribbleSup [23] propose to leverage user-provided bounding
boxes or scribbles, but they employ user interactions as sparse
annotations for the training set rather than as guidance for
dealing with test images. 3D U-Net [24] learns from anno-
tations of some slices in a volume and produces a dense
3D segmentation, but is not responsive to user interactions.
In [4], an FCN is combined with user interactions for 2D RGB
image segmentation, without adaptation for medical images.
DeepIGeoS [5] uses geodesic distance transforms of scribbles
as additional channels of CNNs for interactive segmentation,
but cannot deal with previously unseen object classes.
3) Model Adaptation:
Previous learning-based interactive
segmentation methods often employ image-specific models.
For example, GrabCut [20] and Slic-Seg [21] learn from the
target image with GMMs and ORFs, respectively, so that they
can be well adapted to the specific target image. Learning a
model from a training set with image-specific adaptation in the
testing stage has also been used to improve the segmentation
performance. For example, an adaptive GMM has been used to
address the distribution mismatch between the training and test
images [6]. For CNNs, fine-tuning [25] is used for domain-
wise model adaptation to address the distribution mismatch
between different training sets. However, to the best of our
knowledge, this paper is the first work to propose image-
specific model adaptation for CNNs.
II. M
ETHOD
The proposed interactive framework with Bounding box and
Image-specific Fine-tuning-based Segmentation (BIFSeg) is
depicted in Fig. 1. To deal with different (including previously
unseen) objects in a unified framework, we propose to use
a CNN that takes as input the content of a bounding box
of one instance and gives a binary segmentation for that
instance. In the testing stage, the user provides a bounding

1564 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 37, NO. 7, JULY 2018
Fig. 1. The proposed Bounding box and Image-specific Fine-tuning-based Segmentation (BIFSeg). 2D images are shown as examples. During
training, each instance is cropped with its bounding box, and the CNN is trained for binary segmentation. In the testing stage, image-specific
fine-tuning with optional scribbles and a weighted loss function is used. Note that the object class (e.g. maternal kidneys) for testing may have not
been present in the training set.
Fig. 2. Our resolution-preserving networks with dilated convolution for 2D segmentation (a) and 3D segmentation (b). The numbers in dark blue
boxes denote convolution kernel sizes and numbers of output channels, and the numbers on the top of these boxes denote dilation parameters.
box, and BIFSeg extracts the region inside the bounding
box and feeds it into the pre-trained CNN with a forward
pass to obtain an initial segmentation. This is based on
the fact that our CNNs are designed and trained to learn
some common features, such as saliency, contrast and hyper-
intensity, across different objects, which helps to generalize to
unseen objects. Then we use unsupervised (without additional
user interactions) or supervised (with user-provided scribbles)
image-specific fine-tuning to further refine the segmentation.
This is because there is likely a mismatch between the
common features learned from the training set and those
in (previously unseen) test objects. Therefore, we use fine-
tuning to leverage image-specific features and make our CNNs
adaptive to a specific test image for better segmentation. Our
framework is general, flexible and can handle both 2D and
3D segmentations with few assumptions of network structures.
In this paper, we choose to use the state-of-the-art network
structures proposed in [5] for their compactness and efficiency.
The contribution of BIFSeg is nonetheless largely different
from [5] as BIFSeg focuses on segmentation of previously
unseen object classes and fine-tunes the CNN model on the
fly for image-wise adaptation that can be guided by user
interactions.
A. CNN Models
For 2D images, we adopt the P-Net [5] for bounding
box-based binary segmentation. The network is resolution-
preserving using dilated convolution [10]. As shown in
Fig. 2(a), it consists of six blocks with a receptive field
of 181×181. The first five blocks have dilation parameters
of 1, 2, 4, 8 and 16, respectively, so they capture features at
different scales. Features from these five blocks are concate-
nated and fed into block6 that serves as a classifier. A softmax
layer is used to obtain probability-like outputs. In the testing
stage, we update the model based on image-specific fine-
tuning. To ensure efficient fine-tuning and fast response to

WANG
et al.
: INTERACTIVE MEDICAL IMAGE SEGMENTATION USING DEEP LEARNING 1565
user interactions, we only fine-tune parameters of the classifier
(block6). Thus, features in the concatenation layer for the test
image can be stored before the fine-tuning.
For 3D images, we use a network extended from P-Net,
as shown in Fig. 2(b). It considers a trade-off among receptive
field, inference time and memory efficiency. The network has
an anisotropic receptive field of 85×85×9. Compared with
slice-based networks, it employs 3D contexts. Compared with
large isotropic 3D receptive fields [8], it has less memory con-
sumption [26]. Besides, anisotropic acquisition is often used in
Magnetic Resonance (MR) imaging. We use 3×3×3kernels
in the first two blocks and 3×3×1 kernels in block3 to block5.
Similar to P-Net, we fine-tune the classifier (block6) with pre-
computed concatenated features. To save space for storing
the concatenated features, we use 1×1×1 convolutions to
compress the features in block1 to block5 and then concatenate
them. We refer to this 3D network with feature compression
as PC-Net.
B. Training of CNNs
The training stage for 2D/3D segmentation is shown in the
first row of Fig. 1. Consider a K -ary segmentation training set
T ={(X
1
, Y
1
), (X
2
, Y
2
),...} where X
p
is one training image
and Y
p
is the corresponding label map. The label set of T is
{0, 1, 2,...,K 1} with 0 being the background label. Let N
k
denote the number of instances of the kth object type, so the
total number of instances is
ˆ
N =
k
N
k
. Each image X
p
can
have instances of multiple object classes. Suppose the label of
the qth instance in X
p
is l
pq
, Y
p
is converted into a binary
image Y
pq
based on whether the value of each pixel in Y
p
equals to l
pq
. The bounding box B
pq
of that training instance
is automatically calculated based on Y
pq
and expanded by
a random margin in the range of 0 to 10 pixels/voxels.
X
p
and Y
pq
are cropped based on B
pq
. Thus, T is converted
into a cropped set
ˆ
T ={(
ˆ
X
1
,
ˆ
Y
1
), (
ˆ
X
2
,
ˆ
Y
2
),...} with size
ˆ
N and label set {0, 1} where 1 is the label of the instance
foreground and 0 the background. With
ˆ
T , the CNN model
(e.g., P-Net or PC-Net) is trained to extract the target from
its bounding box, which is a binary segmentation problem
irrespective of the object type. A cross entropy loss function
is used for training.
C. Unsupervised and Supervised Image-Specific
Fine-Tuning
In the testing stage, let
ˆ
X denote the sub-image inside a
user-provided bounding box and
ˆ
Y be the target label of
ˆ
X.
The set of parameters of the trained CNN is θ . With the initial
segmentation
ˆ
Y
0
obtained by the trained CNN, the user may
provide (i.e., supervised) or not provide (i.e., unsupervised) a
set of scribbles to guide the update of
ˆ
Y
0
.LetS
f
and S
b
denote
the scribbles for foreground and background, respectively,
so the entire set of scribbles is S = S
f
S
b
.Lets
i
denote
the user-provided label of a pixel in the scribbles, then we
have s
i
= 1ifi S
f
and s
i
= 0ifi S
b
. We minimize an
objective function that is similar to GrabCut [20] but we use
P-Net or PC-Net instead of a GMM:
arg min
ˆ
Y
E(
ˆ
Y) =
i
φ(ˆy
i
|
ˆ
X)+ λ
i, j
ψ(ˆy
i
, ˆy
j
|
ˆ
X)
subject to y
i
= s
i
if i S (1)
where E(
ˆ
Y ) is constrained by user interactions if S is
not empty. φ and ψ are the unary and pairwise energy
terms, respectively. λ is the weight of ψ. An unconstrained
optimization of an energy similar to E wasusedin[3]for
weakly supervised learning. In that work, the energy was based
on the probability and label map of all the images in a training
set, which was a different task from ours, as we focus on a
single test image. We follow a typical choice of ψ [17]:
ψ(ˆy
i
, ˆy
j
|
ˆ
X) =[ˆy
i
y
j
]exp
(
ˆ
X(i)
ˆ
X( j))
2
2σ
2
·
1
d
ij
(2)
where [·] is 1 if ˆy
i
y
j
and 0 otherwise. d
ij
is the Euclidean
distance between pixel i and pixel j. σ controls the effect of
intensity difference. φ is defined as:
φ(ˆy
i
|
ˆ
X) =−logP( ˆy
i
|
ˆ
X)
=−
ˆy
i
logp
i
+ (1 −ˆy
i
)log(1 p
i
)
(3)
where P( ˆy
i
|
ˆ
X) is the probability given by softmax output
of the CNN, and p
i
= P( ˆy
i
= 1|
ˆ
X) is the probability of
pixel i belonging to the foreground.
The optimization of Eq. (1) can be decomposed into steps
that alternatively update the segmentation label
ˆ
Y and network
parameters θ [3], [20]. In the label update step, we fix θ
and solve for
ˆ
Y , and Eq. (1) becomes a Conditional Random
Field (CRF) problem:
arg min
ˆ
Y
E) =
i
φ(ˆy
i
|
ˆ
X)+ λ
i, j
ψ(ˆy
i
, ˆy
j
|
ˆ
X)
subject to y
i
= s
i
if i S (4)
For implementation ease, the constrained optimization in
Eq. (4) is converted to an unconstrained equivalent:
arg min
ˆ
Y
i
φ
( ˆy
i
|
ˆ
X)+ λ
i, j
ψ(ˆy
i
, ˆy
j
|
ˆ
X)
(5)
φ
( ˆy
i
|
ˆ
X) =
+∞ if i S and ˆy
i
= s
i
0ifi S and ˆy
i
= s
i
logP( ˆy
i
|
ˆ
X) otherwise
(6)
Since θ and therefore φ
are fixed, and ψ is submodular,
Eq. (5) can be solved by Graph Cuts [17]. In the network
update step, we fix
ˆ
Y and solve for θ:
arg min
θ
E(
ˆ
Y ) =
i
φ(ˆy
i
|
ˆ
X)
subject to y
i
= s
i
if i S (7)
Thanks to the constrained optimization in Eq. (4), the label
update step necessarily leads to ˆy
i
= s
i
for i S. Eq. (7) can

Figures
Citations
More filters
Journal ArticleDOI

Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation

TL;DR: This article provides a detailed review of the solutions above, summarizing both the technical novelties and empirical results, and compares the benefits and requirements of the surveyed methodologies and provides recommended solutions.
Journal ArticleDOI

U-Net and Its Variants for Medical Image Segmentation: A Review of Theory and Applications

TL;DR: A narrative literature review examines the numerous developments and breakthroughs in the U-net architecture and provides observations on recent trends, and discusses the many innovations that have advanced in deep learning and how these tools facilitate U-nets.
Journal ArticleDOI

Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks.

TL;DR: In this article, a test-time augmentation-based aleatoric uncertainty was proposed to analyze the effect of different transformations of the input image on the segmentation output, and the results showed that the proposed test augmentation provides a better uncertainty estimation than calculating the testtime dropout-based model uncertainty alone and helps to reduce overconfident incorrect predictions.
Journal ArticleDOI

Extracting Possibly Representative COVID-19 Biomarkers from X-ray Images with Deep Learning Approach and Image Data Related to Pulmonary Diseases.

TL;DR: The results suggest that training CNNs from scratch may reveal vital biomarkers related but not limited to the COVID-19 disease, while the top classification accuracy suggests further examination of the X-ray imaging potential.
References
More filters
Book ChapterDOI

U-Net: Convolutional Networks for Biomedical Image Segmentation

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Journal ArticleDOI

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

TL;DR: This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.
Proceedings ArticleDOI

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Journal ArticleDOI

A survey on deep learning in medical image analysis

TL;DR: This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year, to survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks.
Related Papers (5)
Frequently Asked Questions (10)
Q1. What are the contributions mentioned in the paper "Interactive medical image segmentation using deep learning with image-specific fine tuning" ?

To address these problems, the authors propose a novel deep learning-based interactive segmentation framework by incorporating CNNs into a bounding box and scribblebased segmentation pipeline. The authors propose image-specific fine tuning to make a CNN model adaptive to a specific test image, which can be either unsupervised ( without additional user interactions ) or supervised ( with additional scribbles ). The authors also propose a weighted loss function considering network and interaction-based uncertainty for the fine tuning. The authors applied this framework to two applications: Manuscript received October 11, 2017 ; revised January 4, 2018 ; accepted January 5, 2018. This work was supported in part by the Wellcome Trust under Grant WT101957, Grant WT97914, and Grant HICF-T4-275, in part by the EPSRC under Grant NS/A000027/1, Grant EP/H046410/1, Grant EP/J020990/1, Grant EP/K005278, and Grant NS/A000050/1, in part by the Wellcome/EPSRC under Grant 203145Z/16/Z, in part by the Royal Society under Grant RG160569, in part by the National Institute for Health Research University College London ( UCL ) Hospitals Biomedical Research Centre, in part by the Great Ormond Street Hospital Charity, in part by UCL ORS and GRS, in part by NVIDIA, and in part by Emerald, a GPU-accelerated High Performance Computer, made available by the Science and Engineering South Consortium operated in partnership with the STFC RutherfordAppleton Laboratory. Color versions of one or more of the figures in this paper are available online at http: //ieeexplore. 

With T̂ , the CNN model (e.g., P-Net or PC-Net) is trained to extract the target from its bounding box, which is a binary segmentation problem irrespective of the object type. 

The authors performed data splitting at patient level and used images from 10, 2, 6 patients for training, validation and testing, respectively. 

Suppose the label of the qth instance in X p is l pq , Yp is converted into a binary image Ypq based on whether the value of each pixel in Yp equals to l pq . 

The authors randomly selected T1c and FLAIR images of 19, 25 patients with a single scan for validation and testing, respectively, and used T1c images of the remaining patients for training. 

To deal with organs at different scales, the authors resized the input of P-Net so that the4DeepMedic and HighRes3DNet were implemented in http://niftynet.iominimal value of width and height was 96 pixels. 

To deal with different organs and different modalities, the region inside a bounding box was normalized by the mean value and standard deviation of that region, and then used as the input of the CNNs. 

1. To deal with different (including previously unseen) objects in a unified framework, the authors propose to use a CNN that takes as input the content of a bounding box of one instance and gives a binary segmentation for that instance. 

The authors used a paired Student’s t-test to determine whether the performance difference between two segmentation methods was significant [30]. 

To address this problem, BIFSeg allows optional supervised fine-tuning that leverages user interactions to achieve higher robustness and accuracy.