scispace - formally typeset
Open AccessJournal ArticleDOI

End-to-End Adversarial Retinal Image Synthesis

Reads0
Chats0
TLDR
This paper proposes to implement an adversarial autoencoder for the task of retinal vessel network synthesis, and uses the generated vessel trees as an intermediate stage for the generation of color retinal images, which is accomplished with a generative adversarial network.
Abstract
In medical image analysis applications, the availability of the large amounts of annotated data is becoming increasingly critical. However, annotated medical data is often scarce and costly to obtain. In this paper, we address the problem of synthesizing retinal color images by applying recent techniques based on adversarial learning. In this setting, a generative model is trained to maximize a loss function provided by a second model attempting to classify its output into real or synthetic. In particular, we propose to implement an adversarial autoencoder for the task of retinal vessel network synthesis. We use the generated vessel trees as an intermediate stage for the generation of color retinal images, which is accomplished with a generative adversarial network. Both models require the optimization of almost everywhere differentiable loss functions, which allows us to train them jointly. The resulting model offers an end-to-end retinal image synthesis system capable of generating as many retinal images as the user requires, with their corresponding vessel networks, by sampling from a simple probability distribution that we impose to the associated latent space. We show that the learned latent space contains a well-defined semantic structure, implying that we can perform calculations in the space of retinal images, e.g., smoothly interpolating new data points between two retinal images. Visual and quantitative results demonstrate that the synthesized images are substantially different from those in the training set, while being also anatomically consistent and displaying a reasonable visual quality.

read more

Content maybe subject to copyright    Report

0278-0062 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2759102, IEEE
Transactions on Medical Imaging
TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. X, JULY 2017 1
End-to-end Adversarial Retinal Image Synthesis
Pedro Costa
, Adrian Galdran
, Maria Ines Meyer,
Meindert Niemeijer, Michael Abr
`
amoff, Ana Maria Mendonc¸a, and Aur
´
elio Campilho
Abstract—In medical image analysis applications, the availabil-
ity of large amounts of annotated data is becoming increasingly
critical. However, annotated medical data is often scarce and
costly to obtain. In this paper, we address the problem of
synthesizing retinal color images by applying recent techniques
based on adversarial learning. In this setting, a generative model
is trained to maximize a loss function provided by a second
model attempting to classify its output into real or synthetic. In
particular, we propose to implement an adversarial autoencoder
for the task of retinal vessel network synthesis. We use the
generated vessel trees as an intermediate stage for the generation
of color retinal images, which is accomplished with a Generative
Adversarial Network. Both models require the optimization of
almost everywhere differentiable loss functions, which allows us
to train them jointly. The resulting model offers an end-to-end
retinal image synthesis system capable of generating as many
retinal images as the user requires, with their corresponding ves-
sel networks, by sampling from a simple probability distribution
that we impose to the associated latent space. We show that the
learned latent space contains a well-defined semantic structure,
implying that we can perform calculations in the space of retinal
images, e.g., smoothly interpolating new data points between two
retinal images. Visual and quantitative results demonstrate that
the synthesized images are substantially different from those in
the training set, while being also anatomically consistent and
displaying a reasonable visual quality.
Index Terms—Retinal Image Synthesis, Retinal Image Analy-
sis, Generative Adversarial Networks, Adversarial Autoencoders.
I. INTRODUCTION
T
HE ability to generate meaningful synthetic information
is highly desirable for many computer-aided medical
applications, where annotated data is often scarce and costly
to obtain. A wide availability of such data may allow re-
searchers to develop and validate more sophisticated com-
putational techniques. This pressing need for annotated data,
particularly images, has largely increased with the advent
of deep neural networks, which are progressively becoming
the standard approach in most machine learning tasks [1].
However, these techniques require large amounts of data to
be trained. Therefore, the problem of medical data generation
is of great interest, and as such, it has been deeply studied in
P. Costa
, A. Galdran
, M. I. Meyer, A.M. Mendonc¸a, and A. Campilho
are with INESC TEC Porto, Portugal; e-mails: {pvcosta, adrian.galdran,
maria.i.meyer}@inesctec.pt
M. Niemeijer is with IDx LLC; e-mail: niemeijer@eyediagnosis.net
M. Abramoff is with the Stephen A. Wynn Institute for Vision Research,
University of Iowa; e-mail: michael-abramoff@uiowa.edu
A.M. Mendonc¸a and A. Campilho are also with Faculdade de Engenharia,
Universidade do Porto, Portugal; e-mail: {amendon, campilho}@fe.up.pt
Corresponding Authors. Equal Contribution.
Copyright (c) 2017 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending a request to pubs-permissions@ieee.org.
recent years [2]. Nevertheless, the realistic synthesis of high-
quality medical data still remains a widely unsolved challenge.
Most medical image generation methods follow two main
strategies. The most conventional approach endeavors to for-
mulate a mathematical model of the observed data. These
models can range from simple digital phantoms [3] to more
complex methodologies attempting to mimic anatomical and
physiological medical knowledge [4]. In combination with
the modeling of relevant characteristics of the different ac-
quisition devices, these techniques can generate new high-
quality images by sampling an appropriate parameter space.
This approach is often referred to as image simulation.
In recent years the data-driven approach of image synthesis
has started gaining popularity. In this context, the intrinsic
variability within a large pool of training images is extracted
by means of machine learning techniques. Ideally, the model is
able to learn the underlying probability distribution that defines
the manifold of real images. Once trained, the same system
can be sampled to output new images that are likely to lie
on that manifold, i.e. realistic synthetic images. This approach
has recently been successfully applied to improve classification
of multi-sequence MRI with missing/corrupted sequences [5],
to estimate cross-modality transformations [6], or to perform
knowledge transfer by learning features invariant to the MR
scanning protocol [7].
In the retinal image analysis field, in [8] the authors propose
an algorithm for the generation of the retinal background and
the fovea, and a separate technique for the generation of the
optical disk. For the former, the method relies on the construc-
tion of a large dictionary of small vessel-free image patches.
These patches are extracted from a dataset of co-registered
real images and clustered together, before tiling them in a
consistent manner. For the latter, a parametric intensity model
is proposed, with the parameters being estimated over a dataset
of real images.
The work in [2] is complementary to [8], since it focuses
on the generation of the vascular network only. The authors
propose a method to generate realistic retinal vessel trees.
The parameters controlling the geometry are learned from
real vessel trees. The method also enforces meaningful vessel
orientation and calibers by following a physical bifurcation
law describing the correct oxygenation of the retinal surface
[9]. The output of both approaches can then be superimposed,
allowing for the generation of high-quality large-resolution
images. However, concatenation of both techniques results in
a considerably complex computational pipeline, relying on
sensitive sub-processes such as image registration, patch-to-
image stitching or image blending.
Recently, a purely data-driven approach has been proposed
in [10]. It consists of a simple application of adversarial

0278-0062 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2759102, IEEE
Transactions on Medical Imaging
TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. X, JULY 2017 2
p
G
Adversarial Autoencoder
q
p
Real Pairs
Generative Adversarial Network
Synthetic Pairs
D
G
Fig. 1. Overview of our approach. The pair (p, q) is an adversarial autoencoder trained to reconstruct retinal vessel maps. The pair (G, D) is a Generative
Adversarial Network trained to generate color retinal images out of vessel maps. Once the model is trained, the system can generate a new retinal image and
an associated vessel map. The only required input is sampling a distribution p, which is enforced to follow a simple multi-dimensional Gaussian distribution
during training by means of an adversarial loss.
learning methods [11], in which a model is trained on pairs
of real vessel networks and their corresponding retinal fundus
images. The goal is to learn a transformation between them,
and once trained, this technique can generate a plausible retinal
image out of a pre-existing binary vessel tree. Unfortunately,
this approach has been shown to have a relevant drawback: the
model is dependent on the availability of a pre-existing vessel
network in order to generate a new retinal image. The vessel
networks employed for generating images were obtained by
application of an independent vessel segmentation method to
real retinal images. If the original image is defocused, the
retrieved vessel tree will be undercomplete, and the obtained
synthetic image will contain visual artifacts [10].
In this work, we substantially improve upon [10] by remov-
ing the dependence of the model on the previous existence of
a retinal vessel tree. This is achieved by building an autoen-
coder that can learn to generate realistic retinal vessel trees.
Moreover, by minimizing an adversarial loss, the autoencoder
allows to generate vessel networks by simply sampling a multi-
dimensional Normal distribution. A schematic representation
of our approach is depicted in Fig. 1.
It is worth noting that it is theoretically possible to perform a
separate training of the retinal vessel synthesis module and the
vessel network to retinal image mapping. However, since both
tasks are closely related, it is more natural to train both systems
jointly. We achieve this by combining the loss functions
associated to each task in a more general framework. The
resulting method presents several advantages over previously
proposed approaches:
1) The adversarial learning framework allows us to model
the underlying distribution of plausible retinal images
only from training data, without manually interacting
with parameters controlling complex mathematical mod-
els of the retinal anatomy.
2) Once trained, the model improves upon [10] by allowing
to generate any amount of realistic retinal images, with
associated vessel trees, in an efficient manner.
3) Unlike [2], [8], we generate separate parts of the retinal
anatomy through the same process, avoiding the combi-
nation of complex image processing tasks.
The proposed framework provides an effective end-to-end
retinal image synthesis tool, capable of producing realistic eye
fundus images and associated vessel networks with a simple
sampling procedure. We provide objective evaluation of both
the quality and the applicability of our synthetic images. Even
if the generated images and associated vessel maps are of
low resolution, suffer from small inconsistencies, and may
still not be used to train more complex retinal image analysis
algorithms, we show them to be useful for learning a retinal
vessel segmentation model with reasonable performance. This
represents a promising first step towards achieving synthetic
data that can be used in more complex automatic retinal image
analysis applications.
II. ADVERSARIAL IMAGE GENERATION
A. Vessel Network to Retinal Image Translation
The research herein reported considers retinal color image
generation out of an existing vessel network as an image-
to-image translation problem, learning a mapping G from
a binary vessel map v into another representation r [12].
Since many retinal images could share a similar binary vessel
network due to variations in color, texture, illumination, etc.,
in our case G is a multi-valued mapping G : v {r
1
, ..., r
m
}.
As such, learning G is an ill-posed problem and some uncer-
tainty is present.

0278-0062 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2759102, IEEE
Transactions on Medical Imaging
TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. X, JULY 2017 3
D
Draw samples
from p
data
(v)
Draw samples
from p
data
(v, r)
Neg.
example
Synthetic eye
fundus image
Pos.
example
G
Fig. 2. The discriminator D learns to distinguish between real pairs of vessel
networks and eye fundus images (v, r) and synthetic pairs. The generator G
maps an input vessel network v to a color eye fundus image r.
Connected to this is the choice of the objective function to
be minimized while learning G. Training a model to minimize
the L
2
distance between G(v
i
) and r
i
for a collection of
training pairs given by {(r
1
, v
1
), . . . , (r
n
, v
n
)} will produce
low-quality results with lack of detail [13], due to the model
selecting an average of many potentially valid representations.
Recent ideas based on Generative Adversarial Networks
(GANs) [11] are able to overcome this problem by learning
a more suitable loss function directly from data [12]. The
underlying strategy of adversarial methods consists of emulat-
ing a competition, in which the mapping G, called Generator,
attempts to produce realistic images, while a second player, the
Discriminator D, is trained to distinguish the output generated
by G from real examples. Here, both G and D are neural
networks, and act as adversaries, since the goal of G is to
maximize the misclassification error of D, while Ds objective
is to beat G by learning to identify generated images. As in
[11], the adversarial loss, driving the learning of G and D, is:
L
adv
(G, D) = E
v ,rp
data
(v ,r)
[log(D(v, r))] (1)
+ E
v p
data
(v)
[log(1 D(v, G(v)))],
where E
v ,rp
data
(v ,r)
is the expectation over the pairs
(v, r), sampled from the joint data distribution of real pairs
p
data
(v, r) and p
data
(v) is the real vessel trees distribution.
The Discriminator’s objective is to maximize (1), while the
Generator’s goal is to minimize it. Therefore, it is D that
provides the training signal to G, replacing more conventional
loss functions.
Although minimizing the above loss function induces G to
produce visually sharp results, recent work in [12], [14] has
shown that combining Eq. (1) with a global L
1
loss provides
more consistent results. Thus, the loss function to optimize
becomes:
L
im2im
(G, D) = L
adv
(G, D) (2)
+ λE
v ,rp
data
(v ,r)
[||r G(v)||
1
],
where λ balances the contribution of the two losses. The
discriminator’s objective is in this case local, i.e., it attempts
to discriminate N × N image regions as real or generated,
but the goal of G is supplemented with a requirement not
only to generate realistically looking images but also images
that preserve a global regularity. Since the L
1
loss guarantees
that the output of G is globally consistent, D can concentrate
on modeling only high frequency structures. Thus, while D
penalizes locally over-smooth image regions, the L
1
loss
promotes the consistency of global visual features, such as
the presence of a single optical disk and macula in the image.
An overview of this model is shown in Figure 2.
B. Adversarial Autoencoders for Vessel Trees Generation
Ideally, an end-to-end retinal image synthesis system should
also generate realistic vessel networks. Such a model would
also learn from data and generate as many vessel networks
as the user requires, with a high degree of variability, while
remaining anatomically plausible. In this work, we propose to
achieve this goal by means of an adversarial autoencoder.
Autoencoders are models trained to reconstruct their input.
They are composed of two submodels: 1) an encoder Q, that
maps a training example v to a latent (hidden) representation
z = Q(v), and 2) a decoder P , mapping z to an output that
aims to be a replica of the input. An autoencoder can thus be
trained on a training set of vessel trees v, in order to minimize
a reconstruction objective L
rec
(Q, P ).
Modern autoencoders feature deep neural networks both for
the encoder and the decoder, and introduce stochasticity by
considering probability distributions instead of deterministic
mappings Q, P . Here we define both the decoder and the
autoencoder to be conditional probability distributions, q(z|v)
and p(v|z).
Autoencoders can be employed to learn useful abstractions
of the data through their latent representations. These can then
be applied in other contexts, e.g. data compression or semi-
supervised learning. However, in the above form, the trivial
mapping that associates each vessel tree example v in the
training set to itself can succeed in minimizing the recon-
struction loss while failing to learn any valuable abstraction.
To avoid this, several types of regularization can be added to
the loss, e.g. minimizing L
rec
(q, p) while requiring the latent
representation to be sparse [15].
However, even when properly regularized, an autoencoder
still has no ability to fulfill the goal of generating new
elements close to the true data manifold, since we do not
have knowledge of the underlying probability distribution q(z)
governing the space of latent representations. This prevents us
from sampling it in order to obtain a new code z that can then
be mapped by p to a retinal image.
To achieve the twofold goal of turning the autoencoder into
a generative model while regularizing it in such a way that
it can learn interesting representations of retinal vessel trees,
we apply the adversarial autoencoder framework, proposed
in [16]. In this case, the autoencoder learning process is
embedded in an adversarial competition, similar to the one
described in the previous section. The goal of the autoencoder
is to minimize the reconstruction error, but at the same time,
we attempt to gain control on the probabilistic structure of q(z)
by matching it to a prior distribution p(z) that can be easily
sampled (e.g. a multi-dimensional unit normal distribution).
The encoding distribution q(z|v) in the autoencoder is the
generator component of the adversarial game. This consists
of a neural network enforced to produce latent representations

0278-0062 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2759102, IEEE
Transactions on Medical Imaging
TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. X, JULY 2017 4
Draw samples
from p(z)
D
code
z
v
Synthetic vessel tree
q(z|v) p(v|z)
Neg.
example
Pos.
example
Fig. 3. At first, the discriminator D
code
is trained to distinguish between
samples from the given prior p(z) and latent representations of training vessel
networks v from the encoder q(z|v). Then, the autoencoder is trained to
minimize the reconstruction loss between its output and v and, at the same
time, maximize the misclassification of D
code
.
z following the pre-specified prior distribution p(z). This is
achieved via the maximization of the classification error of
the discriminator module D
code
, which is trained to classify
codes z sampled from q(z) according to whether they come
from the true prior distribution p(z) or not. Figure 3 depicts
a schematic representation of this process.
The autoencoder training is performed by gradient descent,
with the gradients computed by standard backpropagation. The
optimization process consists of two alternate stages. In the
first step, the discriminator is updated to distinguish samples
generated by q from those coming from the prior distribution
p(z). This is achieved by maximizing the following loss:
L
code
(D
code
, q) = E
zp(z)
[log(D
code
(z))] (3)
+ E
v p
data
(v)
[log(1 D
code
(q(z|v)))].
In addition, both the encoder and the decoder weights are
updated to minimize the reconstruction error and, at the same
time, to maximize the classification error of the discriminator.
In this way, the complete loss function that drives the learning
of the adversarial autoencoder is a combination of both losses:
L
AAE
(D
code
, q, p) = L
code
(D
code
, q) + γL
rec
(q, p), (4)
where γ weights the importance of the two losses. The goal
of q and p is to minimize L
AAE
, while D
code
attempts
to maximize it. When the optimization process reaches an
equilibrium point of Eq. (4), the decoder p defines a generative
model than can be employed to generate new vessel trees
starting from a sample of the imposed prior p(z) on the latent
distribution.
C. From Random Samples to Retinal Images
The vessel-to-retinal image model presented in section II-A
can map a vessel tree v to a realistic eye fundus image r, while
the adversarial autoencoder defined in the previous section
generates a vessel network v from a random sample z coming
from a simple probability distribution. When both models are
combined, we obtain a single system capable of generating a
vessel map and a retinal image r from a random sample z.
However, both sub-tasks are deeply interconnected. The
generation of vessel networks of better quality will lead to
a more realistic retinal image r. Conversely, if the generated
image r is able to deceive the discriminator in such a way that
it classifies it as plausible, it means that the vessel network v
contained in it also needs to be plausible.
Following this argument, we build a single joint model, in
which both sub-systems are trained at the same time, instead
of independently. In our case, the loss functions defining both
models are differentiable almost everywhere. Accordingly, to
build a joint loss function we can directly combine them by
simple addition. Nonetheless, we need to redefine the image-
to-image losses in Eqs. (1) and (2), so that they take the output
of the adversarial autoencoder as the input to G:
e
L
adv
(G, D) = E
v ,rp
data
(v ,r)
[log(D(v, r))] (5)
+ E
v p
data
(v )
[log(1 D(˜v, G(˜v)))],
e
L
im2im
(G, D) =
e
L
adv
(G, D) (6)
+ λE
v ,rp
data
(v ,r)
[||r G(˜v)||
1
],
where ˜v = p(q(v)) is the vessel tree generated by the
adversarial autoencoder. With this modification, both loss
functions can be linearly combined into a global one:
L(G, D, D
code
, q, p) =
e
L
im2im
(G, D) (7)
+ L
AAE
(D
code
, q, p).
In this formulation, the goal of G, q and p is to minimize
the loss function in Eq. (7), while D and D
code
attempt to
maximize it. The main advantage of this joint training scheme
is that the discriminator D also provides with a better loss
function for the adversarial autoencoder. The decoder p needs
to produce realistic looking vessels in order to maximize the
misclassification of D. Also, part of the training signal that
arrives to p flows through G. As a consequence, the adversarial
autoencoder also benefits when the generator produces realistic
eye fundus images. A schematic representation of the whole
model is shown in Figure 4.
D. Understanding the Latent Space
After training the model as described above, it is possible
to sample from p(z) in order to produce a synthetic pair of
vessel network and eye fundus images. Nonetheless, the latent
space might contain zones that are not on the manifold learned
during training. This implies that points sampled from p(z)
that are far from the latent representations of all the training
examples might produce pairs that are not plausible (e.g. an
eye fundus image with two optical disks).
Fortunately, there are techniques that allow to sample from
generative models in order to avoid these cases. For instance,
given two real vessel network images v
1
, v
n
, we may apply
the encoder q to obtain their latent representations z
1
, z
n
,
and interpolate between these two known locations in the
latent space to obtain a smooth transition between two images,
{z
2
, ..., z
n1
}. If the model did not overfit the training data,
the vessel trees obtained when decoding these intermediate
representations, i.e., {q(z
2
), ..., q(z
n1
)}, will be plausible
vessel networks that are not present on the set of real vessel
networks in which the model was trained with.

0278-0062 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2759102, IEEE
Transactions on Medical Imaging
TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. X, JULY 2017 5
D
Draw samples
from p
data
(v, r)
D
code
Draw samples
from p(z)
G
q
p
z
Draw samples
from p
data
(v)
Synthetic
vessel tree
Synthetic eye
fundus image
Pos.
example
Neg.
example
Pos.
example
Neg.
example
Fig. 4. The model consists of an adversarial autoencoder followed by a conditional Generative Adversarial Network. The adversarial autoencoder and the
conditional GAN are trained to minimize the distance between their output and the training pair (v, r) and, at the same time, maximize the misclassification
of D and D
code
. Simultaneously, D learns to distinguish between real pairs (v, r) and synthetic pairs and D
code
learns to distinguish between latent
representations produced by the encoder q and samples from the given prior p(z).
z
1
z
2
z
3
z
0
q(v
0
|z
0
)
G(v
0
)
G(v
3
)
q(v
3
|z
3
)
Fig. 5. An example of a spherical interpolation between two points z
0
and
z
3
from the latent space.
To find a correct path linking z
1
to z
n
, typically, linear
interpolation is applied. However, this is not recommendable
when a Gaussian prior is used [17], as is our case. Linearly in-
terpolated latent representations traverse points that are indeed
unlikely given this prior. Instead, it has been shown that the
application of a spherical interpolation (slerp) [17] produces
better results. This is defined by the following equation:
slerp(z
1
, z
n
, t) =
sin((1 t)θ)
sin(θ)
z
1
+
sin()
sin(θ)
z
n
, (8)
where θ is the angle between z
1
and z
n
and t is a value ranging
from 0 to 1. For t = 0 the result of slerp is z
1
, whereas for
t = 1, it takes the value of z
n
. On every intermediate value,
the slerp interpolation outputs a point in a great arc from a
sphere containing z
1
and z
n
.
It is also well known that the latent space learned by an
autoencoder contains a semantic structure, which implies that
it allows us to perform meaningful vector space arithmetic. As
an example, in this vector space we are able to solve visual
analogies [18]. An analogy is defined as a 4-tuple:
z
1
: z
2
:: z
3
: z
4
, (9)
which symbolizes that the relationship between z
1
and z
2
is
the same as the relationship between z
3
and z
4
.
For instance, we can analyze the result of applying the
same transformation between z
1
and z
2
to z
3
, which would
be written in analogies terminology as z
1
: z
2
:: z
3
: ?. If
the points z
i
lie in a space supporting vector arithmetic, this
analogy can be resolved by vector addition, simply computing:
z
4
= z
1
z
2
+ z
3
. (10)
For instance, given two images encoded by the latent factors
z
1
, z
2
, we can compute a transformation mapping one image to
the other by simply obtaining the vector given by
z
1
z
2
. After
this, we can apply that same transformation to a third image
by encoding it into a latent representation z
3
, and computing
z
4
= z
3
+
z
1
z
2
.
In the case of the retinal images synthesized by our model,
the latent space is embedded in an N–dimensional vector
space, where N is a hyperparameter of the model. This
provides a finer degree of control on the high-level properties
of the generated images. Applying the above technique, we can
isolate factors of variation in the associated space of vessel
trees defined by p
data
(v). In this case, we gain control on
global visual properties such as the position of the optical disk
or the amount of vessels. Visual examples of these concepts
are demonstrated in the Evaluation section below.
E. Implementation and Training
To be trained, the proposed model requires a dataset of
vessel trees and associated eye fundus image pairs. In or-
der to have enough training data, automatic retinal vessel
segmentations of the Messidor-1 dataset [19] were used.
As this dataset does not include manual segmentations, the
vessel tree was extracted using a U-Net model trained on
the DRIVE dataset [20]. This model achieved a 0.9755 AUC
on the DRIVE test set, a result aligned with state-of-the-
art methods for retinal vessel segmentation [21]–[24]. Further
details about the implementation are described in [10]. Then,
the model as trained on DRIVE was used to segment images
from the Messidor-1 dataset [19]. The obtained segmentations

Figures
Citations
More filters
Journal ArticleDOI

Generative adversarial network in medical imaging: A review.

TL;DR: A review of recent advances in medical imaging using the adversarial training scheme with the hope of benefiting researchers interested in this technique.
Journal ArticleDOI

Deep learning in medical imaging and radiation therapy.

TL;DR: The general principles of DL and convolutional neural networks are introduced, five major areas of application of DL in medical imaging and radiation therapy are surveyed, common themes are identified, methods for dataset expansion are discussed, and lessons learned, remaining challenges, and future directions are summarized.
Book ChapterDOI

Medical Image Synthesis for Data Augmentation and Anonymization Using Generative Adversarial Networks

TL;DR: In this article, the authors proposed a method to generate synthetic abnormal MRI images with brain tumors by training a generative adversarial network using two publicly available data sets of brain MRI, which demonstrated improved performance on tumor segmentation by leveraging the synthetic images as a form of data augmentation.
Journal ArticleDOI

Image Synthesis in Multi-Contrast MRI With Conditional Generative Adversarial Networks

TL;DR: In this article, a conditional generative adversarial network (GAN) was proposed to preserve intermediate-to-high frequency details via an adversarial loss, and it offers enhanced synthesis performance via pixel-wise and perceptual losses for registered multi-contrast images and a cycle-consistency loss for unregistered images.
Proceedings ArticleDOI

Contrastive learning of global and local features for medical image segmentation with limited annotations

TL;DR: This work proposes novel contrasting strategies that leverage structural similarity across volumetric medical images and a local version of the contrastive loss to learn distinctive representations of local regions that are useful for per-pixel segmentation in the semi-supervised setting with limited annotations.
References
More filters
Journal ArticleDOI

Generative Adversarial Nets

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Proceedings Article

Auto-Encoding Variational Bayes

TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Posted Content

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.
Posted Content

Image-to-Image Translation with Conditional Adversarial Networks

TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Related Papers (5)
Frequently Asked Questions (16)
Q1. What contributions have the authors mentioned in the paper "End-to-end adversarial retinal image synthesis" ?

In this paper, the authors address the problem of synthesizing retinal color images by applying recent techniques based on adversarial learning. In this setting, a generative model is trained to maximize a loss function provided by a second model attempting to classify its output into real or synthetic. In particular, the authors propose to implement an adversarial autoencoder for the task of retinal vessel network synthesis. The authors show that the learned latent space contains a well-defined semantic structure, implying that they can perform calculations in the space of retinal images, e. g., smoothly interpolating new data points between two retinal images. 

There are other limitations of the proposed approach that should be object of future research. Therefore, in the future, the introduction of clinical labels or annotations in the context of a large scale high-resolution data collection will be the first natural extension of their model, as a part of the more general goal of producing realistic and interesting synthetic images that can be employed to train models to solve more complex retinal image analysis tasks. Most of the above drawbacks can be attributed to the amount of available data and computational resource restrictions, and not to a limitation intrinsic to the proposed technique. In general, the availability of an additional set of training examples that can be efficiently generated on-demand could greatly impact the size and capacity of the models the retinal image analysis community train. 

The main advantage of this joint training scheme is that the discriminator D also provides with a better loss function for the adversarial autoencoder. 

The performance of the vessel segmentation model when trained with synthetic images is well above a baseline random model, and when allowed a fraction of false positives approximately greater than 0.35, the resulting system shows greater sensitivity than the same segmentation model trained with real images. 

The vessel network 2 also shows high plausibility, with the two main arcades displaying2Note that the synthetic vessel trees contain continuous values in [0,1], due to their model minimizing the cross-entropy loss. 

These vessel networks can be considered as probability maps, and thresholded appropriately if a binary vessel network is needed for some further application. 

Responses are aggregated into histograms, and a classifier is trained on these histograms’ counts in order to decide if a retinal image contains a reasonable visible proportion of such structures, under the assumption that the lack of presence of one of these clusters is an indicator of low quality, see [33] for the technical details. 

This is achieved via the maximization of the classification error of the discriminator module Dcode, which is trained to classify codes z sampled from q(z) according to whether they come from the true prior distribution p(z) or not. 

The vessel-to-retinal image model presented in section II-A can map a vessel tree v to a realistic eye fundus image r, while the adversarial autoencoder defined in the previous section generates a vessel network v from a random sample z coming from a simple probability distribution. 

In addition, both the encoder and the decoder weights are updated to minimize the reconstruction error and, at the same time, to maximize the classification error of the discriminator. 

the user is only required to sample from an N - dimensional predefined prior Gaussian distribution p(z) to generate a new pair of images. 

This model achieved a 0.9755 AUC on the DRIVE test set, a result aligned with state-of-theart methods for retinal vessel segmentation [21]–[24]. 

For this reason, only images from Messidor-1 with grades 0, 1 and 2 were used in this work, reducing the number of example pairs to 946. 

to report a quantitative image quality analysis, the authors employ the Image Structure Clustering (ISC) metric proposed in [33]. 

1) The set of images used during training, containing 614 real retinal images and corresponding vessel trees extracted from the Messidor1 database, denoted Training Real Dataset (TrainRD); 2) The held-out test set, that was not used during training, and contains 177 real retinal images and associated vessel trees, denoted Test Real Dataset (TestRD). 

In this way, the complete loss function that drives the learning of the adversarial autoencoder is a combination of both losses:LAAE(Dcode, q, p) = Lcode(Dcode, q) + γLrec(q, p), (4)where γ weights the importance of the two losses.