What contributions have the authors mentioned in the paper "End-to-end adversarial retinal image synthesis" ?

In this paper, the authors address the problem of synthesizing retinal color images by applying recent techniques based on adversarial learning. In this setting, a generative model is trained to maximize a loss function provided by a second model attempting to classify its output into real or synthetic. In particular, the authors propose to implement an adversarial autoencoder for the task of retinal vessel network synthesis. The authors show that the learned latent space contains a well-defined semantic structure, implying that they can perform calculations in the space of retinal images, e. g., smoothly interpolating new data points between two retinal images.

What have the authors stated for future works in "End-to-end adversarial retinal image synthesis" ?

There are other limitations of the proposed approach that should be object of future research. Therefore, in the future, the introduction of clinical labels or annotations in the context of a large scale high-resolution data collection will be the first natural extension of their model, as a part of the more general goal of producing realistic and interesting synthetic images that can be employed to train models to solve more complex retinal image analysis tasks. Most of the above drawbacks can be attributed to the amount of available data and computational resource restrictions, and not to a limitation intrinsic to the proposed technique. In general, the availability of an additional set of training examples that can be efficiently generated on-demand could greatly impact the size and capacity of the models the retinal image analysis community train.

What is the performance of the vessel segmentation model when trained with real images?

The performance of the vessel segmentation model when trained with synthetic images is well above a baseline random model, and when allowed a fraction of false positives approximately greater than 0.35, the resulting system shows greater sensitivity than the same segmentation model trained with real images.

Why does the vessel network 2 show high plausibility?

The vessel network 2 also shows high plausibility, with the two main arcades displaying2Note that the synthetic vessel trees contain continuous values in [0,1], due to their model minimizing the cross-entropy loss.

What is the way to use the vessel networks?

These vessel networks can be considered as probability maps, and thresholded appropriately if a binary vessel network is needed for some further application.

What is the metric used to determine the quality of the retinal image?

Responses are aggregated into histograms, and a classifier is trained on these histograms’ counts in order to decide if a retinal image contains a reasonable visible proportion of such structures, under the assumption that the lack of presence of one of these clusters is an indicator of low quality, see [33] for the technical details.

How does the adversarial autoencoder achieve this?

The vessel-to-retinal image model presented in section II-A can map a vessel tree v to a realistic eye fundus image r, while the adversarial autoencoder defined in the previous section generates a vessel network v from a random sample z coming from a simple probability distribution.

How does the user generate a new pair of images?

the user is only required to sample from an N - dimensional predefined prior Gaussian distribution p(z) to generate a new pair of images.

How many pairs of images were used in this study?

For this reason, only images from Messidor-1 with grades 0, 1 and 2 were used in this work, reducing the number of example pairs to 946.

What is the metric used to report a quantitative image quality analysis?

to report a quantitative image quality analysis, the authors employ the Image Structure Clustering (ISC) metric proposed in [33].

What is the name of the dataset used for training?

1) The set of images used during training, containing 614 real retinal images and corresponding vessel trees extracted from the Messidor1 database, denoted Training Real Dataset (TrainRD); 2) The held-out test set, that was not used during training, and contains 177 real retinal images and associated vessel trees, denoted Test Real Dataset (TestRD).

(Open Access) End-to-End Adversarial Retinal Image Synthesis (2018) | Pedro Costa

Q: How is the discriminator trained to classify codes z?

This is achieved via the maximization of the classification error of the discriminator module Dcode, which is trained to classify codes z sampled from q(z) according to whether they come from the true prior distribution p(z) or not.

0278-0062 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMI.2017.2759102, IEEE

Transactions on Medical Imaging

TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. X, JULY 2017 1

End-to-end Adversarial Retinal Image Synthesis

Pedro Costa

∗

, Adrian Galdran

∗

, Maria Ines Meyer,

Meindert Niemeijer, Michael Abr

amoff, Ana Maria Mendonc¸a, and Aur

elio Campilho

Abstract—In medical image analysis applications, the availabil-

ity of large amounts of annotated data is becoming increasingly

critical. However, annotated medical data is often scarce and

costly to obtain. In this paper, we address the problem of

synthesizing retinal color images by applying recent techniques

based on adversarial learning. In this setting, a generative model

is trained to maximize a loss function provided by a second

model attempting to classify its output into real or synthetic. In

particular, we propose to implement an adversarial autoencoder

for the task of retinal vessel network synthesis. We use the

generated vessel trees as an intermediate stage for the generation

of color retinal images, which is accomplished with a Generative

Adversarial Network. Both models require the optimization of

almost everywhere differentiable loss functions, which allows us

to train them jointly. The resulting model offers an end-to-end

retinal image synthesis system capable of generating as many

retinal images as the user requires, with their corresponding ves-

sel networks, by sampling from a simple probability distribution

that we impose to the associated latent space. We show that the

learned latent space contains a well-deﬁned semantic structure,

implying that we can perform calculations in the space of retinal

images, e.g., smoothly interpolating new data points between two

retinal images. Visual and quantitative results demonstrate that

the synthesized images are substantially different from those in

the training set, while being also anatomically consistent and

displaying a reasonable visual quality.

Index Terms—Retinal Image Synthesis, Retinal Image Analy-

sis, Generative Adversarial Networks, Adversarial Autoencoders.

I. INTRODUCTION

HE ability to generate meaningful synthetic information

is highly desirable for many computer-aided medical

applications, where annotated data is often scarce and costly

to obtain. A wide availability of such data may allow re-

searchers to develop and validate more sophisticated com-

putational techniques. This pressing need for annotated data,

particularly images, has largely increased with the advent

of deep neural networks, which are progressively becoming

the standard approach in most machine learning tasks [1].

However, these techniques require large amounts of data to

be trained. Therefore, the problem of medical data generation

is of great interest, and as such, it has been deeply studied in

P. Costa

∗

, A. Galdran

∗

, M. I. Meyer, A.M. Mendonc¸a, and A. Campilho

are with INESC TEC Porto, Portugal; e-mails: {pvcosta, adrian.galdran,

maria.i.meyer}@inesctec.pt

M. Niemeijer is with IDx LLC; e-mail: niemeijer@eyediagnosis.net

M. Abramoff is with the Stephen A. Wynn Institute for Vision Research,

University of Iowa; e-mail: michael-abramoff@uiowa.edu

A.M. Mendonc¸a and A. Campilho are also with Faculdade de Engenharia,

Universidade do Porto, Portugal; e-mail: {amendon, campilho}@fe.up.pt

∗

Corresponding Authors. Equal Contribution.

However, permission to use this material for any other purposes must be

obtained from the IEEE by sending a request to pubs-permissions@ieee.org.

recent years [2]. Nevertheless, the realistic synthesis of high-

quality medical data still remains a widely unsolved challenge.

Most medical image generation methods follow two main

strategies. The most conventional approach endeavors to for-

mulate a mathematical model of the observed data. These

models can range from simple digital phantoms [3] to more

complex methodologies attempting to mimic anatomical and

physiological medical knowledge [4]. In combination with

the modeling of relevant characteristics of the different ac-

quisition devices, these techniques can generate new high-

quality images by sampling an appropriate parameter space.

This approach is often referred to as image simulation.

In recent years the data-driven approach of image synthesis

has started gaining popularity. In this context, the intrinsic

variability within a large pool of training images is extracted

by means of machine learning techniques. Ideally, the model is

able to learn the underlying probability distribution that deﬁnes

the manifold of real images. Once trained, the same system

can be sampled to output new images that are likely to lie

on that manifold, i.e. realistic synthetic images. This approach

has recently been successfully applied to improve classiﬁcation

of multi-sequence MRI with missing/corrupted sequences [5],

to estimate cross-modality transformations [6], or to perform

knowledge transfer by learning features invariant to the MR

scanning protocol [7].

In the retinal image analysis ﬁeld, in [8] the authors propose

an algorithm for the generation of the retinal background and

the fovea, and a separate technique for the generation of the

optical disk. For the former, the method relies on the construc-

tion of a large dictionary of small vessel-free image patches.

These patches are extracted from a dataset of co-registered

real images and clustered together, before tiling them in a

consistent manner. For the latter, a parametric intensity model

is proposed, with the parameters being estimated over a dataset

of real images.

The work in [2] is complementary to [8], since it focuses

on the generation of the vascular network only. The authors

propose a method to generate realistic retinal vessel trees.

The parameters controlling the geometry are learned from

real vessel trees. The method also enforces meaningful vessel

orientation and calibers by following a physical bifurcation

law describing the correct oxygenation of the retinal surface

[9]. The output of both approaches can then be superimposed,

allowing for the generation of high-quality large-resolution

images. However, concatenation of both techniques results in

a considerably complex computational pipeline, relying on

sensitive sub-processes such as image registration, patch-to-

image stitching or image blending.

Recently, a purely data-driven approach has been proposed

in [10]. It consists of a simple application of adversarial

Transactions on Medical Imaging

TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. X, JULY 2017 2

Adversarial Autoencoder

Real Pairs

Generative Adversarial Network

Synthetic Pairs

Fig. 1. Overview of our approach. The pair (p, q) is an adversarial autoencoder trained to reconstruct retinal vessel maps. The pair (G, D) is a Generative

Adversarial Network trained to generate color retinal images out of vessel maps. Once the model is trained, the system can generate a new retinal image and

an associated vessel map. The only required input is sampling a distribution p, which is enforced to follow a simple multi-dimensional Gaussian distribution

during training by means of an adversarial loss.

learning methods [11], in which a model is trained on pairs

of real vessel networks and their corresponding retinal fundus

images. The goal is to learn a transformation between them,

and once trained, this technique can generate a plausible retinal

image out of a pre-existing binary vessel tree. Unfortunately,

this approach has been shown to have a relevant drawback: the

model is dependent on the availability of a pre-existing vessel

network in order to generate a new retinal image. The vessel

networks employed for generating images were obtained by

application of an independent vessel segmentation method to

real retinal images. If the original image is defocused, the

retrieved vessel tree will be undercomplete, and the obtained

synthetic image will contain visual artifacts [10].

In this work, we substantially improve upon [10] by remov-

ing the dependence of the model on the previous existence of

a retinal vessel tree. This is achieved by building an autoen-

coder that can learn to generate realistic retinal vessel trees.

Moreover, by minimizing an adversarial loss, the autoencoder

allows to generate vessel networks by simply sampling a multi-

dimensional Normal distribution. A schematic representation

of our approach is depicted in Fig. 1.

It is worth noting that it is theoretically possible to perform a

separate training of the retinal vessel synthesis module and the

vessel network to retinal image mapping. However, since both

tasks are closely related, it is more natural to train both systems

jointly. We achieve this by combining the loss functions

associated to each task in a more general framework. The

resulting method presents several advantages over previously

proposed approaches:

1) The adversarial learning framework allows us to model

the underlying distribution of plausible retinal images

only from training data, without manually interacting

with parameters controlling complex mathematical mod-

els of the retinal anatomy.

2) Once trained, the model improves upon [10] by allowing

to generate any amount of realistic retinal images, with

associated vessel trees, in an efﬁcient manner.

3) Unlike [2], [8], we generate separate parts of the retinal

anatomy through the same process, avoiding the combi-

nation of complex image processing tasks.

The proposed framework provides an effective end-to-end

retinal image synthesis tool, capable of producing realistic eye

fundus images and associated vessel networks with a simple

sampling procedure. We provide objective evaluation of both

the quality and the applicability of our synthetic images. Even

if the generated images and associated vessel maps are of

low resolution, suffer from small inconsistencies, and may

still not be used to train more complex retinal image analysis

algorithms, we show them to be useful for learning a retinal

vessel segmentation model with reasonable performance. This

represents a promising ﬁrst step towards achieving synthetic

data that can be used in more complex automatic retinal image

analysis applications.

II. ADVERSARIAL IMAGE GENERATION

A. Vessel Network to Retinal Image Translation

The research herein reported considers retinal color image

generation out of an existing vessel network as an image-

to-image translation problem, learning a mapping G from

a binary vessel map v into another representation r [12].

Since many retinal images could share a similar binary vessel

network due to variations in color, texture, illumination, etc.,

in our case G is a multi-valued mapping G : v → {r

, ..., r

As such, learning G is an ill-posed problem and some uncer-

tainty is present.

Transactions on Medical Imaging

TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. X, JULY 2017 3

Draw samples

from p

data

(v)

Draw samples

from p

data

(v, r)

Neg.

example

Synthetic eye

fundus image

Pos.

example

Fig. 2. The discriminator D learns to distinguish between real pairs of vessel

networks and eye fundus images (v, r) and synthetic pairs. The generator G

maps an input vessel network v to a color eye fundus image r.

Connected to this is the choice of the objective function to

be minimized while learning G. Training a model to minimize

the L

distance between G(v

) and r

for a collection of

training pairs given by {(r

, v

), . . . , (r

, v

)} will produce

low-quality results with lack of detail [13], due to the model

selecting an average of many potentially valid representations.

Recent ideas based on Generative Adversarial Networks

(GANs) [11] are able to overcome this problem by learning

a more suitable loss function directly from data [12]. The

underlying strategy of adversarial methods consists of emulat-

ing a competition, in which the mapping G, called Generator,

attempts to produce realistic images, while a second player, the

Discriminator D, is trained to distinguish the output generated

by G from real examples. Here, both G and D are neural

networks, and act as adversaries, since the goal of G is to

maximize the misclassiﬁcation error of D, while D’s objective

is to beat G by learning to identify generated images. As in

[11], the adversarial loss, driving the learning of G and D, is:

adv

(G, D) = E

v ,r∼p

data

(v ,r)

[log(D(v, r))] (1)

+ E

v ∼p

data

(v)

[log(1 − D(v, G(v)))],

where E

v ,r∼p

data

(v ,r)

is the expectation over the pairs

(v, r), sampled from the joint data distribution of real pairs

data

(v, r) and p

data

(v) is the real vessel trees distribution.

The Discriminator’s objective is to maximize (1), while the

Generator’s goal is to minimize it. Therefore, it is D that

provides the training signal to G, replacing more conventional

loss functions.

Although minimizing the above loss function induces G to

produce visually sharp results, recent work in [12], [14] has

shown that combining Eq. (1) with a global L

loss provides

more consistent results. Thus, the loss function to optimize

becomes:

im2im

(G, D) = L

adv

(G, D) (2)

+ λE

v ,r∼p

data

(v ,r)

[||r − G(v)||

where λ balances the contribution of the two losses. The

discriminator’s objective is in this case local, i.e., it attempts

to discriminate N × N image regions as real or generated,

but the goal of G is supplemented with a requirement not

only to generate realistically looking images but also images

that preserve a global regularity. Since the L

loss guarantees

that the output of G is globally consistent, D can concentrate

on modeling only high frequency structures. Thus, while D

penalizes locally over-smooth image regions, the L

loss

promotes the consistency of global visual features, such as

the presence of a single optical disk and macula in the image.

An overview of this model is shown in Figure 2.

B. Adversarial Autoencoders for Vessel Trees Generation

Ideally, an end-to-end retinal image synthesis system should

also generate realistic vessel networks. Such a model would

also learn from data and generate as many vessel networks

as the user requires, with a high degree of variability, while

remaining anatomically plausible. In this work, we propose to

achieve this goal by means of an adversarial autoencoder.

Autoencoders are models trained to reconstruct their input.

They are composed of two submodels: 1) an encoder Q, that

maps a training example v to a latent (hidden) representation

z = Q(v), and 2) a decoder P , mapping z to an output that

aims to be a replica of the input. An autoencoder can thus be

trained on a training set of vessel trees v, in order to minimize

a reconstruction objective L

rec

(Q, P ).

Modern autoencoders feature deep neural networks both for

the encoder and the decoder, and introduce stochasticity by

considering probability distributions instead of deterministic

mappings Q, P . Here we deﬁne both the decoder and the

autoencoder to be conditional probability distributions, q(z|v)

and p(v|z).

Autoencoders can be employed to learn useful abstractions

of the data through their latent representations. These can then

be applied in other contexts, e.g. data compression or semi-

supervised learning. However, in the above form, the trivial

mapping that associates each vessel tree example v in the

training set to itself can succeed in minimizing the recon-

struction loss while failing to learn any valuable abstraction.

To avoid this, several types of regularization can be added to

the loss, e.g. minimizing L

rec

(q, p) while requiring the latent

representation to be sparse [15].

However, even when properly regularized, an autoencoder

still has no ability to fulﬁll the goal of generating new

elements close to the true data manifold, since we do not

have knowledge of the underlying probability distribution q(z)

governing the space of latent representations. This prevents us

from sampling it in order to obtain a new code z that can then

be mapped by p to a retinal image.

To achieve the twofold goal of turning the autoencoder into

a generative model while regularizing it in such a way that

it can learn interesting representations of retinal vessel trees,

we apply the adversarial autoencoder framework, proposed

in [16]. In this case, the autoencoder learning process is

embedded in an adversarial competition, similar to the one

described in the previous section. The goal of the autoencoder

is to minimize the reconstruction error, but at the same time,

we attempt to gain control on the probabilistic structure of q(z)

by matching it to a prior distribution p(z) that can be easily

sampled (e.g. a multi-dimensional unit normal distribution).

The encoding distribution q(z|v) in the autoencoder is the

generator component of the adversarial game. This consists

of a neural network enforced to produce latent representations

Transactions on Medical Imaging

TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. X, JULY 2017 4

Draw samples

from p(z)

code

Synthetic vessel tree

q(z|v) p(v|z)

Neg.

example

Pos.

example

Fig. 3. At ﬁrst, the discriminator D

code

is trained to distinguish between

samples from the given prior p(z) and latent representations of training vessel

networks v from the encoder q(z|v). Then, the autoencoder is trained to

minimize the reconstruction loss between its output and v and, at the same

time, maximize the misclassiﬁcation of D

code

z following the pre-speciﬁed prior distribution p(z). This is

achieved via the maximization of the classiﬁcation error of

the discriminator module D

code

, which is trained to classify

codes z sampled from q(z) according to whether they come

from the true prior distribution p(z) or not. Figure 3 depicts

a schematic representation of this process.

The autoencoder training is performed by gradient descent,

with the gradients computed by standard backpropagation. The

optimization process consists of two alternate stages. In the

ﬁrst step, the discriminator is updated to distinguish samples

generated by q from those coming from the prior distribution

p(z). This is achieved by maximizing the following loss:

code

, q) = E

z∼p(z)

[log(D

code

(z))] (3)

+ E

v ∼p

data

(v)

[log(1 − D

code

(q(z|v)))].

In addition, both the encoder and the decoder weights are

updated to minimize the reconstruction error and, at the same

time, to maximize the classiﬁcation error of the discriminator.

In this way, the complete loss function that drives the learning

of the adversarial autoencoder is a combination of both losses:

AAE

code

, q, p) = L

code

, q) + γL

rec

(q, p), (4)

where γ weights the importance of the two losses. The goal

of q and p is to minimize L

AAE

, while D

code

attempts

to maximize it. When the optimization process reaches an

equilibrium point of Eq. (4), the decoder p deﬁnes a generative

model than can be employed to generate new vessel trees

starting from a sample of the imposed prior p(z) on the latent

distribution.

C. From Random Samples to Retinal Images

The vessel-to-retinal image model presented in section II-A

can map a vessel tree v to a realistic eye fundus image r, while

the adversarial autoencoder deﬁned in the previous section

generates a vessel network v from a random sample z coming

from a simple probability distribution. When both models are

combined, we obtain a single system capable of generating a

vessel map and a retinal image r from a random sample z.

However, both sub-tasks are deeply interconnected. The

generation of vessel networks of better quality will lead to

a more realistic retinal image r. Conversely, if the generated

image r is able to deceive the discriminator in such a way that

it classiﬁes it as plausible, it means that the vessel network v

contained in it also needs to be plausible.

Following this argument, we build a single joint model, in

which both sub-systems are trained at the same time, instead

of independently. In our case, the loss functions deﬁning both

models are differentiable almost everywhere. Accordingly, to

build a joint loss function we can directly combine them by

simple addition. Nonetheless, we need to redeﬁne the image-

to-image losses in Eqs. (1) and (2), so that they take the output

of the adversarial autoencoder as the input to G:

adv

(G, D) = E

v ,r∼p

data

(v ,r)

[log(D(v, r))] (5)

+ E

v ∼p

data

(v )

[log(1 − D(˜v, G(˜v)))],

im2im

(G, D) =

adv

(G, D) (6)

+ λE

v ,r∼p

data

(v ,r)

[||r − G(˜v)||

where ˜v = p(q(v)) is the vessel tree generated by the

adversarial autoencoder. With this modiﬁcation, both loss

functions can be linearly combined into a global one:

L(G, D, D

code

, q, p) =

im2im

(G, D) (7)

+ L

AAE

code

, q, p).

In this formulation, the goal of G, q and p is to minimize

the loss function in Eq. (7), while D and D

code

attempt to

maximize it. The main advantage of this joint training scheme

is that the discriminator D also provides with a better loss

function for the adversarial autoencoder. The decoder p needs

to produce realistic looking vessels in order to maximize the

misclassiﬁcation of D. Also, part of the training signal that

arrives to p ﬂows through G. As a consequence, the adversarial

autoencoder also beneﬁts when the generator produces realistic

eye fundus images. A schematic representation of the whole

model is shown in Figure 4.

D. Understanding the Latent Space

After training the model as described above, it is possible

to sample from p(z) in order to produce a synthetic pair of

vessel network and eye fundus images. Nonetheless, the latent

space might contain zones that are not on the manifold learned

during training. This implies that points sampled from p(z)

that are far from the latent representations of all the training

examples might produce pairs that are not plausible (e.g. an

eye fundus image with two optical disks).

Fortunately, there are techniques that allow to sample from

generative models in order to avoid these cases. For instance,

given two real vessel network images v

, v

, we may apply

the encoder q to obtain their latent representations z

, z

and interpolate between these two known locations in the

latent space to obtain a smooth transition between two images,

, ..., z

n−1

}. If the model did not overﬁt the training data,

the vessel trees obtained when decoding these intermediate

representations, i.e., {q(z

), ..., q(z

n−1

)}, will be plausible

vessel networks that are not present on the set of real vessel

networks in which the model was trained with.

Transactions on Medical Imaging

TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. X, JULY 2017 5

Draw samples

from p

data

(v, r)

code

Draw samples

from p(z)

Draw samples

from p

data

(v)

Synthetic

vessel tree

Synthetic eye

fundus image

Pos.

example

Neg.

example

Pos.

example

Neg.

example

Fig. 4. The model consists of an adversarial autoencoder followed by a conditional Generative Adversarial Network. The adversarial autoencoder and the

conditional GAN are trained to minimize the distance between their output and the training pair (v, r) and, at the same time, maximize the misclassiﬁcation

of D and D

code

. Simultaneously, D learns to distinguish between real pairs (v, r) and synthetic pairs and D

code

learns to distinguish between latent

representations produced by the encoder q and samples from the given prior p(z).

q(v

)

G(v

)

G(v

)

q(v

)

Fig. 5. An example of a spherical interpolation between two points z

and

from the latent space.

To ﬁnd a correct path linking z

to z

, typically, linear

interpolation is applied. However, this is not recommendable

when a Gaussian prior is used [17], as is our case. Linearly in-

terpolated latent representations traverse points that are indeed

unlikely given this prior. Instead, it has been shown that the

application of a spherical interpolation (slerp) [17] produces

better results. This is deﬁned by the following equation:

slerp(z

, z

, t) =

sin((1 − t)θ)

sin(θ)

sin(tθ)

sin(θ)

, (8)

where θ is the angle between z

and z

and t is a value ranging

from 0 to 1. For t = 0 the result of slerp is z

, whereas for

t = 1, it takes the value of z

. On every intermediate value,

the slerp interpolation outputs a point in a great arc from a

sphere containing z

and z

It is also well known that the latent space learned by an

autoencoder contains a semantic structure, which implies that

it allows us to perform meaningful vector space arithmetic. As

an example, in this vector space we are able to solve visual

analogies [18]. An analogy is deﬁned as a 4-tuple:

: z

:: z

: z

, (9)

which symbolizes that the relationship between z

and z

the same as the relationship between z

and z

For instance, we can analyze the result of applying the

same transformation between z

and z

to z

, which would

be written in analogies terminology as z

: z

:: z

: ?. If

the points z

lie in a space supporting vector arithmetic, this

analogy can be resolved by vector addition, simply computing:

= z

− z

+ z

. (10)

For instance, given two images encoded by the latent factors

, z

, we can compute a transformation mapping one image to

the other by simply obtaining the vector given by

−−→

. After

this, we can apply that same transformation to a third image

by encoding it into a latent representation z

, and computing

= z

−−→

In the case of the retinal images synthesized by our model,

the latent space is embedded in an N–dimensional vector

space, where N is a hyperparameter of the model. This

provides a ﬁner degree of control on the high-level properties

of the generated images. Applying the above technique, we can

isolate factors of variation in the associated space of vessel

trees deﬁned by p

data

(v). In this case, we gain control on

global visual properties such as the position of the optical disk

or the amount of vessels. Visual examples of these concepts

are demonstrated in the Evaluation section below.

E. Implementation and Training

To be trained, the proposed model requires a dataset of

vessel trees and associated eye fundus image pairs. In or-

der to have enough training data, automatic retinal vessel

segmentations of the Messidor-1 dataset [19] were used.

As this dataset does not include manual segmentations, the

vessel tree was extracted using a U-Net model trained on

the DRIVE dataset [20]. This model achieved a 0.9755 AUC

on the DRIVE test set, a result aligned with state-of-the-

art methods for retinal vessel segmentation [21]–[24]. Further

details about the implementation are described in [10]. Then,

the model as trained on DRIVE was used to segment images

from the Messidor-1 dataset [19]. The obtained segmentations

End-to-End Adversarial Retinal Image Synthesis

Figures

Citations

Generative adversarial network in medical imaging: A review.

Deep learning in medical imaging and radiation therapy.

Medical Image Synthesis for Data Augmentation and Anonymization Using Generative Adversarial Networks

Image Synthesis in Multi-Contrast MRI With Conditional Generative Adversarial Networks

Contrastive learning of global and local features for medical image segmentation with limited annotations

References

Generative Adversarial Nets

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Auto-Encoding Variational Bayes

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Image-to-Image Translation with Conditional Adversarial Networks

Related Papers (5)

Generative Adversarial Nets

U-Net: Convolutional Networks for Biomedical Image Segmentation

Image-to-Image Translation with Conditional Adversarial Networks

Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks

Deep Residual Learning for Image Recognition

Frequently Asked Questions (16)

Q1. What contributions have the authors mentioned in the paper "End-to-end adversarial retinal image synthesis" ?

Q2. What have the authors stated for future works in "End-to-end adversarial retinal image synthesis" ?

Q3. What is the main advantage of this joint training scheme?

Q4. What is the performance of the vessel segmentation model when trained with real images?

Q5. Why does the vessel network 2 show high plausibility?

Q6. What is the way to use the vessel networks?

Q7. What is the metric used to determine the quality of the retinal image?

Q8. How is the discriminator trained to classify codes z?

Q9. How does the adversarial autoencoder achieve this?

Q10. How is the loss function used to maximize the classification error of the discriminator?

Q11. How does the user generate a new pair of images?

Q12. How did the model achieve a 0.9755 AUC on the DRIVE test set?

Q13. How many pairs of images were used in this study?

Q14. What is the metric used to report a quantitative image quality analysis?

Q15. What is the name of the dataset used for training?

Q16. How is the complete loss function that drives the learning of the adversarial autoencoder?