What are the contributions in "Automated quality assessment of retinal fundus photos" ?

On a test dataset of 301 retinal fundus images the authors evaluated their method on a given gold standard by human observers and compared it to a state of the art approach.

How do the authors calculate the gradient magnitude of the input image?

High gradients identifying sharp edges the authors calculate the gradient magnitude image G of the input image The authorby combining the derivative Ix in x-direction and the derivative

What is the p-value of the criteria for the Haralick features?

The criteria are based on the recognizability and dissimilarity of certain structures in the eye background as well as on illumination homogeneity and sharpness.

How long does it take to compute the Haralick features?

The average computation time is 0.8 seconds for the sharpness metrics, 2.2 seconds for the clustering-features and 2.4 seconds for the Haralick features on an Intel Core 2 Duo Quad Q9550 system with 2.4 GHz and 3 GB RAM.

What is the metric used to evaluate the quality of the image?

h1 = 14 ∑ r hr1 (15)h2 = 14 ∑ r hr2 (16)h3 = 14 ∑ r hr3 (17)Thus, texture statistics are used to calculate generic quality features, entropy h1 for common image sharpness, energy h2 for image homogeneity and contrast h3.2.4 Feature Composition

How many subsets were used for each experiment?

Five subsets consisted of 6 bad and 24 good images, four subsets of 7 bad and 23 good images and one subset of 7 bad and 24 good images.

What is the p-value for the proposed method?

3.2 ResultsFor quantifying the performance of the proposed method the authors calculated the area under the ROC curve (AUC), the p-value related to ISC and the p-value related to the final feature combination of Haralick, clustering and sharpness features.

P (i, j, 0◦) = #{(a, x) ∈ [1, . . . , n], (b, y) ∈ [1, . . . ,m] | gab = i, gxy = j, a− x = 0, |b− y| = 1}(7)P (i, j, 45◦) = #{(a, x) ∈ [1, . . . , n], (b, y) ∈ [1, . . . ,m] | gab = i, gxy = j, (a− x = 1, b− y = −1)∨(a− x = −1, b− y = 1)} (8)P (i, j, 90◦) = #{(a, x) ∈ [1, . . . , n], (b, y) ∈ [1, . . . ,m] | gab = i, gxy = j, |a− x| = 1, b− y = 0}(9)P (i, j, 135◦) = #{(a, x) ∈ [1, . . . , n], (b, y) ∈ [1, . . . ,m] | gab = i, gxy = j, (a− x = 1, b− y = 1)∨(a− x = −1, b− y = −1)} (10)Each matrix entry is normalized by the total number of neighbored pixel pairs in its certain direction Nr.p(i, j, r) = P (i, j, r)Nr (11)Based on the four co-occurrence matrices entropy hr1, energy hr2 and contrast h r 3 are calculated for each direction r. hr1 = − m·n∑ i=1 m·n∑ j=1 p(i, j, r)log(p(i, j, r)) (12)hr2 = m·n∑ i=1 m·n∑ j=1 p(i, j, r) 2(13)hr3 = m·n−1∑ l=0 l2{ m·n∑ i=1 m·n∑ j=1|i−j|=lp(i, j, r)} (14)The final Haralick features h1, h2 and h3 are generated by computing the mean of all directions.

What was the parameter set for each method?

The variance γ of the radial basis kernel and the penalty factor C were calculated using a grid search strategy in order to find the best parameter set for each method.

(Open Access) Automated quality assessment of retinal fundus photos (2010) | Jan Paulus

This is the author submitted version of an article whose ﬁnal and deﬁnitive form has been published in

International Journal of Computer Assisted Radiology and Surgergy - Special issue: BVM 2009 Advances and

recent developments in medical image computing

2011 Springer. The original publication is available at

www.springerlink.com with DOI: 10.1007/s11548-010-0479-7

Noname manuscript No.

(will be inserted by the editor)

Automated Quality Assessment of Retinal Fundus Photos

Jan Paulus · J¨org Meier · R¨udiger Bock ·

Joachim Hornegger · Georg Michelson

Received: date / Accepted: date

Abstract Objective Automated, objective and fast mea-

surement of the image quality of single retinal fundus

photos to allow a stable and reliable medical evaluation.

Methods The proposed technique maps diagnosis-relevant

criteria inspired by diagnosis procedures based on the

advise of an eye expert to quantitative and objective

features related to image quality. Independent from seg-

mentation methods it combines global clustering with

local sharpness and texture features for classiﬁcation.

Results On a test dataset of 301 retinal fundus images

we evaluated our method on a given gold standard by

human observers and compared it to a state of the art

approach. An area under the ROC curve of 95.3% com-

pared to 87.2% outperformed the state of the art ap-

proach. A signiﬁcant p-value of 0.019 emphasizes the

statistical diﬀerence of both approaches.

Conclusions The combination of local and global im-

age statistics models the deﬁned quality criteria and

automatically produces reliable and objective results in

determining the image quality of retinal fundus photos.

J. Paulus · J. Meier · R. Bock · J. Hornegger

Pattern Recognition Lab

Graduate School in Advanced Optical Technologies (SAOT)

Martensstr. 3, 91058 Erlangen, Germany

Friedrich-Alexander-University Erlangen-Nuremberg

G. Michelson

Department of Ophthalmology

Graduate School in Advanced Optical Technologies (SAOT)

Interdisciplinary Center of Ophthalmic Preventive Medicine and

Imaging (IZPI)

Schwabachanlage 6, 91054 Erlangen, Germany

Friedrich-Alexander-University Erlangen-Nuremberg

Keywords Retina · fundus image · quality assess-

ment · non-reference image quality metric

1 Introduction

1.1 Motivation

Medical images are a very important basis for diagnosis

and patient treatment. In particular in ophthalmology

photos of the eye background are used by medical ex-

perts to diagnose and document diseases like glaucoma

or diabetic retinopathy. In addition the images are com-

monly further evaluated by automatic software tools to

support the diagnosis [1–3].

Suﬃcient image quality is essential to ensure a re-

liable diagnosis and a valid automated processing. Be-

cause of the operating personnel’s varying level of expe-

rience, diﬀerent types of cameras or the individual prop-

erties of the acquired eye the quality of images highly

varies. Photos of poor quality should not be further

used for diagnosis. A reacquisition would be necessary.

However, in many cases like in reading centers in Ger-

many and the USA [4], the image acquisition is time

and location independent from its medical assessment.

A reacquisition of the images will be time consuming

and expensive. Thus a suﬃcient image quality has to

be assured already during the acquisition procedure.

Unfortunately the rating of image quality is sub-

jective and application dependent. It is an individual

decision at which point the image quality becomes too

bad for a stable diagnosis. There is a strong need to ob-

jectify image quality during the acquisition. This would

help to ensure an overall suﬃcient quality level for the

acquired image data that is essential for a stable and

reliable diagnosis.

1.2 State of the art

In literature, the main purpose for automated qual-

ity assessment in common images is to compare orig-

inal images to their compressed versions for quality

loss quantiﬁcation, so called reference approaches. Es-

kicioglu et al. [5] provide an overview of basic quality

metrics for this problem, such as average diﬀerence or

normalized cross-correlation. Several works in that ﬁeld

develop extended approaches [6] e.g. driven by the hu-

man eye’s function of ﬁnding structures [7].

In the ﬁeld of medical imaging those reference ap-

proaches used for common images are not feasible as

comparable reference images are rarely available. De-

spite of the importance of this problem it is still a

widely neglected ﬁeld of research especially with regard

to ophthalmic fundus imaging. To the authors’ knowl-

edge there are only ﬁve relevant publications dealing

with retinal image quality assessment: (i) Segmentation

based approaches detect anatomical structures, while it

is assumed that the segmentation will fail on low quality

images due to the bad recognizability. Fleming et al. [8]

measure the quality by evaluating the vessel tree in the

region around the macula (point of sharpest vision in

the retina). In addition, anatomical criteria related to

the optic nerve head (exit of the optic nerve out of

the retina) and the macula describe an image forma-

tion that is required to achieve good quality images.

Giancardo et al. [9] measure the densities of vessels for

diﬀerent regions in the image. The vessel densities and

a 5-bin-histogram of each color channel are used as fea-

tures for classiﬁcation. (ii) Histogram based approaches

use information gained by image statistics to identify

low quality photos. Lalonde et al. [10] evaluate the his-

togram of an input image’s gradient magnitude image

and local histogram information of its gray values. Ref-

erence histograms are calculated out of images show-

ing good quality and compared with the input image’s

histograms for classiﬁcation. Lee et al. [11] compute a

quality index by convolving the intensity histogram of

the input image with the template intensity histogram

from good retinal images. Image Structure Clustering

(ISC) [12] characterizes the image quality by the dis-

tribution of image intensities itself and the ability to

cluster the image into the contained anatomical struc-

tures. Five clusters are calculated from the input image

using a bank of ﬁlters to transform the pixels into the

gauge coordinate system that is deﬁned at each point

by the direction of its gradients.

1.3 Contribution

Most of the state of the art methods focus either on

segmentation methods, that can be error-prone, or on

histogram information, that misses the structural infor-

mation of relevant components. As an exception ISC in-

corporates the promising idea of assessing the structural

recognizability of anatomical components but mainly

uses local gradient information of a non-objective gold

standard. We seize the idea but present a new method

that introduces a combination of global and local struc-

tural characteristics as a non-reference approach and

waives error-prone segmentation. In contrast to the sta-

te of the art it is driven by four criteria inspired by

diagnosis procedures based on the advise of an eye ex-

pert. By judging an image according to these criteria

quality assessment becomes a more objective task and

enables the building of an objective gold standard (ﬁg-

ure 1). The criteria are designed for the application on

optic nerve head centered fundus images of 22.5

◦

ﬁeld

of view. Anatomical components like the fovea are not

visible and will not be considered in the following:

– Structural criteria

1. Optic disk structure

Can we recognize and diﬀerentiate the structure

of the optic disk?

2. Vessel structure

Can we recognize and diﬀerentiate the ﬁne struc-

ture of the vessels?

– Generic criteria

3. Homogeneous illumination

Is the illumination and brightness approximately

equal in all parts of the image?

4. Bright and high-contrast background

Is the eye’s background bright enough and of

suﬃcient contrast?

The structural criteria are covered by an unsupervised

clustering and a sharpness metric. Like in ISC the

clustering groups the anatomical structures into clus-

ters. ISC uses a bank of complex ﬁlters for a gauge co-

ordinate transformation. Therefore, it mainly focuses

on gradient and thus local information. In contrast, we

gain global information using a more basic operation

by applying k-means-clustering directly on the pixel in-

tensities. We also utilize cluster sizes to express the size

of relevant components. Another advantage of this ba-

sic operation is the possibility to compute inter-cluster-

diﬀerences for the description of the recognizability and

dissimilarity of these anatomical structures. Like ISC

we incorporate local gradient information, but we gain

it separately as the sharpness metric measures the clear-

ness of separation between the components.

Fig. 1 Example for retinal fundus images of excellent (upper row, all criteria fulﬁlled), average (middle row, two criteria fulﬁlled) and

insuﬃcient (lower row, no criteria fulﬁlled) quality. The images of excellent quality show clearly the optic disk (bright circular spot in

the middle where the optic nerve exits the eye, also known as “blind spot”), the vessel tree (exiting into the eye at the optic disk),

a high-contrast background and an overall homogeneous illumination. The rating is based on the majority decision of three human

evaluators using the criteria deﬁned in section 1.3. Excellent and average quality will be considered to be suﬃcient for further use

and referred as good quality. The average quality images show the problem of judging quality at the class border. Insuﬃcient quality

indicates a reacquisition and will be included in the set of bad quality images in the following.

As a major improvement we introduce the Haralick

texture metrics [13] into the ﬁeld of retinal quality to

describe the generic criteria. Beside the sharpness of the

image the Haralick metrics evaluate the homogeneity

and the contrast.

Summarizing, the clustering describes the recogniz-

ability, dissimilarity and contrast of relevant structures.

The sharpness metric evaluates the separation between

components. The Haralick features measure common

image sharpness, homogeneity and generic contrast.

Thus we combine global and local information which

is not yet present in this form in the state of the art.

2 Methods

Our algorithm models the criteria deﬁned above to mea-

sure the image quality that is relevant for a reliable

assessment of fundus images. The method consists of

a clustering, a sharpness metric and Haralick texture

features.

We combine all features in one ﬁnal vector. For all

computations only the green channel was considered as

it shows the best contrast.

2.1 Clustering

As we want to assure suﬃcient recognizability and dif-

ferentiation of anatomical structures (e.g. optic disk,

vessels) we identify these components by applying a k-

means-clustering of the input image I of size n×m with

k clusters C

with i ∈ {1, . . . , k}. The gray values g

with x ∈ {1, . . . , n} and y ∈ {1, . . . , m} are group ed in

clusters without further preprocessing.

(a) Input image (b) Clustering result (proposed method) (c) Clustering result (ISC)

Fig. 2 Clustering examples: Good (ﬁrst row) and bad (second row) quality images (a) and clustering results for the proposed method

(b) and ISC as state of the art (c) co ded as gray values. For good quality images the clustering images show the characteristic anatomical

structures. In the case of bad quality they are not recognizable.

The cluster centers are initialized with mean values

of the k structures (e.g. vessels) in 10 images manu-

ally segmented by one person. The images showed good

quality and were considered by three human evaluators

to fulﬁll all quality criteria. In each image represen-

tative pixels for each cluster were identiﬁed and their

intensities averaged for each cluster over all 10 images.

In good quality images each anatomical structure

has an expected size where signiﬁcant variations refer

to bad recognizability and thus bad quality. We assess

the structure size by using the normalized cluster sizes

as features, where # denotes the cardinal number.

#{g

∈ C

}

n · m

(1)

The clearer we can recognize certain structures and

diﬀerentiate between them the higher their inter-clus-

ter-contrast. We use inter-cluster-differences as essen-

tial features to express this structural contrast. They

are generated by computing the diﬀerence d

between

the mean value m

of a certain cluster C

and all other

clusters’ mean values m

= m

− m

, i ∈ {1, . . . , k}, j ∈ {1, . . . , k}, i > j (2)

Thus the cluster sizes c

and the inter-cluster-dif-

ferences d

evaluate the structural recognizability and

dissimilarity of relevant image components like e.g. the

optic disk. For bad quality images the clustering will

consequently fail resulting in abnormal cluster sizes and

low inter-cluster-diﬀerences (ﬁgure 2).

2.2 Sharpness

Our clustering (section 2.1) measures the diﬀerentiation

of relevant structures globally. It does not cover local

properties at the structures’ borders where a clear and

sharp edge is important for good quality as it will sepa-

rate the components (e.g. optic disk, vessels) from each

other more clearly. Therefore we incorporate a sharp-

ness metric that evaluates the edge strength in the im-

age. High gradients identifying sharp edges we calculate

the gradient magnitude image G of the input image I

by combining the derivative I

in x-direction and the

derivative I

in y-direction using the Euclidean norm.

G =



+ I

w ith I

∂I

∂x

, I

∂I

∂y

(3)

The gray values e

in the gradient magnitude image G

are normalized to the range [0; 1] by a minimum max-

imum scaling. We use the normalized number of pixels

identifying strong edges s

and the average strength of

strong edges s

to express the image sharpness. Strong

edges have to lie above a threshold α ∈ [0; 1], that was

Automated quality assessment of retinal fundus photos

Figures

Citations

Color Retinal Image Enhancement Based on Luminosity and Contrast Adjustment

Automatic no-reference quality assessment for retinal fundus images using vessel segmentation

Retinal image quality assessment using generic image quality indicators

Identification of suitable fundus images using automated quality assessment methods.

Retinal image quality assessment using deep learning.

References

Textural Features for Image Classification

Image quality measures and their performance

Why is image quality assessment so difficult

Statistical evaluation of image quality measures

Automated detection of diabetic retinopathy on digital fundus images.

Related Papers (5)

Automated assessment of diabetic retinal image quality based on clarity and field definition

Image structure clustering for image quality verification of color retina images in diabetic retinopathy screening

Automatic retinal image quality assessment and enhancement

Retinal Imaging and Image Analysis

Automatic no-reference quality assessment for retinal fundus images using vessel segmentation

Frequently Asked Questions (9)

Q1. What are the contributions in "Automated quality assessment of retinal fundus photos" ?

Q2. How do the authors calculate the gradient magnitude of the input image?

Q3. What is the p-value of the criteria for the Haralick features?

Q4. How long does it take to compute the Haralick features?

Q5. What is the metric used to evaluate the quality of the image?

Q6. How many subsets were used for each experiment?

Q7. What is the p-value for the proposed method?

Q8. i, j, r = p(i, j, r?

Q9. What was the parameter set for each method?