scispace - formally typeset
Open AccessJournal ArticleDOI

Automated quality assessment of retinal fundus photos

TLDR
The proposed technique maps diagnosis-relevant criteria inspired by diagnosis procedures based on the advise of an eye expert to quantitative and objective features related to image quality and automatically produces reliable and objective results in determining the image quality of retinal fundus photos.
Abstract
Objective Automated, objective and fast measurement of the image quality of single retinal fundus photos to allow a stable and reliable medical evaluation.

read more

Content maybe subject to copyright    Report

This is the author submitted version of an article whose final and definitive form has been published in
International Journal of Computer Assisted Radiology and Surgergy - Special issue: BVM 2009 Advances and
recent developments in medical image computing
c
2011 Springer. The original publication is available at
www.springerlink.com with DOI: 10.1007/s11548-010-0479-7

1
Noname manuscript No.
(will be inserted by the editor)
Automated Quality Assessment of Retinal Fundus Photos
Jan Paulus · org Meier · R¨udiger Bock ·
Joachim Hornegger · Georg Michelson
Received: date / Accepted: date
Abstract Objective Automated, objective and fast mea-
surement of the image quality of single retinal fundus
photos to allow a stable and reliable medical evaluation.
Methods The proposed technique maps diagnosis-relevant
criteria inspired by diagnosis procedures based on the
advise of an eye expert to quantitative and objective
features related to image quality. Independent from seg-
mentation methods it combines global clustering with
local sharpness and texture features for classification.
Results On a test dataset of 301 retinal fundus images
we evaluated our method on a given gold standard by
human observers and compared it to a state of the art
approach. An area under the ROC curve of 95.3% com-
pared to 87.2% outperformed the state of the art ap-
proach. A significant p-value of 0.019 emphasizes the
statistical difference of both approaches.
Conclusions The combination of local and global im-
age statistics models the defined quality criteria and
automatically produces reliable and objective results in
determining the image quality of retinal fundus photos.
J. Paulus · J. Meier · R. Bock · J. Hornegger
Pattern Recognition Lab
Graduate School in Advanced Optical Technologies (SAOT)
Martensstr. 3, 91058 Erlangen, Germany
Friedrich-Alexander-University Erlangen-Nuremberg
G. Michelson
Department of Ophthalmology
Graduate School in Advanced Optical Technologies (SAOT)
Interdisciplinary Center of Ophthalmic Preventive Medicine and
Imaging (IZPI)
Schwabachanlage 6, 91054 Erlangen, Germany
Friedrich-Alexander-University Erlangen-Nuremberg
Keywords Retina · fundus image · quality assess-
ment · non-reference image quality metric
1 Introduction
1.1 Motivation
Medical images are a very important basis for diagnosis
and patient treatment. In particular in ophthalmology
photos of the eye background are used by medical ex-
perts to diagnose and document diseases like glaucoma
or diabetic retinopathy. In addition the images are com-
monly further evaluated by automatic software tools to
support the diagnosis [1–3].
Sufficient image quality is essential to ensure a re-
liable diagnosis and a valid automated processing. Be-
cause of the operating personnel’s varying level of expe-
rience, different types of cameras or the individual prop-
erties of the acquired eye the quality of images highly
varies. Photos of poor quality should not be further
used for diagnosis. A reacquisition would be necessary.
However, in many cases like in reading centers in Ger-
many and the USA [4], the image acquisition is time
and location independent from its medical assessment.
A reacquisition of the images will be time consuming
and expensive. Thus a sufficient image quality has to
be assured already during the acquisition procedure.
Unfortunately the rating of image quality is sub-
jective and application dependent. It is an individual
decision at which point the image quality becomes too
bad for a stable diagnosis. There is a strong need to ob-
jectify image quality during the acquisition. This would
help to ensure an overall sufficient quality level for the
acquired image data that is essential for a stable and
reliable diagnosis.

2
1.2 State of the art
In literature, the main purpose for automated qual-
ity assessment in common images is to compare orig-
inal images to their compressed versions for quality
loss quantification, so called reference approaches. Es-
kicioglu et al. [5] provide an overview of basic quality
metrics for this problem, such as average difference or
normalized cross-correlation. Several works in that field
develop extended approaches [6] e.g. driven by the hu-
man eye’s function of finding structures [7].
In the field of medical imaging those reference ap-
proaches used for common images are not feasible as
comparable reference images are rarely available. De-
spite of the importance of this problem it is still a
widely neglected field of research especially with regard
to ophthalmic fundus imaging. To the authors’ knowl-
edge there are only five relevant publications dealing
with retinal image quality assessment: (i) Segmentation
based approaches detect anatomical structures, while it
is assumed that the segmentation will fail on low quality
images due to the bad recognizability. Fleming et al. [8]
measure the quality by evaluating the vessel tree in the
region around the macula (point of sharpest vision in
the retina). In addition, anatomical criteria related to
the optic nerve head (exit of the optic nerve out of
the retina) and the macula describe an image forma-
tion that is required to achieve good quality images.
Giancardo et al. [9] measure the densities of vessels for
different regions in the image. The vessel densities and
a 5-bin-histogram of each color channel are used as fea-
tures for classification. (ii) Histogram based approaches
use information gained by image statistics to identify
low quality photos. Lalonde et al. [10] evaluate the his-
togram of an input image’s gradient magnitude image
and local histogram information of its gray values. Ref-
erence histograms are calculated out of images show-
ing good quality and compared with the input image’s
histograms for classification. Lee et al. [11] compute a
quality index by convolving the intensity histogram of
the input image with the template intensity histogram
from good retinal images. Image Structure Clustering
(ISC) [12] characterizes the image quality by the dis-
tribution of image intensities itself and the ability to
cluster the image into the contained anatomical struc-
tures. Five clusters are calculated from the input image
using a bank of filters to transform the pixels into the
gauge coordinate system that is defined at each point
by the direction of its gradients.
1.3 Contribution
Most of the state of the art methods focus either on
segmentation methods, that can be error-prone, or on
histogram information, that misses the structural infor-
mation of relevant components. As an exception ISC in-
corporates the promising idea of assessing the structural
recognizability of anatomical components but mainly
uses local gradient information of a non-objective gold
standard. We seize the idea but present a new method
that introduces a combination of global and local struc-
tural characteristics as a non-reference approach and
waives error-prone segmentation. In contrast to the sta-
te of the art it is driven by four criteria inspired by
diagnosis procedures based on the advise of an eye ex-
pert. By judging an image according to these criteria
quality assessment becomes a more objective task and
enables the building of an objective gold standard (fig-
ure 1). The criteria are designed for the application on
optic nerve head centered fundus images of 22.5
field
of view. Anatomical components like the fovea are not
visible and will not be considered in the following:
Structural criteria
1. Optic disk structure
Can we recognize and differentiate the structure
of the optic disk?
2. Vessel structure
Can we recognize and differentiate the fine struc-
ture of the vessels?
Generic criteria
3. Homogeneous illumination
Is the illumination and brightness approximately
equal in all parts of the image?
4. Bright and high-contrast background
Is the eye’s background bright enough and of
sufficient contrast?
The structural criteria are covered by an unsupervised
clustering and a sharpness metric. Like in ISC the
clustering groups the anatomical structures into clus-
ters. ISC uses a bank of complex filters for a gauge co-
ordinate transformation. Therefore, it mainly focuses
on gradient and thus local information. In contrast, we
gain global information using a more basic operation
by applying k-means-clustering directly on the pixel in-
tensities. We also utilize cluster sizes to express the size
of relevant components. Another advantage of this ba-
sic operation is the possibility to compute inter-cluster-
differences for the description of the recognizability and
dissimilarity of these anatomical structures. Like ISC
we incorporate local gradient information, but we gain
it separately as the sharpness metric measures the clear-
ness of separation between the components.

3
Fig. 1 Example for retinal fundus images of excellent (upper row, all criteria fulfilled), average (middle row, two criteria fulfilled) and
insufficient (lower row, no criteria fulfilled) quality. The images of excellent quality show clearly the optic disk (bright circular spot in
the middle where the optic nerve exits the eye, also known as “blind spot”), the vessel tree (exiting into the eye at the optic disk),
a high-contrast background and an overall homogeneous illumination. The rating is based on the majority decision of three human
evaluators using the criteria defined in section 1.3. Excellent and average quality will be considered to be sufficient for further use
and referred as good quality. The average quality images show the problem of judging quality at the class border. Insufficient quality
indicates a reacquisition and will be included in the set of bad quality images in the following.
As a major improvement we introduce the Haralick
texture metrics [13] into the field of retinal quality to
describe the generic criteria. Beside the sharpness of the
image the Haralick metrics evaluate the homogeneity
and the contrast.
Summarizing, the clustering describes the recogniz-
ability, dissimilarity and contrast of relevant structures.
The sharpness metric evaluates the separation between
components. The Haralick features measure common
image sharpness, homogeneity and generic contrast.
Thus we combine global and local information which
is not yet present in this form in the state of the art.
2 Methods
Our algorithm models the criteria defined above to mea-
sure the image quality that is relevant for a reliable
assessment of fundus images. The method consists of
a clustering, a sharpness metric and Haralick texture
features.
We combine all features in one final vector. For all
computations only the green channel was considered as
it shows the best contrast.
2.1 Clustering
As we want to assure sufficient recognizability and dif-
ferentiation of anatomical structures (e.g. optic disk,
vessels) we identify these components by applying a k-
means-clustering of the input image I of size n×m with
k clusters C
i
with i {1, . . . , k}. The gray values g
xy
with x {1, . . . , n} and y {1, . . . , m} are group ed in
clusters without further preprocessing.

4
(a) Input image (b) Clustering result (proposed method) (c) Clustering result (ISC)
Fig. 2 Clustering examples: Good (first row) and bad (second row) quality images (a) and clustering results for the proposed method
(b) and ISC as state of the art (c) co ded as gray values. For good quality images the clustering images show the characteristic anatomical
structures. In the case of bad quality they are not recognizable.
The cluster centers are initialized with mean values
of the k structures (e.g. vessels) in 10 images manu-
ally segmented by one person. The images showed good
quality and were considered by three human evaluators
to fulfill all quality criteria. In each image represen-
tative pixels for each cluster were identified and their
intensities averaged for each cluster over all 10 images.
In good quality images each anatomical structure
has an expected size where significant variations refer
to bad recognizability and thus bad quality. We assess
the structure size by using the normalized cluster sizes
c
i
as features, where # denotes the cardinal number.
c
i
=
#{g
xy
|g
xy
C
i
}
n · m
(1)
The clearer we can recognize certain structures and
differentiate between them the higher their inter-clus-
ter-contrast. We use inter-cluster-differences as essen-
tial features to express this structural contrast. They
are generated by computing the difference d
ij
between
the mean value m
i
of a certain cluster C
i
and all other
clusters’ mean values m
j
.
d
ij
= m
i
m
j
, i {1, . . . , k}, j {1, . . . , k}, i > j (2)
Thus the cluster sizes c
i
and the inter-cluster-dif-
ferences d
ij
evaluate the structural recognizability and
dissimilarity of relevant image components like e.g. the
optic disk. For bad quality images the clustering will
consequently fail resulting in abnormal cluster sizes and
low inter-cluster-differences (figure 2).
2.2 Sharpness
Our clustering (section 2.1) measures the differentiation
of relevant structures globally. It does not cover local
properties at the structures’ borders where a clear and
sharp edge is important for good quality as it will sepa-
rate the components (e.g. optic disk, vessels) from each
other more clearly. Therefore we incorporate a sharp-
ness metric that evaluates the edge strength in the im-
age. High gradients identifying sharp edges we calculate
the gradient magnitude image G of the input image I
by combining the derivative I
x
in x-direction and the
derivative I
y
in y-direction using the Euclidean norm.
G =
I
2
x
+ I
2
y
w ith I
x
=
I
x
, I
y
=
I
y
(3)
The gray values e
xy
in the gradient magnitude image G
are normalized to the range [0; 1] by a minimum max-
imum scaling. We use the normalized number of pixels
identifying strong edges s
1
and the average strength of
strong edges s
2
to express the image sharpness. Strong
edges have to lie above a threshold α [0; 1], that was

Figures
Citations
More filters
Journal ArticleDOI

Color Retinal Image Enhancement Based on Luminosity and Contrast Adjustment

TL;DR: The proposed image enhancement method to improve color retinal image luminosity and contrast is shown to achieve superior image enhancement compared to contrast enhancement in other color spaces or by other related methods, while simultaneously preserving image naturalness.
Proceedings ArticleDOI

Automatic no-reference quality assessment for retinal fundus images using vessel segmentation

TL;DR: A no-reference quality metric to quantify image noise and blur and its application to fundus image quality assessment is presented, which correlates reasonable to a human observer, indicating high agreement to human visual perception.
Journal ArticleDOI

Retinal image quality assessment using generic image quality indicators

TL;DR: A retinal image gradability assessment algorithm based on the fusion of generic image quality indicators is introduced demonstrating very good performance and confirming the usability of the solution in an ambulatory application environment.
Journal ArticleDOI

Identification of suitable fundus images using automated quality assessment methods.

TL;DR: An approach for finding medically suitable retinal images for retinal diagnosis using a three-class grading system that consists of good, bad, and outlier classes is presented and can be integrated into any automatic retinal analysis system with sufficient performance scores.
Journal ArticleDOI

Retinal image quality assessment using deep learning.

TL;DR: A robust automatic system is proposed to assess the quality of retinal images at the moment of the acquisition, aiming at assisting health care professionals during a fundus photography exam and shows the robustness of the proposed model to various image acquisitions without requiring special adaptation.
References
More filters
Journal ArticleDOI

Textural Features for Image Classification

TL;DR: These results indicate that the easily computable textural features based on gray-tone spatial dependancies probably have a general applicability for a wide variety of image-classification applications.
Journal ArticleDOI

Image quality measures and their performance

TL;DR: Although some numerical measures correlate well with the observers' response for a given compression technique, they are not reliable for an evaluation across different techniques, and a graphical measure called Hosaka plots can be used to appropriately specify not only the amount, but also the type of degradation in reconstructed images.
Proceedings ArticleDOI

Why is image quality assessment so difficult

TL;DR: In this paper, insights on why image quality assessment is so difficult are provided by pointing out the weaknesses of the error sensitivity based framework and a new philosophy in designing image quality metrics is proposed.
Journal ArticleDOI

Statistical evaluation of image quality measures

TL;DR: It was found that measures based on the phase spectrum, the multireso- lution distance or the HVS filtered mean square error are computa- tionally simple and are more responsive to coding artifacts.
Journal ArticleDOI

Automated detection of diabetic retinopathy on digital fundus images.

TL;DR: The aim was to develop an automated screening system to analyse digital colour retinal images for important features of non‐proliferative diabetic retinopathy (NPDR).
Related Papers (5)
Frequently Asked Questions (9)
Q1. What are the contributions in "Automated quality assessment of retinal fundus photos" ?

On a test dataset of 301 retinal fundus images the authors evaluated their method on a given gold standard by human observers and compared it to a state of the art approach. 

High gradients identifying sharp edges the authors calculate the gradient magnitude image G of the input image The authorby combining the derivative Ix in x-direction and the derivative 

The criteria are based on the recognizability and dissimilarity of certain structures in the eye background as well as on illumination homogeneity and sharpness. 

The average computation time is 0.8 seconds for the sharpness metrics, 2.2 seconds for the clustering-features and 2.4 seconds for the Haralick features on an Intel Core 2 Duo Quad Q9550 system with 2.4 GHz and 3 GB RAM. 

h1 = 14 ∑ r hr1 (15)h2 = 14 ∑ r hr2 (16)h3 = 14 ∑ r hr3 (17)Thus, texture statistics are used to calculate generic quality features, entropy h1 for common image sharpness, energy h2 for image homogeneity and contrast h3.2.4 Feature Composition 

Five subsets consisted of 6 bad and 24 good images, four subsets of 7 bad and 23 good images and one subset of 7 bad and 24 good images. 

3.2 ResultsFor quantifying the performance of the proposed method the authors calculated the area under the ROC curve (AUC), the p-value related to ISC and the p-value related to the final feature combination of Haralick, clustering and sharpness features. 

P (i, j, 0◦) = #{(a, x) ∈ [1, . . . , n], (b, y) ∈ [1, . . . ,m] | gab = i, gxy = j, a− x = 0, |b− y| = 1}(7)P (i, j, 45◦) = #{(a, x) ∈ [1, . . . , n], (b, y) ∈ [1, . . . ,m] | gab = i, gxy = j, (a− x = 1, b− y = −1)∨(a− x = −1, b− y = 1)} (8)P (i, j, 90◦) = #{(a, x) ∈ [1, . . . , n], (b, y) ∈ [1, . . . ,m] | gab = i, gxy = j, |a− x| = 1, b− y = 0}(9)P (i, j, 135◦) = #{(a, x) ∈ [1, . . . , n], (b, y) ∈ [1, . . . ,m] | gab = i, gxy = j, (a− x = 1, b− y = 1)∨(a− x = −1, b− y = −1)} (10)Each matrix entry is normalized by the total number of neighbored pixel pairs in its certain direction Nr.p(i, j, r) = P (i, j, r)Nr (11)Based on the four co-occurrence matrices entropy hr1, energy hr2 and contrast h r 3 are calculated for each direction r. hr1 = − m·n∑ i=1 m·n∑ j=1 p(i, j, r)log(p(i, j, r)) (12)hr2 = m·n∑ i=1 m·n∑ j=1 p(i, j, r) 2(13)hr3 = m·n−1∑ l=0 l2{ m·n∑ i=1 m·n∑ j=1|i−j|=lp(i, j, r)} (14)The final Haralick features h1, h2 and h3 are generated by computing the mean of all directions. 

The variance γ of the radial basis kernel and the penalty factor C were calculated using a grid search strategy in order to find the best parameter set for each method.