How long does it take to register all 168 pairs of retinal images?

It takes approximately 41.3 min to register all 168 pairs of retinal images using their Harris-PIIFD algorithm (14.75 s per pair, standard deviation of 4.65 s).

How do the authors normalize the gradient magnitudes of each control point?

In a neighborhood surrounding each control point candidate, the authors normalize the first 20% strongest gradient magnitudes to 1, second 20% to 0.75, and by parity of reasoning the last 20% to 0.

What is the importance of a reliable and fair evaluation method for measuring the performance of the retina?

A reliable and fair evaluation method is very important for measuring the performance since there is no public retinal registration dataset.

How many pairs of different-modal retinal images are selected?

In this test, 400 pairs of corresponding control points and 400 pairs of noncorresponding control points are chosen from 20 pairs of different-modal retinal images.

How can the authors extract the main orientation of each control point candidate?

Given the main orientation of each control point candidate (corner point extracted by Harris Detector), the authors can extract the local feature in a manner invariant to image rotation [42] and partially invariant to image intensity.

How many control point candidates are there in each experiment?

In their experiments, the average number of control point candidates is 231, the average number of initial matches (including incorrect matches) is 64.6, and the average number of final matches (after removing incorrect matches) is 43.2.

How many Harris corner points are detected in the PIIFD?

It has been confirmed that 200 Harris corner points are sufficient for subsequent processing, thus, in their experiments about 200 Harris corner points are detected by automatically tuning the sigma of Gaussian window.

How long does it take to estimate the transformation parameters?

This task takes approximately 10 h, and afterward the authors develop a program to estimate the transformation parameters and overlapping percentage.

What is the scale factor of the proposed Harris-PIIFD algorithm?

8. The results of this experiment indicate that their proposed Harris-PIIFD can provide robust matching when the scale factor is below 1.8.

(Open Access) A Partial Intensity Invariant Feature Descriptor for Multimodal Retinal Image Registration (2010) | Jian Chen

Q: What are the contributions in "A partial intensity invariant feature descriptor for multimodal retinal image registration" ?

To solve this problem, the authors propose a novel highly distinctive local feature descriptor named partial intensity invariant feature descriptor ( PIIFD ) and describe a robust automatic retinal image registration framework named Harris-PIIFD.

Q: How does the degraded orientation histogram achieve invariance?

The degraded orientation histogram constrains the gradient orientation from 0 to π, and then the histogram achieves invariance when the gradient orientation rotates by 180◦.

Q: What is the main orientation assigned to each control point candidate?

A main orientation that is relative to the local gradient is assigned to each control point candidate before extracting the PIIFD.

Q: What is the procedure for identifying PIIFDs?

PIIFDs are extracted relative to the main orientations of control point candidates therefore achieve invariance to image rotation, and a bilateral matching technique is applied to identify corresponding PIIFDs matches between image pairs (steps 2–4).

Q: What is the way to register a poor quality multimodal image?

A robust local feature descriptor may bring to success the registration of poor quality multimodal retinal images, as long as it solves the following two problems: 1) the gradient orientations at corresponding locations in multimodal images may point to opposite directions and the gradient magnitudes usually change.

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO. 7, JULY 2010 1707

APartialIntensityInvariantFeatureDescriptorfor

Multimodal Retinal Image Registration

Jian Chen, Jie Tian

∗

,Fellow,IEEE, Noah Lee, Jian Zheng, R. Theodore Smith, and Andrew F. Laine,Fellow,IEEE

Abstract—Detection of vascular bifurcations is a challenging

task in multimodal retinal image registration. Existing algorithms

based on bifurcations usually fail in correctly aligning poor qual-

ity retinal image pairs. To solve this problem, we propose a novel

highly distinctive local feature descriptor named partial intensity

invariant feature descriptor (PIIFD) and describe a robust auto-

matic retinal image registration framework named Harris-PIIFD.

PIIFD is invariant to image rotation, partially invariant to image

intensity, afﬁne transformation, and viewpoint/perspective change.

Our Harris-PIIFD framework consists of four steps. First, corner

points are used as control point candidates instead of bifurcations

since corner points are sufﬁcient and uniformly distributed across

the image domain. Second, PIIFDs are extracted for all corner

points, and a bilateral matching technique is applied to identify

corresponding PIIFDs matches between image pairs. Third, in-

correct matches are removed and inaccurate matches are reﬁned.

Finally, an adaptive transformation is used to register the image

pairs. PIIFD is so distinctive that it can be correctly identiﬁed even

in nonvascular areas. When tested on 168 pairs of multimodal reti-

nal images, the Harris-PIIFD far outperforms existing algorithms

in terms of robustness, accuracy, and computational efﬁciency.

Index Terms—Harris detector, local feature, multimodal regis-

tration, partial intensity invariance, retinal images.

Manuscript received February 20, 2009; revised July 12, 2009 and November

20, 2009; accepted January 15, 2010. Date of publication February 18, 2010;

date of current version June 16, 2010. This work was supported in part by the

National Eye Institute under Grant R01 EY015520-01, by the NYC Commu-

nity Trust (RTS), by the unrestricted funds from Research to prevent blindness,

by the Project for the National Basic Research Program of China (973) un-

der Grant 2006CB705700, by Changjiang Scholars and Innovative Research

Team in University (PCSIRT) under Grant IRT0645, by CAS Hundred Talents

Program, by CAS Scientiﬁc Research Equipment Development Program under

Grant YZ200766, by the Knowledge Innovation Project of the Chinese Academy

of Sciences under Grant KGCX2-YW-129 and Grant KSCX2-YW-R-262, by

the National Natural Science Foundation of China under Grant 30672690,

Grant 30600151, Grant 60532050, Grant 60621001, Grant 30873462, Grant

60910006, Grant 30970769, and Grant 30970771, by Beijing Natural Science

Fund under Grant 4071003, and by the Science and Technology Key Project

of Beijing Municipal Education Commission under Grant KZ200910005005.

Asterisk indicates corresponding author.

J. Chen was with the Institute of Automation, Chinese Academy of Sci-

ences, Beijing 100190, China, and also with the Department of Biomedical

Engineering, Columbia University, New York, NY 10027 USA. He is now with

IBM China Research Laboratory, Beijing 100027, China (e-mail: jianchen@

cn.ibm.com).

∗

J. Tian is with the Institute of Automation, Chinese Academy of Sciences,

Beijing 100190, China (e-mail: tian@ieee.org).

N. Lee is with the Department of Biomedical Engineering, Columbia Uni-

versity, New York, NY 10027 USA (e-mail: nl2168@columbia.edu).

J. Zheng is with the Institute of Automation, Chinese Academy of Sciences,

Beijing 100190, China (e-mail: zhengjian@ﬁngerpass.net.cn).

R. T. Smith is with the Retinal Image Analysis Laboratory, Edward S. Hark-

ness Eye Institute and the Department of Ophthalmology, Columbia University,

New York, NY 10027 USA (e-mail: rts1@columbia.edu).

A. F. Laine is with Heffner Biomedical Imaging Laboratory, Department

of Biomedical Engineering, Columbia University, New York, NY 10027 USA

(e-mail: laine@columbia.edu).

Digital Object Identiﬁer 10.1109/TBME.2010.2042169

Fig. 1. (a) and (b) Poor quality retinal images taken at different stages. Tra-

ditional feature-based approaches usually fail to register this image pair since it

is hard to detect the vasculatures in (b).

I. INTRODUCTION

HE PURPOSE of retinal image registration is to spatially

align two or more retinal images for clinical review of dis-

ease progression. These images come from different screening

events and are usually taken at different times or different ﬁelds

of view. An accurate registration is helpful to diagnose various

kinds of retinal diseases such as glaucoma, diabetes, and age-

related macular degeneration [1]–[4], [54]. However, automatic

accurate registration becomes a problem when registering poor

quality multimodal retinal images (severely affected by noise or

pathology). For example, it is difﬁcult to register an image pair

taken years apart, which were acquired with different sensors

due to possible differences in the ﬁeld of view and modality

characteristics [5]–[8]. Retinopathy may cause severe changes

in the appearance of the whole retina such as obscure vascu-

lature patterns (see Fig. 1). Registration algorithms that rely

on vascular information may fail to correctly align such image

pairs.

Thus, in this paper, we propose a novel distinctive partial

intensity invariant feature descriptor (PIIFD) and describe a fully

automatic algorithm to register poor quality multimodal retinal

image pairs. In the following, we will ﬁrst brieﬂy introduce prior

work regarding existing retinal registration algorithms and then

propose our Harris-PIIFD framework.

A. Prior Work

Existing registration algorithms can be classiﬁed as area-

based and feature-based approaches [9]–[11]. The area-based

approaches [12]–[24] compare and match the intensity differ-

ences of an image pair under a similarity metric such as mu-

tual information [19]–[22] and cross correlation [12], [15], and

then apply an optimization technique [23], [24] to maximize

the similarity metric by searching in the transformation space.

Authorized licensed use limited to: Columbia University. Downloaded on June 15,2010 at 14:53:08 UTC from IEEE Xplore. Restrictions apply.

1708 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO. 7, JULY 2010

The similarity metric is expected to reach its optimum when

two images are properly registered. However, in the case of low

overlapping area registration, the area-based approaches usually

fail [55]. In other words, the similarity metric is usually misled

by nonoverlapping areas. To overcome this problem, a widely

used solution is to assign a region of interest (ROI) within one or

both images for computing the similarity metric [24]. The area-

based approaches are also sensitive to illumination changes and

signiﬁcant initial-misalignment, suggesting that area-based ap-

proaches may be susceptible to occlusion, background changes

caused by pathologies, and pose changes of the camera [35].

Compared with area-based registration, feature-based ap-

proaches [25]–[41] are more appropriate for retinal image reg-

istration. Feature-based approaches typically involve extracting

features and searching for a transformation, which optimizes

the correspondence between these features. The bifurcations of

retinal vasculature, optic disc, and fovea [27], [28] are examples

of such widely used feature cues, respectively. The main advan-

tage of feature-based approaches is the robustness against illu-

mination changes. However, extraction of such features in poor

quality images is difﬁcult. Feature-based approaches for reti-

nal image registration usually distinguish themselves through

minor differences and rely on the assumption that vasculature

network is able to be extracted. For instance, the use of different

structures of the retina as landmark points [27], a focus on im-

proving the performance of landmark extraction algorithm [36],

anarrowingdownofthesearchspacebymanuallyorauto-

matically assigning “matched” points [30], and a more com-

plicated mapping strategy to estimate the most plausible trans-

formation from a pool of possible landmark matches [26] have

been described and all of them rely on the extraction of retinal

vasculature.

Ahybridapproachthateffectivelycombinesbotharea-based

and feature-based approaches has also been proposed [55]; how-

ever, it still relies on retinal vasculature.

General feature-based approaches that do not rely on vas-

culature are also discussed. Scale invariant feature transform

(SIFT), an algorithm for extracting distinctive invariant features

has been proposed [42]–[46]. The SIFT features proposed in this

algorithm are invariant to image scale and rotation and provide

robust matching across a substantial range of afﬁne distortion,

change in 3-D viewpoint/perspective, addition of noise, and

changes in illumination. These features are highly distinctive

in a sense that a single feature can be correctly matched with

high probability against a large database of f eatures from many

images. However, SIFT is designed for monomodal image reg-

istration, and its scale invariance strategy usually cannot provide

sufﬁcient control points for high order transformations. Another

local feature named speeded up robust features (SURF) [57] has

also been proposed, which is several times faster and more ro-

bust against different image transformations than SIFT claimed

by its authors. SURF is based on Haar wavelet, and its good

performance is achieved by building on the strengths of SIFT

and simplifying SIFT to the essential [57]. Soon after, a SURF-

based retinal image registration method, which does not depend

on vasculature has been proposed [59]; however, it is still only

applicable for monomodal image registration.

Fig. 2. Pair of poor quality multimodal retinal images. These two images were

taken from the same eye.

General dual bootstrap iterative closest point algorithm

(GDB-ICP) [35], [60] which uses “corner” points and “face”

points as correspondence cues is more efﬁcient than other ex-

isting algorithms. To our knowledge, the GDB-ICP algorithm

is the best algorithm reported for poor quality retinal image

registration. There are two versions of this approach. The ﬁrst

version uses Lowe’s multiscale keypoint detector and the SIFT

descriptor [42]–[46] to provide initial matches. In comparison,

the second version uses the central line extraction algorithm [36]

to extract the bifurcations of the vasculature to provide initial

matches. Then GDB-ICP algorithm is applied to iteratively ex-

pand the area around initial matches by mapping the “corner”

or “face” points. The authors declare that only one correct ini-

tial match is enough for subsequent iterative registering pro-

cess. However, in some extreme cases no correct match can

be detected by their two initial matching methods. Further, f or

very poor quality images, even if there are some correct initial

matches, the GDB-ICP algorithm may still fail because the dis-

tribution of “corner” and “face” points are severely affected by

noise.

B. Problem Statement and Proposed Method

As mentioned earlier, the existing algorithms cannot register

poor quality multimodal image pairs in which the vasculature is

severely affected by noise or artifacts. The retinal image regis-

tration can be broken down to two situations: multimodal image

registration and poor quality image registration. The existing

algorithms can achieve good performance when these two sit-

uations are not combined together. On one hand, vasculature-

based registration methods can correctly align good-quality mul-

timodal retinal image pairs. On the other hand, some robust local

features such as SIFT and SURF can achieve satisfactory results

for poor quality monomodal registration. However, it is hard to

of retinal image registration combined these two situations is

shown in Fig. 2, in which two images are of poor quality and

different modalities.

Arobustlocalfeaturedescriptormaybringtosuccessthe

registration of poor quality multimodal retinal images, as long as

it solves the following two problems: 1) the gradient orientations

at corresponding locations in multimodal images may point to

opposite directions and the gradient magnitudes usually change.

Authorized licensed use limited to: Columbia University. Downloaded on June 15,2010 at 14:53:08 UTC from IEEE Xplore. Restrictions apply.

CHEN et al.: PIIFD FOR MULTIMODAL RETINAL IMAGE REGISTRATION 1709

Fig. 3. Flowchart of our registration framework. The key contribution of this

study (see Section II-C) is highlighted in bold.

Thus, how can a local feature achieve intensity invariance or at

least partial intensity invariance? and 2) the main orientations

of corresponding control points in multimodal images usually

point to the opposite directions supposing that two images are

properly registered. How can a local feature achieve rotation

invariance?

In this paper, we propose a novel highly distinctive local

feature descriptor named PIIFD [58] and describe a robust auto-

matic retinal image registration framework named Harris-PIIFD

to solve the aforementioned registration problem. PIIFD is in-

variant to image rotation, partially invariant to image intensity ,

afﬁne transformation, and viewpoint/perspective change. Note

that PIIFD is a hybrid area-feature descriptor since the area-

based structural outline is transformed to a feature-vector.

The remainder of this paper is organized as mentioned in

the following. Section II is devoted to the proposed Harris-

PIIFD framework including the novel PIIFD feature descriptor.

Section III describes the experimental settings and reports the

experimental results. Discussion and conclusion are given in

Section IV.

II. P

ROPOSED REGISTRATION FRAMEWORK

Our suggested Harris-PIIFD framework comprises the fol-

lowing seven distinct steps.

1) Detect corner points by a Harris detector [47].

2) Assign a main orientation for each corner point.

3) Extract the PIIFD surrounding each corner point.

4) Match the PIIFDs with bilateral matching.

5) Remove any incorrect matches.

6) Reﬁne the locations of each match.

7) Select the transformation mode.

The ﬂowchart of the Harris-PIIFD framework is shown in

Fig. 3. First, corner points are used as control point candidates

instead of bifurcations (step 1) since corner points are sufﬁcient

and uniformly distributed across the image domain. We assume

that there are two subsets of control point candidates, which

could be identically matched across two images. Second, PI-

IFDs are extracted relative to the main orientations of control

point candidates therefore achieve invariance to image rotation,

and a bilateral matching technique is applied to identify cor-

responding PIIFDs matches between image pairs (steps 2–4).

Fig. 4. Spatial distributions of the control point candidates represented by

(a) bifurcations of vasculature detected by an automatic central line extraction

method and (b) corner points detected by a Harris detector.

Third, incorrect matches are removed and inaccurate matches

are reﬁned (steps 5–6). Finally, an adaptive transformation is ap-

plied to register the image pairs based on these matched control

point candidates (step 7).

Three preprocessing operations are applied before detecting

control point candidates: 1) convert the input image format to

grayscale; 2) scale the intensities of the input image to the full

8-bit intensity range [0, 255]; and 3) zoom out or in the image to

aﬁxedsize(about1000× 1000 pixels, in this paper). The third

operation is not necessary but has twofold advantages: 1) some

image-size-sensitive parameters can be hold ﬁxed and 2) the

scale difference can be reduced in some cases.

A. Detect Corner Points by Harris Detector

The lack of control points is likely to result in an unsuccess-

ful registration for a feature-based algorithm. In retinal image

registration, bifurcations are usually regarded as control point

candidates. However, it is hard to extract the bifurcations in some

cases, especially in poor quality retinal images. Take the image

in Fig. 1(b), for example, only four bifurcations are detected

by a central line extraction algorithm [see Fig. 4(a)] [36]. On

the contrary, a large number of Harris corners are detected and

uniformly distributed across the image domain [see Fig. 4(b)].

Therefore, we introduce Harris detector [47] to generate con-

trol point candidates in our registration framework. The basic

concept of the Harris detector is to measure the changes in all

directions when convoluted with a Gaussian window, and the

changes can be represented by image gradients. For an image I,

assume the traditional image gradients are given as follows:

∂I/∂x

∂I/∂y

. (1)

Thus, the Harris detector can be mathematically expressed as

M =

∗ h (2)

R =det(M) − ktr

(M) (3)

where h is a Gaussian window, k is a constant (usually k =

0.04 ∼ 0.06 [47]), and det and tr are the determinant and trace

of the matrix, respectively. Given a point p(x, y),itisconsidered

Authorized licensed use limited to: Columbia University. Downloaded on June 15,2010 at 14:53:08 UTC from IEEE Xplore. Restrictions apply.

1710 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO. 7, JULY 2010

as a corner point if and only if R(p) > 0.Formoredetailsabout

Harris detector please refer to [47].

Extracting the PIIFDs is the most time-consuming stage of the

proposed Harris-PIIFD framework, and its runtime is directly

proportional to the number of corner points (control point can-

didates). It has been conﬁrmed that 200 Harris corner points are

sufﬁcient for subsequent processing, thus, in our experiments

about 200 Harris corner points are detected by automatically

tuning the sigma of Gaussian window.

The corner points in our framework are not directly used as

features for the registration algorithm. Instead, they just provide

the locations for calculating PIIFDs. Thus, the proposed method

can still work if these corner points are disturbed in the neigh-

borhood or even be replaced by a set of randomly distributed

points. The only difference may be a change in accuracy.

B. Assign Main Orientation to Each Corner Point

Amainorientationthatisrelativetothelocalgradientis

assigned to each control point candidate before extracting the

PIIFD. Thus, the PIIFD can be represented relative to this ori-

entation and therefore achieve invariance to image rotation. In

the present study, we introduce a continuous method, average

squared gradients [48], [49], to assign the main orientation.

This method uses the averaged perpendicular direction of gra-

dient which is limited within [0,π)torepresentacontrolpoint

candidate’s main orientation. For image I,thenewgradient

[ G

]

is expressed as follows:

= sgn(G

)

(4)

where G

and G

are the traditional gradients deﬁned in (1).

In this equation, the second element of the gradient vector is

always positive for the reason that opposite directions of gradi-

ents indicate equivalent main orientations. To compute the main

orientation, the image gradients should be averaged or accumu-

lated within an image window. Opposite gradients will cancel

each other if they are directly averaged or accumulated, but they

are supposed to reinforce each other because they indicate the

same main orientation. A solution to this problem is to square

the gradient vector in complex domain before averaging. The

squared gradient vector [ G

s,x

s,y

]

is given by

s,x

s,y

− G

. (5)

Next, the average squared gradient [

s,x

s,y

]

is calcu-

lated within a Gaussian-weighted circular window

s,x

s,y

s,x

∗ h

s,y

∗ h

(6)

where h

is the Gaussian-weighted kernel, and the operator ∗

means convolution. The σ of the Gaussian window can neither

be too s mall nor too big, for the reason that the average ori-

entation computed in a small window is sensitive to noise and

in a large window cannot represent the local orientation. In this

study, the σ of Gaussian window is set to ﬁve pixels empirically.

Fig. 5. Extracting PIIFD relative to main orientation of control point candidate.

(a) Neighborhood surrounding the control point candidate (centered point) is

decided relative to the main orientation. (b) Orientation histogram extracted

from the highlighted small square in (a).

The main orientation φ of each neighborhood with 0 ≤ φ<π

is given by

φ =











tan

−1

s,y

s,x

(

+ π,

s,x

≥ 0

tan

−1

s,y

s,x

(

+2π,

s,x

< 0 ∩ G

s,y

≥ 0

tan

−1

s,y

s,x

(

s,x

< 0 ∩ G

s,y

< 0

(7)

Thus, for each control point candidate p(x, y),itsmainori-

entation is assigned to φ(x, y).

The SIFT algorithm uses an orientation histogram to calcu-

late the main orientation [42]. However, the main orientations

in multimodal images calculated by orientation histogram may

direct to unrelated directions. This may result in many incor-

rect matches. In addition, the orientation histogram is discrete,

suggesting that their directional resolution is related to the num-

ber of histogram bins. Compared with orientation histogram,

our averaging squared gradients is continuous, more accurate

and computational efﬁcient. As long as the structural outlines

are the same, the main orientations calculated by our method

remain the same. Therefore, our method for calculating main

orientation is suitable for multimodal image registration.

C. Extract PIIFD Surrounding Each Corner Point

Given the main orientation of each control point candidate

(corner point extracted by Harris Detector), we can extract the

local feature in a manner invariant to image rotation [42] and

partially invariant to image intensity. As shown in Fig. 5(a), sup-

posing the centered point is a control point candidate, and the big

square which consists of 4 × 4smallsquaresisthelocalneigh-

borhood surrounding this control point candidate. Note that the

main orientation of this control point candidate is illustrated

by the arrow. The size of neighborhood is a tradeoff between

distinctiveness and computational efﬁciency. In Lowe’s SIFT

algorithm, the size of neighborhood is automatically decided by

the scale of control point. By carefully investigating the reti-

nal images, we empirically set the size to ﬁxed 40 × 40 pixels

in our experiments for the reason that the scale difference is

slight.

To extract the PIIFD, the image gradient magnitudes and ori-

entations are sampled in this local neighborhood. In order to

Authorized licensed use limited to: Columbia University. Downloaded on June 15,2010 at 14:53:08 UTC from IEEE Xplore. Restrictions apply.

CHEN et al.: PIIFD FOR MULTIMODAL RETINAL IMAGE REGISTRATION 1711

achieve orientation invariance, the gradient orientations are ro-

tated relative to the main orientation. For a given small square

in this neighborhood [e.g., the highlighted small square shown

in Fig. 5(a)], an orientation histogram, which evenly covers 0

◦

–

360

◦

with 16 bins (0

◦

,22.5

◦

,45

◦

,...,337.5

◦

)isformed.The

gradient magnitude of each pixel that falls into this small square

is accumulated to the corresponding histogram entry. It is im-

portant to avoid the boundary affects in which the descriptor

abruptly changes as a sample shifts smoothly from being within

one histogram to another or from one orientation to another.

Therefore, bilinear interpolation is used to distribute the value

of each gradient sample into adjacent histogram bins. The pro-

cesses between extracting PIIFD and SIFT are almost the same,

therefore, PIIFD and SIFT have some common characteristics.

For example, both PIIFD and SIFT are partially inv ariant to

afﬁne transformation [42]–[46].

In an image, an outline is a line marking the multiple con-

tours or boundaries of an object or a ﬁgure. The basic idea

of achieving partial intensity invariance involves extracting the

descriptor from the image outlines. This is based on the assump-

tion that regions of similar anatomical structure in one image

would correspond to regions in the other image that also con-

sist of similar outlines (although probably different values to

those of the ﬁrst image). In this study, image outline extrac-

tion is simpliﬁed to extract the constrained image gradients.

The gradient orientations at corresponding locations in multi-

modal images may possibly point to opposite directions and the

gradient magnitudes usually change. In order to achieve partial

intensity invariance, two operations are applied on the image

gradients. First, we normalize the gradient magnitudes piece-

wise to reduce the inﬂuence of change of gradient magnitude.

In a neighborhood surrounding each control point candidate,

we normalize the ﬁrst 20% strongest gradient magnitudes to 1,

second 20% to 0.75, and by parity of reasoning the last 20% to

0. Second, we convert the orientation histogram with 16 bins

to a degraded orientation histogram with only 8 bins (0

◦

,22.5

◦

, ...,157.5

◦

)bycalculatingthesumoftheoppositedirec-

tions [see Fig. 5(b)]. If the intensities of this local neighborhood

change between two image modalities (for instance, some dark

vessels become bright), then the gradients in this area will also

change. However, the outlines of this area will almost remain

unchanged. The degraded orientation histogram constrains the

gradient orientation from 0 to π,andthenthehistogramachieves

invariance when the gradient orientation rotates by 180

◦

.Con-

sequently, the descriptor achieves partial invariance to the afore-

mentioned intensity change. The second operation is based on

the assumption that the gradient orientations at corresponding

locations in multimodal images point to the same direction or

opposite directions. It is difﬁcult to mathematically prove this

assumption as “multimodal image” is not a well-deﬁned nota-

tion, although for intensity inverse images (an ideal situation),

this assumption is absolutely sustainable. Actually, the degraded

orientation histogram is not as distinctive as the original one,

but this degradation at the cost of distinctiveness is acceptable

for achieving partial invariance to image intensity. For the case

shown in Fig. 5, there are in total 4 × 4 = 16 orientation his-

tograms (one for each small square). All these histograms can

be denoted by

H =













(8)

where H

denotes an orientation histogram with eight bins.

The main orientations of corresponding control points may

point to the opposite directions in multimodal image pair. This

situation will still occur even we have already constrained the

gradient orientations to the range [0

◦

,180

◦

], and break the ro-

tation invariance. For example, the main orientations of corre-

sponding control points extracted from an image and its rotated

version by 180

◦

always point to the opposite directions. In this

paper, we propose a linear combination of two subdescriptors

to solve this problem. One subdescriptor is the matrix H com-

puted by (8). The other subdescriptor is a rotated version of H:

H : Q =rot(H, 180

◦

).Thecombineddescriptor,PIIFD,can

be calculated as follows:

des =







+ Q

)

+ Q

)

c |H

− Q

c |H

− Q







(9)

=[H

] (10)

=[Q

] (11)

where c is a parameter to tune the proportion of magnitude in this

local descriptor. The absolute value of descriptor is normalized

in the next step. In our algorithm, c is adaptively determined by

making the maximum of two parts the same. The goal of the

linear combination is to make the ﬁnal descriptor invariant to

two opposite directions. This linear combination is reversible,

so it will not reduce the distinctiveness of the descriptor. It is

obvious that PIIFD is a 4 × 4 × 8matrix.Fortheconvenienceof

matching, it is quantized to a vector with 128 elements. Finally,

the PIIFD is normalized to a unit length.

D. Match PIIFDs by Bilateral Matching Method

We use the best-bin-ﬁrst (BBF) algorithm [50] to match the

correspondences between two images. This algorithm identiﬁes

the approximate closest neighbors of points in high dimensional

spaces. This is approximate in the sense that it returns the closest

neighbor with the highest probability. Suppose that the set of

all PIIFDs of image I

is F

,andthesetofI

is F

,thenfor

agivenPIIFDf

∈ F

,asetofdistancesfromf

to F

deﬁned as follows:

D(f

)={f

• f

|∀f

∈ F

} (12)

where • is the dot product of vectors. It is obvious that this set

comprises all the distances between f

and descriptors in I

Let f

and f

be the biggest and second-biggest elements

of D(f

),whichcorrespondtof

s closest and second-

closest neighbors, respectively. If the closest neighbor is signif-

icantly closer than the second-closest neighbor, f

<t,

Authorized licensed use limited to: Columbia University. Downloaded on June 15,2010 at 14:53:08 UTC from IEEE Xplore. Restrictions apply.

A Partial Intensity Invariant Feature Descriptor for Multimodal Retinal Image Registration

Figures

Citations

Retinal Imaging and Image Analysis

Image Matching from Handcrafted to Deep Features: A Survey

RIFT: Multi-Modal Image Matching Based on Radiation-Variation Insensitive Feature Transform

Remote Sensing Image Matching Based on Adaptive Binning SIFT Descriptor

MODS: Fast and robust method for two-view matching☆

References

Distinctive Image Features from Scale-Invariant Keypoints

Object recognition from local scale-invariant features

Distinctive Image Features from Scale-Invariant Keypoints

A Combined Corner and Edge Detector

SURF: speeded up robust features

Related Papers (5)

Distinctive Image Features from Scale-Invariant Keypoints

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Speeded-Up Robust Features (SURF)

Image registration methods: a survey

The dual-bootstrap iterative closest point algorithm with application to retinal image registration

Frequently Asked Questions (15)

Q1. What are the contributions in "A partial intensity invariant feature descriptor for multimodal retinal image registration" ?

Q2. How does the degraded orientation histogram achieve invariance?

Q3. What is the main orientation assigned to each control point candidate?

Q4. How long does it take to register all 168 pairs of retinal images?

Q5. How do the authors normalize the gradient magnitudes of each control point?

Q6. What is the procedure for identifying PIIFDs?

Q7. What is the importance of a reliable and fair evaluation method for measuring the performance of the retina?

Q8. How many pairs of different-modal retinal images are selected?

Q9. What is the way to register a poor quality multimodal image?

Q10. How can the authors extract the main orientation of each control point candidate?

Q11. How many control point candidates are there in each experiment?

Q12. How many Harris corner points are detected in the PIIFD?

Q13. How many bins are formed in the orientation histogram?

Q14. How long does it take to estimate the transformation parameters?

Q15. What is the scale factor of the proposed Harris-PIIFD algorithm?