What is the value of a bootstrap resampling?

Bootstrap resampling was used in image processing to evaluate the performance of detection and classification algorithms [51], [52] and edge detectors [53], to compensate the bias in estimation of ellipse parameters [54] and to improve image segmentation [55], [56].

What is the coefficient of variation of the bootstrap method?

The authors observe that the coefficient of variation decreases with but the decrease is slow and diminishes even further with increased noise level .

What is the method for estimating the accuracy of a pixel?

Although for the sake of simplicity the authors have considered only 2-D translations, the presented accuracy estimation techniques are directly usable for other registration methods that find transformation with more degrees of freedom.

What is the definition of the mean displacement variance?

In particular, the authors shall evaluate the covariance matrixwith (5)and a mean displacement variance(6)For , the expression simplifies to(7)The mean displacement variance is equal to the mean squared geometric error (MSE) provided that the estimator (4) is unbiased, .

What is the corresponding log-likelihood of the partial derivatives?

Hence(10)and using the chain rule yields(11)In accordance with [13], the authors estimate the partial derivatives using first order differences.

(Open Access) Bootstrap Resampling for Image Registration Uncertainty Estimation Without Ground Truth (2010) | Jan Kybic

Q: What is the criterion used to calculate the Hessian matrix?

In their implementation, image is interpolated using cubic B-splines [16], [17], its derivative is calculated analytically, and the minimization (4) is performed using the BFGS (Broyden-Fletcher-Goldfarb-Shanno) pseudoNewton algorithm [18], which incrementally updates the estimate of the Hessian matrix from the gradient.

Q: What is the effect of the trimmed mean?

To eliminate the influence of outliers (the optimization program failing to converge) and thus distorting the statistics, the authors used a trimmed mean, discarding of the highest and lowest values.

Q: What is the log-likelihood of the Fisher information matrix?

The corresponding log-likelihood is(8)The elements of the Fisher information matrix (FIM) are(9)The second quadratic term in (8) is constant with respect to and the expected value of is zero.

64 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 1, JANUARY 2010

Bootstrap Resampling for Image Registration

Uncertainty Estimation Without Ground Truth

Jan Kybic, Senior Member, IEEE

Abstract—We address the problem of estimating the uncertainty

of pixel based image registration algorithms, given just the two

images to be registered, for cases when no ground truth data is

available. Our novel method uses bootstrap resampling. It is very

general, applicable to almost any registration method based on

minimizing a pixel-based similarity criterion; we demonstrate it

using the SSD, SAD, correlation, and mutual information criteria.

We show experimentally that the bootstrap method provides better

estimates of the registration accuracy than the state-of-the-art

Cramér–Rao bound method. Additionally, we evaluate also a

fast registration accuracy estimation (FRAE) method which is

based on quadratic sensitivity analysis ideas and has a negligible

computational overhead. FRAE mostly works better than the

Cramér–Rao bound method but is outperformed by the bootstrap

method.

Index Terms—Accuracy estimation, bootstrap, Cramér–Rao

bound, image registration, motion estimation, performance limits,

uncertainty estimation.

I. INTRODUCTION

MAGE registration [1], [2] ﬁnds a geometric transforma-

tion relating coordinates of corresponding points in two

given images. Image registration is used for motion analysis,

video compression and coding, object tracking, image stabiliza-

tion, segmentation, stereo reconstruction, and super-resolution

[3]. Biomedical applications [4]–[8] include intrasubject, inter-

subject, and intermodality analysis, registration with atlases,

quantiﬁcation and qualiﬁcation of feature shapes and sizes,

elastography, distortion compensation, motion detection and

compensation.

Most image registration algorithms return just a single, deter-

ministic answer, a point-wise estimate of the unknown geometric

transformation. However, in practice, there is always some asso-

ciated uncertainty, the registration accuracy is limited. Knowing

this uncertainty is useful to determine whether and to what ex-

tent the registration results can be trusted and whether the input

data is suitable. It can be used to give more weight to more re-

liable image pairs or spatial locations, for example, in sequence

registration, group-wise registration, ﬂow-inpainting, or recov-

ering elastography parameters from the displacement.

Manuscript received August 12, 2008; revised July 27, 2009. First published

August 25, 2009; current version published December 16, 2009. This work was

supported by the Czech Ministery of Education under Project 1M0567. The

associate editor coordinating the review of this manuscript and approving it for

publication was Dr. Pier Luigi Dragotti.

The author is with the Center for Applied Cybernetics, Faculty of Electrical

Engineering, Czech Technical University in Prague, Czech Republic (e-mail:

kybic@fel.cvut.cz).

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TIP.2009.2030955

This paper presents a general method to estimate the uncer-

tainty of area based (or pixel based, as opposed to landmark or

feature based) image registration algorithms on a particular pair

of images. This method (Section II) uses bootstrap resampling

[9]–[11] and performs well at the cost of increasing the compu-

tational complexity 10 ~ 100 times with respect to the original

algorithm. The key feature of our approach is that the uncer-

tainty is estimated from the input images only, under very weak

assumptions about the registration problem—no ground truth

and no explicit model for the transformation, the noise, or the

images is needed. Also, we aim to estimate the absolute uncer-

tainty (in pixels), not a dimensionless conﬁdence measure with

only a relative interpretation.

There are two main limitations. (i) Only the variability of the

returned transformation can be estimated, not the bias. Fortu-

nately, the bias of image registration algorithms is often quite

small, as can be seen experimentally (Section III-A). (ii) We

need to assume some form of ergodicity of the image gener-

ating processes, so that their behavior across realizations can be

deduced from their behavior in space.

The bootstrap method is compared experimentally with the

Cramér–Rao bound method [12], [13] and also with a fast regis-

tration accuracy estimation (FRAE) method, which is based on

Gaussian approximation and quadratic sensitivity analysis ideas

[14] (Section I-E).

A. Problem Deﬁnition I—Image Registration

Most area based image registration algorithms can be cast

into the following framework: We are given two images

, with for grayscale images. The images are

considered to be random realizations of an image-generating

process (e.g., sensor noise) and are related by an unknown ge-

ometrical transformation

, so that pixel

corresponds to pixel and their values are

dependent. For simplicity of exposition, we consider here espe-

cially the case of a 2-D translation (

)

(1)

which is fully determined by a parameter vector

, .

The quality of the registration is measured by a criterion

(2)

where

is a regularization part of the criterion, often penal-

izing unsmooth deformations. The data part

measures the

similarity of the image

and the warped image , using an

image similarity measure. Again for simplicity we shall use

Authorized licensed use limited to: CZECH TECHNICAL UNIVERSITY. Downloaded on December 22, 2009 at 07:47 from IEEE Xplore. Restrictions apply.

KYBIC: BOOTSTRAP RESAMPLING FOR IMAGE REGISTRATION UNCERTAINTY ESTIMATION 65

the sum of square differences (SSD) similarity criterion and no

regularization

with (3)

where

is a set of pixels of a suitable window.

The transformation parameters are estimated as a minimizer

(4)

We expect the criterion to be relevant, so that the estimated

transformation parameters are close to the true ones,

Our choice of the transformation

and the criterion makes

the registration algorithm equivalent to the well-known block

matching algorithm [15]. In our implementation, image

interpolated using cubic B-splines [16], [17], its derivative is

calculated analytically, and the minimization (4) is performed

using the BFGS (Broyden-Fletcher-Goldfarb-Shanno) pseudo-

Newton algorithm [18], which incrementally updates the esti-

mate of the Hessian matrix from the gradient.

B. Problem Deﬁnition II—Uncertainty Estimation

Since images

, are random (across realizations) due to the

stochastic nature of the image generation process (measurement

noise), the criterion

is also random, and, hence, the esti-

mate

from (4) is random, too. The problem addressed in this

article is to characterize the uncertainty of

. In particular, we

shall evaluate the covariance matrix

with (5)

and a mean displacement variance

(6)

For

, the expression simpliﬁes to

(7)

The mean displacement variance

is equal to the mean squared

geometric error (MSE) provided that the estimator (4) is unbi-

ased,

. MSE is in turn closely related to the warping

index [19]. We also deﬁne the root mean squared error

C. Related Work on Image Registration Accuracy Evaluation

Evaluation of image registration method is most often done

via simulations, generating the data artiﬁcially and comparing

the recovered results with the known true transformation

[20]–[22]. More realistic but less widely applicable ’gold

standard’ approach is to use some independent and sufﬁciently

accurate method to determine the true deformation, such as

using special markers for validation which are not used for

registration [23]–[25]. A “bronze standard” [26], [27] uses a

robust mean of several registration algorithms as a reference.

The registration accuracy can also be estimated indirectly,

from ground truth segmentations [28], [29] or by its ability to

create good generative models [30]. An a posteriori estimate

is possible for low-rank transformations and a large number

of corresponding features [31], [32]. Conﬁdence measures

for block matching [33], [34] and optical ﬂow estimation

[35]–[38] are based either on the data part of the criterion (such

as preferring high correlation) or on the regularization part of

the criterion (penalizing unlikely deformations); they can be

derived from the image derivative covariance matrices [39],

[40], or from a posteriori probabilities [41] assuming a speciﬁc

noise model. However, note that conﬁdence measures typically

do not attempt to recover absolute values of registration errors,

only relative ordering between errors in different spatial posi-

tions within one image.

In some special cases, typically assuming i.i.d. Gaussian

noise statistics, the expected accuracy can be evaluated analyti-

cally [42]–[46].

D. Cramér–Rao Bound

The most relevant prior art is based on estimating the

Cramér–Rao bound [12], [13] for

, which we review here

brieﬂy using our notation for coherence. For tractability, the

following observation model is assumed:

with

where , are zero mean i.i.d. Gaussian additive measure-

ment noises with variance

; , are the input images and

is a ﬁxed but unobservable ’true’ image. The corresponding

log-likelihood is

(8)

The elements of the Fisher information matrix (FIM)

are

(9)

The second quadratic term in (8) is constant with respect to

and the expected value of is zero. Hence

(10)

and using the chain rule yields

(11)

In accordance with [13], we estimate the partial derivatives

using ﬁrst order differences.

The Cramér–Rao bound gives us a lower bound on the co-

variance of any unbiased estimator of

, including (4)

(12)

in the sense of positive-semideﬁniteness.

Authorized licensed use limited to: CZECH TECHNICAL UNIVERSITY. Downloaded on December 22, 2009 at 07:47 from IEEE Xplore. Restrictions apply.

66 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 1, JANUARY 2010

The estimate is described in [12] and [13].

In practice, neither

nor is available. We, therefore, ﬁrst

perform a registration as deﬁned by (4) to obtain

and then

plug-in the following ML estimates:

(13)

(14)

into (10) to obtain a realizable CRB estimate

E. Fast Registration Accuracy Estimation (FRAE)

The second method which we will review here brieﬂy and

later use for comparison is the fast registration accuracy es-

timation method (FRAE) [14], which is based on quadratic

sensitivity analysis ideas. It is a fast method, incurring only a

negligible computational overhead. Given a similarity criterion

which can be written as a sum of pixel contributions

(15)

we start by determining a conﬁdence interval of the criterion

value

at around a noiseless value

(16)

Assuming that

is normally distributed with a standard de-

viation

, then for

(17)

where

is the inverse normal cumulative distribution func-

tion. The standard deviation can be estimated as

(18)

For uncorrelated

, a practical estimator is

(19)

to which we might add the effect of quantization noise [14].

As the true criterion function

is known with a lim-

ited accuracy the position of its minimum

is, therefore, also

known only with a limited accuracy. From the conﬁdence in-

terval (16) and properties of minimum we get an inequality for

the true value of

based on observable quantities

(20)

We approximate

quadratically around

(21)

an estimate of the Hessian

is available for free as a by-product

of the BFGS optimization procedure. This yields

(22)

from which we can get an equivalent covariance matrix that a

normally distributed

would have for (22) to hold as an equality

(23)

where is the inverse cumulative distribution function.

The value of

can be precomputed, for example for

and we get .

F. Bootstrap—Introduction and Related Work

Bootstrap resampling [9]–[11], [47]–[50] is a powerful and

versatile computational technique for assessing the accuracy of

a parametric estimator in small sample situations. Let us have

i.i.d. samples of a random variable with a

probability distribution

. A bootstrap resample is constructed

by randomly selecting

points from with replacement. This

is repeated

times, forming multisets

, . The

bootstrap resamples

are conditionally independent given

and follow the same distribution as .

Let us further have a continuous statistics

(e.g., a mean)

and its estimator

. We are interested in assessing the

reliability of

, as measured for example by its vari-

ance or its conﬁdence interval. We apply the estimator

to the

bootstrap resamples

, obtaining values .

The desired reliability measure is then evaluated using the em-

pirical distribution of the

bootstrap values .

Bootstrap resampling was used in image processing to eval-

uate the performance of detection and classiﬁcation algorithms

[51], [52] and edge detectors [53], to compensate the bias in

estimation of ellipse parameters [54] and to improve image seg-

mentation [55], [56]. Bootstrap was also used to assess the ac-

curacy of a rigid motion estimation algorithm based on 3-D key

points [57], [58].

II. B

OOTSTRAP ACCURACY

ESTIMATION

Bootstrap resampling accuracy estimation [59] is a general

but computationally intensive method. Its inputs are a registra-

tion algorithm and the two input images

and . In contrast to

FRAE and CRB (Sections I-D and I-E ), the bootstrap method

can provide a nonparametric estimate of the probability density

and any desired statistics on , such conﬁdence intervals.

However, for an easy comparison with the CRB and FRAE, we

will concentrate on using bootstrap to obtain a covariance ma-

trix estimate

, and consequently from (7), which has the

additional advantage of requiring only a small number of boot-

strap resamples

and thus being computationally tractable (see

Section III-B).

A. Bootstrap Covariance Estimation

To determine the variability of

from (4) we will use

bootstrap to “simulate” the behavior of the criterion function

across realizations. Bootstrap can be applied to a criterion

written as a sum of pixel contributions (15). However, we use

A multiset is a generalization of a set, which can contain each element several

times.

Authorized licensed use limited to: CZECH TECHNICAL UNIVERSITY. Downloaded on December 22, 2009 at 07:47 from IEEE Xplore. Restrictions apply.

KYBIC: BOOTSTRAP RESAMPLING FOR IMAGE REGISTRATION UNCERTAINTY ESTIMATION 67

a more general form, anticipating its use in Section II-D. We

replace the sum by a more general operation, describing the

data criterion as a function

of a multiset of pairs of pixel

intensities of corresponding pixels

(24)

Following the bootstrap methodology (Section I-F), we take

the pixel coordinates

and make a set of bootstrap resamples

, by sampling from with replacement. We get

a set of

bootstrap versions of the data criterion

with

(25)

For example for SSD (3), the bootstrap version is

(26)

with

(27)

Finally, by minimization of each

we get bootstrap ver-

sions of

(28)

which can be used to estimate any desired statistics on

, such

as the covariance matrix

(29)

with

(30)

B. Practical Bootstrap

Algorithm 1 describes a practical implementation of boot-

strap resampling. At each bootstrap run, a multiset

is con-

structed containing pixels from

, some several times, some not

at all, by repeatedly drawing a random number

from the uni-

form distribution

. This induces a bootstrap version of the

criterion function (25) which is then optimized. The minimiza-

tion(28) is repeated

times. We have observed that

is normally sufﬁcient to estimate [9]. See also Sec-

tion III-B. The starting point for each minimization (Algorithm

1, line 7) can be chosen randomly around the original starting

point

(used to ﬁnd ) to detect potential local minima.

Algorithm 1: Bootstrap registration uncertainty estimation

Input: Images , , set of pixels .

Output: Parameter

, covariance matrix .

2 for to do

4 for to do

;

8 Calculate from using (29)

C. Block Bootstrap

In reality, samples

are not independent —

they are based on different positions in the same images which

are spatially correlated and also the measurement noise can

be correlated. A possible approach is to decorrelate the sam-

ples by ﬁtting an appropriate model before bootstrapping the

residuals [9], [48], [49]. A more robust technique is a moving

block bootstrap [9], [49], [60] which we extend here to

-D.

Its essence is to sample from

not element by element but

by spatially consecutive blocks. This way, the spatial depen-

dency is preserved if the block size

is chosen large enough.

However, choosing

too large decreases the randomness of

the sampling; we use

. Algorithm 2 is a modiﬁed ver-

sion of Algorithm 1 using block bootstrap. The only differ-

ence is that pixel indices are added to

one block of size

at a time. Alternatively, a different (not

rectangular) neighborhood could be used by changing the norm

at line 6 of Algorithm 2.

Algorithm 2: Block bootstrap uncertainty estimation

Input: Images , , set of pixels , block

size

Output: Parameter

, covariance matrix .

2 for to do

4 repeat

7 with

8 until

10 Calculate from using (29).

D. Bootstrap for Different Similarity Criteria

To demonstrate the bootstrap generality, we show its appli-

cation to several commonly used image similarity criteria be-

sides SSD (3). The sum of absolute differences (SAD) criterion

is written as follows [compare with (27)]:

Similarly, the (negative) normalized correlation criterion

(NCC) is obtained as follows:

(31)

Authorized licensed use limited to: CZECH TECHNICAL UNIVERSITY. Downloaded on December 22, 2009 at 07:47 from IEEE Xplore. Restrictions apply.

68 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 1, JANUARY 2010

The mutual information (MI) has no readily identiﬁable pixel

contributions, nevertheless it ﬁts well into the formulation (24)

(32)

where

is the smooth joint histogram [61] with

bins and parameters , , , , and , are the cor-

responding marginal histograms

where is the chosen windowing function; we are using a linear

B-spline, i.e., P1 or linear interpolation.

The bootstrap algorithm (Algorithm 1) works unchanged for

all four presented similarity criteria. Care must be taken when

evaluating the criterion for the minimization on line 7 that it is

calculated over the bootstrap multiset

instead of the original

set of pixels. The bootstrap samples are not spatially indepen-

dent, especially for the NCC and MI criteria, but in spite of that,

the bootstrap works well and it is not even necessary to use the

block bootstrap (see the experimental results in Section III-A).

III. E

XPERIMENTS

A. Block Matching Accuracy Prediction

The purpose of the ﬁrst experiment is to measure the true root

mean squared geometrical error (RMSE) of the block matching

algorithm (Section I-A) and to compare it with the predicted

(6), (7) by the Cramér–Rao bound method CRB (Section I-D),

the FRAE method [14] (Section I-E) and the bootstrap method

(Section II).

We took the gray-scale 8-bit Lena image of size 512

512

pixels and selected three rectangular regions of interest (ROI)

of size 61

61 containing high, medium, and low amount of

texture and detail, respectively (Fig. 1). In each run, we have

displaced the ROI with a randomly selected displacement

uniformly distributed in the range pixels. We have

perturbed both the original ROI and the displaced ROI with

one of three types of noise: (i) uncorrelated zero-mean i.i.d.

Gaussian (white) noise with varying standard deviation

;

(ii) correlated Gaussian noise obtained by convolving the i.i.d.

noise by a Gaussian kernel with standard deviation 0.8 pixels;

(iii) salt & pepper noise obtained by changing with probability

the value of each pixel to either 0 or 255 (chosen randomly);

was between and 0.3. The block matching registration

was run with a small (up to

pixels) random initial

displacement

. A constrained BFGS optimization was used

with the maximum displacement set to

pixels to detect

divergence. The experiment was performed

times for each

method, noise type and noise level. We are reporting the root

mean squared geometrical error (RMSE)

Fig. 1. Lena test image with three rectangular test areas 1,2,3 (ROIs) with pro-

gressively decreasing level of detail.

in pixels and comparing it with the mean displacement (6),

(7) estimated by the evaluated methods. Bias is negligible in all

cases. To eliminate the inﬂuence of outliers (the optimization

program failing to converge) and thus distorting the statistics,

we used a trimmed mean, discarding

of the highest and

lowest values. This inﬂuences only in minor ways the reported

results and only for the highest noise levels. We only report

results for ROI size 61

61 because results for other ROI sizes

were similar, the error slowly decreases with increasing ROI

size for all methods; this is because only translational motion

is considered.

Fig. 2 shows selected results. We can see that the Cramér–Rao

bound (CRBi) gives a good estimate of the accuracy, especially

for higher SNR [Fig. 2(a)–(c)]. It nevertheless consistently un-

derperforms the bootstrap and often also the FRAE method.

Bear in mind, however, that under practical conditions, CRBi

cannot be evaluated because it depends on unknown quantities.

We can calculate only CRBr (Section I-D) which gives exceed-

ingly optimistic estimates, especially for low SNR, being the

worst of the methods tested. The advantage of CRBr is its min-

imal computational cost. However, the results show that it is us-

able only for Gaussian noise and high SNR.

For medium to high SNR and Gaussian noise, the FRAE

method (Section I-E) gives usable estimates that correctly

follow the trend of the true error, even though the error is often

overestimated [Fig. 2(a)–(f)]. The FRAE method fails for low

SNR (worse than

dB) because the Hessian estimate is

unreliable in this case. The FRAE method also fails for the salt

& pepper noise at the SNR levels tested [Fig. 2(g)–(i)].

A clear winner is the bootstrap method (Section II-A). The

estimated error follows the true error for both uncorrelated

and correlated noise, as well as for the salt & pepper noise

[Fig. 2(a)–(i)]. Most of the time the ratio between the two

values is less than 2.

On the other hand, the beneﬁt of the block bootstrap method

(Section II-C) has not been demonstrated. In some cases block

bootstrap performs better than normal bootstrap, such as for po-

sition 1 and correlated noise [Fig. 2(d)]. Most of the time there

is no clear improvement, such as for the salt & pepper noise

[Fig. 2(g)–(i)] or for uncorrelated noise (not shown). And there

are also cases when block bootstrap is inferior to standard boot-

strap [Fig. 2(e)-(f)].

Authorized licensed use limited to: CZECH TECHNICAL UNIVERSITY. Downloaded on December 22, 2009 at 07:47 from IEEE Xplore. Restrictions apply.

Bootstrap Resampling for Image Registration Uncertainty Estimation Without Ground Truth

Figures

Citations

A Survey on Deep Learning-Driven Remote Sensing Image Scene Understanding: Scene Classification, Scene Retrieval and Scene-Guided Object Detection

Adjusted Fireworks Algorithm Applied to Retinal Image Registration

Summarizing and visualizing uncertainty in non-rigid registration

Probabilistic inference of regularisation in non-rigid registration.

Deformable image registration by combining uncertainty estimates from supervoxel belief propagation

References

An introduction to the bootstrap

Bootstrap Methods: Another Look at the Jackknife

Image registration methods: a survey

Bootstrap Methods: Another Look at the Jackknife

Pattern Classification (2nd ed.)

Related Papers (5)

Summarizing and visualizing uncertainty in non-rigid registration

Nonrigid registration using free-form deformations: application to breast MR images

Evaluation of 14 nonlinear deformation algorithms applied to human brain MRI registration.

Deformable Medical Image Registration: A Survey

elastix : A Toolbox for Intensity-Based Medical Image Registration

Frequently Asked Questions (13)

Q1. What have the authors contributed in "Bootstrap resampling for image registration uncertainty estimation without ground truth" ?

Q2. What is the criterion used to calculate the Hessian matrix?

Q3. What is the method for assessing the accuracy of a parametric estimator?

Q4. What is the value of a bootstrap resampling?

Q5. What is the criterion for the hessian matrix?

Q6. What is the effect of the trimmed mean?

Q7. What is the coefficient of variation of the bootstrap method?

Q8. What is the method for estimating the accuracy of a pixel?

Q9. What is the log-likelihood of the Fisher information matrix?

Q10. What is the definition of the mean displacement variance?

Q11. What is the method for estimating SNR?

Q12. How can the authors estimate the accuracy of the registration method?

Q13. What is the corresponding log-likelihood of the partial derivatives?