scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Image registration using log-polar mappings for recovery of large-scale similarity and projective transformations

01 Oct 2005-IEEE Transactions on Image Processing (IEEE Trans Image Process)-Vol. 14, Iss: 10, pp 1422-1434
TL;DR: A novel technique to recover large similarity transformations (rotation/scale/translation) and moderate perspective deformations among image pairs and achieves subpixel accuracy through the use of nonlinear least squares optimization.
Abstract: This paper describes a novel technique to recover large similarity transformations (rotation/scale/translation) and moderate perspective deformations among image pairs We introduce a hybrid algorithm that features log-polar mappings and nonlinear least squares optimization The use of log-polar techniques in the spatial domain is introduced as a preprocessing module to recover large scale changes (eg, at least four-fold) and arbitrary rotations Although log-polar techniques are used in the Fourier-Mellin transform to accommodate rotation and scale in the frequency domain, its use in registering images subjected to very large scale changes has not yet been exploited in the spatial domain In this paper, we demonstrate the superior performance of the log-polar transform in featureless image registration in the spatial domain We achieve subpixel accuracy through the use of nonlinear least squares optimization The registration process yields the eight parameters of the perspective transformation that best aligns the two input images Extensive testing was performed on uncalibrated real images and an array of 10,000 image pairs with known transformations derived from the Corel Stock Photo Library of royalty-free photographic images

Summary (3 min read)

Introduction

  • Therefore, the images differ by large rotation and scale.
  • See [2] for a recent survey of image registration techniques.
  • Similarity measures like the zero-mean normalized sum of squared differences (SSD) and correlation coefficient are invariant to the linear intensity changes.
  • In Section II, the authors discuss related work on the standard Levenberg–Marquardt algorithm (LMA) and log-polar techniques.

II. PREVIOUS WORK

  • The authors discuss related work on the LMA and the log-polar techniques.
  • In Section II-A, the authors present a background of the Levenberg–Marquardt nonlinear least-squares optimization algorithm that is useful for achieving subpixel registration accuracy.
  • The log-polar transform is described in Section II-B.
  • In Section II-C, the authors discuss the Fourier–Mellin transform, its limitations, and a review of related work.
  • Section II-D discusses a feature-based method that can register images subjected to large scale changes (i.e., ) and arbitrary rotation.

B. Log-Polar Transform

  • The log-polar transformation is a nonlinear and nonuniform sampling of the spatial domain.
  • The log-polar transform has received considerable attention.
  • If the authors assume that and lie along the horizontal and vertical axes, respectively, then image shown in Fig. 2(a) will be mapped to image in Fig. 2(b) after a log-polar coordinate transformation.
  • The log-polar mapping is an accepted model of the representation of the retina in the primary visual cortex in primates, also known as V1 [23]–[25].
  • This bandwidth reduction helps us process a high resolution image only at the focus of attention while aware of a wider field of view.

D. Feature-Based Image Registration

  • Feature-based image registration algorithms extract salient structures, such as points, lines, curves, and regions, from graylevel images and establish correspondences between features using invariant descriptors.
  • In more recent feature-based work, registration for wide baseline applications has been reported in [48]–[52].
  • These results are promising in that they accommodate larger deformations.
  • These stages include corner detection, conversion to invariant descriptors, matching based on the Mahalanobis distance or k-d tree, and outlier removal using the RANSAC algorithm.
  • Whereas their methods are designed to operate under textured regions, they may fail in smooth regions.

III. MODIFIED LMA

  • The LMA solves the following system of equations in an iterative fashion: (5) where is the Hessian matrix and is the residual vector (6) (7).
  • The authors can improve the standard Levenberg–Marquardt optimization algorithm outlined above by adding two modifications.
  • The first modification includes the use of a multiresolution pyramid for both reference and target images.
  • The second modification virtually eliminates the calculation of the Hessian matrix (7) which would otherwise have been computed in every iteration.
  • The authors second modification is based on the work of [16], whereby registration was performed on medical images subjected to similarity transforms (rotation/scale/translation).

A. Multiresolution Pyramid

  • The original image, sitting at the base of the pyramid, is downsampled by a constant scale factor in each dimension to form the next level.
  • This is repeated from one level to the next until the tip of the pyramid is reached.
  • Second, the smoothness conditions imposed by successively bandlimiting the pyramid levels causes to be computed on smoother images.
  • An example of computed on two different pyramid levels is shown in Fig.
  • Thus, the relation between parameters is (11).

B. Modified Levenberg–Marquardt Algorithm

  • In the standard LMA, the authors calculate the vector and Hessian matrix in each iteration.
  • This is achieved by casting this problem into one where is transformed into , leaving unchanged from one iteration to the next.
  • An important distinction between the standard and modified LMA methods lie in the manner in which the unknown parameters are updated in each iteration.
  • Instead of minimizing (15a), the authors minimize (15c) with respect to the parameters .
  • For further details about resampling, see [57].

IV. GLOBAL REGISTRATION USING LOG-POLAR TRANSFORM

  • The authors have implemented a new algorithm for automatically finding the translation between both input images in the presence of large scale and rotation.
  • The radius and the center of the template are optionally given by the user.
  • The authors compute the base of the logarithm for log-polar transformation as follows: (20) where is the width of the input image ( diameter).
  • The normalized correlation coefficient similarity measure is given as follows: (21) where is the average of image .
  • Furthermore, the bulk of their computation is performed at the coarsest level where there are fewest pixels.

V. EXPERIMENTAL RESULTS

  • An analytical evaluation of the robustness of image registration algorithms is an elusive task.
  • Performance is highly dependent on the content of the input images.
  • Many proposed image registration algorithms in the literature have limited their published results to the use of a few reference images and their synthetically generated target images.
  • The reference and target images are taken by a camera with optical zoom.
  • In Section V-B, the authors test the robustness of their algorithm with a large suite of 10 000 image pairs.

A. Uncalibrated Test Images

  • A Canon PowerShot G3 digital camera with 4 optical zoom was used to capture a set of test images taken from natural and man-made scenes.
  • The authors method uses all pixels and does not depend on any specific feature set.
  • Images were acquired with (a) no magnification and (b) 4 magnification with unknown rotation about the optical axis.
  • In order to quantify registration accuracy, the authors compute the correlation coefficient in the overlapping area.
  • All of the non-SIFT methods failed to register the image pairs in Fig.

B. Calibrated Test Images

  • It is not feasible to capture a very large set of images with a variety of image content and transformation parameters.
  • The authors generated 10 000 target images from these random parameters.
  • First, the authors used their log-polar module to recover the global rotation, scale, and translation parameters.
  • The authors have tested their registration algorithm to create image mosaics by stitching together low resolution frames from several overlapping images.
  • In order to best expose any misalignment, the authors applied unweighted averaging upon the overlapping areas.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

1422 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 10, OCTOBER 2005
Image Registration Using Log-Polar Mappings
for Recovery of Large-Scale Similarity and
Projective Transformations
Siavash Zokai and George Wolberg, Senior Member, IEEE
Abstract—This paper describes a novel technique to recover
large similarity transformations (rotation/scale/translation) and
moderate perspective deformations among image pairs. We in-
troduce a hybrid algorithm that features log-polar mappings
and nonlinear least squares optimization. The use of log-polar
techniques in the spatial domain is introduced as a preprocessing
module to recover large scale changes (e.g., at least four-fold) and
arbitrary rotations. Although log-polar techniques are used in the
Fourier–Mellin transform to accommodate rotation and scale in
the frequency domain, its use in registering images subjected to
very large scale changes has not yet been exploited in the spatial
domain. In this paper, we demonstrate the superior performance
of the log-polar transform in featureless image registration in the
spatial domain. We achieve subpixel accuracy through the use
of nonlinear least squares optimization. The registration process
yields the eight parameters of the perspective transformation that
best aligns the two input images. Extensive testing was performed
on uncalibrated real images and an array of 10,000 image pairs
with known transformations derived from the Corel Stock Photo
Library of royalty-free photographic images.
Index Terms—Image registration, Levenberg–Marquardt non-
linear least-squares optimization, log-polar transform, perspective
transformation, similarity transformation.
I. INTRODUCTION
D
IGITAL image registration is a branch of computer vision
that deals with the geometric alignment of a set of im-
ages. The set may consist of two or more digital images taken
of a single scene at different times, from different sensors, or
from different viewpoints. A large body of research has been
drawn to this area due to its importance in remote sensing, med-
ical imaging, computer graphics, and computer vision. Despite
comprehensive research spanning over thirty years, robust tech-
niques to register images in the presence of large deformations
remains elusive. Most techniques fail unless the input images
are misaligned by moderate deformations.
The goal of registration is to establish geometric correspon-
dence between the images so that they may be transformed,
compared, and analyzed in a common reference frame. Regis-
tration is often necessary for 1) integrating information taken
Manuscript received April 27, 2004; revised October 11, 2004. This work was
supported in part by an ONR HBCU/MI Research and Education Program Grant
(N000140310511) and a PSC-CUNY Grant. The associate editor coordinating
the review of this manuscript and approving it for publication was Dr. Luca
Lucchese.
S. Zokai is with Brainstorm Technology LLC, New York, NY 10011 USA.
G. Wolberg is with the Department of Computer Science, City College of
New York, New York, NY 10031 USA (e-mail: wolberg@cs.ccny.cuny.edu).
Digital Object Identifier 10.1109/TIP.2005.854501
from different sensors (i.e., multisensor data fusion), 2) finding
changes in images taken at different times or under different
conditions, 3) inferring three-dimensional (3-D) information
from images in which either the camera or the objects in the
scene have moved, and 4) for model-based object recognition.
The most common task associated with image registration
is the generation of large panoramic images for viewing and
analysis. Image mosaics, created by warping and blending
together several overlapping images, are central to this process.
Other common registration tasks include producing super-reso-
lution images from multiple images of the same scene, change
detection, motion stabilization, topographic mapping, and
multisensor image fusion.
This work attempts to register two images using one global
perspective transformation even in the presence of arbitrary ro-
tation angles and large scale changes (up to 5
zoom). Our
work is motivated by the problem of registering airborne im-
ages. These images are taken at vastly different times, altitudes,
and directions. Therefore, the images differ by large rotation and
scale. Also, the pitch and roll introduces moderate perspective.
In general, images of a 3-D scene do not differ by just one per-
spective transformation because the depth between the camera
and the objects introduces parallax. A global transformation
cannot align all features in such cases. We must, therefore, place
constraints on camera motion and/or our 3-D scene to produce
images that are free of parallax. One constraint requires the
camera motion to be limited to rotation, pan, tilt, and zoom about
a fixed point, e.g, on a tripod. If this constraint is not satisfied,
then we may still have images free of parallax if the object’s 3-D
points
in the scene are far away from the camera, i.e.,
. This means that the scene is flat and we are looking
at a planar object. In either case, we assume that the scene is
static and the lighting is fixed between images. Nevertheless, we
have relaxed these conditions to accommodate local disparity
and linear changes in illumination.
A survey by Brown [1] introduces a framework in which all
registration techniques can be understood. The framework con-
sists of the feature space, similarity measure, search space, and
search strategy. The feature space extracts the information in
the images that will be used for matching. The search space is
the class of transformations, or deformation models, that is ca-
pable of aligning the images. The search strategy decides how
to choose the next transformation from this space, to be tested in
the search for the optimal transformation. The similarity mea-
sure determines the relative merit for each test. Search continues
according to the search strategy until a transformation is found
1057-7149/$20.00 © 2005 IEEE

ZOKAI AND WOLBERG: IMAGE REGISTRATION USING LOG-POLAR MAPPINGS 1423
Fig. 1. Airborne imagery. (a) Observed images. (b) Reference image. (c) Registration overlays.
whose similarity measure is satisfactory. Numerous registration
techniques have been proposed based on choosing a specic fea-
ture, deformation model, optimization method, and/or similarity
measure. See [2] for a recent survey of image registration tech-
niques.
For image registration, we need to recover the geo-
metric transformation and/or intensity function. Let
and be the reference and observed images, re-
spectively. The relationship between these images is
, where is a two-dimensional
(2-D) geometric transformation operator that relates the
coordinates in to the coordinates in and is the
intensity function.
The estimation of the intensity function
is useful when we
want to register images taken from different sensors or when il-
lumination is changed by automatic gain exposure of a camera.
Comparametric equations have been introduced to model the in-
tensity function
[3]. Although these equations are nonlinear,
a piecewise linear method has been developed to estimate
and simultaneously [4].
Mutual information is a similarity measure that has recently
been introduced for multimodal medical image registration [5],
[6]. Correlation ratio is another similarity measure for multi-
modal image registration and has proven to perform better than
mutual information [7]. Multimodal image registration has been
studied extensively in the medical imaging domain. In this work,
we assume that the intensity function is linear. Similarity mea-
sures like the zero-mean normalized sum of squared differences
(SSD) and correlation coefcient are invariant to the linear in-
tensity changes.
This paper describes a hierarchical image registration system.
We model the mapping function as a perspective transforma-
tion. The algorithm estimates the perspective parameters neces-
sary to register any two misaligned digital images. The parame-
ters are selected to minimize the SSD between the two images.
They are computed iteratively in a coarse-to-ne hierarchical
framework using a variation of the LevenbergMarquadt non-
linear least squares optimization method. This approach yields
a robust solution that precisely registers images with subpixel
accuracy.
The primary drawback of the optimization-based approach
is that it may fail unless the two images are misaligned by a
moderate difference in scale, rotation, and translation. In order
to address this problem, we introduce a log-polar registration
module to bring the images into approximate alignment, even in
the presence of arbitrary rotation angles and large scale changes.
Its purpose is to furnish a good initial estimate to the perspec-
tive registration module that is based on nonlinear least squares
optimization.
The scope of this work shall prove useful for various ap-
plications, including the registration of aerial images, and the
formation of image mosaics. Note that aerial imagery may be
acquired from uncalibrated airborne cameras subjected to yaw,
pitch, and roll at various altitudes. Since the terrain appears at
from moderately high altitude, it is an ideal candidate for reg-
istration using a single perspective transformation. An example
demonstrating the registration of two aerial images in the pres-
ence of large scale/rotation and moderate perspective is shown
in Fig. 1. The image in Fig. 1(a) is automatically registered to
that in Fig. 1(b), as depicted by the highlighted rectangle.
In Section II, we discuss related work on the standard Lev-
enbergMarquardt algorithm (LMA) and log-polar techniques.
Section III describes a modied LMA for improving the per-
formance of the standard LMA and Section IV presents our
proposed log-polar method. In Section V, we demonstrate the
success of the log-polar transform in recovering large deforma-
tions by comparing registration accuracy with and without the
log-polar registration module. A signicant increase in correct
matches is attributed to our algorithm. A secondary compar-
ison was made by replacing the log-polar module with the well-
known FourierMellin transform. Again, our log-polar module
proved superior to the FourierMellin transform for achieving
high perspective registration accuracy.
II. P
REVIOUS WORK
In this section, we discuss related work on the LMA and the
log-polar techniques. In Section II-A, we present a background
of the LevenbergMarquardt nonlinear least-squares optimiza-
tion algorithm that is useful for achieving subpixel registration
accuracy. The log-polar transform is described in Section II-B.

1424 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 10, OCTOBER 2005
In Section II-C, we discuss the FourierMellin transform, its
limitations, and a review of related work. Section II-D discusses
a feature-based method that can register images subjected to
large scale changes (i.e.,
) and arbitrary rotation.
A. LMA
There is a vast literature of work in the related elds of image
registration, motion estimation, image mosaics, and video in-
dexing that make use of a nonlinear least-squares optimization
technique known as the LMA. Most algorithms exploit a hier-
archical approach due to computational efciency in handling
large displacements. Algorithms for hierarchical motion esti-
mation [8][10] and image mosaicing [11][20] usually assume
small deformations among image pairs. For instance, a dense
image sequence is required to stitch the frames together [14],
[18]. The problem of assembling a large set of images into a
common reference frame is simplied when the inter-frame de-
formations are small. The LMA uses the SSD as the similarity
measure between two images (or regions)
(1)
and the discrete form is
Note that is a geometric transformation applied to image
to map it from its coordinate system to the coordi-
nate system of
. In our case, the subscript is a 3 3 per-
spective transformation matrix and
is the number of pixels.
B. Log-Polar Transform
The log-polar transformation is a nonlinear and nonuniform
sampling of the spatial domain. Nonlinearity is introduced by
polar mapping, while nonuniform sampling is the result of log-
arithmic scaling. Despite the difculties of nonlinear processing
for computer vision applications, the log-polar transform has re-
ceived considerable attention. Consider the log-polar
coordinate system, where denotes radial distance from the
center
and denotes angle. Any point can be rep-
resented in polar coordinates
(2)
(3)
Applying a polar coordinate transformation to an image
maps radial lines in Cartesian space to horizontal lines in the
polar coordinate space. We shall denote the transformed image
. If we assume that and lie along the horizontal and ver-
tical axes, respectively, then image
shown in Fig. 2(a) will be
mapped to image
in Fig. 2(b) after a log-polar coordinate
transformation.
Fig. 2. Log-polar coordinate transformation. (a) Input image. (b) Log-polar
transformation.
The motivation for considering the log-polar transform stems
from its biological origins. The rst reported discoveries of log-
polar mappings in the primate visual system were reported in
[21] and [22]. The log-polar mapping is an accepted model of
the representation of the retina in the primary visual cortex in
primates, also known as V1 [23][25]. The nonuniform sam-
pling that simulates logarithmic scale takes place in the retina
and the nerve endings from the retina are connected to the visual
cortex by a special mapping. This mapping realizes the polar
transformation by a simple rewiring. The radial nerve endings
are connected horizontally to the visual cortex. Due to these bi-
ological origins, the log-polar transform has often been referred
to as the retino-cortical transform [26]. The log-polar transform
has two principal advantages: 1) rotation and scale invariance
and 2) the spatially varying sampling in the retina is the solu-
tion to reduce the amount of information traversing the optical
nerve while maintaining high resolution in the fovea and cap-
turing a wide eld of view. This bandwidth reduction helps us
process a high resolution image only at the focus of attention
while aware of a wider eld of view. Several researchers have
designed log-polar sensors for active and real-time vision appli-
cations [27][31]. These efforts sought to make the leap from
biological hardware to VLSI hardware.
C. Fourier–Mellin Transform
The FourierMellin registration method is based on phase
correlation and the properties of Fourier analysis. The phase
correlation method can nd the translation between two im-
ages. The FourierMellin transform extends phase correlation
to handle images related by both translation and rotation
[32][39]. According to the rotation and translation properties
of the Fourier transform, the transforms are related by
We can see that the magnitude of spectra is a rotated
replica of
. Both spectrum share the same center of rotation.

ZOKAI AND WOLBERG: IMAGE REGISTRATION USING LOG-POLAR MAPPINGS 1425
Fig. 3. Effects of optical and digital zoom on the power spectrum.
(a) Reference image. (b) Target image (real). (c) Target image (synthetic).
We can recover this rotation by representing the spectra and
in polar coordinates
(4)
The Fourier magnitude in polar coordinates differs only
by translation. We can use the phase-correlation method to
nd this translation and estimate
. This method has been
extended to nd scale by mapping the Fourier magnitude to
log-polar coordinates. Therefore, one nds scale and rotation
by phase-correlation, which recovers the amount of shifts in
space. One advantage of this method is that it tol-
erates additive noise. The method, however, can only recover
moderate scales and rotations. This difculty can be understood
by realizing that large rotation and scale changes exacerbate the
border effects when computing the Fourier transform. These
problems are minimized in the rare case when the images are
periodic. Therefore, a large translation, or scale introduces
additional pixel information that can dramatically alter the
Fourier coefcients.
In early papers on FourierMellin, the border problems were
not investigated. They were, however, reported recently in [40]
and [41], where the authors showed that rotation and scale in-
troduce aliasing in the low frequencies. They have suggested
that two preprocessing steps are needed to alleviate the aliasing
problem. First, the image must be multiplied by a radial mask
consisting of a 2-D Gaussian function. Second, a low-pass lter
must be applied to remove the offending low frequencies. The
researchers in [35] reported that they recovered scale factors up
to 1.8 and 80
rotations.
It is important to note that the literature is replete with syn-
thetic examples for the FourierMellin registration method. In
particular, a reference image is always matched against a scaled
and rotated version of itself. This serves to defer the problem of
handling the ne details introduced by an actual optical zoom.
Conversely, when the image undergoes minication, translation,
or rotation, additional real data seeps into the target image, not
just black pixels. Note that articial black backgrounds can help
register two images because it ensures that we consider the same
underlying content.
An example demonstrating the differences between digital
and optical zoom is shown in Fig. 3. As is expected, the shape
of the spectrum in Fig. 3(c) conforms to the inverse relationship
between space and frequency. However, the spectra of Fig. 3(b)
reects the fact that the images were taken with optical zoom
and minor perspective distortion was introduced due to real hand
movement. Although the FourierMellin transform is able to
correctly register the synthetic image shown in Fig. 3(c), the
image in Fig. 3(b) dees recovery because of the lack of simi-
larity in its spectra compared to that of the reference image.
An important contribution of this work is that we introduce
a new method based on the log-polar transform in the spatial
domain that works robustly with real images.
D. Feature-Based Image Registration
Feature-based image registration algorithms extract salient
structures, such as points, lines, curves, and regions, from
graylevel images and establish correspondences between
features using invariant descriptors. Early work in this area
includes [42][47]. This work, however, is generally limited to
small geometric deformations.
In more recent feature-based work, registration for wide base-
line applications has been reported in [48][52]. These results
are promising in that they accommodate larger deformations.
Finding local and invariant features is an important tool for
detecting correspondences between different views of a scene.
In [50], the authors detect quadrilateral and elliptical locally
afne regions for nding the fundamental matrix in wide-base-
line stereo images. In [51] and [53], the authors look for locally
afne regions. They compute several degrees of moments in
these regions to build feature vectors for wide-baseline stere-
oscopy [53] and image retrieval [51]. Their work tolerates only
small scale changes.
Recently, several researchers at INRIA and University of
British Columbia developed methods for recovering large-scale
deformation based on scale-space theory [49], [54][56]. The
INRIA method computes interest points at different scales, cal-
culating at each scale a set of local descriptors that are invariant
to rotation, translation, and illumination. The Mahalanobis
distance is then used to nd the corresponding interest points
between two images. In order to remove outliers, they use the
RANSAC algorithm with constraints based on collections of
points. In the work of Lowe and his colleagues, a scale-invariant
feature transform (SIFT) is introduced to nd features and a
k-d tree is used to match features across multiple images [48],
[49]. To our knowledge, the techniques described in [49] and
[54] are the only works that are applied to outdoor images
with large scale factors (i.e.,
) derived from optical zoom
cameras (not digital zoom). Our registration algorithm is able
to properly register all of their test data. Their methods consist
of a series of complex stages that are not prone to direct hard-
ware implementation. These stages include corner detection,
conversion to invariant descriptors, matching based on the
Mahalanobis distance or k-d tree, and outlier removal using the
RANSAC algorithm. Whereas their methods are designed to
operate under textured regions, they may fail in smooth regions.
III. M
ODIFIED LMA
The LMA solves the following system of equations in an it-
erative fashion:
(5)

1426 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 10, OCTOBER 2005
where is the Hessian matrix and is the residual
vector
(6)
(7)
We can improve the standard LevenbergMarquardt opti-
mization algorithm outlined above by adding two modications.
The rst modication includes the use of a multiresolution
pyramid for both reference and target images. The second
modication virtually eliminates the calculation of the Hessian
matrix (7) which would otherwise have been computed in every
iteration. Our second modication is based on the work of [16],
whereby registration was performed on medical images sub-
jected to similarity transforms (rotation/scale/translation). We
have extended their method to recover perspective parameters.
A. Multiresolution Pyramid
A multiresolution pyramid consists of a set of images repre-
senting an image in multiple resolutions. The original image,
sitting at the base of the pyramid, is downsampled by a constant
scale factor in each dimension to form the next level. This is re-
peated from one level to the next until the tip of the pyramid is
reached. The image size at level
is reduced from the original
by a factor of
in each dimension. Level 0, at the base of the
pyramid, is referred to as the nest level. Level
, at the tip
of the pyramid, is known as the coarsest level.
Multiresolution pyramids supply us with two major advan-
tages. First, when we apply the LevenbergMarquardt method
to the coarsest level of the pyramid, the number of pixels is re-
duced by a factor of
. We get large computational gains
because most of the iterations are executed in the coarsest level,
consisting of fewer pixels. Second, the smoothness conditions
imposed by successively bandlimiting the pyramid levels causes
to be computed on smoother images. This smoothness
property helps prevent getting trapped in local minimas. An ex-
ample of
computed on two different pyramid levels is
shown in Fig. 4. Since the coarsest level retains large-scale fea-
tures only, the registration algorithm proceeds from the coarsest
level to progressively ner levels, where small corrections due to
ner details are integrated. This approach passes the computed
parameters as an initial estimate to the next ner level. The pa-
rameters must be scaled properly across successive levels. Let
the scale factor between the levels be
: ,
, , and , where
(8a)
(8b)
Fig. 4. Example of
(
a
)
computed on two different pyramid levels.
Substituting the coordinates of the next ner level into the above
equations yields
(9)
Multiplying both sides by
gives us
(10)
Thus, the relation between parameters is
(11)
In our case,
, so the translation parameters and are
multiplied by two and
and divided by two.
B. Modified Levenberg–Marquardt Algorithm
In the standard LMA, we calculate the
vector and Hes-
sian matrix
in each iteration. In this section, we review a
modied LMA that realizes performance gains by eliminating
the calculation of the Hessian matrix at each iteration. Consider
the following objective function to establish a similarity mea-
sure between
and
(12)

Citations
More filters
Journal ArticleDOI
TL;DR: This paper introduces adaptive polar transform (APT) technique and an innovative matching mechanism that serve as image processing tool for recovering scale and rotation change of the registered image.
Abstract: Image registration is an essential step in many image processing applications that need visual information from multiple images for comparison, integration, or analysis. Recently, researchers have introduced image registration techniques using the log-polar transform (LPT) for its rotation and scale invariant properties. However, it suffers from nonuniform sampling which makes it not suitable for applications in which the registered images are altered or occluded. Inspired by LPT, this paper presents a new registration algorithm that addresses the problems of the conventional LPT while maintaining the robustness to scale and rotation. We introduce a novel adaptive polar transform (APT) technique that evenly and effectively samples the image in the Cartesian coordinates. Combining APT with an innovative projection transform along with a matching mechanism, the proposed method yields less computational load and more accurate registration than that of the conventional LPT. Translation between the registered images is recovered with the new search scheme using Gabor feature extraction to accelerate the localization procedure. Moreover an image comparison scheme is proposed for locating the area where the image pairs differ. Experiments on real images demonstrate the effectiveness and robustness of the proposed approach for registering images that are subjected to occlusion and alteration in addition to scale, rotation, and translation.

73 citations


Cites methods from "Image registration using log-polar ..."

  • ...Zokai et al [3] uses multi-resolution framework to reduce the computation cost of the search....

    [...]

  • ...Recently Zokia et al [3] proposed an innovative image registration technique by using Log-Polar transform (LPT)....

    [...]

  • ...Zokai et al [4] uses a multi-resolution imaging technique to reduce the computation cost of translation recovery by searching from the coarsest to the finest level....

    [...]

  • ...Unlike the conventional LPT that can use 2D correlation technique [3] for matching LPT images and obtaining scale and rotation information, APT requires the correlation to be performed separately for each sampling radius r of the transformed image....

    [...]

  • ...In this section, the performance comparisons are made between the conventional LPT, as proposed in [3], and the...

    [...]

Journal ArticleDOI
TL;DR: Three key factors underlying superresolution that enable the perceptual acuity to surpass the sensor resolution are identified and envisage that these principles will enable cheap high-acuity tactile sensors that are highly customizable to suit their robotic use.
Abstract: Motivated by the impact of superresolution methods for imaging, we undertake a detailed and systematic analysis of localization acuity for a biomimetic fingertip and a flat region of tactile skin. We identify three key factors underlying superresolution that enable the perceptual acuity to surpass the sensor resolution: 1) the sensor is constructed with multiple overlapping, broad but sensitive receptive fields; 2) the tactile perception method interpolates between receptors (taxels) to attain subtaxel acuity; and 3) active perception ensures robustness to unknown initial contact location. All factors follow from active Bayesian perception applied to biomimetic tactile sensors with an elastomeric covering that spreads the contact over multiple taxels. In consequence, we attain extreme superresolution with a 35-fold improvement of localization acuity (0.12 mm) over sensor resolution (4 mm). We envisage that these principles will enable cheap high-acuity tactile sensors that are highly customizable to suit their robotic use. Practical applications encompass any scenario where an end-effector must be placed accurately via the sense of touch.

71 citations

Journal ArticleDOI
TL;DR: The tactile sensing array can be seen as a coordinated system of touch sensors and the low spatial resolution measured with the FSRs compared to other force or pressure sensors required the use of a super-resolution algorithm.
Abstract: This paper presents a tactile sensor consisting of an array of force sensing resistors (FSRs). The tactile sensing array can be seen as a coordinated system of touch sensors. The low spatial resolution measured with the FSRs compared to other force or pressure sensors required the use of a super-resolution algorithm. Super-resolution algorithms are often used in digital image processing to enhance the resolution of images. Multiple images taken from slightly different orientations are superimposed in such a way that a single higher-resolution image is obtained. Different touch sensors are briefly discussed and the use of FSRs is motivated. Image-registration techniques are discussed and the super-resolution algorithm developed for the application is presented. Some tests performed using the tactile sensor in a neck palpation device and the results of these tests are also presented.

66 citations


Cites background or methods from "Image registration using log-polar ..."

  • ...Digital image registration is a branch of computer vision that deals with the geometric alignment of a set of images [8]....

    [...]

  • ...The LMA is commonly used in image registration [8], motion estimates [21], image mosaics [22], [23], and video indexing....

    [...]

  • ...The mapping between the two images can then be expressed as follows [8]:...

    [...]

Journal ArticleDOI
TL;DR: A mathematical model is derived that describes an exact form for embedding the masking step fully into the Fourier domain so that all steps of translation registration can be computed efficiently using Fast Fourier Transforms.
Abstract: Registration is one of the most common tasks of image analysis and computer vision applications. The requirements of most registration algorithms include large capture range and fast computation so that the algorithms are robust to different scenarios and can be computed in a reasonable amount of time. For these purposes, registration in the Fourier domain using normalized cross-correlation is well suited and has been extensively studied in the literature. Another common requirement is masking, which is necessary for applications where certain regions of the image that would adversely affect the registration result should be ignored. To address these requirements, we have derived a mathematical model that describes an exact form for embedding the masking step fully into the Fourier domain so that all steps of translation registration can be computed efficiently using Fast Fourier Transforms. We provide algorithms and implementation details that demonstrate the correctness of our derivations. We also demonstrate how this masked FFT registration approach can be applied to improve the Fourier-Mellin algorithm that calculates translation, rotation, and scale in the Fourier domain. We demonstrate the computational efficiency, advantages, and correctness of our algorithm on a number of images from real-world applications. Our framework enables fast, global, parameter-free registration of images with masked regions.

66 citations


Cites methods from "Image registration using log-polar ..."

  • ...We show in Section II-C how this representation leads to a simple and general form for masked FFT NCC. Finally, in Section II-D, we demonstrate how the masked framework can overcome several limitations of the Fourier–Mellin method....

    [...]

Posted Content
TL;DR: Wang et al. as mentioned in this paper proposed a new correlation filter-based tracker with a novel robust estimation of similarity transformation on the large displacements and formulated the problem into two 2-DoF sub-problems and applied an efficient Block Coordinates Descent solver to optimize the estimation result.
Abstract: Most of existing correlation filter-based tracking approaches only estimate simple axis-aligned bounding boxes, and very few of them is capable of recovering the underlying similarity transformation. To tackle this challenging problem, in this paper, we propose a new correlation filter-based tracker with a novel robust estimation of similarity transformation on the large displacements. In order to efficiently search in such a large 4-DoF space in real-time, we formulate the problem into two 2-DoF sub-problems and apply an efficient Block Coordinates Descent solver to optimize the estimation result. Specifically, we employ an efficient phase correlation scheme to deal with both scale and rotation changes simultaneously in log-polar coordinates. Moreover, a variant of correlation filter is used to predict the translational motion individually. Our experimental results demonstrate that the proposed tracker achieves very promising prediction performance compared with the state-of-the-art visual object tracking methods while still retaining the advantages of high efficiency and simplicity in conventional correlation filter-based tracking methods.

62 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Journal ArticleDOI
TL;DR: A review of recent as well as classic image registration methods to provide a comprehensive reference source for the researchers involved in image registration, regardless of particular application areas.

6,842 citations


"Image registration using log-polar ..." refers background in this paper

  • ...See [2] for a recent survey of image registration techniques....

    [...]

Journal ArticleDOI
TL;DR: This paper organizes this material by establishing the relationship between the variations in the images and the type of registration techniques which can most appropriately be applied, and establishing a framework for understanding the merits and relationships between the wide variety of existing techniques.
Abstract: Registration is a fundamental task in image processing used to match two or more pictures taken, for example, at different times, from different sensors, or from different viewpoints. Virtually all large systems which evaluate images require the registration of images, or a closely related operation, as an intermediate step. Specific examples of systems where image registration is a significant component include matching a target with a real-time image of a scene for target recognition, monitoring global land usage using satellite images, matching stereo images to recover shape for autonomous navigation, and aligning images from different medical modalities for diagnosis.Over the years, a broad range of techniques has been developed for various types of data and problems. These techniques have been independently studied for several different applications, resulting in a large body of research. This paper organizes this material by establishing the relationship between the variations in the images and the type of registration techniques which can most appropriately be applied. Three major types of variations are distinguished. The first type are the variations due to the differences in acquisition which cause the images to be misaligned. To register images, a spatial transformation is found which will remove these variations. The class of transformations which must be searched to find the optimal transformation is determined by knowledge about the variations of this type. The transformation class in turn influences the general technique that should be taken. The second type of variations are those which are also due to differences in acquisition, but cannot be modeled easily such as lighting and atmospheric conditions. This type usually effects intensity values, but they may also be spatial, such as perspective distortions. The third type of variations are differences in the images that are of interest such as object movements, growths, or other scene changes. Variations of the second and third type are not directly removed by registration, but they make registration more difficult since an exact match is no longer possible. In particular, it is critical that variations of the third type are not removed. Knowledge about the characteristics of each type of variation effect the choice of feature space, similarity measure, search space, and search strategy which will make up the final technique. All registration techniques can be viewed as different combinations of these choices. This framework is useful for understanding the merits and relationships between the wide variety of existing techniques and for assisting in the selection of the most suitable technique for a specific problem.

4,769 citations


"Image registration using log-polar ..." refers background in this paper

  • ...Note that our implementation of SIFT came directly from the source code of Lowe and Brown....

    [...]

  • ...A survey by Brown [1] introduces a framework in which all registration techniques can be understood....

    [...]

Journal ArticleDOI
TL;DR: A new information-theoretic approach is presented for finding the pose of an object in an image that works well in domains where edge or gradient-magnitude based methods have difficulty, yet it is more robust than traditional correlation.
Abstract: A new information-theoretic approach is presented for finding the pose of an object in an image. The technique does not require information about the surface properties of the object, besides its shape, and is robust with respect to variations of illumination. In our derivation few assumptions are made about the nature of the imaging process. As a result the algorithms are quite general and may foreseeably be used in a wide variety of imaging situations. Experiments are presented that demonstrate the approach registering magnetic resonance (MR) images, aligning a complex 3D object model to real scenes including clutter and occlusion, tracking a human head in a video sequence and aligning a view-based 2D object model to real images. The method is based on a formulation of the mutual information between the model and the image. As applied here the technique is intensity-based, rather than feature-based. It works well in domains where edge or gradient-magnitude based methods have difficulty, yet it is more robust than traditional correlation. Additionally, it has an efficient implementation that is based on stochastic approximation.

3,584 citations


"Image registration using log-polar ..." refers background in this paper

  • ...Mutual information is a similarity measure that has recently been introduced for multimodal medical image registration [5], [6]....

    [...]

Journal ArticleDOI
TL;DR: An automatic subpixel registration algorithm that minimizes the mean square intensity difference between a reference and a test data set, which can be either images (two-dimensional) or volumes (three-dimensional).
Abstract: We present an automatic subpixel registration algorithm that minimizes the mean square intensity difference between a reference and a test data set, which can be either images (two-dimensional) or volumes (three-dimensional). It uses an explicit spline representation of the images in conjunction with spline processing, and is based on a coarse-to-fine iterative strategy (pyramid approach). The minimization is performed according to a new variation (ML*) of the Marquardt-Levenberg algorithm for nonlinear least-square optimization. The geometric deformation model is a global three-dimensional (3-D) affine transformation that can be optionally restricted to rigid-body motion (rotation and translation), combined with isometric scaling. It also includes an optional adjustment of image contrast differences. We obtain excellent results for the registration of intramodality positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) data. We conclude that the multiresolution refinement strategy is more robust than a comparable single-stage method, being less likely to be trapped into a false local optimum. In addition, our improved version of the Marquardt-Levenberg algorithm is faster.

2,801 citations


"Image registration using log-polar ..." refers methods in this paper

  • ...The modified LMA version implemented in [16] uses...

    [...]

  • ...Our second modification is based on the work of [16], whereby registration was performed on medical images subjected to similarity transforms (rotation/scale/translation)....

    [...]

Frequently Asked Questions (13)
Q1. What are the contributions in "Image registration using log-polar mappings for recovery of large-scale similarity and projective transformations" ?

This paper describes a novel technique to recover large similarity transformations ( rotation/scale/translation ) and moderate perspective deformations among image pairs. The authors introduce a hybrid algorithm that features log-polar mappings and nonlinear least squares optimization. The use of log-polar techniques in the spatial domain is introduced as a preprocessing module to recover large scale changes ( e. g., at least four-fold ) and arbitrary rotations. In this paper, the authors demonstrate the superior performance of the log-polar transform in featureless image registration in the spatial domain. 

Additional future work will accelerate correlation. It is worthwhile to examine whether this process may be accelerated by positioning the sliding window on areas of high information content only. Entropy, variance, or other statistically discriminating techniques can be used to quantify information content. Recent success with scale-invariant interest points ( e. g., SIFT ) suggest that the log-polar windows should be centered at these extracted positions. 

The log-polar transform has two principal advantages: 1) rotation and scale invariance and 2) the spatially varying sampling in the retina is the solution to reduce the amount of information traversing the optical nerve while maintaining high resolution in the fovea and capturing a wide field of view. 

The INRIA method computes interest points at different scales, calculating at each scale a set of local descriptors that are invariant to rotation, translation, and illumination. 

The authors have extensively tested several similarity measures, including normalized correlation coefficient, phase correlation, and mutual information. 

Although the Fourier–Mellin transform is able to correctly register the synthetic image shown in Fig. 3(c), the image in Fig. 3(b) defies recovery because of the lack of similarity in its spectra compared to that of the reference image. 

An important contribution of this work is that the authors introduce a new method based on the log-polar transform in the spatial domain that works robustly with real images. 

The new method is based on multiresolution log-polar transformations to simultaneously find the best scale, rotation, and translation parameters. 

In order to show the importance of the log-polar module, the authors ran the LMA without the estimated initial parameters from the log-polar module. 

The correlation coefficient values for the thirty pairs mentioned above are all above 0.9, which is very good considering camera noise and artifacts introduced by warping to produce the target images. 

The LMA solves the following system of equations in an iterative fashion:(5)where is the Hessian matrix and is the residual vector(6)(7)The authors can improve the standard Levenberg–Marquardt optimization algorithm outlined above by adding two modifications. 

They were, however, reported recently in [40] and [41], where the authors showed that rotation and scale introduce aliasing in the low frequencies. 

Note that artificial black backgrounds can help register two images because it ensures that the authors consider the same underlying content.