scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Image registration using log-polar mappings for recovery of large-scale similarity and projective transformations

01 Oct 2005-IEEE Transactions on Image Processing (IEEE Trans Image Process)-Vol. 14, Iss: 10, pp 1422-1434
TL;DR: A novel technique to recover large similarity transformations (rotation/scale/translation) and moderate perspective deformations among image pairs and achieves subpixel accuracy through the use of nonlinear least squares optimization.
Abstract: This paper describes a novel technique to recover large similarity transformations (rotation/scale/translation) and moderate perspective deformations among image pairs We introduce a hybrid algorithm that features log-polar mappings and nonlinear least squares optimization The use of log-polar techniques in the spatial domain is introduced as a preprocessing module to recover large scale changes (eg, at least four-fold) and arbitrary rotations Although log-polar techniques are used in the Fourier-Mellin transform to accommodate rotation and scale in the frequency domain, its use in registering images subjected to very large scale changes has not yet been exploited in the spatial domain In this paper, we demonstrate the superior performance of the log-polar transform in featureless image registration in the spatial domain We achieve subpixel accuracy through the use of nonlinear least squares optimization The registration process yields the eight parameters of the perspective transformation that best aligns the two input images Extensive testing was performed on uncalibrated real images and an array of 10,000 image pairs with known transformations derived from the Corel Stock Photo Library of royalty-free photographic images

Summary (3 min read)

Introduction

  • Therefore, the images differ by large rotation and scale.
  • See [2] for a recent survey of image registration techniques.
  • Similarity measures like the zero-mean normalized sum of squared differences (SSD) and correlation coefficient are invariant to the linear intensity changes.
  • In Section II, the authors discuss related work on the standard Levenberg–Marquardt algorithm (LMA) and log-polar techniques.

II. PREVIOUS WORK

  • The authors discuss related work on the LMA and the log-polar techniques.
  • In Section II-A, the authors present a background of the Levenberg–Marquardt nonlinear least-squares optimization algorithm that is useful for achieving subpixel registration accuracy.
  • The log-polar transform is described in Section II-B.
  • In Section II-C, the authors discuss the Fourier–Mellin transform, its limitations, and a review of related work.
  • Section II-D discusses a feature-based method that can register images subjected to large scale changes (i.e., ) and arbitrary rotation.

B. Log-Polar Transform

  • The log-polar transformation is a nonlinear and nonuniform sampling of the spatial domain.
  • The log-polar transform has received considerable attention.
  • If the authors assume that and lie along the horizontal and vertical axes, respectively, then image shown in Fig. 2(a) will be mapped to image in Fig. 2(b) after a log-polar coordinate transformation.
  • The log-polar mapping is an accepted model of the representation of the retina in the primary visual cortex in primates, also known as V1 [23]–[25].
  • This bandwidth reduction helps us process a high resolution image only at the focus of attention while aware of a wider field of view.

D. Feature-Based Image Registration

  • Feature-based image registration algorithms extract salient structures, such as points, lines, curves, and regions, from graylevel images and establish correspondences between features using invariant descriptors.
  • In more recent feature-based work, registration for wide baseline applications has been reported in [48]–[52].
  • These results are promising in that they accommodate larger deformations.
  • These stages include corner detection, conversion to invariant descriptors, matching based on the Mahalanobis distance or k-d tree, and outlier removal using the RANSAC algorithm.
  • Whereas their methods are designed to operate under textured regions, they may fail in smooth regions.

III. MODIFIED LMA

  • The LMA solves the following system of equations in an iterative fashion: (5) where is the Hessian matrix and is the residual vector (6) (7).
  • The authors can improve the standard Levenberg–Marquardt optimization algorithm outlined above by adding two modifications.
  • The first modification includes the use of a multiresolution pyramid for both reference and target images.
  • The second modification virtually eliminates the calculation of the Hessian matrix (7) which would otherwise have been computed in every iteration.
  • The authors second modification is based on the work of [16], whereby registration was performed on medical images subjected to similarity transforms (rotation/scale/translation).

A. Multiresolution Pyramid

  • The original image, sitting at the base of the pyramid, is downsampled by a constant scale factor in each dimension to form the next level.
  • This is repeated from one level to the next until the tip of the pyramid is reached.
  • Second, the smoothness conditions imposed by successively bandlimiting the pyramid levels causes to be computed on smoother images.
  • An example of computed on two different pyramid levels is shown in Fig.
  • Thus, the relation between parameters is (11).

B. Modified Levenberg–Marquardt Algorithm

  • In the standard LMA, the authors calculate the vector and Hessian matrix in each iteration.
  • This is achieved by casting this problem into one where is transformed into , leaving unchanged from one iteration to the next.
  • An important distinction between the standard and modified LMA methods lie in the manner in which the unknown parameters are updated in each iteration.
  • Instead of minimizing (15a), the authors minimize (15c) with respect to the parameters .
  • For further details about resampling, see [57].

IV. GLOBAL REGISTRATION USING LOG-POLAR TRANSFORM

  • The authors have implemented a new algorithm for automatically finding the translation between both input images in the presence of large scale and rotation.
  • The radius and the center of the template are optionally given by the user.
  • The authors compute the base of the logarithm for log-polar transformation as follows: (20) where is the width of the input image ( diameter).
  • The normalized correlation coefficient similarity measure is given as follows: (21) where is the average of image .
  • Furthermore, the bulk of their computation is performed at the coarsest level where there are fewest pixels.

V. EXPERIMENTAL RESULTS

  • An analytical evaluation of the robustness of image registration algorithms is an elusive task.
  • Performance is highly dependent on the content of the input images.
  • Many proposed image registration algorithms in the literature have limited their published results to the use of a few reference images and their synthetically generated target images.
  • The reference and target images are taken by a camera with optical zoom.
  • In Section V-B, the authors test the robustness of their algorithm with a large suite of 10 000 image pairs.

A. Uncalibrated Test Images

  • A Canon PowerShot G3 digital camera with 4 optical zoom was used to capture a set of test images taken from natural and man-made scenes.
  • The authors method uses all pixels and does not depend on any specific feature set.
  • Images were acquired with (a) no magnification and (b) 4 magnification with unknown rotation about the optical axis.
  • In order to quantify registration accuracy, the authors compute the correlation coefficient in the overlapping area.
  • All of the non-SIFT methods failed to register the image pairs in Fig.

B. Calibrated Test Images

  • It is not feasible to capture a very large set of images with a variety of image content and transformation parameters.
  • The authors generated 10 000 target images from these random parameters.
  • First, the authors used their log-polar module to recover the global rotation, scale, and translation parameters.
  • The authors have tested their registration algorithm to create image mosaics by stitching together low resolution frames from several overlapping images.
  • In order to best expose any misalignment, the authors applied unweighted averaging upon the overlapping areas.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

1422 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 10, OCTOBER 2005
Image Registration Using Log-Polar Mappings
for Recovery of Large-Scale Similarity and
Projective Transformations
Siavash Zokai and George Wolberg, Senior Member, IEEE
Abstract—This paper describes a novel technique to recover
large similarity transformations (rotation/scale/translation) and
moderate perspective deformations among image pairs. We in-
troduce a hybrid algorithm that features log-polar mappings
and nonlinear least squares optimization. The use of log-polar
techniques in the spatial domain is introduced as a preprocessing
module to recover large scale changes (e.g., at least four-fold) and
arbitrary rotations. Although log-polar techniques are used in the
Fourier–Mellin transform to accommodate rotation and scale in
the frequency domain, its use in registering images subjected to
very large scale changes has not yet been exploited in the spatial
domain. In this paper, we demonstrate the superior performance
of the log-polar transform in featureless image registration in the
spatial domain. We achieve subpixel accuracy through the use
of nonlinear least squares optimization. The registration process
yields the eight parameters of the perspective transformation that
best aligns the two input images. Extensive testing was performed
on uncalibrated real images and an array of 10,000 image pairs
with known transformations derived from the Corel Stock Photo
Library of royalty-free photographic images.
Index Terms—Image registration, Levenberg–Marquardt non-
linear least-squares optimization, log-polar transform, perspective
transformation, similarity transformation.
I. INTRODUCTION
D
IGITAL image registration is a branch of computer vision
that deals with the geometric alignment of a set of im-
ages. The set may consist of two or more digital images taken
of a single scene at different times, from different sensors, or
from different viewpoints. A large body of research has been
drawn to this area due to its importance in remote sensing, med-
ical imaging, computer graphics, and computer vision. Despite
comprehensive research spanning over thirty years, robust tech-
niques to register images in the presence of large deformations
remains elusive. Most techniques fail unless the input images
are misaligned by moderate deformations.
The goal of registration is to establish geometric correspon-
dence between the images so that they may be transformed,
compared, and analyzed in a common reference frame. Regis-
tration is often necessary for 1) integrating information taken
Manuscript received April 27, 2004; revised October 11, 2004. This work was
supported in part by an ONR HBCU/MI Research and Education Program Grant
(N000140310511) and a PSC-CUNY Grant. The associate editor coordinating
the review of this manuscript and approving it for publication was Dr. Luca
Lucchese.
S. Zokai is with Brainstorm Technology LLC, New York, NY 10011 USA.
G. Wolberg is with the Department of Computer Science, City College of
New York, New York, NY 10031 USA (e-mail: wolberg@cs.ccny.cuny.edu).
Digital Object Identifier 10.1109/TIP.2005.854501
from different sensors (i.e., multisensor data fusion), 2) finding
changes in images taken at different times or under different
conditions, 3) inferring three-dimensional (3-D) information
from images in which either the camera or the objects in the
scene have moved, and 4) for model-based object recognition.
The most common task associated with image registration
is the generation of large panoramic images for viewing and
analysis. Image mosaics, created by warping and blending
together several overlapping images, are central to this process.
Other common registration tasks include producing super-reso-
lution images from multiple images of the same scene, change
detection, motion stabilization, topographic mapping, and
multisensor image fusion.
This work attempts to register two images using one global
perspective transformation even in the presence of arbitrary ro-
tation angles and large scale changes (up to 5
zoom). Our
work is motivated by the problem of registering airborne im-
ages. These images are taken at vastly different times, altitudes,
and directions. Therefore, the images differ by large rotation and
scale. Also, the pitch and roll introduces moderate perspective.
In general, images of a 3-D scene do not differ by just one per-
spective transformation because the depth between the camera
and the objects introduces parallax. A global transformation
cannot align all features in such cases. We must, therefore, place
constraints on camera motion and/or our 3-D scene to produce
images that are free of parallax. One constraint requires the
camera motion to be limited to rotation, pan, tilt, and zoom about
a fixed point, e.g, on a tripod. If this constraint is not satisfied,
then we may still have images free of parallax if the object’s 3-D
points
in the scene are far away from the camera, i.e.,
. This means that the scene is flat and we are looking
at a planar object. In either case, we assume that the scene is
static and the lighting is fixed between images. Nevertheless, we
have relaxed these conditions to accommodate local disparity
and linear changes in illumination.
A survey by Brown [1] introduces a framework in which all
registration techniques can be understood. The framework con-
sists of the feature space, similarity measure, search space, and
search strategy. The feature space extracts the information in
the images that will be used for matching. The search space is
the class of transformations, or deformation models, that is ca-
pable of aligning the images. The search strategy decides how
to choose the next transformation from this space, to be tested in
the search for the optimal transformation. The similarity mea-
sure determines the relative merit for each test. Search continues
according to the search strategy until a transformation is found
1057-7149/$20.00 © 2005 IEEE

ZOKAI AND WOLBERG: IMAGE REGISTRATION USING LOG-POLAR MAPPINGS 1423
Fig. 1. Airborne imagery. (a) Observed images. (b) Reference image. (c) Registration overlays.
whose similarity measure is satisfactory. Numerous registration
techniques have been proposed based on choosing a specic fea-
ture, deformation model, optimization method, and/or similarity
measure. See [2] for a recent survey of image registration tech-
niques.
For image registration, we need to recover the geo-
metric transformation and/or intensity function. Let
and be the reference and observed images, re-
spectively. The relationship between these images is
, where is a two-dimensional
(2-D) geometric transformation operator that relates the
coordinates in to the coordinates in and is the
intensity function.
The estimation of the intensity function
is useful when we
want to register images taken from different sensors or when il-
lumination is changed by automatic gain exposure of a camera.
Comparametric equations have been introduced to model the in-
tensity function
[3]. Although these equations are nonlinear,
a piecewise linear method has been developed to estimate
and simultaneously [4].
Mutual information is a similarity measure that has recently
been introduced for multimodal medical image registration [5],
[6]. Correlation ratio is another similarity measure for multi-
modal image registration and has proven to perform better than
mutual information [7]. Multimodal image registration has been
studied extensively in the medical imaging domain. In this work,
we assume that the intensity function is linear. Similarity mea-
sures like the zero-mean normalized sum of squared differences
(SSD) and correlation coefcient are invariant to the linear in-
tensity changes.
This paper describes a hierarchical image registration system.
We model the mapping function as a perspective transforma-
tion. The algorithm estimates the perspective parameters neces-
sary to register any two misaligned digital images. The parame-
ters are selected to minimize the SSD between the two images.
They are computed iteratively in a coarse-to-ne hierarchical
framework using a variation of the LevenbergMarquadt non-
linear least squares optimization method. This approach yields
a robust solution that precisely registers images with subpixel
accuracy.
The primary drawback of the optimization-based approach
is that it may fail unless the two images are misaligned by a
moderate difference in scale, rotation, and translation. In order
to address this problem, we introduce a log-polar registration
module to bring the images into approximate alignment, even in
the presence of arbitrary rotation angles and large scale changes.
Its purpose is to furnish a good initial estimate to the perspec-
tive registration module that is based on nonlinear least squares
optimization.
The scope of this work shall prove useful for various ap-
plications, including the registration of aerial images, and the
formation of image mosaics. Note that aerial imagery may be
acquired from uncalibrated airborne cameras subjected to yaw,
pitch, and roll at various altitudes. Since the terrain appears at
from moderately high altitude, it is an ideal candidate for reg-
istration using a single perspective transformation. An example
demonstrating the registration of two aerial images in the pres-
ence of large scale/rotation and moderate perspective is shown
in Fig. 1. The image in Fig. 1(a) is automatically registered to
that in Fig. 1(b), as depicted by the highlighted rectangle.
In Section II, we discuss related work on the standard Lev-
enbergMarquardt algorithm (LMA) and log-polar techniques.
Section III describes a modied LMA for improving the per-
formance of the standard LMA and Section IV presents our
proposed log-polar method. In Section V, we demonstrate the
success of the log-polar transform in recovering large deforma-
tions by comparing registration accuracy with and without the
log-polar registration module. A signicant increase in correct
matches is attributed to our algorithm. A secondary compar-
ison was made by replacing the log-polar module with the well-
known FourierMellin transform. Again, our log-polar module
proved superior to the FourierMellin transform for achieving
high perspective registration accuracy.
II. P
REVIOUS WORK
In this section, we discuss related work on the LMA and the
log-polar techniques. In Section II-A, we present a background
of the LevenbergMarquardt nonlinear least-squares optimiza-
tion algorithm that is useful for achieving subpixel registration
accuracy. The log-polar transform is described in Section II-B.

1424 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 10, OCTOBER 2005
In Section II-C, we discuss the FourierMellin transform, its
limitations, and a review of related work. Section II-D discusses
a feature-based method that can register images subjected to
large scale changes (i.e.,
) and arbitrary rotation.
A. LMA
There is a vast literature of work in the related elds of image
registration, motion estimation, image mosaics, and video in-
dexing that make use of a nonlinear least-squares optimization
technique known as the LMA. Most algorithms exploit a hier-
archical approach due to computational efciency in handling
large displacements. Algorithms for hierarchical motion esti-
mation [8][10] and image mosaicing [11][20] usually assume
small deformations among image pairs. For instance, a dense
image sequence is required to stitch the frames together [14],
[18]. The problem of assembling a large set of images into a
common reference frame is simplied when the inter-frame de-
formations are small. The LMA uses the SSD as the similarity
measure between two images (or regions)
(1)
and the discrete form is
Note that is a geometric transformation applied to image
to map it from its coordinate system to the coordi-
nate system of
. In our case, the subscript is a 3 3 per-
spective transformation matrix and
is the number of pixels.
B. Log-Polar Transform
The log-polar transformation is a nonlinear and nonuniform
sampling of the spatial domain. Nonlinearity is introduced by
polar mapping, while nonuniform sampling is the result of log-
arithmic scaling. Despite the difculties of nonlinear processing
for computer vision applications, the log-polar transform has re-
ceived considerable attention. Consider the log-polar
coordinate system, where denotes radial distance from the
center
and denotes angle. Any point can be rep-
resented in polar coordinates
(2)
(3)
Applying a polar coordinate transformation to an image
maps radial lines in Cartesian space to horizontal lines in the
polar coordinate space. We shall denote the transformed image
. If we assume that and lie along the horizontal and ver-
tical axes, respectively, then image
shown in Fig. 2(a) will be
mapped to image
in Fig. 2(b) after a log-polar coordinate
transformation.
Fig. 2. Log-polar coordinate transformation. (a) Input image. (b) Log-polar
transformation.
The motivation for considering the log-polar transform stems
from its biological origins. The rst reported discoveries of log-
polar mappings in the primate visual system were reported in
[21] and [22]. The log-polar mapping is an accepted model of
the representation of the retina in the primary visual cortex in
primates, also known as V1 [23][25]. The nonuniform sam-
pling that simulates logarithmic scale takes place in the retina
and the nerve endings from the retina are connected to the visual
cortex by a special mapping. This mapping realizes the polar
transformation by a simple rewiring. The radial nerve endings
are connected horizontally to the visual cortex. Due to these bi-
ological origins, the log-polar transform has often been referred
to as the retino-cortical transform [26]. The log-polar transform
has two principal advantages: 1) rotation and scale invariance
and 2) the spatially varying sampling in the retina is the solu-
tion to reduce the amount of information traversing the optical
nerve while maintaining high resolution in the fovea and cap-
turing a wide eld of view. This bandwidth reduction helps us
process a high resolution image only at the focus of attention
while aware of a wider eld of view. Several researchers have
designed log-polar sensors for active and real-time vision appli-
cations [27][31]. These efforts sought to make the leap from
biological hardware to VLSI hardware.
C. Fourier–Mellin Transform
The FourierMellin registration method is based on phase
correlation and the properties of Fourier analysis. The phase
correlation method can nd the translation between two im-
ages. The FourierMellin transform extends phase correlation
to handle images related by both translation and rotation
[32][39]. According to the rotation and translation properties
of the Fourier transform, the transforms are related by
We can see that the magnitude of spectra is a rotated
replica of
. Both spectrum share the same center of rotation.

ZOKAI AND WOLBERG: IMAGE REGISTRATION USING LOG-POLAR MAPPINGS 1425
Fig. 3. Effects of optical and digital zoom on the power spectrum.
(a) Reference image. (b) Target image (real). (c) Target image (synthetic).
We can recover this rotation by representing the spectra and
in polar coordinates
(4)
The Fourier magnitude in polar coordinates differs only
by translation. We can use the phase-correlation method to
nd this translation and estimate
. This method has been
extended to nd scale by mapping the Fourier magnitude to
log-polar coordinates. Therefore, one nds scale and rotation
by phase-correlation, which recovers the amount of shifts in
space. One advantage of this method is that it tol-
erates additive noise. The method, however, can only recover
moderate scales and rotations. This difculty can be understood
by realizing that large rotation and scale changes exacerbate the
border effects when computing the Fourier transform. These
problems are minimized in the rare case when the images are
periodic. Therefore, a large translation, or scale introduces
additional pixel information that can dramatically alter the
Fourier coefcients.
In early papers on FourierMellin, the border problems were
not investigated. They were, however, reported recently in [40]
and [41], where the authors showed that rotation and scale in-
troduce aliasing in the low frequencies. They have suggested
that two preprocessing steps are needed to alleviate the aliasing
problem. First, the image must be multiplied by a radial mask
consisting of a 2-D Gaussian function. Second, a low-pass lter
must be applied to remove the offending low frequencies. The
researchers in [35] reported that they recovered scale factors up
to 1.8 and 80
rotations.
It is important to note that the literature is replete with syn-
thetic examples for the FourierMellin registration method. In
particular, a reference image is always matched against a scaled
and rotated version of itself. This serves to defer the problem of
handling the ne details introduced by an actual optical zoom.
Conversely, when the image undergoes minication, translation,
or rotation, additional real data seeps into the target image, not
just black pixels. Note that articial black backgrounds can help
register two images because it ensures that we consider the same
underlying content.
An example demonstrating the differences between digital
and optical zoom is shown in Fig. 3. As is expected, the shape
of the spectrum in Fig. 3(c) conforms to the inverse relationship
between space and frequency. However, the spectra of Fig. 3(b)
reects the fact that the images were taken with optical zoom
and minor perspective distortion was introduced due to real hand
movement. Although the FourierMellin transform is able to
correctly register the synthetic image shown in Fig. 3(c), the
image in Fig. 3(b) dees recovery because of the lack of simi-
larity in its spectra compared to that of the reference image.
An important contribution of this work is that we introduce
a new method based on the log-polar transform in the spatial
domain that works robustly with real images.
D. Feature-Based Image Registration
Feature-based image registration algorithms extract salient
structures, such as points, lines, curves, and regions, from
graylevel images and establish correspondences between
features using invariant descriptors. Early work in this area
includes [42][47]. This work, however, is generally limited to
small geometric deformations.
In more recent feature-based work, registration for wide base-
line applications has been reported in [48][52]. These results
are promising in that they accommodate larger deformations.
Finding local and invariant features is an important tool for
detecting correspondences between different views of a scene.
In [50], the authors detect quadrilateral and elliptical locally
afne regions for nding the fundamental matrix in wide-base-
line stereo images. In [51] and [53], the authors look for locally
afne regions. They compute several degrees of moments in
these regions to build feature vectors for wide-baseline stere-
oscopy [53] and image retrieval [51]. Their work tolerates only
small scale changes.
Recently, several researchers at INRIA and University of
British Columbia developed methods for recovering large-scale
deformation based on scale-space theory [49], [54][56]. The
INRIA method computes interest points at different scales, cal-
culating at each scale a set of local descriptors that are invariant
to rotation, translation, and illumination. The Mahalanobis
distance is then used to nd the corresponding interest points
between two images. In order to remove outliers, they use the
RANSAC algorithm with constraints based on collections of
points. In the work of Lowe and his colleagues, a scale-invariant
feature transform (SIFT) is introduced to nd features and a
k-d tree is used to match features across multiple images [48],
[49]. To our knowledge, the techniques described in [49] and
[54] are the only works that are applied to outdoor images
with large scale factors (i.e.,
) derived from optical zoom
cameras (not digital zoom). Our registration algorithm is able
to properly register all of their test data. Their methods consist
of a series of complex stages that are not prone to direct hard-
ware implementation. These stages include corner detection,
conversion to invariant descriptors, matching based on the
Mahalanobis distance or k-d tree, and outlier removal using the
RANSAC algorithm. Whereas their methods are designed to
operate under textured regions, they may fail in smooth regions.
III. M
ODIFIED LMA
The LMA solves the following system of equations in an it-
erative fashion:
(5)

1426 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 10, OCTOBER 2005
where is the Hessian matrix and is the residual
vector
(6)
(7)
We can improve the standard LevenbergMarquardt opti-
mization algorithm outlined above by adding two modications.
The rst modication includes the use of a multiresolution
pyramid for both reference and target images. The second
modication virtually eliminates the calculation of the Hessian
matrix (7) which would otherwise have been computed in every
iteration. Our second modication is based on the work of [16],
whereby registration was performed on medical images sub-
jected to similarity transforms (rotation/scale/translation). We
have extended their method to recover perspective parameters.
A. Multiresolution Pyramid
A multiresolution pyramid consists of a set of images repre-
senting an image in multiple resolutions. The original image,
sitting at the base of the pyramid, is downsampled by a constant
scale factor in each dimension to form the next level. This is re-
peated from one level to the next until the tip of the pyramid is
reached. The image size at level
is reduced from the original
by a factor of
in each dimension. Level 0, at the base of the
pyramid, is referred to as the nest level. Level
, at the tip
of the pyramid, is known as the coarsest level.
Multiresolution pyramids supply us with two major advan-
tages. First, when we apply the LevenbergMarquardt method
to the coarsest level of the pyramid, the number of pixels is re-
duced by a factor of
. We get large computational gains
because most of the iterations are executed in the coarsest level,
consisting of fewer pixels. Second, the smoothness conditions
imposed by successively bandlimiting the pyramid levels causes
to be computed on smoother images. This smoothness
property helps prevent getting trapped in local minimas. An ex-
ample of
computed on two different pyramid levels is
shown in Fig. 4. Since the coarsest level retains large-scale fea-
tures only, the registration algorithm proceeds from the coarsest
level to progressively ner levels, where small corrections due to
ner details are integrated. This approach passes the computed
parameters as an initial estimate to the next ner level. The pa-
rameters must be scaled properly across successive levels. Let
the scale factor between the levels be
: ,
, , and , where
(8a)
(8b)
Fig. 4. Example of
(
a
)
computed on two different pyramid levels.
Substituting the coordinates of the next ner level into the above
equations yields
(9)
Multiplying both sides by
gives us
(10)
Thus, the relation between parameters is
(11)
In our case,
, so the translation parameters and are
multiplied by two and
and divided by two.
B. Modified Levenberg–Marquardt Algorithm
In the standard LMA, we calculate the
vector and Hes-
sian matrix
in each iteration. In this section, we review a
modied LMA that realizes performance gains by eliminating
the calculation of the Hessian matrix at each iteration. Consider
the following objective function to establish a similarity mea-
sure between
and
(12)

Citations
More filters
Proceedings ArticleDOI
13 Jun 2010
TL;DR: A scale-invariant version of the heat kernel descriptor that can be used in the bag-of-features framework for shape retrieval in the presence of transformations such as isometric deformations, missing data, topological noise, and global and local scaling.
Abstract: One of the biggest challenges in non-rigid shape retrieval and comparison is the design of a shape descriptor that would maintain invariance under a wide class of transformations the shape can undergo. Recently, heat kernel signature was introduced as an intrinsic local shape descriptor based on diffusion scale-space analysis. In this paper, we develop a scale-invariant version of the heat kernel descriptor. Our construction is based on a logarithmically sampled scale-space in which shape scaling corresponds, up to a multiplicative constant, to a translation. This translation is undone using the magnitude of the Fourier transform. The proposed scale-invariant local descriptors can be used in the bag-of-features framework for shape retrieval in the presence of transformations such as isometric deformations, missing data, topological noise, and global and local scaling. We get significant performance improvement over state-of-the-art algorithms on recently established non-rigid shape retrieval benchmarks.

613 citations


Cites methods from "Image registration using log-polar ..."

  • ...The second way is to use a combination of logarithmic sampling with Fourier analysis to compensate for the scaling effects [17] (such an approach is also commonly used to compute a global image rotation and scaling in the context of registration [9, 46])....

    [...]

Book
28 Nov 2007
TL;DR: Digital Image Processing is the definitive textbook for students, researchers, and professionals in search of critical analysis and modern implementations of the most important algorithms in the field, and is also eminently suitable for self-study.
Abstract: This revised and expanded new edition of an internationally successful classic presents an accessible introduction to the key methods in digital image processing for both practitioners and teachers. Emphasis is placed on practical application, presenting precise algorithmic descriptions in an unusually high level of detail, while highlighting direct connections between the mathematical foundations and concrete implementation. The text is supported by practical examples and carefully constructed chapter-ending exercises drawn from the authors' years of teaching experience, including easily adaptable Java code and completely worked out examples. Source code, test images and additional instructor materials are also provided at an associated website. Digital Image Processingis the definitive textbook for students, researchers, and professionals in search of critical analysis and modern implementations of the most important algorithms in the field, and is also eminently suitable for self-study.

558 citations

Proceedings ArticleDOI
16 Jun 2012
TL;DR: This work generalizes to surfaces the polar sampling of the image domain used in shape contexts and can leverage recent developments in intrinsic shape analysis and construct ISC based on state-of-the-art dense shape descriptors such as heat kernel signatures.
Abstract: In this work, we present intrinsic shape context (ISC) descriptors for 3D shapes. We generalize to surfaces the polar sampling of the image domain used in shape contexts: for this purpose, we chart the surface by shooting geodesic outwards from the point being analyzed; ‘angle’ is treated as tantamount to geodesic shooting direction, and radius as geodesic distance. To deal with orientation ambiguity, we exploit properties of the Fourier transform. Our charting method is intrinsic, i.e., invariant to isometric shape transformations. The resulting descriptor is a meta-descriptor that can be applied to any photometric or geometric property field defined on the shape, in particular, we can leverage recent developments in intrinsic shape analysis and construct ISC based on state-of-the-art dense shape descriptors such as heat kernel signatures. Our experiments demonstrate a notable improvement in shape matching on standard benchmarks.

183 citations


Cites methods from "Image registration using log-polar ..."

  • ...This technique has been used extensively in image registration [40], and was introduced for scale- and rotation- invariant descriptor construction in [23], and was also exploited to construct scale-invariant heat kernel signatures (SIHKS) for surfaces [11]....

    [...]

Journal ArticleDOI
TL;DR: Performance comparison with classical brute-force image registration method reveals that the proposed quantum algorithm can achieve a quartic speedup.
Abstract: The power of quantum mechanics has been extensively exploited to meet the high computational requirement of classical image processing. However, existing quantum image models can only represent the images sampled in Cartesian coordinates. In this paper, quantum log-polar image (QUALPI), a novel quantum image representation is proposed for the storage and processing of images sampled in log-polar coordinates. In QUALPI, all the pixels of a QUALPI are stored in a normalized superposition and can be operated on simultaneously. A QUALPI can be constructed from a classical image via a preparation whose complexity is approximately linear in the image size. Some common geometric transformations, such as symmetry transformation, rotation, etc., can be performed conveniently with QUALPI. Based on these geometric transformations, a fast rotation-invariant quantum image registration algorithm is designed for log-polar images. Performance comparison with classical brute-force image registration method reveals that our quantum algorithm can achieve a quartic speedup.

177 citations


Cites methods from "Image registration using log-polar ..."

  • ...Based on log-polar sampling, Zokai andWolberg [18],Matungka et al....

    [...]

Journal ArticleDOI
TL;DR: This paper surveys the application of log-polar imaging in robotic vision, particularly in visual attention, target tracking, egomotion estimation, and 3D perception and to help readers identify promising research directions.

154 citations


Cites background from "Image registration using log-polar ..."

  • ...in image registration problems, allowing the recovery of large affine motions [181]....

    [...]

References
More filters
Proceedings ArticleDOI
01 Jan 2000
TL;DR: This work presents an alternative method for extracting invariant regions that does not depend on the presence of edges or corners in the image but is purely intensity-based, and demonstrates the use of such regions for another application, which is wide baseline stereo matching.
Abstract: ‘Invariant regions’ are image patches that automatically deform with changing viewpoint as to keep on covering identical physical parts of a scene. Such regions are then described by a set of invariant features, which makes it relatively easy to match them between views and under changing illumination. In previous work, we have presented invariant regions that are based on a combination of corners and edges. The application discussed then was image database retrieval. Here, an alternative method for extracting (affinely) invariant regions is given, that does not depend on the presence of edges or corners in the image but is purely intensity-based. Also, we demonstrate the use of such regions for another application, which is wide baseline stereo matching. As a matter of fact, the goal is to build an opportunistic system that exploits several types of invariant regions as it sees fit. This yields more correspondences and a system that can deal with a wider range of images. To increase the robustness of the system even further, two semi-local constraints on combinations of region correspondences are derived (one geometric, the other photometric). They allow to test the consistency of correspondences and hence to reject falsely matched regions.

531 citations

Proceedings ArticleDOI
05 Dec 1994
TL;DR: These techniques can be used to build environments and 3-D models for virtual reality application based on recreating a true scene and a number of novel applications based on tele-reality technology are discussed.
Abstract: This paper presents some techniques for automatically deriving realistic 2-D scenes and 3-D geometric models from video sequences. These techniques can be used to build environments and 3-D models for virtual reality application based on recreating a true scene, i.e., tele-reality applications. The fundamental technique used in this paper is image mosaicing, i.e., the automatic alignment of multiple images into larger aggregates which are then used to represent portions of a 3-D scene. The paper first examines the easiest problems, those of flat scene and panoramic scene mosaicing. It then progresses to more complicated scenes with depth, and concludes with full 3-D models. The paper also discusses a number of novel applications based on tele-reality technology. >

525 citations

Book ChapterDOI
01 Jun 2001
TL;DR: The automatic construction of large, high-resolution image mosaics is an active area of research in the fields of photogrammetry, computer vision, image processing, and computer graphics, for applications such as the construction of virtual environments and virtual travel.
Abstract: The automatic construction of large, high-resolution image mosaics is an active area of research in the fields of photogrammetry, computer vision, image processing, and computer graphics. Image mosaics can be used for many different applications [163, 1.22]. The most traditional application is the construction of large aerial and satellite photographs from collections of images [186]. More recent applications include scene stabilization and change detection [93], video compression [125, 122, 167] and video indexing [240], increasing the field of view [105, 177, 266] and resolution [126, 50] of a camera, and even simple photo editing [38]. A particularly popular application is the emulation of traditional film-based panoramic photography [175] with digital panoramic mosaics, for applications such as the construction of virtual environments [181, 267] and virtual travel [49].

421 citations

Proceedings ArticleDOI
James Davis1
23 Jun 1998
TL;DR: A complete system for creating visually pleasing mosaics in the presence of moving objects by solving a linear system of equations derived from many pairwise registration matrices and finding an optimal global registration is presented.
Abstract: Image mosaics are useful for a variety of tasks in vision and computer graphics. A particularly convenient way to generate mosaics is by 'stitching' together many ordinary photographs. Existing algorithms focus on capturing static scenes. This paper presents a complete system for creating visually pleasing mosaics in the presence of moving objects. There are three primary contributions. The first component of our system is a registration method that remains unbiased by movement-the Mellin transform is extended to register images related by a projective transform. Second an efficient method for finding a globally consistent registration of all images is developed. By solving a linear system of equations, derived from many pairwise registration matrices, we find an optimal global registration. Lastly, a new method of compositing images is presented. Blurred areas due to moving objects are avoided by segmenting the mosaic into disjoint regions and sampling pixels in each region from a single source image.

375 citations


"Image registration using log-polar ..." refers background in this paper

  • ...Remarkable progress has been documented during the last decade in this area [12], [14], [18], [19], [58], [59]....

    [...]

Journal ArticleDOI
01 May 1998
TL;DR: A new set of methods for indexing into the video sequence based on the scene-based representation, based on geometric and dynamic information contained in the video, complement the more traditional content-based indexing methods.
Abstract: Video is a rich source of information It provides visual information about scenes This information is implicitly buried inside the raw video data, however, and is provided with the cost of very high temporal redundancy While the standard sequential form of video storage is adequate for viewing in a movie mode, it fails to support rapid access to information of interest that is required in many of the emerging applications of video This paper presents an approach for efficient access, use and manipulation of video data The video data are first transformed from their sequential and redundant frame-based representation, in which the information about the scene is distributed over many frames, to an explicit and compact scene-based representation, to which each frame can be directly related This compact reorganization of the video data supports nonlinear browsing and efficient indexing to provide rapid access directly to information of interest This paper describes a new set of methods for indexing into the video sequence based on the scene-based representation These indexing methods are based on geometric and dynamic information contained in the video These methods complement the more traditional content-based indexing methods, which utilize image appearance information (namely, color and texture properties) but are considerably simpler to achieve and are highly computationally efficient

334 citations


"Image registration using log-polar ..." refers background in this paper

  • ...Remarkable progress has been documented during the last decade in this area [12], [14], [18], [19], [58], [59]....

    [...]

Frequently Asked Questions (13)
Q1. What are the contributions in "Image registration using log-polar mappings for recovery of large-scale similarity and projective transformations" ?

This paper describes a novel technique to recover large similarity transformations ( rotation/scale/translation ) and moderate perspective deformations among image pairs. The authors introduce a hybrid algorithm that features log-polar mappings and nonlinear least squares optimization. The use of log-polar techniques in the spatial domain is introduced as a preprocessing module to recover large scale changes ( e. g., at least four-fold ) and arbitrary rotations. In this paper, the authors demonstrate the superior performance of the log-polar transform in featureless image registration in the spatial domain. 

Additional future work will accelerate correlation. It is worthwhile to examine whether this process may be accelerated by positioning the sliding window on areas of high information content only. Entropy, variance, or other statistically discriminating techniques can be used to quantify information content. Recent success with scale-invariant interest points ( e. g., SIFT ) suggest that the log-polar windows should be centered at these extracted positions. 

The log-polar transform has two principal advantages: 1) rotation and scale invariance and 2) the spatially varying sampling in the retina is the solution to reduce the amount of information traversing the optical nerve while maintaining high resolution in the fovea and capturing a wide field of view. 

The INRIA method computes interest points at different scales, calculating at each scale a set of local descriptors that are invariant to rotation, translation, and illumination. 

The authors have extensively tested several similarity measures, including normalized correlation coefficient, phase correlation, and mutual information. 

Although the Fourier–Mellin transform is able to correctly register the synthetic image shown in Fig. 3(c), the image in Fig. 3(b) defies recovery because of the lack of similarity in its spectra compared to that of the reference image. 

An important contribution of this work is that the authors introduce a new method based on the log-polar transform in the spatial domain that works robustly with real images. 

The new method is based on multiresolution log-polar transformations to simultaneously find the best scale, rotation, and translation parameters. 

In order to show the importance of the log-polar module, the authors ran the LMA without the estimated initial parameters from the log-polar module. 

The correlation coefficient values for the thirty pairs mentioned above are all above 0.9, which is very good considering camera noise and artifacts introduced by warping to produce the target images. 

The LMA solves the following system of equations in an iterative fashion:(5)where is the Hessian matrix and is the residual vector(6)(7)The authors can improve the standard Levenberg–Marquardt optimization algorithm outlined above by adding two modifications. 

They were, however, reported recently in [40] and [41], where the authors showed that rotation and scale introduce aliasing in the low frequencies. 

Note that artificial black backgrounds can help register two images because it ensures that the authors consider the same underlying content.