scispace - formally typeset
Open AccessJournal ArticleDOI

Automatic Panoramic Image Stitching using Invariant Features

Reads0
Chats0
TLDR
This work forms stitching as a multi-image matching problem, and uses invariant local features to find matches between all of the images, and is insensitive to the ordering, orientation, scale and illumination of the input images.
Abstract
This paper concerns the problem of fully automated panoramic image stitching. Though the 1D problem (single axis of rotation) is well studied, 2D or multi-row stitching is more difficult. Previous approaches have used human input or restrictions on the image sequence in order to establish matching images. In this work, we formulate stitching as a multi-image matching problem, and use invariant local features to find matches between all of the images. Because of this our method is insensitive to the ordering, orientation, scale and illumination of the input images. It is also insensitive to noise images that are not part of a panorama, and can recognise multiple panoramas in an unordered image dataset. In addition to providing more detail, this paper extends our previous work in the area (Brown and Lowe, 2003) by introducing gain compensation and automatic straightening steps.

read more

Content maybe subject to copyright    Report

Automatic Panoramic Image Stitching using Invariant Features
Matthew Brown and David G. Lowe
{mbrown|lowe}@cs.ubc.ca
Department of Computer Science,
University of British Columbia,
Vancouver, Canada.
Abstract
This paper concerns the problem of fully automated
panoramic image stitching. Though the 1D problem (single
axis of rotation) is well studied, 2D or multi-row stitching is
more difficult. Previous approaches have used human input
or restrictions on the image sequence in order to establish
matching images. In this work, we formulate stitching as a
multi-image matching problem, and use invariant local fea-
tures to find matches between all of the images. Because of
this our method is insensitive to the ordering, orientation,
scale and illumination of the input images. It is also insen-
sitive to noise images that are not part of a panorama, and
can recognise multiple panoramas in an unordered image
dataset. In addition to providing more detail, this paper ex-
tends our previous work in the area [BL03] by introducing
gain compensation and automatic straightening steps.
1 Introduction
Panoramic image stitching has an extensive research lit-
erature [Sze04, Mil75, BL03] and several commercial ap-
plications [Che95, REA, MSF]. The basic geometry of
the problem is well understood, and consists of estimat-
ing a 3 × 3 camera matrix or homography for each image
[HZ04, SS97]. This estimation process needs an initialisa-
tion, which is typically provided by user input to approxi-
mately align the images, or a fixed image ordering. For ex-
ample, the PhotoStitch software bundled with Canon digital
cameras requires a horizontal or vertical sweep, or a square
matrix of images. REALVIZ Stitcher version 4 [REA] has a
user interface to roughly position the images with a mouse,
before automatic registration proceeds. Our work is novel
in that we require no such initialisation to be provided.
In the research literature methods for automatic image
alignment and stitching fall broadly into two categories
direct [SK95, IA99, SK99, SS00] and feature based
[ZFD97, CZ98, MJ02]. Direct methods have the advan-
tage that they use all of the available image data and hence
can provide very accurate registration, but they require a
close initialisation. Feature based registration does not re-
quire initialisation, but traditional feature matching meth-
ods (e.g., correlation of image patches around Harris cor-
ners [Har92, ST94]) lack the invariance properties needed
to enable reliable matching of arbitrary panoramic image
sequences.
In this paper we describe an invariant feature based ap-
proach to fully automatic panoramic image stitching. This
has several advantages over previous approaches. Firstly,
our use of invariant features enables reliable matching of
panoramic image sequences despite rotation, zoom and illu-
mination change in the input images. Secondly, by viewing
image stitching as a multi-image matching problem, we can
automatically discover the matching relationships between
the images, and recognise panoramas in unordered datasets.
Thirdly, we generate high-quality results using multi-band
blending to render seamless output panoramas. This paper
extends our earlier work in the area [BL03] by introducing
gain compensation and automatic straightening steps. We
also describe an efficient bundle adjustment implementation
and show how to perform multi-band blending for multiple
overlapping images with any number of bands.
The remainder of the paper is structured as follows. Sec-
tion 2 develops the geometry of the problem and motivates
our choice of invariant features. Section 3 describes our im-
age matching methodology (RANSAC) and a probabilistic
model for image match verification. In section 4 we de-
scribe our image alignment algorithm (bundle adjustment)
which jointly optimises the parameters of each camera. Sec-
tions 5 - 7 describe the rendering pipeline including au-
tomatic straightening, gain compensation and multi-band
blending. In section 9 we present conclusions and ideas for
future work.
2 Feature Matching
The first step in the panoramic recognition algorithm is
to extract and match SIFT [Low04] features between all of

the images. SIFT features are located at scale-space max-
ima/minima of a difference of Gaussian function. At each
feature location, a characteristic scale and orientation is es-
tablished. This gives a similarity-invariant frame in which
to make measurements. Although simply sampling inten-
sity values in this frame would be similarity invariant, the
invariant descriptor is actually computed by accumulating
local gradients in orientation histograms. This allows edges
to shift slightly without altering the descriptor vector, giving
some robustness to affine change. This spatial accumulation
is also important for shift invariance, since the interest point
locations are typically only accurate in the 0-3 pixel range
[BSW05, SZ03]. Illumination invariance is achieved by us-
ing gradients (which eliminates bias) and normalising the
descriptor vector (which eliminates gain).
Since SIFT features are invariant under rotation and scale
changes, our system can handle images with varying orien-
tation and zoom (see figure 8). Note that this would not be
possible using traditional feature matching techniques such
as correlation of image patches around Harris corners. Or-
dinary (translational) correlation is not invariant under ro-
tation, and Harris corners are not invariant to changes in
scale.
Assuming that the camera rotates about its optical cen-
tre, the group of transformations the images may undergo
is a special group of homographies. We parameterise each
camera by a rotation vector θ = [θ
1
, θ
2
, θ
3
] and focal length
f. This gives pairwise homographies
˜
u
i
= H
ij
˜
u
j
where
H
ij
= K
i
R
i
R
T
j
K
1
j
(1)
and
˜
u
i
,
˜
u
j
are the homogeneous image positions (
˜
u
i
=
s
i
[u
i
, 1], where u
i
is the 2-dimensional image position).
The 4 parameter camera model is defined by
K
i
=
f
i
0 0
0 f
i
0
0 0 1
(2)
and (using the exponential representation for rotations)
R
i
= e
[θ
i
]
×
, [θ
i
]
×
=
0 θ
i
3
θ
i
2
θ
i
3
0 θ
i
1
θ
i
2
θ
i
1
0
. (3)
Ideally one would use image features that are invariant
under this group of transformations. However, for small
changes in image position
u
i
= u
i
0
+
u
i
u
j
u
i
0
u
j
(4)
or equivalently
˜
u
i
= A
ij
˜
u
j
, where
A
ij
=
a
11
a
12
a
13
a
21
a
22
a
23
0 0 1
(5)
is an affine transformation obtained by linearising the ho-
mography about u
i
0
. This implies that each small image
patch undergoes an affine transformation, and justifies the
use of SIFT features which are partially invariant under
affine change.
Once features have been extracted from all n images (lin-
ear time), they must be matched. Since multiple images
may overlap a single ray, each feature is matched to its k
nearest neighbours in feature space (we use k = 4). This
can be done in O(n log n) time by using a k-d tree to find
approximate nearest neighbours [BL97]. A k-d tree is an
axis aligned binary space partition, which recursively par-
titions the feature space at the mean in the dimension with
highest variance.
3 Image Matching
At this stage the objective is to find all matching (i.e.
overlapping) images. Connected sets of image matches will
later become panoramas. Since each image could poten-
tially match every other one, this problem appears at first to
be quadratic in the number of images. However, it is only
necessary to match each image to a small number of over-
lapping images in order to get a good solution for the image
geometry.
From the feature matching step, we have identified im-
ages that have a large number of matches between them. We
consider a constant number m images, that have the greatest
number of feature matches to the current image, as potential
image matches (we use m = 6). First, we use RANSAC to
select a set of inliers that are compatible with a homography
between the images. Next we apply a probabilistic model to
verify the match.
3.1 Robust Homography Estimation using
RANSAC
RANSAC (random sample consensus) [FB81] is a robust
estimation procedure that uses a minimal set of randomly
sampled correspondences to estimate image transformation
parameters, and finds a solution that has the best consensus
with the data. In the case of panoramas we select sets of
r = 4 feature correspondences and compute the homogra-
phy H between them using the direct linear transformation
(DLT) method [HZ04]. We repeat this with n = 500 tri-
als and select the solution that has the maximum number
of inliers (whose projections are consistent with H within
a tolerance pixels). Given the probability that a feature

match is correct between a pair of matching images (the in-
lier probability) is p
i
, the probability of finding the correct
transformation after n trials is
p(H is correct) = 1 (1 (p
i
)
r
)
n
. (6)
After a large number of trials the probability of finding the
correct homography is very high. For example, for an in-
lier probability p
i
= 0.5, the probability that the correct
homography is not found after 500 trials is approximately
1 × 10
14
.
RANSAC is essentially a sampling approach to estimat-
ing H. If instead of maximising the number of inliers one
maximises the sum of the log likelihoods, the result is max-
imum likelihood estimation (MLE). Furthermore, if priors
on the transformation parameters are available, one can
compute a maximum a posteriori estimate (MAP). These
algorithms are known as MLESAC and MAPSAC respec-
tively [Tor02].
3.2 Probabilistic Model for Image Match Verifi-
cation
For each pair of potentially matching images we have
a set of feature matches that are geometrically consistent
(RANSAC inliers) and a set of features that are inside the
area of overlap but not consistent (RANSAC outliers). The
idea of our verification model is to compare the probabilities
that this set of inliers/outliers was generated by a correct
image match or by a false image match.
For a given image we denote the total number of features
in the area of overlap n
f
and the number of inliers n
i
. The
event that this image matches correctly/incorrectly is rep-
resented by the binary variable m {0, 1}. The event that
the i
th
feature match f
(i)
{0, 1} is an inlier/outlier is as-
sumed to be independent Bernoulli, so that the total number
of inliers is Binomial
p(f
(1:n
f
)
|m = 1) = B(n
i
; n
f
, p
1
) (7)
p(f
(1:n
f
)
|m = 0) = B(n
i
; n
f
, p
0
) (8)
where p
1
is the probability a feature is an inlier given a cor-
rect image match, and p
0
is the probability a feature is an
inlier given a false image match. The set of feature match
variables {f
(i)
, i = 1, 2, . . . , n
f
} is denoted f
(1:n
f
)
. The
number of inliers n
i
=
P
n
f
i=1
f
(i)
and B(.) is the Binomial
distribution
B(x; n, p) =
n!
x!(n x)!
p
x
(1 p)
nx
. (9)
We choose values p
1
= 0.6 and p
0
= 0.1. We can now eval-
uate the posterior probability that an image match is correct
using Bayes’ Rule
p(m = 1|f
(1:n
f
)
) =
p(f
(1:n
f
)
|m = 1)p(m = 1)
p(f
(1:n
f
)
)
(10)
=
1
1 +
p(f
(1:n
f
)
|m=0)p(m=0)
p(f
(1:n
f
)
|m=1)p(m=1)
(11)
We accept an image match if p(m = 1|f
(1:n
f
)
) > p
min
B(n
i
; n
f
, p
1
)p(m = 1)
B(n
i
; n
f
, p
0
)p(m = 0)
accept
reject
1
1
p
min
1
. (12)
Choosing values p(m = 1) = 10
6
and p
min
= 0.999
gives the condition
n
i
> α + βn
f
(13)
for a correct image match, where α = 8.0 and β = 0.3.
Though in practice we have chosen values for p
0
, p
1
, p(m =
0), p(m = 1) and p
min
, they could in principle be learnt
from the data. For example, p
1
could be estimated by com-
puting the fraction of matches consistent with correct ho-
mographies over a large dataset.
Once pairwise matches have been established between
images, we can find panoramic sequences as connected sets
of matching images. This allows us to recognise multiple
panoramas in a set of images, and reject noise images which
match to no other images (see figure (2)).
4 Bundle Adjustment
Given a set of geometrically consistent matches between
the images, we use bundle adjustment [TMHF99] to solve
for all of the camera parameters jointly. This is an essen-
tial step as concatenation of pairwise homographies would
cause accumulated errors and disregard multiple constraints
between images, e.g., that the ends of a panorama should
join up. Images are added to the bundle adjuster one by
one, with the best matching image (maximum number of
consistent matches) being added at each step. The new im-
age is initialised with the same rotation and focal length as
the image to which it best matches. Then the parameters are
updated using Levenberg-Marquardt.
The objective function we use is a robustified sum
squared projection error. That is, each feature is projected
into all the images in which it matches, and the sum of
squared image distances is minimised with respect to the
camera parameters
1
. Given a correspondence u
k
i
u
l
j
(u
k
i
1
Note that it would also be possible (and in fact statistically optimal) to
represent the unknown ray directions X explicitly, and to estimate them
jointly with the camera parameters. This would not increase the com-
plexity of the algorithm if a sparse bundle adjustment method was used
[TMHF99].

(a) Image 1 (b) Image 2
(c) SIFT matches 1 (d) SIFT matches 2
(e) RANSAC inliers 1 (f) RANSAC inliers 2
(g) Images aligned according to a homography
Figure 1. SIFT features are extracted from all of the images. After matching all of the features using a k-d tree, the m
images with the greatest number of feature matches to a given image are checked for an image match. First RANSAC
is performed to compute the homography, then a probabilistic model is invoked to verify the image match based on the
number of inliers. In this example the input images are 517 × 374 pixels and there are 247 correct feature matches.

(a) Image matches
(b) Connected components of image matches
(c) Output panoramas
Figure 2. Recognising panoramas. Given a noisy set of feature matches, we use RANSACanda probabilistic verification
procedure to find consistent image matches (a). Each arrow between a pair of images indicates that a consistent set of
feature matches was found between that pair. Connected components of image matches are detected (b) and stitched
into panoramas (c). Note that the algorithm is insensitive to noise images that do not belong to a panorama (connected
components of size 1 image).

Figures
Citations
More filters
Book

Computer Vision: Algorithms and Applications

TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.
Journal ArticleDOI

Globally optimal stitching of tiled 3D microscopic image acquisitions

TL;DR: This work developed a method that, based on the Fourier Shift Theorem, computes all possible translations between pairs of 3D images, yielding the best overlap in terms of the cross-correlation measure and subsequently finds the globally optimal configuration of the whole group of3D images.
Journal ArticleDOI

A Unified Framework for Street-View Panorama Stitching.

TL;DR: Experimental results on a large set of challenging street-view panoramic images captured form the real world illustrate that the proposed system is capable of creating high-quality panoramas.
Proceedings ArticleDOI

MatchNet: Unifying feature and metric learning for patch-based matching

TL;DR: A unified approach to combining feature computation and similarity networks for training a patch matching system that improves accuracy over previous state-of-the-art results on patch matching datasets, while reducing the storage requirement for descriptors is confirmed.
Patent

Vehicular vision system

TL;DR: In this article, the camera is disposed at an interior portion of a vehicle equipped with the vehicular vision system, where the camera one of (i) views exterior of the equipped vehicle through the windshield of the vehicle and forward of the equipment and (ii) views from the windshield into the interior cabin of the equipments.
References
More filters
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Journal ArticleDOI

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

TL;DR: New results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form that provide the basis for an automatic system that can solve the Location Determination Problem under difficult viewing.
Book

Multiple view geometry in computer vision

TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.

Multiple View Geometry in Computer Vision.

TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.
Proceedings ArticleDOI

Good features to track

TL;DR: A feature selection criterion that is optimal by construction because it is based on how the tracker works, and a feature monitoring method that can detect occlusions, disocclusions, and features that do not correspond to points in the world are proposed.
Related Papers (5)