scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A general imaging model and a method for finding its parameters

07 Jul 2001-Vol. 2, pp 108-115
TL;DR: A novel calibration method is presented that uses structured light patterns to extract the raxel parameters of an arbitrary imaging system and experimental results for perspective as well as ion-perspective imaging systems are included.
Abstract: Linear perspective projection has served as the dominant imaging model in computer vision. Recent developments in image sensing make the perspective model highly restrictive. This paper presents a general imaging model that can be used to represent an arbitrary imaging system. It is observed that all imaging systems perform a mapping from incoming scene rays to photo-sensitive elements on the image detector. This mapping can be conveniently described using a set of virtual sensing elements called raxels. Raxels include geometric, radiometric and optical properties. We present a novel calibration method that uses structured light patterns to extract the raxel parameters of an arbitrary imaging system. Experimental results for perspective as well as ion-perspective imaging systems are included.

Summary (2 min read)

1 Introduction

  • After describing the general imaging model and its properties, the authors present a simple method for finding the parameters of the model for any arbitrary imaging system:.
  • It is important to note that, given the non-perspective nature of a general device, conventional calibration methods based on known scene points [251 or self-calibration techniques that use unknown scene points [51, [lo] , [151, cannot be directly applied.
  • Since the authors are interested in the mapping from rays to image points, they need a ruy-bused calibration method.
  • The authors describe a simple and effective ray-based approach that uses structured light patterns.
  • This method allows a user to obtain the geometric, radiometric, and optical parameters of an arbitrarily complex imaging system in a matter of minutes.

2 General Imaging Model: Geometry

  • If the imaging system is perspective, all the incoming light rays are projected directly onto to the detector plane through a single point, namely, the effective pinhole of the perspective system.
  • The goal of this section is to present a geometrical model that can represent such imaging systems.

2.1 Raxels

  • Each raxel includes a pixel that measures light energy and imaging optics (a lens) that collects the bundle of rays around an incoming ray.
  • The authors will focus on the geometric properties (locations and orientations) of raxels.
  • Each raxel can posses its own radiometric (brightness and wavelength) response as well as optical (point spread) properties.
  • These non-geometric properties will be discussed in subsequent sections.

Figure 3: (a)

  • It may be placed along the line of a principle ray of light entering the imaging system.
  • In addition to location and orientation, a raxel may have radiometric and optical parameters.
  • The notation for a raxel used in this paper.

t f

  • Multiple raxels may be located at the same point (p1 = p~ = p3), but have different directions.
  • The choice of intersecting the incoming rays with a refer-3Many of the arrays of photo-sensitive elements in the imaging devices described in section 1 are one or two-dimensional.
  • 41ntensities usually do not change much along a ray (particularly when the medium is air) provided the the displacement is small with respect to the total length of the ray.
  • In 191 and [141, it was suggested that the plenoptic function could also be restricted to a plane.
  • The important thing is to choose some reference surface so that each incoming ray intersects this surface at only one point.

3 Caustics

  • 'For example, when light refracts through shallow water of a pool, bnght curves can be seen where the caustics intersect the bottom Figure 5 :.
  • The caustic is a good candidate for the ray surface of an imaging system as it is closely related to the geometry of the incoming rays; the incoming ray directions are tangent to the caustic.

4.1 Local Focal Length and Point Spread

  • An arbitrary imaging system cannot be expected to have a single global focal length.
  • Each raxel may be modeled to have its own focal length.
  • The authors can compute each raxel's focal length by measuring its point spread function for several depths.
  • A flexible approach models the point spread as an elliptical Gaussian.

4.3 Complete Imaging Model

  • In the case of perspective projection, the essential [51 or fundamental [ 101 matrix provides the relationship between points in one image and lines in another image (of the same scene).
  • In the general imaging model, this correspondence need no longer be projective.

5 Finding the Model Parameters

  • The major axis makes an angle 1c, with the x-axis in the image.
  • Each raxel has two focal lengths, f a , hThe angle $ is only defined if the major and minor axis have different.
  • In section the authors described how to compute The authors model for a known optical system.
  • In contrast, their goal in this section lengths.

Figure 7: (a)

  • If these positions are known, the direction of the ray qf may be determined for each pixel.
  • Now, the authors construct a calibration environment where the geometric and radiometric parameters can be efficiently estimated.
  • If a display has N locations, the authors can make each point distinct in logN images using simple grey coding or bit coding.
  • The authors may then compute the fall-off function across all the points.
  • The authors compute both the radiometric response function and the falloff from seventeen uniform brightness levels.

5.1 Experimental Apparatus

  • The laptop was oriented so as to give the maximum screen resolution along the axis of symmetry.
  • Figure 8 (b) shows a sample binary pattern as seen from the parabolic catadioptric system.
  • The perspective imaging system, consisting of just the camera itself, can be seen in Figure 1 l(a).

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

A
General Imaging Model and a Method for Finding its Parameters
Michael
D.
Grossberg
and
Shree
K.
Nayar
Department
of
Computer Science, Columbia University
New York, New York
10027
{mdog, nayar} @cs.columbia.edu
Abstract
Linear perspective projection has served as the dominant
imaging model in computer vision. Recent developments in
image sensing make the perspective model highly restric-
tive. This paper presents a general imaging model that can
be used to represent an arbitrary imaging system. It is ob-
served that all imaging systems perform a mapping from in-
coming scene rays to photo-sensitive elements
on
the im-
age detector: This mapping can be conveniently described
using a
set
of virtual sensing elements called
raxels.
Rax-
els include geometric, radiometric and optical properties.
We present a novel calibration method that uses structured
light patterns to extract the raxel parameters of an arbi-
trary imaging system. Experimental results for perspective
as
well
as non-perspective imaging systems are included.
1
Introduction
Since the Renaissance, artists have been fascinated by the
visual manifestations
of
perspective projection. Geometers
have studied the properties
of
the
pinhole imaging model
and derived
a
large suite
of
projective invariants that provide
insights into the relation between
a
scene and its perspective
image. The field of optics has developed high quality imag-
ing lenses that closely adhere
to
the perspective model.
To-
day, perspective projection serves as the dominant imaging
model in computer vision and computer graphics.
Despite its great relevance, there are several reasons that
make the perspective model
far
too restrictive. In recent
years, the notion of a “vision sensor” has taken on
a
much
broader meaning.
A
variety of devices have been devel-
oped that sample the light field
[81
or the plenoptic func-
tion
[11
associated with
a
scene in interesting and useful
non-perspective ways. Figure
1
shows some examples of
such imaging systems that are widely used today. Figure
l(a) shows
a
catadioptric sensor that uses
a
combination of
lenses and mirrors. Even when such
a
sensor has a single
effective viewpoint
[211,
its projection modo1 can include
a
variety of non-perspective distortions (barrel, pincushion,
or more complex)[21. More interesting is the fact that cer-
tain applications (see
[221
for example) require the system
to
not have
a
single viewpoint but rather a locus
of
viewpoints
(catacaustic
[41).
Similarly, wide-angle lens systems
[201
curved mirror
/
camera
camera cluster
Q,
8
(c)
meniscus lens
E>
compound camera
lenses
3
Figure
1
:
Examples
of
non-perspective imaging systems:
(a)
a
catadioptric system,
(b)
a dioptric wide-angle system, (c)
an
imag-
ing
system
made
of
a
camera cluster,
and
(d)
a
compound camera
made
of
individual
sensing elements, each including
a
receptor
and
a lens.
In
all
of
these cases, the imaging model
of
the system devi-
ates from perspective projection.
like the one shown in Figure l(b), include severe projective
distortions and often have a locus of viewpoints (called
a
di-
acaustic). Recently, clusters of cameras, like the one shown
in Figure l(c), have become popular
[171[241.
It
is clear that
such
a
system includes multiple viewpoints
or
loci of
view-
points, each one associated with one of the cameras in the
cluster. Finally, in the case
of
insects, nature has evolved
eyes that have compound lenses
[71,
[61,
such
as
the one
shown in Figure I(d). These eyes are composed of thou-
sands
of
“ommatidia”, each ommatidium including
a
recep-
tor and lens. It is only a matter of time before we see solid-
state cameras with flexible imaging surfaces that include
a
large number
of
such ommatidia.
In this paper we address two questions that we believe are
fundamental to imaging:
0
Is
there an imaging model that
is
general enough to
represent any arbitrary imaging system? Note that we
0-7695-1143-0/01
$10.00
0
2001
IEEE
108

are not placing any restrictions on the properties of
the imaging system. It could be perspective or non-
perspective.
0
Given an unknown imaging system (a black box),
is
there
a
simple calibration method that can compute
the parameters of the imaging model? Note that for
the types
of
systems we wish to subsume in our imag-
ing model, conventional camera calibration techniques
will not suffice.
Such a general imaging model must be flexible enough to
cover the wide range of devices that are of interest to
us.
Yet, it should be specific enough in terms of its parameters
that it is useful in practice. Our approach is to exploit the
fact that
all
imaging systems perform
a
mapping from in-
coming scene rays to photo-sensitive elements on the image
detector. This mapping can be conveniently described by
a
ray
surjiuce
which is
a
surface in three-dimensional space
from which the rays are measured in various directions. The
smallest element of our imaging model is
a
virtual photo-
sensitive element that measures light in essentially a single
direction. We refer to these virtual elements
as
ray pixels,
or
ruxels.
It turns out that
a
convenient way
to
represent
the ray surface on which the raxels reside is the
caustic
of
the imaging system. In addition to its geometric parame-
ters, each raxel has its own radiometric response function
and local point spread function.
After describing the general imaging model and its proper-
ties, we present
a
simple method for finding the parameters
of the model for any arbitrary imaging system: It is im-
portant to note that, given the non-perspective nature of
a
general device, conventional calibration methods based on
known scene points
[251
or self-calibration techniques that
use unknown scene points
[51,
[lo],
[151,
cannot be directly
applied. Since we are interested in the mapping from rays to
image points, we need a
ruy-bused
calibration method. We
describe a simple and effective ray-based approach that uses
structured light patterns. This method allows a user to ob-
tain the geometric, radiometric, and optical parameters of an
arbitrarily complex imaging system in a matter of minutes.
2
General
Imaging
Model: Geometry
We will consider the imaging system shown in Figure
2
when formulating
our
mathematical model of the imaging
sensors mentioned in section
1.
The system includes
a
de-
tector with a large number of photo-sensitive elements (pix-
els). The detector could be an electronic chip, film,
or
any
other light sensitive device. The technology used to im-
plement it is not important. The imaging optics typically
include several elements. Even
a
relatively simple optical
component has about five individual lenses within it. In our
arbitrary system, there may be additional optical elements
such as mirrors, prisms, or beam-slitters. In fact, the system
109
could be comprised of multiple individual imaging systems,
each with its own imaging optics and image detector.
Irrespective of its specific design, the purpose of an imag-
ing system is to map incoming rays of light from the scene
onto pixels on the detector. Each pixel collects light energy
from a bundle of closely packed rays in any system that has
a non-zero aperture size. However, the bundle can be rep-
resented by a single chief (or principle) ray when studying
the geometric properties of the imaging system.
As
shown
in Figure
2,
the system maps the ray
Pi
to the pixel
i.
The
path that incoming ray traverses to the pixel can be arbitrar-
ily complex.
photosensitive elements
arbitrary imaging system
-
captured light rays
I
Figure
2:
An imaging system directs incoming
light
rays
to
its
photo-sensitive elements (pixels). Each pixel collects light from a
bundle
of
rays
that
pass through the
finite
aperture
of
the imaging
system. However,
we
will
assume there is
a
correspondence
be-
tween each individual detector element
i
and
a
specific ray
of
light
Pi,
entering the system.
If the imaging system is perspective, all the incoming light
rays are projected directly onto to the detector plane through
a single point, namely, the effective pinhole of the perspec-
tive system. This is not true in an arbitrary system. For
instance, it is clear from Figure
2
that the captured rays do
not meet at
a
single effective viewpoint. The goal of this
section is to present a geometrical model that can represent
such imaging systems.
2.1
Raxels
It is convenient to represent the mapping from scene rays
to pixels in a form that easily lends itself to manipulation
and analysis. We can replace our physical pixels with an ab-
stract mathematical equivalent we refer to as a ray pixel or
a “raxel”.
A
raxel is
a
virtual photo-sensitive element that
measures the light energy of
a
compact bundle of rays which
can be represented as a single principle incoming ray.
A
similar abstraction, the pencigraph, was proposed by Mann

[
161
for the perspective case. The abstract optical model of
our
virtual raxel is shown in Figure
3.
Each raxel includes a
pixel that measures light energy and imaging optics (a lens)
that collects the bundle of rays around an incoming ray. In
this section, we will focus on the geometric properties
(lo-
cations and orientations) of raxels. However, each raxel can
posses its own radiometric (brightness and wavelength) re-
sponse as well as optical (point spread) properties. These
non-geometric properties will be discussed in subsequent
sections.
Figure
3:
(a)
A
raxel is
a
virtual
replacement
for
a real photo-
sensitive element.
It
may be placed along the line
of
a principle ray
of light entering the imaging system.
In
addition to location and
orientation,
a
raxel
may
have radiometric and optical parameters.
(b) The notation for a raxel used
in
this paper.
2.2
Plenoptic Function
What
does an imaging system see? The input to the sys-
tem is the plenoptic function
[l].
The plenoptic function
@(p,
q,
t,
A)
gives the intensity of light at each point
p
in
space, from direction
q,
at an instant of time
t
and wave-
length
A.
We specify position by
(px,
py,
pz)
and direction
by two angles
(q$,
qe).
Still images represent an integration
of light energy over
a
short time period, given by the ef-
fective shutter speed. Further, each photo-sensitive element
will average the plenoptic function across a range of wave-
lengths. Thus, we set aside time and wavelength by consid-
ering monochromatic still imaging. We will only consider
the plenoptic function as a function of position and direc-
tion:
@(p,
9).
We may view a raxel as a delta function'
d,,po,qo
over
p,
q
space,
as
it measures the value of plenoptic function
@(p,
q)
at
(PO,
90).
Hence the parameters for a raxel are just posi-
tion
po
and direction
90.
2.3
Pencils
of
Rays and Ray Surfaces
From
where
does the system (with its raxels) see the plenop-
tic function? Each point in the image corresponds to a ray.
Thus, the set of positions and directions determined by the
set of rays is the part of the domain
of
the plenoptic function
relevant to our system.2
'If
the raxel has
a
non-linear radiometric response then the response
2The related problem
of
representing the positions and directions cor-
must be linearized
for
the raxel to be
a
delta function.
responding to
a
light source was explored in
[
131.
The most general imaging system is described by a list
of these rays.
For
clarity, we assume
our
image is two-
dimen~ional.~ An image point is specified by
(x,y).
A
scene point
p
imaged at
(x,
y)
can be anywhere along the
corresponding ray.
To
specify the point in space we define a
parameter
r
along the ray. In the perspective case,
r
may be
chosen as scene depth.
A point
p(x,
y,
r)
imaged at
(2,
y)
at depth
r
is imaged
along a ray in the direction
q(x,
y,
r).
Thus, we see the
plenoptic function only from those points in the range of
p
and
q.
We may place a raxel anywhere along a ray.4 It will
be more convenient for representing the model to arrange
the raxels on a surface we call a
ray
surface.
For
example,
consider a sphere enclosing
our
imaging system,
as
shown
in Figure
4.
For
each photo-sensitive element
i
there is some
point
pi
on the sphere that received a ray in the direction
qi.
Thus
we can place our raxels on the sphere by assigning
them the positions and directions
(pi,
si).
It is important to
note that there could be several rays that enter the sphere at
the same point but with different directions (see
91,
q2
and
q3
in Figure
4).
Thus the direction
q
is not, in general, a
function of
p.
tf
Figure
4:
An
imaging system may be modeled as a set of raxels
on
a
sphere surrounding the imaging system. Each raxel
i
has
a
position
pi
on the sphere,
and
an orientation
q;
aligned
with
an
incoming ray. Multiple raxels may be located at the same point
(p1
=
p~
=
p3),
but
have different directions.
The choice of intersecting the incoming rays with a refer-
3Many
of
the arrays of photo-sensitive elements in the imaging devices
described in section
1
are
one
or
two-dimensional. Multi-camera systems
can be represented by two-dimensional arrays parameterized by
an
extra
parameter.
41ntensities usually do not change much along
a
ray
(particularly when
the medium is air) provided the the displacement is small with respect to
the total length of the ray.
110

ence sphere is arbitrary.
In
191
and [141, it was suggested
that the plenoptic function could
also
be restricted to
a
plane.
The important thing is to choose some reference surface
so
that each incoming ray intersects this surface at only one
point. If the incoming rays are parameterized by image co-
ordinates
(x,
y),
each ray will intersect
a
reference surface
at one point
p(x,
y).
We can write the ray surface
as
a
func-
tion of
(2,
y)
as:
We can express the position of
a
point along the ray
as
p(x,
y,
r)
=
p(z,
y)+
rq(x,
y).
This allows
us
to express
the relevant subset of the domain of the plenoptic function
as
the range of
In the case of an unknown imaging system
we
may mea-
sure
s(x,
y)
along some ray surface. In the case of
a
known
imaging system we are able to compute
s(z,
y)
a
priori.
Us-
ing equation 2 we may express one ray surface in terms of
another ray surface such
as
a
sphere or
a
plane,
3
Caustics
While convenient, the choices
of
a
plane or sphere
as
the
reference surface come with their drawbacks. The direction
q(z,
y)
of
a
raxel need not have any relation to the position
p(z,
y)
on
a general reference surface. There may be points
on the surface that do not have incoming rays. At other
points, several rays may pass through the same point.
For many imaging systems, there is a distinguished ray sur-
face which is defined
solely
by the geometry of the rays. In
Figure
5,
we see the envelope of the incoming rays form
a
curve. On such surfaces, direction
q
is
really
a
function of
position
p,
and the incomming rays are tangent to the sur-
face. This is
a
special case of
a
ray surface called
a
caustic.
We argue that caustics are the logical place
to
locate our
raxels.
Caustics have
a
number of different characterizations. The
caustic often forms a surface to which
all
incoming rays are
tangent. This does not happen in the perspective case, where
the caustic is
a
point. Caustics can
also
be viewed
as
geo-
metrical entities where there occurs
a
singularity (bunching)
of the light rays. As
a
result,
a
caustic formed by rays of illu-
mination generates
a
very bright region when
it
intersects
a
~urface.~ In our context of imaging systems, a caustic is the
locus of points where incoming rays most converge. Points
on the caustic can therefore be viewed as the ‘locus of view-
points” of the system. It is thus
a
natural place to locate the
raxels
of
our abstract imaging model.
’For
example, when light refracts through shallow water
of
a
pool,
bnght curves can be seen where the caustics intersect the bottom
Figure
5:
The caustic is a good candidate for the
ray
surface
of
an
imaging system as
it
is closely related to the geometry of the
incoming rays; the incoming
ray
directions are tangent to the caus-
tic.
3.1
Definition
of
a Caustic Surface
In section 2.3, we described the set of points and direc-
tions that can be detected by the imaging system. This set
is described by the range of
C(x,
y,
r)
from equation (2).
The position component functions are
9
=
px(z,y,~),
Y
=
py(x,y,r),
and
2
=
pz(z,y,r).
The map from
(x,
y,
r)
to
(A-,
Y,
2)
can be viewed
as
a change of coor-
dinates. The
caustic
surface
is defined
as
the locus
of
points
in
A-,
Y,
2
space where this change of coordinates is singu-
1
ar.
3.2
Computing Caustics
The ray-to-image mapping
of
an imaging system may be
obtained in two ways. One way, is to derive the mapping or
compute
it
numerically from optical components or parame-
ters of an imaging system which is known
a-priori.
Alterna-
tively,
a
calibration method can be used to obtain
a
discrete
form of the mapping (see section
5).
In either case, our goal
is to compute the caustic surface from the given mapping.
When this mapping is known in closed form, analytic meth-
ods can be used to derive the caustic surface
[41.
When
a
discrete approximation is given,
a
host
of
numerical meth-
ods [121,
[IS],
[261, may be used. The method we use com-
putes the caustic by finding
all
the points where the change
in
coordinates described above is singular
[21,
[31.
Equation (2) expresses
C
in terms of
a
known or measured
ray surface
s(x,
y).
The caustic is defined
as
the singular-
ities
in
the change from
(x,
y,
T)
coordinates to
(S,
Y,
2)
coordinates given by
p.
Singularities arise at those points
(A‘,
l’,
2)
where the Jacobian matrix
J
of the transforma-
tion does not have full rank. We can find these points by
111

computing the determinant of the Jacobian,
and setting it to zero. Since this is quadratic in
T
we can
solve for
r
explicitly in terms of
p,
q,
and their first deriva-
tives with respect to
z
and
y.
Plugging this back into
C
gives us an expression for the caustic ray surface parame-
terized by
(x,
y)
as
in equation
(1).
If the optical system
has translational or rotational symmetry then we may only
consider one parameter, for example
2,
in the image plane.
In this case Jacobian becomes linear in
T
and the solution
simplifies to:
(4)
3.3
Field
of
View
Some parameters used to specify the general perspective
camera model are derived from the ray surface representa-
tion in our general imaging model. Other parameters de-
pend on the perspective assumption and are ill defined in
our model. For example, field of view presents an ambigu-
ity, since in the non-perspective case the rays may
no
longer
form
a
simple rectangular cone. One candidate for
a
field
of view is the range of
q(x,
y)
over the image. This is the
same
as
the Gauss map
[l
11.
The Gauss map is
a
good ap-
proximation
to
the field of view when the scene points are
distant relative to the size
of
the imaging system.
Other geometric parameters of
a
perspective imaging sys-
tem, such
as
aspect ratio, and spherical aberration
[251,
may
no longer be separable in the general imaging model from
the parameterized ray surface.
4
Non-Geometric
Raxel
Parameters
Each raxel is treated
as
a
very narrow field
of
view, perspec-
tive imaging system. Many of the conventional parameters
associated with
a
perspective or near-perspective systems
may be attributed
to
a
raxel.
4.1
Local Focal Length and Point Spread
An arbitrary imaging system cannot be expected to have
a
single global focal length. However, each raxel may be
modeled
to
have its own focal length. We can compute each
raxel's focal length by measuring its point spread function
for several depths.
A
flexible approach models the point
spread
as
an elliptical Gaussian. Each ellipse has
a
major
fb,
as
well
as
a focal orientation
1c,
in the image.
4.2
Radiometry
The radiometric response,
g,
is expected to be smooth
and monotonic and can be modeled a polynomial
[191.
If
one can compute the radiometric response function of each
raxel, one can linearize the response with respect to scene
radiance, assuming the response is invertible.
We model the relation between raxel irradiance
E
as
scene
radiance
L
times
a
spatially varying attenuation factor,
h(z,
y),
corresponding to the image point
(2,
y).
We call
h(x,
y)
the
full-offfunction.
This factor takes into account
the finite size of the aperture and vignetting effects which
are linear in
L.
4.3
Complete Imaging Model
The general imaging model consists of
a
set of raxels pa-
rameterized by
x
and
y
in pixel coordinates. The parameters
associated with these raxels (see Figure
6),
are
(a)
position
and direction, that describe the ray surface of the caustic,
(b) major and minor focal lengths
as
well
as
a
focal orien-
tation, (c)
a
radiometric response function, and (d)
a
fall-off
constant for each pixel.
Figure
6:
Each raxel has the above parameters of image coor-
dinates, position and direction
in
space, major and minor focal
lengths,
focal
orientation, radiometric response,
and
fall-off fac-
tor. These parameters are measured
with
respect to a coordinate
frame
fixed
to
the imaging
system.
Camera parameters are separated into external and internal
parameters. The external parameters are specified by
a
co-
ordinate frame. The internal parameters (Figure
6)
are spec-
ified with respect to that coordinate frame. In particular, for
each raxel
i,
the parameters
pi,
qi
are measured with respect
to
a
single coordinate frame fixed to the imaging system. If
the system is rotated or translated, these parameters will not
change
but
the coordinate frame will.
In the case of perspective projection, the essential
[51
or
fundamental
[
101
matrix provides the relationship between
points in one image and lines in another image (of the same
scene). In the general imaging model, this correspondence
need no longer be projective. Nevertheless,
a
point in one
image still corresponds to
a
curve in the other, which may
be computed.
5
Finding
the
Model
Parameters
and minor axis. The major axis makes an angle
1c,
with the
x-axis in the image.6 Each raxel has two focal lengths,
fa,
hThe angle
$
is only defined
if
the
major
and minor
axis
have different
In
section
we
described
how
to
compute
Our
model
for
a
known optical system. In contrast, our goal in this section
lengths.
112

Citations
More filters
Book
30 Sep 2010
TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.
Abstract: Humans perceive the three-dimensional structure of the world with apparent ease. However, despite all of the recent advances in computer vision research, the dream of having a computer interpret an image at the same level as a two-year old remains elusive. Why is computer vision such a challenging problem and what is the current state of the art? Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which students can apply to their own personal photos and videos. More than just a source of recipes, this exceptionally authoritative and comprehensive textbook/reference also takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene. These problems are also analyzed using statistical models and solved using rigorous engineering techniques Topics and features: structured to support active curricula and project-oriented courses, with tips in the Introduction for using the book in a variety of customized courses; presents exercises at the end of each chapter with a heavy emphasis on testing algorithms and containing numerous suggestions for small mid-term projects; provides additional material and more detailed mathematical topics in the Appendices, which cover linear algebra, numerical techniques, and Bayesian estimation theory; suggests additional reading at the end of each chapter, including the latest research in each sub-field, in addition to a full Bibliography at the end of the book; supplies supplementary course material for students at the associated website, http://szeliski.org/Book/. Suitable for an upper-level undergraduate or graduate-level course in computer science or engineering, this textbook focuses on basic techniques that work under real-world conditions and encourages students to push their creative boundaries. Its design and exposition also make it eminently suitable as a unique reference to the fundamental techniques and current research literature in computer vision.

4,146 citations

Proceedings Article
01 Jan 1989
TL;DR: A scheme is developed for classifying the types of motion perceived by a humanlike robot and equations, theorems, concepts, clues, etc., relating the objects, their positions, and their motion to their images on the focal plane are presented.
Abstract: A scheme is developed for classifying the types of motion perceived by a humanlike robot. It is assumed that the robot receives visual images of the scene using a perspective system model. Equations, theorems, concepts, clues, etc., relating the objects, their positions, and their motion to their images on the focal plane are presented. >

2,000 citations

Book
Richard Szeliski1
31 Dec 2006
TL;DR: In this article, the basic motion models underlying alignment and stitching algorithms are described, and effective direct (pixel-based) and feature-based alignment algorithms, and blending algorithms used to produce seamless mosaics.
Abstract: This tutorial reviews image alignment and image stitching algorithms. Image alignment algorithms can discover the correspondence relationships among images with varying degrees of overlap. They are ideally suited for applications such as video stabilization, summarization, and the creation of panoramic mosaics. Image stitching algorithms take the alignment estimates produced by such registration algorithms and blend the images in a seamless manner, taking care to deal with potential problems such as blurring or ghosting caused by parallax and scene movement as well as varying image exposures. This tutorial reviews the basic motion models underlying alignment and stitching algorithms, describes effective direct (pixel-based) and feature-based alignment algorithms, and describes blending algorithms used to produce seamless mosaics. It ends with a discussion of open research problems in the area.

1,226 citations

Journal ArticleDOI
TL;DR: MOPED, a framework for Multiple Object Pose Estimation and Detection that seamlessly integrates single-image and multi-image object recognition and pose estimation in one optimized, robust, and scalable framework is presented.
Abstract: We present MOPED, a framework for Multiple Object Pose Estimation and Detection that seamlessly integrates single-image and multi-image object recognition and pose estimation in one optimized, robust, and scalable framework. We address two main challenges in computer vision for robotics: robust performance in complex scenes, and low latency for real-time operation. We achieve robust performance with Iterative Clustering Estimation (ICE), a novel algorithm that iteratively combines feature clustering with robust pose estimation. Feature clustering quickly partitions the scene and produces object hypotheses. The hypotheses are used to further refine the feature clusters, and the two steps iterate until convergence. ICE is easy to parallelize, and easily integrates single- and multi-camera object recognition and pose estimation. We also introduce a novel object hypothesis scoring function based on M-estimator theory, and a novel pose clustering algorithm that robustly handles recognition outliers. We achieve scalability and low latency with an improved feature matching algorithm for large databases, a GPU/CPU hybrid architecture that exploits parallelism at all levels, and an optimized resource scheduler. We provide extensive experimental results demonstrating state-of-the-art performance in terms of recognition, scalability, and latency in real-world robotic applications.

455 citations


Cites methods from "A general imaging model and a metho..."

  • ...The techniques we consider are the generalized camera (Grossberg and Nayar 2001) and the pose averaging (Viksten et al....

    [...]

Proceedings ArticleDOI
18 Jun 2003
TL;DR: The discrete structure from motion equations for generalized cameras is derived, and the corollaries to epipolar geometry are illustrated, which gives constraints on the optimal design of panoramic imaging systems constructed from multiple cameras.
Abstract: We illustrate how to consider a network of cameras as a single generalized camera in a framework proposed by Nayar (2001). We derive the discrete structure from motion equations for generalized cameras, and illustrate the corollaries to epipolar geometry. This formal mechanism allows one to use a network of cameras as if they were a single imaging device, even when they do not share a common center of projection. Furthermore, an analysis of structure from motion algorithms for this imaging model gives constraints on the optimal design of panoramic imaging systems constructed from multiple cameras.

323 citations


Cites background from "A general imaging model and a metho..."

  • ...Many natural camera systems, including catadioptric systems made with conical mirrors and incorrectly aligned lenses in standard cameras, have a set of viewpoints well characterized by a caustic [25]....

    [...]

  • ...The main contribution of this paper is to express a multi-camera system in this framework and then to derive the structure from motion constraint equations for this model....

    [...]

References
More filters
Proceedings ArticleDOI
01 Aug 1996
TL;DR: A new method for capturing the complete appearance of both synthetic and real world objects and scenes, representing this information, and then using this representation to render images of the object from new camera positions.
Abstract: This paper discusses a new method for capturing the complete appearance of both synthetic and real world objects and scenes, representing this information, and then using this representation to render images of the object from new camera positions. Unlike the shape capture process traditionally used in computer vision and the rendering process traditionally used in computer graphics, our approach does not rely on geometric representations. Instead we sample and reconstruct a 4D function, which we call a Lumigraph. The Lumigraph is a subset of the complete plenoptic function that describes the flow of light at all positions in all directions. With the Lumigraph, new images of the object can be generated very quickly, independent of the geometric or illumination complexity of the scene or object. The paper discusses a complete working system including the capture of samples, the construction of the Lumigraph, and the subsequent rendering of images from this new representation.

2,986 citations


"A general imaging model and a metho..." refers background in this paper

  • ...In this case Jacobian becomes linear in r and the solution simplifies to: r = (qX dpY dx qY dpX dx ) (qY dqX dx qX dqY dx ) : (4)...

    [...]

Journal ArticleDOI
01 Jan 1987-Nature
TL;DR: A simple algorithm for computing the three-dimensional structure of a scene from a correlated pair of perspective projections is described here, when the spatial relationship between the two projections is unknown.
Abstract: A simple algorithm for computing the three-dimensional structure of a scene from a correlated pair of perspective projections is described here, when the spatial relationship between the two projections is unknown. This problem is relevant not only to photographic surveying1 but also to binocular vision2, where the non-visual information available to the observer about the orientation and focal length of each eye is much less accurate than the optical information supplied by the retinal images themselves. The problem also arises in monocular perception of motion3, where the two projections represent views which are separated in time as well as space. As Marr and Poggio4 have noted, the fusing of two images to produce a three-dimensional percept involves two distinct processes: the establishment of a 1:1 correspondence between image points in the two views—the ‘correspondence problem’—and the use of the associated disparities for determining the distances of visible elements in the scene. I shall assume that the correspondence problem has been solved; the problem of reconstructing the scene then reduces to that of finding the relative orientation of the two viewpoints.

2,671 citations


"A general imaging model and a metho..." refers background in this paper

  • ...These non-geometric properties will be discussed in subsequent sections....

    [...]

Proceedings Article
01 Jan 1989
TL;DR: A scheme is developed for classifying the types of motion perceived by a humanlike robot and equations, theorems, concepts, clues, etc., relating the objects, their positions, and their motion to their images on the focal plane are presented.
Abstract: A scheme is developed for classifying the types of motion perceived by a humanlike robot. It is assumed that the robot receives visual images of the scene using a perspective system model. Equations, theorems, concepts, clues, etc., relating the objects, their positions, and their motion to their images on the focal plane are presented. >

2,000 citations

01 Jan 1991
TL;DR: Early vision as discussed by the authors is defined as measuring the amounts of various kinds of visual substances present in the image (e.g., redness or rightward motion energy) rather than in how it labels "things".
Abstract: What are the elements of early vision? This question might be taken to mean, What are the fundamental atoms of vision?—and might be variously answered in terms ofsuch candidate structures as edges, peaks, corners, and so on. In this chapter we adopt a rather different point of view and ask the question, What are the fundamentalsubstances of vision? This distinction is important becausewe wish to focus on the first steps in extraction of visualinformation. At this level it is premature to talk aboutdiscrete objects, even such simple ones as edges and corners.There is general agreement that early vision involvesmeasurements of a number of basic image properties in-cluding orientation, color, motion, and so on. Figure l.lshows a caricature (in the style of Neisser, 1976), of the sort of architecture that has become quite popular as a model for both human and machine vision. The first stageof processing involves a set of parallel pathways, eachdevoted to one particular-visual property. We propose that the measurements of these basic properties be con-sidered as the elements of early vision. We think of earlyvision as measuring the amounts of various kinds of vi-sual "substances" present in the image (e.g., redness orrightward motion energy). In other words, we are inter- ested in how early vision measures “stuff” rather than in how it labels “things.”What, then, are these elementary visual substances?Various lists have been compiled using a mixture of intui-tion and experiment. Electrophysiologists have describedneurons in striate cortex that are selectively sensitive tocertain visual properties; for reviews, see Hubel (1988) and DeValois and DeValois (1988). Psychophysicists haveinferred the existence of channels that are tuned for cer- tain visual properties; for reviews, see Graham (1989), Olzak and Thomas (1986), Pokorny and Smith (1986), and Watson (1986). Researchers in perception have foundaspects of visual stimuli that are processed pre-attentive- ly (Beck, 1966; Bergen & Julesz, 1983; Julesz & Bergen,

1,576 citations


"A general imaging model and a metho..." refers background in this paper

  • ...In section 2.3, we described the set of points and directions that can be detected by the imaging system....

    [...]

  • ...A variety of devices have been developed that sample the light field [8] or the plenoptic function [1] associated with a scene in interesting and useful non-perspective ways....

    [...]

Book ChapterDOI
19 May 1992
TL;DR: This paper addresses the problem of determining the kind of three- dimensional reconstructions that can be obtained from a binocular stereo rig for which no three-dimensional metric calibration data is available, and shows that even in this case some very rich non-metric reconstructions of the environment can nonetheless be obtained.
Abstract: This paper addresses the problem of determining the kind of three-dimensional reconstructions that can be obtained from a binocular stereo rig for which no three-dimensional metric calibration data is available. The only information at our disposal is a set of pixel correspondences between the two retinas which we assume are obtained by some correlation technique or any other means. We show that even in this case some very rich non-metric reconstructions of the environment can nonetheless be obtained.

998 citations


"A general imaging model and a metho..." refers background in this paper

  • ...In the case of perspective projection, the essential [5] or fundamental [10] matrix provides the relationship between points in one image and lines in another image (of the same scene)....

    [...]

  • ...It is important to note that, given the non-perspective nature of a general device, conventional calibration methods based on known scene points [25] or self-calibration techniques that use unknown scene points [5], [10], [15], cannot be directly applied....

    [...]

  • ...7It is important to note that, given the non-perspective nature of a general device, conventional calibration [5], [10], [15], [25], cannot be applied here....

    [...]