A general imaging model and a method for finding its parameters
07 Jul 2001Vol. 2, pp 108115
TL;DR: A novel calibration method is presented that uses structured light patterns to extract the raxel parameters of an arbitrary imaging system and experimental results for perspective as well as ionperspective imaging systems are included.
Abstract: Linear perspective projection has served as the dominant imaging model in computer vision. Recent developments in image sensing make the perspective model highly restrictive. This paper presents a general imaging model that can be used to represent an arbitrary imaging system. It is observed that all imaging systems perform a mapping from incoming scene rays to photosensitive elements on the image detector. This mapping can be conveniently described using a set of virtual sensing elements called raxels. Raxels include geometric, radiometric and optical properties. We present a novel calibration method that uses structured light patterns to extract the raxel parameters of an arbitrary imaging system. Experimental results for perspective as well as ionperspective imaging systems are included.
Summary (2 min read)
Jump to: [1 Introduction] – [2 General Imaging Model: Geometry] – [2.1 Raxels] – [Figure 3: (a)] – [t f] – [3 Caustics] – [4.1 Local Focal Length and Point Spread] – [4.3 Complete Imaging Model] – [5 Finding the Model Parameters] – [Figure 7: (a)] and [5.1 Experimental Apparatus]
1 Introduction
 After describing the general imaging model and its properties, the authors present a simple method for finding the parameters of the model for any arbitrary imaging system:.
 It is important to note that, given the nonperspective nature of a general device, conventional calibration methods based on known scene points [251 or selfcalibration techniques that use unknown scene points [51, [lo] , [151, cannot be directly applied.
 Since the authors are interested in the mapping from rays to image points, they need a ruybused calibration method.
 The authors describe a simple and effective raybased approach that uses structured light patterns.
 This method allows a user to obtain the geometric, radiometric, and optical parameters of an arbitrarily complex imaging system in a matter of minutes.
2 General Imaging Model: Geometry
 If the imaging system is perspective, all the incoming light rays are projected directly onto to the detector plane through a single point, namely, the effective pinhole of the perspective system.
 The goal of this section is to present a geometrical model that can represent such imaging systems.
2.1 Raxels
 Each raxel includes a pixel that measures light energy and imaging optics (a lens) that collects the bundle of rays around an incoming ray.
 The authors will focus on the geometric properties (locations and orientations) of raxels.
 Each raxel can posses its own radiometric (brightness and wavelength) response as well as optical (point spread) properties.
 These nongeometric properties will be discussed in subsequent sections.
Figure 3: (a)
 It may be placed along the line of a principle ray of light entering the imaging system.
 In addition to location and orientation, a raxel may have radiometric and optical parameters.
 The notation for a raxel used in this paper.
t f
 Multiple raxels may be located at the same point (p1 = p~ = p3), but have different directions.
 The choice of intersecting the incoming rays with a refer3Many of the arrays of photosensitive elements in the imaging devices described in section 1 are one or twodimensional.
 41ntensities usually do not change much along a ray (particularly when the medium is air) provided the the displacement is small with respect to the total length of the ray.
 In 191 and [141, it was suggested that the plenoptic function could also be restricted to a plane.
 The important thing is to choose some reference surface so that each incoming ray intersects this surface at only one point.
3 Caustics
 'For example, when light refracts through shallow water of a pool, bnght curves can be seen where the caustics intersect the bottom Figure 5 :.
 The caustic is a good candidate for the ray surface of an imaging system as it is closely related to the geometry of the incoming rays; the incoming ray directions are tangent to the caustic.
4.1 Local Focal Length and Point Spread
 An arbitrary imaging system cannot be expected to have a single global focal length.
 Each raxel may be modeled to have its own focal length.
 The authors can compute each raxel's focal length by measuring its point spread function for several depths.
 A flexible approach models the point spread as an elliptical Gaussian.
4.3 Complete Imaging Model
 In the case of perspective projection, the essential [51 or fundamental [ 101 matrix provides the relationship between points in one image and lines in another image (of the same scene).
 In the general imaging model, this correspondence need no longer be projective.
5 Finding the Model Parameters
 The major axis makes an angle 1c, with the xaxis in the image.
 Each raxel has two focal lengths, f a , hThe angle $ is only defined if the major and minor axis have different.
 In section the authors described how to compute The authors model for a known optical system.
 In contrast, their goal in this section lengths.
Figure 7: (a)
 If these positions are known, the direction of the ray qf may be determined for each pixel.
 Now, the authors construct a calibration environment where the geometric and radiometric parameters can be efficiently estimated.
 If a display has N locations, the authors can make each point distinct in logN images using simple grey coding or bit coding.
 The authors may then compute the falloff function across all the points.
 The authors compute both the radiometric response function and the falloff from seventeen uniform brightness levels.
5.1 Experimental Apparatus
 The laptop was oriented so as to give the maximum screen resolution along the axis of symmetry.
 Figure 8 (b) shows a sample binary pattern as seen from the parabolic catadioptric system.
 The perspective imaging system, consisting of just the camera itself, can be seen in Figure 1 l(a).
Did you find this useful? Give us your feedback
A
General Imaging Model and a Method for Finding its Parameters
Michael
D.
Grossberg
and
Shree
K.
Nayar
Department
of
Computer Science, Columbia University
New York, New York
10027
{mdog, nayar} @cs.columbia.edu
Abstract
Linear perspective projection has served as the dominant
imaging model in computer vision. Recent developments in
image sensing make the perspective model highly restric
tive. This paper presents a general imaging model that can
be used to represent an arbitrary imaging system. It is ob
served that all imaging systems perform a mapping from in
coming scene rays to photosensitive elements
on
the im
age detector: This mapping can be conveniently described
using a
set
of virtual sensing elements called
raxels.
Rax
els include geometric, radiometric and optical properties.
We present a novel calibration method that uses structured
light patterns to extract the raxel parameters of an arbi
trary imaging system. Experimental results for perspective
as
well
as nonperspective imaging systems are included.
1
Introduction
Since the Renaissance, artists have been fascinated by the
visual manifestations
of
perspective projection. Geometers
have studied the properties
of
the
pinhole imaging model
and derived
a
large suite
of
projective invariants that provide
insights into the relation between
a
scene and its perspective
image. The field of optics has developed high quality imag
ing lenses that closely adhere
to
the perspective model.
To
day, perspective projection serves as the dominant imaging
model in computer vision and computer graphics.
Despite its great relevance, there are several reasons that
make the perspective model
far
too restrictive. In recent
years, the notion of a “vision sensor” has taken on
a
much
broader meaning.
A
variety of devices have been devel
oped that sample the light field
[81
or the plenoptic func
tion
[11
associated with
a
scene in interesting and useful
nonperspective ways. Figure
1
shows some examples of
such imaging systems that are widely used today. Figure
l(a) shows
a
catadioptric sensor that uses
a
combination of
lenses and mirrors. Even when such
a
sensor has a single
effective viewpoint
[211,
its projection modo1 can include
a
variety of nonperspective distortions (barrel, pincushion,
or more complex)[21. More interesting is the fact that cer
tain applications (see
[221
for example) require the system
to
not have
a
single viewpoint but rather a locus
of
viewpoints
(catacaustic
[41).
Similarly, wideangle lens systems
[201
curved mirror
/
camera
camera cluster
Q,
8
(c)
meniscus lens
E>
compound camera
lenses
3
Figure
1
:
Examples
of
nonperspective imaging systems:
(a)
a
catadioptric system,
(b)
a dioptric wideangle system, (c)
an
imag
ing
system
made
of
a
camera cluster,
and
(d)
a
compound camera
made
of
individual
sensing elements, each including
a
receptor
and
a lens.
In
all
of
these cases, the imaging model
of
the system devi
ates from perspective projection.
like the one shown in Figure l(b), include severe projective
distortions and often have a locus of viewpoints (called
a
di
acaustic). Recently, clusters of cameras, like the one shown
in Figure l(c), have become popular
[171[241.
It
is clear that
such
a
system includes multiple viewpoints
or
loci of
view
points, each one associated with one of the cameras in the
cluster. Finally, in the case
of
insects, nature has evolved
eyes that have compound lenses
[71,
[61,
such
as
the one
shown in Figure I(d). These eyes are composed of thou
sands
of
“ommatidia”, each ommatidium including
a
recep
tor and lens. It is only a matter of time before we see solid
state cameras with flexible imaging surfaces that include
a
large number
of
such ommatidia.
In this paper we address two questions that we believe are
fundamental to imaging:
0
Is
there an imaging model that
is
general enough to
represent any arbitrary imaging system? Note that we
0769511430/01
$10.00
0
2001
IEEE
108
are not placing any restrictions on the properties of
the imaging system. It could be perspective or non
perspective.
0
Given an unknown imaging system (a black box),
is
there
a
simple calibration method that can compute
the parameters of the imaging model? Note that for
the types
of
systems we wish to subsume in our imag
ing model, conventional camera calibration techniques
will not suffice.
Such a general imaging model must be flexible enough to
cover the wide range of devices that are of interest to
us.
Yet, it should be specific enough in terms of its parameters
that it is useful in practice. Our approach is to exploit the
fact that
all
imaging systems perform
a
mapping from in
coming scene rays to photosensitive elements on the image
detector. This mapping can be conveniently described by
a
ray
surjiuce
which is
a
surface in threedimensional space
from which the rays are measured in various directions. The
smallest element of our imaging model is
a
virtual photo
sensitive element that measures light in essentially a single
direction. We refer to these virtual elements
as
ray pixels,
or
ruxels.
It turns out that
a
convenient way
to
represent
the ray surface on which the raxels reside is the
caustic
of
the imaging system. In addition to its geometric parame
ters, each raxel has its own radiometric response function
and local point spread function.
After describing the general imaging model and its proper
ties, we present
a
simple method for finding the parameters
of the model for any arbitrary imaging system: It is im
portant to note that, given the nonperspective nature of
a
general device, conventional calibration methods based on
known scene points
[251
or selfcalibration techniques that
use unknown scene points
[51,
[lo],
[151,
cannot be directly
applied. Since we are interested in the mapping from rays to
image points, we need a
ruybused
calibration method. We
describe a simple and effective raybased approach that uses
structured light patterns. This method allows a user to ob
tain the geometric, radiometric, and optical parameters of an
arbitrarily complex imaging system in a matter of minutes.
2
General
Imaging
Model: Geometry
We will consider the imaging system shown in Figure
2
when formulating
our
mathematical model of the imaging
sensors mentioned in section
1.
The system includes
a
de
tector with a large number of photosensitive elements (pix
els). The detector could be an electronic chip, film,
or
any
other light sensitive device. The technology used to im
plement it is not important. The imaging optics typically
include several elements. Even
a
relatively simple optical
component has about five individual lenses within it. In our
arbitrary system, there may be additional optical elements
such as mirrors, prisms, or beamslitters. In fact, the system
109
could be comprised of multiple individual imaging systems,
each with its own imaging optics and image detector.
Irrespective of its specific design, the purpose of an imag
ing system is to map incoming rays of light from the scene
onto pixels on the detector. Each pixel collects light energy
from a bundle of closely packed rays in any system that has
a nonzero aperture size. However, the bundle can be rep
resented by a single chief (or principle) ray when studying
the geometric properties of the imaging system.
As
shown
in Figure
2,
the system maps the ray
Pi
to the pixel
i.
The
path that incoming ray traverses to the pixel can be arbitrar
ily complex.
‘
photosensitive elements
’
arbitrary imaging system

captured light rays
I
Figure
2:
An imaging system directs incoming
light
rays
to
its
photosensitive elements (pixels). Each pixel collects light from a
bundle
of
rays
that
pass through the
finite
aperture
of
the imaging
system. However,
we
will
assume there is
a
correspondence
be
tween each individual detector element
i
and
a
specific ray
of
light
Pi,
entering the system.
If the imaging system is perspective, all the incoming light
rays are projected directly onto to the detector plane through
a single point, namely, the effective pinhole of the perspec
tive system. This is not true in an arbitrary system. For
instance, it is clear from Figure
2
that the captured rays do
not meet at
a
single effective viewpoint. The goal of this
section is to present a geometrical model that can represent
such imaging systems.
2.1
Raxels
It is convenient to represent the mapping from scene rays
to pixels in a form that easily lends itself to manipulation
and analysis. We can replace our physical pixels with an ab
stract mathematical equivalent we refer to as a ray pixel or
a “raxel”.
A
raxel is
a
virtual photosensitive element that
measures the light energy of
a
compact bundle of rays which
can be represented as a single principle incoming ray.
A
similar abstraction, the pencigraph, was proposed by Mann
[
161
for the perspective case. The abstract optical model of
our
virtual raxel is shown in Figure
3.
Each raxel includes a
pixel that measures light energy and imaging optics (a lens)
that collects the bundle of rays around an incoming ray. In
this section, we will focus on the geometric properties
(lo
cations and orientations) of raxels. However, each raxel can
posses its own radiometric (brightness and wavelength) re
sponse as well as optical (point spread) properties. These
nongeometric properties will be discussed in subsequent
sections.
Figure
3:
(a)
A
raxel is
a
virtual
replacement
for
a real photo
sensitive element.
It
may be placed along the line
of
a principle ray
of light entering the imaging system.
In
addition to location and
orientation,
a
raxel
may
have radiometric and optical parameters.
(b) The notation for a raxel used
in
this paper.
2.2
Plenoptic Function
What
does an imaging system see? The input to the sys
tem is the plenoptic function
[l].
The plenoptic function
@(p,
q,
t,
A)
gives the intensity of light at each point
p
in
space, from direction
q,
at an instant of time
t
and wave
length
A.
We specify position by
(px,
py,
pz)
and direction
by two angles
(q$,
qe).
Still images represent an integration
of light energy over
a
short time period, given by the ef
fective shutter speed. Further, each photosensitive element
will average the plenoptic function across a range of wave
lengths. Thus, we set aside time and wavelength by consid
ering monochromatic still imaging. We will only consider
the plenoptic function as a function of position and direc
tion:
@(p,
9).
We may view a raxel as a delta function'
d,,po,qo
over
p,
q
space,
as
it measures the value of plenoptic function
@(p,
q)
at
(PO,
90).
Hence the parameters for a raxel are just posi
tion
po
and direction
90.
2.3
Pencils
of
Rays and Ray Surfaces
From
where
does the system (with its raxels) see the plenop
tic function? Each point in the image corresponds to a ray.
Thus, the set of positions and directions determined by the
set of rays is the part of the domain
of
the plenoptic function
relevant to our system.2
'If
the raxel has
a
nonlinear radiometric response then the response
2The related problem
of
representing the positions and directions cor
must be linearized
for
the raxel to be
a
delta function.
responding to
a
light source was explored in
[
131.
The most general imaging system is described by a list
of these rays.
For
clarity, we assume
our
image is two
dimen~ional.~ An image point is specified by
(x,y).
A
scene point
p
imaged at
(x,
y)
can be anywhere along the
corresponding ray.
To
specify the point in space we define a
parameter
r
along the ray. In the perspective case,
r
may be
chosen as scene depth.
A point
p(x,
y,
r)
imaged at
(2,
y)
at depth
r
is imaged
along a ray in the direction
q(x,
y,
r).
Thus, we see the
plenoptic function only from those points in the range of
p
and
q.
We may place a raxel anywhere along a ray.4 It will
be more convenient for representing the model to arrange
the raxels on a surface we call a
ray
surface.
For
example,
consider a sphere enclosing
our
imaging system,
as
shown
in Figure
4.
For
each photosensitive element
i
there is some
point
pi
on the sphere that received a ray in the direction
qi.
Thus
we can place our raxels on the sphere by assigning
them the positions and directions
(pi,
si).
It is important to
note that there could be several rays that enter the sphere at
the same point but with different directions (see
91,
q2
and
q3
in Figure
4).
Thus the direction
q
is not, in general, a
function of
p.
tf
Figure
4:
An
imaging system may be modeled as a set of raxels
on
a
sphere surrounding the imaging system. Each raxel
i
has
a
position
pi
on the sphere,
and
an orientation
q;
aligned
with
an
incoming ray. Multiple raxels may be located at the same point
(p1
=
p~
=
p3),
but
have different directions.
The choice of intersecting the incoming rays with a refer
3Many
of
the arrays of photosensitive elements in the imaging devices
described in section
1
are
one
or
twodimensional. Multicamera systems
can be represented by twodimensional arrays parameterized by
an
extra
parameter.
41ntensities usually do not change much along
a
ray
(particularly when
the medium is air) provided the the displacement is small with respect to
the total length of the ray.
110
ence sphere is arbitrary.
In
191
and [141, it was suggested
that the plenoptic function could
also
be restricted to
a
plane.
The important thing is to choose some reference surface
so
that each incoming ray intersects this surface at only one
point. If the incoming rays are parameterized by image co
ordinates
(x,
y),
each ray will intersect
a
reference surface
at one point
p(x,
y).
We can write the ray surface
as
a
func
tion of
(2,
y)
as:
We can express the position of
a
point along the ray
as
p(x,
y,
r)
=
p(z,
y)+
rq(x,
y).
This allows
us
to express
the relevant subset of the domain of the plenoptic function
as
the range of
In the case of an unknown imaging system
we
may mea
sure
s(x,
y)
along some ray surface. In the case of
a
known
imaging system we are able to compute
s(z,
y)
a
priori.
Us
ing equation 2 we may express one ray surface in terms of
another ray surface such
as
a
sphere or
a
plane,
3
Caustics
While convenient, the choices
of
a
plane or sphere
as
the
reference surface come with their drawbacks. The direction
q(z,
y)
of
a
raxel need not have any relation to the position
p(z,
y)
on
a general reference surface. There may be points
on the surface that do not have incoming rays. At other
points, several rays may pass through the same point.
For many imaging systems, there is a distinguished ray sur
face which is defined
solely
by the geometry of the rays. In
Figure
5,
we see the envelope of the incoming rays form
a
curve. On such surfaces, direction
q
is
really
a
function of
position
p,
and the incomming rays are tangent to the sur
face. This is
a
special case of
a
ray surface called
a
caustic.
We argue that caustics are the logical place
to
locate our
raxels.
Caustics have
a
number of different characterizations. The
caustic often forms a surface to which
all
incoming rays are
tangent. This does not happen in the perspective case, where
the caustic is
a
point. Caustics can
also
be viewed
as
geo
metrical entities where there occurs
a
singularity (bunching)
of the light rays. As
a
result,
a
caustic formed by rays of illu
mination generates
a
very bright region when
it
intersects
a
~urface.~ In our context of imaging systems, a caustic is the
locus of points where incoming rays most converge. Points
on the caustic can therefore be viewed as the ‘locus of view
points” of the system. It is thus
a
natural place to locate the
raxels
of
our abstract imaging model.
’For
example, when light refracts through shallow water
of
a
pool,
bnght curves can be seen where the caustics intersect the bottom
Figure
5:
The caustic is a good candidate for the
ray
surface
of
an
imaging system as
it
is closely related to the geometry of the
incoming rays; the incoming
ray
directions are tangent to the caus
tic.
3.1
Definition
of
a Caustic Surface
In section 2.3, we described the set of points and direc
tions that can be detected by the imaging system. This set
is described by the range of
C(x,
y,
r)
from equation (2).
The position component functions are
9
=
px(z,y,~),
Y
=
py(x,y,r),
and
2
=
pz(z,y,r).
The map from
(x,
y,
r)
to
(A,
Y,
2)
can be viewed
as
a change of coor
dinates. The
caustic
surface
is defined
as
the locus
of
points
in
A,
Y,
2
space where this change of coordinates is singu
1
ar.
3.2
Computing Caustics
The raytoimage mapping
of
an imaging system may be
obtained in two ways. One way, is to derive the mapping or
compute
it
numerically from optical components or parame
ters of an imaging system which is known
apriori.
Alterna
tively,
a
calibration method can be used to obtain
a
discrete
form of the mapping (see section
5).
In either case, our goal
is to compute the caustic surface from the given mapping.
When this mapping is known in closed form, analytic meth
ods can be used to derive the caustic surface
[41.
When
a
discrete approximation is given,
a
host
of
numerical meth
ods [121,
[IS],
[261, may be used. The method we use com
putes the caustic by finding
all
the points where the change
in
coordinates described above is singular
[21,
[31.
Equation (2) expresses
C
in terms of
a
known or measured
ray surface
s(x,
y).
The caustic is defined
as
the singular
ities
in
the change from
(x,
y,
T)
coordinates to
(S,
Y,
2)
coordinates given by
p.
Singularities arise at those points
(A‘,
l’,
2)
where the Jacobian matrix
J
of the transforma
tion does not have full rank. We can find these points by
111
computing the determinant of the Jacobian,
and setting it to zero. Since this is quadratic in
T
we can
solve for
r
explicitly in terms of
p,
q,
and their first deriva
tives with respect to
z
and
y.
Plugging this back into
C
gives us an expression for the caustic ray surface parame
terized by
(x,
y)
as
in equation
(1).
If the optical system
has translational or rotational symmetry then we may only
consider one parameter, for example
2,
in the image plane.
In this case Jacobian becomes linear in
T
and the solution
simplifies to:
(4)
3.3
Field
of
View
Some parameters used to specify the general perspective
camera model are derived from the ray surface representa
tion in our general imaging model. Other parameters de
pend on the perspective assumption and are ill defined in
our model. For example, field of view presents an ambigu
ity, since in the nonperspective case the rays may
no
longer
form
a
simple rectangular cone. One candidate for
a
field
of view is the range of
q(x,
y)
over the image. This is the
same
as
the Gauss map
[l
11.
The Gauss map is
a
good ap
proximation
to
the field of view when the scene points are
distant relative to the size
of
the imaging system.
Other geometric parameters of
a
perspective imaging sys
tem, such
as
aspect ratio, and spherical aberration
[251,
may
no longer be separable in the general imaging model from
the parameterized ray surface.
4
NonGeometric
Raxel
Parameters
Each raxel is treated
as
a
very narrow field
of
view, perspec
tive imaging system. Many of the conventional parameters
associated with
a
perspective or nearperspective systems
may be attributed
to
a
raxel.
4.1
Local Focal Length and Point Spread
An arbitrary imaging system cannot be expected to have
a
single global focal length. However, each raxel may be
modeled
to
have its own focal length. We can compute each
raxel's focal length by measuring its point spread function
for several depths.
A
flexible approach models the point
spread
as
an elliptical Gaussian. Each ellipse has
a
major
fb,
as
well
as
a focal orientation
1c,
in the image.
4.2
Radiometry
The radiometric response,
g,
is expected to be smooth
and monotonic and can be modeled a polynomial
[191.
If
one can compute the radiometric response function of each
raxel, one can linearize the response with respect to scene
radiance, assuming the response is invertible.
We model the relation between raxel irradiance
E
as
scene
radiance
L
times
a
spatially varying attenuation factor,
h(z,
y),
corresponding to the image point
(2,
y).
We call
h(x,
y)
the
fullofffunction.
This factor takes into account
the finite size of the aperture and vignetting effects which
are linear in
L.
4.3
Complete Imaging Model
The general imaging model consists of
a
set of raxels pa
rameterized by
x
and
y
in pixel coordinates. The parameters
associated with these raxels (see Figure
6),
are
(a)
position
and direction, that describe the ray surface of the caustic,
(b) major and minor focal lengths
as
well
as
a
focal orien
tation, (c)
a
radiometric response function, and (d)
a
falloff
constant for each pixel.
Figure
6:
Each raxel has the above parameters of image coor
dinates, position and direction
in
space, major and minor focal
lengths,
focal
orientation, radiometric response,
and
falloff fac
tor. These parameters are measured
with
respect to a coordinate
frame
fixed
to
the imaging
system.
Camera parameters are separated into external and internal
parameters. The external parameters are specified by
a
co
ordinate frame. The internal parameters (Figure
6)
are spec
ified with respect to that coordinate frame. In particular, for
each raxel
i,
the parameters
pi,
qi
are measured with respect
to
a
single coordinate frame fixed to the imaging system. If
the system is rotated or translated, these parameters will not
change
but
the coordinate frame will.
In the case of perspective projection, the essential
[51
or
fundamental
[
101
matrix provides the relationship between
points in one image and lines in another image (of the same
scene). In the general imaging model, this correspondence
need no longer be projective. Nevertheless,
a
point in one
image still corresponds to
a
curve in the other, which may
be computed.
5
Finding
the
Model
Parameters
and minor axis. The major axis makes an angle
1c,
with the
xaxis in the image.6 Each raxel has two focal lengths,
fa,
hThe angle
$
is only defined
if
the
major
and minor
axis
have different
In
section
we
described
how
to
compute
Our
model
for
a
known optical system. In contrast, our goal in this section
lengths.
112
Citations
More filters
•
30 Sep 2010
TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.
Abstract: Humans perceive the threedimensional structure of the world with apparent ease. However, despite all of the recent advances in computer vision research, the dream of having a computer interpret an image at the same level as a twoyear old remains elusive. Why is computer vision such a challenging problem and what is the current state of the art? Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging realworld applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumerlevel tasks such as image editing and stitching, which students can apply to their own personal photos and videos. More than just a source of recipes, this exceptionally authoritative and comprehensive textbook/reference also takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene. These problems are also analyzed using statistical models and solved using rigorous engineering techniques Topics and features: structured to support active curricula and projectoriented courses, with tips in the Introduction for using the book in a variety of customized courses; presents exercises at the end of each chapter with a heavy emphasis on testing algorithms and containing numerous suggestions for small midterm projects; provides additional material and more detailed mathematical topics in the Appendices, which cover linear algebra, numerical techniques, and Bayesian estimation theory; suggests additional reading at the end of each chapter, including the latest research in each subfield, in addition to a full Bibliography at the end of the book; supplies supplementary course material for students at the associated website, http://szeliski.org/Book/. Suitable for an upperlevel undergraduate or graduatelevel course in computer science or engineering, this textbook focuses on basic techniques that work under realworld conditions and encourages students to push their creative boundaries. Its design and exposition also make it eminently suitable as a unique reference to the fundamental techniques and current research literature in computer vision.
4,146 citations
•
[...]
TL;DR: A scheme is developed for classifying the types of motion perceived by a humanlike robot and equations, theorems, concepts, clues, etc., relating the objects, their positions, and their motion to their images on the focal plane are presented.
Abstract: A scheme is developed for classifying the types of motion perceived by a humanlike robot. It is assumed that the robot receives visual images of the scene using a perspective system model. Equations, theorems, concepts, clues, etc., relating the objects, their positions, and their motion to their images on the focal plane are presented. >
2,000 citations
•
Microsoft^{1}
TL;DR: In this article, the basic motion models underlying alignment and stitching algorithms are described, and effective direct (pixelbased) and featurebased alignment algorithms, and blending algorithms used to produce seamless mosaics.
Abstract: This tutorial reviews image alignment and image stitching algorithms. Image alignment algorithms can discover the correspondence relationships among images with varying degrees of overlap. They are ideally suited for applications such as video stabilization, summarization, and the creation of panoramic mosaics. Image stitching algorithms take the alignment estimates produced by such registration algorithms and blend the images in a seamless manner, taking care to deal with potential problems such as blurring or ghosting caused by parallax and scene movement as well as varying image exposures. This tutorial reviews the basic motion models underlying alignment and stitching algorithms, describes effective direct (pixelbased) and featurebased alignment algorithms, and describes blending algorithms used to produce seamless mosaics. It ends with a discussion of open research problems in the area.
1,226 citations
••
TL;DR: MOPED, a framework for Multiple Object Pose Estimation and Detection that seamlessly integrates singleimage and multiimage object recognition and pose estimation in one optimized, robust, and scalable framework is presented.
Abstract: We present MOPED, a framework for Multiple Object Pose Estimation and Detection that seamlessly integrates singleimage and multiimage object recognition and pose estimation in one optimized, robust, and scalable framework. We address two main challenges in computer vision for robotics: robust performance in complex scenes, and low latency for realtime operation. We achieve robust performance with Iterative Clustering Estimation (ICE), a novel algorithm that iteratively combines feature clustering with robust pose estimation. Feature clustering quickly partitions the scene and produces object hypotheses. The hypotheses are used to further refine the feature clusters, and the two steps iterate until convergence. ICE is easy to parallelize, and easily integrates single and multicamera object recognition and pose estimation. We also introduce a novel object hypothesis scoring function based on Mestimator theory, and a novel pose clustering algorithm that robustly handles recognition outliers. We achieve scalability and low latency with an improved feature matching algorithm for large databases, a GPU/CPU hybrid architecture that exploits parallelism at all levels, and an optimized resource scheduler. We provide extensive experimental results demonstrating stateoftheart performance in terms of recognition, scalability, and latency in realworld robotic applications.
455 citations
Cites methods from "A general imaging model and a metho..."
...The techniques we consider are the generalized camera (Grossberg and Nayar 2001) and the pose averaging (Viksten et al....
[...]
••
18 Jun 2003TL;DR: The discrete structure from motion equations for generalized cameras is derived, and the corollaries to epipolar geometry are illustrated, which gives constraints on the optimal design of panoramic imaging systems constructed from multiple cameras.
Abstract: We illustrate how to consider a network of cameras as a single generalized camera in a framework proposed by Nayar (2001). We derive the discrete structure from motion equations for generalized cameras, and illustrate the corollaries to epipolar geometry. This formal mechanism allows one to use a network of cameras as if they were a single imaging device, even when they do not share a common center of projection. Furthermore, an analysis of structure from motion algorithms for this imaging model gives constraints on the optimal design of panoramic imaging systems constructed from multiple cameras.
323 citations
Cites background from "A general imaging model and a metho..."
...Many natural camera systems, including catadioptric systems made with conical mirrors and incorrectly aligned lenses in standard cameras, have a set of viewpoints well characterized by a caustic [25]....
[...]
...The main contribution of this paper is to express a multicamera system in this framework and then to derive the structure from motion constraint equations for this model....
[...]
References
More filters
•
[...]
01 Jan 1959
TL;DR: In this paper, the authors discuss various topics about optics, such as geometrical theories, image forming instruments, and optics of metals and crystals, including interference, interferometers, and diffraction.
Abstract: The book is comprised of 15 chapters that discuss various topics about optics, such as geometrical theories, image forming instruments, and optics of metals and crystals. The text covers the elements of the theories of interference, interferometers, and diffraction. The book tackles several behaviors of light, including its diffraction when exposed to ultrasonic waves.
19,815 citations
••
IBM^{1}
TL;DR: In this paper, a twostage technique for 3D camera calibration using TV cameras and lenses is described, aimed at efficient computation of camera external position and orientation relative to object reference coordinate system as well as the effective focal length, radial lens distortion, and image scanning parameters.
Abstract: A new technique for threedimensional (3D) camera calibration for machine vision metrology using offtheshelf TV cameras and lenses is described. The twostage technique is aimed at efficient computation of camera external position and orientation relative to object reference coordinate system as well as the effective focal length, radial lens distortion, and image scanning parameters. The twostage technique has advantage in terms of accuracy, speed, and versatility over existing state of the art. A critical review of the state of the art is given in the beginning. A theoretical framework is established, supported by comprehensive proof in five appendixes, and may pave the way for future research on 3D robotics vision. Test results using real data are described. Both accuracy and speed are reported. The experimental results are analyzed and compared with theoretical prediction. Recent effort indicates that with slight modification, the twostage calibration can be done in real time.
5,940 citations
•
03 Jan 1992
TL;DR: A new technique for threedimensional camera calibration for machine vision metrology using offtheshelf TV cameras and lenses using twostage technique has advantage in terms of accuracy, speed, and versatility over existing state of the art.
Abstract: A new technique for threedimensional (3D) camera calibration for machine vision metrology using offtheshelf TV cameras and lenses is described. The twostage technique is aimed at efficient computation of camera external position and orientation relative to object reference coordinate system as well as the effective focal length, radial lens distortion, and image scanning parameters. The twostage technique has advantage in terms of accuracy, speed, and versatility over existing state of the art. A critical review of the state of the art is given in the beginning. A theoretical framework is established, supported by comprehensive proof in five appendixes, and may pave the way for future research on 3D robotics vision. Test results using real data are described. Both accuracy and speed are reported. The experimental results are analyzed and compared with theoretical prediction. Recent effort indicates that with slight modification, the twostage calibration can be done in real time.
5,816 citations
••
01 Aug 1996TL;DR: This paper describes a sampled representation for light fields that allows for both efficient creation and display of inward and outward looking views, and describes a compression system that is able to compress the light fields generated by more than a factor of 100:1 with very little loss of fidelity.
Abstract: A number of techniques have been proposed for flying through scenes by redisplaying previously rendered or digitized views. Techniques have also been proposed for interpolating between views by warping input images, using depth information or correspondences between multiple images. In this paper, we describe a simple and robust method for generating new views from arbitrary camera positions without depth information or feature matching, simply by combining and resampling the available images. The key to this technique lies in interpreting the input images as 2D slices of a 4D function the light field. This function completely characterizes the flow of light through unobstructed space in a static scene with fixed illumination. We describe a sampled representation for light fields that allows for both efficient creation and display of inward and outward looking views. We hav e created light fields from large arrays of both rendered and digitized images. The latter are acquired using a video camera mounted on a computercontrolled gantry. Once a light field has been created, new views may be constructed in real time by extracting slices in appropriate directions. Since the success of the method depends on having a high sample rate, we describe a compression system that is able to compress the light fields we have generated by more than a factor of 100:1 with very little loss of fidelity. We also address the issues of antialiasing during creation, and resampling during slice extraction. CR Categories: I.3.2 [Computer Graphics]: Picture/Image Generation — Digitizing and scanning, Viewing algorithms; I.4.2 [Computer Graphics]: Compression — Approximate methods Additional keywords: imagebased rendering, light field, holographic stereogram, vector quantization, epipolar analysis
4,426 citations
•
[...]
TL;DR: Robot Vision as discussed by the authors is a broad overview of the field of computer vision, using a consistent notation based on a detailed understanding of the image formation process, which can provide a useful and current reference for professionals working in the fields of machine vision, image processing, and pattern recognition.
Abstract: From the Publisher:
This book presents a coherent approach to the fastmoving field of computer vision, using a consistent notation based on a detailed understanding of the image formation process. It covers even the most recent research and will provide a useful and current reference for professionals working in the fields of machine vision, image processing, and pattern recognition.
An outgrowth of the author's course at MIT, Robot Vision presents a solid framework for understanding existing work and planning future research. Its coverage includes a great deal of material that is important to engineers applying machine vision methods in the real world. The chapters on binary image processing, for example, help explain and suggest how to improve the many commercial devices now available. And the material on photometric stereo and the extended Gaussian image points the way to what may be the next thrust in commercialization of the results in this area.
Chapters in the first part of the book emphasize the development of simple symbolic descriptions from images, while the remaining chapters deal with methods that exploit these descriptions. The final chapter offers a detailed description of how to integrate a vision system into an overall robotics system, in this case one designed to pick parts out of a bin.
The many exercises complement and extend the material in the text, and an extensive bibliography will serve as a useful guide to current research.
Errata (164k PDF)
3,783 citations