scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Factorization methods for projective structure and motion

18 Jun 1996-pp 845-851
TL;DR: This paper describes a family of factorization-based algorithms that recover 3D projective structure and motion from multiple uncalibrated perspective images of 3D points and lines that can be viewed as generalizations of the Tomasi-Kanade algorithm from affine to fully perspective cameras, and from points to lines.
Abstract: This paper describes a family of factorization-based algorithms that recover 3D projective structure and motion from multiple uncalibrated perspective images of 3D points and lines. They can be viewed as generalizations of the Tomasi-Kanade algorithm from affine to fully perspective cameras, and from points to lines. They make no restrictive assumptions about scene or camera geometry, and unlike most existing reconstruction methods they do not rely on 'privileged' points or images. All of the available image data is used, and each feature in each image is treated uniformly. The key to projective factorization is the recovery of a consistent set of projective depths (scale factors) for the image points: this is done using fundamental matrices and epipoles estimated from the image data. We compare the performance of the new techniques with several existing ones, and also describe an approximate factorization method that gives similar results to SVD-based factorization, but runs much more quickly for large problems.

Summary (3 min read)

1 Introduction

  • There has been considerable progress on scene reconstruction from multiple images in the last few years, aimed at applications ranging from very precise industrial measurement systems with several fixed cameras, to approximate structure and motion from real time video for active robot navigation.
  • The key result is that projective reconstruction is the best that can be done without calibration or metric information about the scene, and that it is possible from at least two views of point-scenes or three views of line-scenes [2, 3, 8, 6].
  • These are exactly the missing factorization scales mentioned above.
  • However they also require a depth recovery phase that is not present in the affine case.
  • For matrices of fixed low rank r (as here, where the rank is 3 for the affine method or 4 for the projective one), approximate factorizations can be computed in time O(mnr), i.e. directly proportional to the size of the input data.

2 Point Reconstruction

  • Modulo some scale factors ip, the image points are projected from the world points: ip xip =PiXp.
  • The ’s ‘cancel out’ the arbitrary scales of the image points, but there is still the freedom to: (i) arbitrarily rescale each world pointXp and each projectionPi; (ii) apply an arbitrary nonsingular4 4 projective deformationT: Xp !.
  • PiT 1. Modulo changes of the ip, the image projections are invariant under both of these transformations.
  • The scale factors ip will be called projective depths.
  • In fact, [18, 19] argues that just as the key to calibrated stereo reconstruction is the recovery of Euclidean depth, the essence of projective reconstruction is precisely the recovery of a coherent set of projective depths modulo overall projection and world point rescalings.

2.1 Factorization

  • One practical method of factorizingW is the Singular Value Decomposition [12].
  • The decomposition is unique when the singular values are distinct, and can be computed stably and reliably in timeO(klmin(k; l)).
  • Ideally, one would like to find reconstructions in timeO(mn) (the size of the input data).
  • Rank r matrices can be factorized in ‘output sensitive’ time O(mnr).
  • The method repeatedly sweeps the matrix, at each sweep guessing and subtracting a column-vector that ‘explains’ as much as possible of the residual error in the matrix columns.

2.2 Projective Depth Recovery

  • The key technical advance that makes this work possible is a practical method for estimating these using fundamental matrices and epipoles obtained from the image data.
  • These turn out to be the expressions for the fundamental matrixFij and epipole eji of camera j in image i in terms of projection matrix components [19, 4].
  • The two methods give similar results except when there are many (>40) images, when the shorter chains of the parallel system become more robust.
  • Theoretically this is not a problem as the overall scales are arbitrary, but it could easily make the factorization phase numerically illconditioned.
  • For each the depths are estimated as above, and then: (i) each row of the estimated depth matrix is rescaled to have length pn; (ii) each column of the resulting matrix is rescaled to length pm.

3 Line Reconstruction

  • 3D lines can also be reconstructed using the above techniques.
  • In fact, epipolar transfer and depth recovery can be done in one step.
  • Let yi stand for the rescaled via pointsPiY.
  • The required fundamental matrices can not be found directly from line matches, but they can be estimated from point matches, or from the trilinear line matching constraints (trivalent tensor) [6, 14, 4, 19, 18].
  • This works with the 3m 2nlines ‘W’ matrix of via-points, iteratively rescaling all coordinates of each image (triple of rows) and all coordinates of each line (pair of columns) until an approximate equilibrium is reached, where the overall mean square size of each coordinate is O(1) in each case.

4 Implementation

  • This section summarizes the complete algorithm for factorization-based 3D projective reconstruction from image points and lines, and discusses a few important implementation details and variants.
  • Build and balance the depth matrix ip, and use it to build the rescaled point measurement matrixW.
  • 4) For each line choose two via-points and transfer them to the other images using the transfer equations (2).
  • 6) Un-standardize the projection matrices (see below).
  • The basic idea is to choose working coordinates that reflect the least squares trade-offs implicit in the factorization algorithm.

4.1 Generalizations & Variants

  • I have implemented and experimented with a number of variants of the above algorithm, the more promising of which are featured in the experiments described below.
  • The projective depths depend on the 3D structure, which in turn derives from the depths.
  • With SVD-based factorization and standardized image coordinates the iteration turns out to be extremely stable, and always improves the recovered structure slightly (often significantly for lines).
  • The ‘linear’ factorization-based projective reconstruction methods described above are a suitable starting point for more refined nonlinear least-squares estimation.
  • This can take account of image point error models, camera calibrations, or Euclidean constraints, as in the work of Szeliski and Kang [16], Hartley [5] and Mohr, Boufama and Brand [10].

5 Experiments

  • To quantify the performance of the various algorithms, I have run a large number of simulations using synthetic data, and also tested the algorithms on manually matched primitives derived from real images.
  • Reconstruction error is measured over 50 trials, after least-squares projective alignment with the true 3D structure.
  • Lines parallel trilinear SVD serial bilinear SVD parallel bilinear SVD iterative bilinear SVD bilinear SVD + L-M Figure 1: Mean 3D reconstruction error for points and lines, vs. noise, number of views and number of primitives.
  • Iterating the SVD makes a small improvement, and nonlinear least-squares is slightly more accurate again.
  • The rapid increase in error at scales below 0.1 is caused by floating-point truncation error.

6 Discussion & Conclusions

  • Within the limitations of the factorization paradigm, factorization-based projective reconstruction seems quite successful.
  • For points, the methods studied have proved simple, stable, and surprisingly accurate.
  • Fixed-rank factorization works well, although (as might be expected) SVD always produces slightly more accurate results.
  • All of these allow various trade-offs between redundancy, computation and implementation effort.
  • Projective structure and motion can be recovered from multiple perspective images of a scene consisting of points and lines, by estimating fundamental matrices and epipoles from the image data, using these to rescale the image measurements, and then factorizing the resulting rescaled measurement matrix using either SVD or a fast approximate factorization algorithm, also known as Summary.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

HAL Id: inria-00548364
https://hal.inria.fr/inria-00548364
Submitted on 20 Dec 2010
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Factorization Methods for Projective Structure and
Motion
Bill Triggs
To cite this version:
Bill Triggs. Factorization Methods for Projective Structure and Motion. International Conference
on Computer Vision & Pattern Recognition (CVPR ’96), Jun 1996, San Francisco, United States.
pp.845–851, �10.1109/CVPR.1996.517170�. �inria-00548364�

Factorization Methods for Projective Structure and Motion
Bill Triggs
INRIA Rh
ˆ
one-Alpes,
655 avenue de l’Europe, 38330 Montbonnot Saint-Martin, France.
Bill.Triggs@inrialpes.fr http://www.inrialpes.fr/MOVI/Triggs
Abstract
This paper describes a family of factorization-based algorithms that
recover 3D projective structure and motion from multiple uncali-
brated perspective images of 3D points and lines. They can be viewed
as generalizations of the Tomasi-Kanade algorithm from affine to
fully perspective cameras, and from points to lines. They make no
restrictive assumptions about scene or camera geometry, and unlike
most existing reconstruction methods they do not rely on ‘privileged’
points or images. All of the available imagedata is used, and each fea-
ture in each image is treateduniformly. The key to projective factoriz-
ation is the recovery of a consistent set of projective depths (scale fac-
tors) forthe image points: this is done using fundamental matricesand
epipoles estimatedfromthe image data. We compare the performance
of the new techniques with several existing ones, and also describe an
approximate factorization method that gives similar results to SVD-
based factorization, but runs much more quickly for large problems.
Keywords: Multi-image Structure, Projective Reconstruction, Ma-
trix Factorization.
1 Introduction
There has been considerable progress on scene reconstruction
from multiple images in the last few years, aimed at applica-
tions ranging from very precise industrial measurement sys-
tems with several xed cameras, to approximate structure and
motion from real time video for active robot navigation. One
can usefully begin by ignoring the issues of camera calibra-
tion andmetricstructure, initially recoveringthescene up to an
overall projective transformation and only later adding metric
information if needed [5, 10, 1]. The key result is that projec-
tive reconstruction is the best that can be done without calibra-
tion or metric information about the scene, and that it is pos-
sible from at least two views of point-scenes or three views of
line-scenes [2, 3, 8, 6].
Most current reconstruction methods either work only for
the minimal number of views (typically two), or single out a
few ‘privileged’ views for initialization before bootstrapping
themselves to the multi-view case [5, 10, 9]. For robustness
and accuracy, there is a need for methods that uniformly take
To appear in CVPR’96. This work was supported by an EC HCM grant and
INRIA Rhˆone-Alpes. I would like to thank Peter Sturm and Richard Hartley
for enlightening discussions.
account of all the data in all the images, without making re-
strictive special assumptions or relying on privileged features
or images for initialization. The orthographicand paraperspec-
tive structure/motionfactorizationmethods of Tomasi, Kanade
and Poelman [17, 11] partially fulfill these requirements, but
they only apply when the camera projections are well approx-
imated by affine mappings. This happens only for cameras
viewingsmall, distant scenes, which is seldom the case in prac-
tice. Factorization methods for perspective images are needed,
however it has not been clear how to find the unknown projec-
tive scale factors of the image measurements that are required
for this. (In the affine case the scales are constant and can be
eliminated).
As part of the current blossoming of interest in multi-
imagereconstruction,Shashua[14]recentlyextendedthewell-
known two-image epipolar constraint to a trilinear constraint
between matching points in three images. Hartley [6] showed
that this constraint also applies to lines in three images, and
Faugeras & Mourrain [4] and I [18, 19] completed that cor-
ner of the puzzle by systematically studying the constraints for
lines and points in any number of images. A key aspect of the
viewpointpresentedin [18, 19] is that projectivereconstruction
is essentially a matter of recovering a coherent set of projec-
tive depths projective scale factors that represent the depth
informationlostduringimageprojection. These are exactlythe
missing factorization scales mentioned above. They satisfy a
set of consistencyconditionscalled ‘joint image reconstruction
equations’ [18], that link them together via the corresponding
image point coordinates and the various inter-image matching
tensors.
In the MOVI group, we have recently been developing pro-
jective structure and motion algorithms based on this ‘projec-
tive depth’ picture. Several of these methods use the factoriz-
ation paradigm, and so can be viewed as generalizations of the
Tomasi-Kanade method from affine to fully perspective pro-
jections. However they also require a depth recovery phase
that is not present in the affine case. The basic reconstruction
method for point images was introduced in [15]. The current
paper extends this in several directions, and presents a detailed
assessment of the performanceof the new methods in compar-
ison to existing techniques such as Tomasi-Kanade factoriz-
ation and Levenberg-Marquardt nonlinear least squares. Per-
1

haps the most significant result in the paper is the extension
of the method to work for lines as well as points, but I will
also show how the factorization can be iteratively polished’
(with results similar to nonlinear least squares iteration), and
how any factorization-based method can be speeded up signif-
icantly for large problems, by using an approximate fixed-rank
factorization technique in place of the Singular Value Decom-
position.
The factorization paradigm has two key attractions that are
onlyenhancedbymovingfromtheaffinetotheprojectivecase:
(i) All of the data in all of the images is treated uniformly
there is no need to single out ‘privileged’ features or images
for special treatment; (ii) No initialization is required and con-
vergence is virtually guaranteed by the nature of the numerical
methods used. Factorization also has some well known disad-
vantages:
1) Every primitive must be visible in every image. This is un-
realistic in practice givenocclusionand extractionand tracking
failures.
2) It is not possible to incorporate a full statistical error model
for the image data, althoughsome sort of implicit least-squares
trade-off is made.
3) It is not clear how to incorporate additional points or im-
ages incrementally: the whole calculation must be redone.
4) SVD-based factorization is slow for large problems.
Only the speed problem will be considered here. SVD is
slow because it was designed for general, full rank matrices.
For matrices of fixed low rank
r
(as here, where the rank is 3
for the affine method or 4 for the projective one), approximate
factorizations can be computed in time
O
(
mnr
)
, i.e. directly
proportional to the size of the input data.
The Tomasi-Kanade ‘hallucination’ process can be used to
work around missing data [17], as in the affine case. How-
ever this greatly complicates the method and dilutes some of
its principal benefits. There is no obvious solution to the error
modelling problem, beyond using the factorization to initialize
a nonlinear least squares routine (as is done in some of the ex-
periments below). It would probablybe possible to develop in-
cremental factorization update methods, although there do not
seem to be any in the standard numerical algebra literature.
The rest of the paper outlines the theory of projective fac-
torization for points and lines, describes the final algorithms
and implementation, reportsonexperimentalresults using syn-
thetic and real data, and concludes with a discussion. The
full theory of projective depth recovery applies equally to two,
three and four image matching tensors, but throughout this pa-
per I will concentrate on the two-image (fundamental matrix)
case for simplicity. The underlying theory for the higher va-
lency cases can be found in [18].
2 Point Reconstruction
We need to recover 3D structure (point locations) and mo-
tion (camera calibrations and locations) from
m
uncalibrated
perspective images of a scene containing
n
3D points. With-
out further information it is only possible to reconstruct the
scene up to an overall projective transformation [2, 8], so we
willworkin homogeneouscoordinateswith respect to arbitrary
projectivecoordinateframes. Let
X
p
(
p
= 1
; : : : ; n
) be the un-
known homogeneous 3D point vectors,
P
i
(
i
= 1
; : : : ; m
) the
unknown
3
4
image projections,and
x
ip
the measured homo-
geneous image point vectors. Modulo some scale factors
ip
,
theimagepointsareprojectedfromtheworldpoints:
ip
x
ip
=
P
i
X
p
. Each object is defined only up to rescaling. The
s
‘cancel out’ the arbitrary scales of the image points, but there
is still the freedom to: (i) arbitrarily rescale each world point
X
p
and each projection
P
i
; (ii) apply an arbitrary nonsingular
4
4
projective deformation
T
:
X
p
!
TX
p
,
P
i
!
P
i
T
1
.
Modulo changes of the
ip
, the image projectionsare invariant
under both of these transformations.
The scale factors
ip
will be called projectivedepths. With
correctly normalized points and projections they become true
optical depths, i.e. orthogonal distances from the focal planes
of the cameras. (NB: this is not the same as Shashua’s ‘projec-
tive depth’ [13]). In general,
m
+
n
1
projective depths can
be set arbitrarily by choosing appropriatescales forthe
X
p
and
P
i
. However, once this is done the remaining
(
m
1)(
n
1)
degrees of freedom contain real information that can be used
for 3D reconstruction: taken as a whole the projective depths
have a strong internal coherence. In fact, [18, 19] argues that
just as the key to calibrated stereo reconstruction is the recov-
ery of Euclidean depth, the essence of projective reconstruc-
tion is precisely the recovery of a coherent set of projective
depths modulo overall projection and world point rescalings.
Once this is done, reconstruction reduces to choosinga projec-
tive basis for a certain abstract three dimensional ‘joint image’
subspace, and reading off point coordinates with respect to it.
2.1 Factorization
Gather the point projections into a single
3
m
n
matrix equa-
tion:
W
0
B
B
B
@
11
x
11
12
x
12
1n
x
1n
21
x
21
22
x
22
2n
x
2n
.
.
.
.
.
.
.
.
.
.
.
.
m1
x
m1
m2
x
m2
mn
x
mn
1
C
C
C
A
=
0
B
B
B
@
P
1
P
2
.
.
.
P
m
1
C
C
C
A
X
1
X
2
X
n
Hence, with a consistent set of projective depths the rescaled
measurement matrix
W
has rank at most 4. Any rank 4 ma-
2

trix can be factorized into some
3
m
4
matrix of ‘projections’
multiplying a
4
n
matrix of ‘points’ as shown, and any such
factorization corresponds to a valid projective reconstruction:
the freedom in factorization is exactly a
4
4
nonsingular lin-
ear transformation
P
!
P T
1
,
X
!
T X
, which can be re-
garded as a projective transformation of the reconstructed 3D
space.
One practical methodof factorizing
W
is the Singular Value
Decomposition [12]. This decomposes an arbitrary
k
l
ma-
trix
W
k
l
of rank
r
into a product
W
k
l
=
U
k
r
D
r
r
V
>
l
r
,
where the columns of
V
l
r
and
U
k
r
are orthonormal bases
for the input (co-kernel) and output (range) spaces of
W
k
l
,
and
D
r
r
is a diagonal matrix of positive decreasing ‘singular
values’. Thedecompositionis uniquewhenthe singular values
are distinct, and can be computed stably and reliably in time
O
(
k l
min(
k ; l
))
. The matrix
D
of singular values can be ab-
sorbed into either
U
or
V
to give a decomposition of the pro-
jection/point form
PX
. (I absorb it into
V
to form
X
).
The SVD has been used by Tomasi, Kanade and Poel-
man [17, 11] for their affine (orthographic and paraperspec-
tive) reconstruction techniques. The currentapplication can be
viewed as a generalization of these methods to projective re-
construction. The projective case leads to slightly larger ma-
trices (
3
m
n
rank 4 as opposed to
2
m
n
rank 3), but is
actually simpler than the affine case as there is no need to sub-
tract translation terms orapply nonlinear constraints to guaran-
tee the orthogonality of the projection matrices.
Ideally, one would like to find reconstructions in time
O
(
mn
)
(the size of the input data). SVD is a factor of
O
(min(3
m; n
))
slower than this, which can be significant if
there are many points and images. Although SVD is proba-
bly near-optimal for full-rank matrices, rank
r
matrices can be
factorized in ‘output sensitive’ time
O
(
mnr
)
. I have experi-
mented with one such ‘fixed rank’ method, and find it to be al-
most as accurate as SVD and significantly faster for large prob-
lems. The method repeatedlysweepsthe matrix, at each sweep
guessing and subtracting a column-vector that ‘explains’ as
much as possible of the residual error in the matrix columns.
A rank
r
matrix is factorized in
r
sweeps. When the matrix is
not exactly of rank
r
the guesses are not quite optimal and it is
useful to includefurther sweeps (say
2
r
in total) and then SVD
the matrix of extracted columns to estimate the best
r
combi-
nations of them.
2.2 Projective Depth Recovery
The above factorization techniques can only be used if a self-
consistent set of projective depths
ip
can be found. The key
technical advance that makes this work possible is a practical
method for estimating these using fundamental matrices and
epipoles obtained from the image data. The full theory can be
found in [18], which also describes how to use trivalent and
quadrivalent matching tensors for depth recovery. Here we
briefly sketch the fundamental matrix case. The image projec-
tions
ip
x
ip
=
P
i
X
p
imply that the
6
5
matrix
P
i
ip
x
ip
P
j
jp
x
jp
=
P
i
P
j
I
4
4
X
p
has rank at most 4, so all of its
5
5
minors vanish. Expand-
ing by cofactors in the last column gives homogeneous linear
equations in the components of
ip
x
ip
and
jp
x
jp
, with coef-
ficients that are
4
4
determinants of projection matrix rows.
These turn out to be the expressions for the fundamentalmatrix
F
ij
and epipole
e
ji
of camera
j
in image
i
in terms of projection
matrix components [19, 4]. The result is the projective depth
recovery equation:
(
F
ij
x
jp
)
jp
= (
e
ji
^
x
ip
)
ip
(1)
This says two things: (i) The epipolar line of
x
jp
in image
i
is
the same as the line through the corresponding point
x
ip
and
epipole
e
ji
(as is well known); (ii) With the correct projective
depths and scalings for
F
ij
and
e
ji
, the two terms have exactly
the same size. The equality is exact, not just up to scale. This is
the new result that allows us to recover projective depths using
fundamental matrices and epipoles. Analogous results based
on higher order matching tensors can be found in [18].
It is straightforward to recover projective depths using (1).
Each instance of it linearly relates the depths of a single 3D
point in two images. By estimating a sufficient number of fun-
damental matrices and epipoles, we can amass a system of ho-
mogeneous linear equations that allows the complete set of
depths for a given point to be found, up to an arbitrary over-
all scale factor. At a minimum, this can be done by selecting
any set of
m
1
equations that link the
m
images into a single
connected graph. With such a non-redundant set of equations
the depths for each point
p
can be found trivially by chaining
together the solutions for each image, starting from some arbi-
trary initial value such as
1p
= 1
. Solving the depth recovery
equation in least squares gives a simple recursion relation for
ip
in terms of
jp
:
ip
:=
(
e
ji
^
x
ip
)
(
F
ij
x
jp
)
k
e
ji
^
x
ip
k
2
jp
If additional depth recovery equations are used, this simple re-
cursion must be replacedby a redundant(and hence potentially
more robust) homogeneous linear system. However, care is
needed. The depth recovery equationsare sensitive to the scale
factors chosen for the
F
s and
e
s, and these can not be recov-
ereddirectlyfromthe image data. This is irrelevantwhen a sin-
gle chain of equations is used, as rescalings of
F
and
e
affect
all points equally and hence amount to rescalings of the corre-
sponding projection matrices. However with redundant equa-
tions it is essential to choose a mutually self-consistent set of
scales for the
F
s and
e
s. I will not describe this process here,
except to note that the consistency condition is the Grassmann
identity
F
kj
e
ij
=
e
ik
^
e
jk
[18].
It is still unclear what the best trade-off between economy
and robustnessis fordepth recovery. This paperconsidersonly
3

two simple non-redundantchoices: either the images are taken
pairwiseinsequence,
F
21
;
F
32
; : : : ;
F
m m
1
, or all subsequent
images are scaled in parallel from the first,
F
21
;
F
31
; : : : ;
F
m1
.
It might seem that long chains of rescalings would prove nu-
merically unstable, but in practice depth recovery is surpris-
ingly well conditioned. Both serial and parallel chains work
very well despite their non-redundancyand chain length or re-
liance on a ‘key’ image. The two methods give similar results
except when there are many (
>
40) images, when the shorter
chainsof theparallelsystembecomemorerobust. Both are sta-
ble even when epipolarpoint transferis ill-conditioned (e.g. for
a camera moving in a straight line, when the epipolar lines of
different images coincide): the image observations act as sta-
ble ‘anchors’ for the transfer process.
Balancing: A further point is that with arbitrary choices of
scale for the fundamental matrices and epipoles, the average
size of the recovered depths might tend to increase or decrease
exponentially during the solution-chaining process. Theoret-
ically this is not a problem as the overall scales are arbitrary,
but it couldeasilymake the factorization phase numerically ill-
conditioned. To counter this the recovered matrix of projec-
tive depths must be balanced after it has been built, by judi-
cious overall row and column rescalings. The process is very
simple. The image points are normalized on input, so ideally
all of the scale factors
ip
should have roughly the same or-
der of magnitude,
O
(1)
say. For each point the depths are esti-
matedas above, andthen: (i) each row (image) of theestimated
depth matrix is rescaled to have length
p
n
; (ii) each column
(point) of the resulting matrix is rescaled to length
p
m
. This
process is repeated until it roughly converges, which happens
very quickly (within 2–3 iterations).
3 Line Reconstruction
3D lines can also be reconstructed using the above techniques.
A line
L
can be represented by any two 3D points lying on it,
say
Y
and
Z
. In image
i
,
L
projects to some image line
l
i
and
Y
and
Z
project to image points
y
i
and
z
i
lying on
l
i
. The
points
f
y
i
j
i
= 1
; : : : ; m
g
are in epipolar correspondence, so
they can be used in the depth recovery equation (1) to recon-
struct
Y
, and similarly for
Z
. The representatives
Y
and
Z
can
be fixed implicitly by choosing
y
1
and
z
1
arbitrarilyon
l
1
in the
first image, and using the epipolar constraint to transfer these
to the correspondingpoints in the remaining images:
y
i
lies on
both
l
i
and the epipolar line of
y
1
, so is located at their inter-
section.
In fact, epipolar transfer and depth recovery can be done in
one step. Let
y
i
stand for the rescaled via points
P
i
Y
. Substi-
tute these into equation (1), cross-product with
l
i
, expand, and
simplify using
l
i
y
i
= 0
:
l
i
^
(
F
ij
y
j
) =
l
i
^
(
e
ji
^
y
i
)
=
(
l
i
e
ji
)
y
i
+ (
l
i
y
i
)
e
ji
=
(
l
i
e
ji
)
y
i
(2)
Up to a factor of
l
i
e
ji
, the intersection
l
i
^
(
F
ij
y
j
)
of
l
i
with the epipolarline of
y
j
automatically gives the correct pro-
jective depth for reconstruction. Hence, factorization-based
line reconstruction can be implemented by choosing a suitable
(widely spaced) pair of via-points on each line in the first im-
age, and then chaining together instances of equation (2) to
find the corresponding, correctly scaled via-points in the other
images. The required fundamental matrices can not be found
directly from line matches, but they can be estimated from
point matches, or from the trilinear line matching constraints
(trivalent tensor) [6, 14, 4, 19, 18]. Alternatively, the triva-
lent tensor can be used directly: in tensorial notation [18], the
trivalent via-point transfer equation is
l
B
k
G
C
j
A
i
B
k
y
C
j
=
(l
B
k
e
B
k
j
)
y
A
i
.
As with points, redundant equations may be included if and
only if a self-consistent normalization is chosen for the funda-
mental matrices and epipoles. For numerical stability, it is es-
sential to balance the resulting via-points(i.e. depthestimates).
This works with the
3
m
2
n
lines
W
matrix of via-points,
iteratively rescaling all coordinates of each image (triple of
rows) and all coordinates of each line (pair of columns) until
an approximateequilibrium is reached,where the overallmean
square size of each coordinate is
O
(1)
in each case. To ensure
that the via-points representing each line are on average well
separated, I also orthonormalize the two
3
m
-component col-
umn vectors for each line with respect to one another. The via-
point equations (2) are linear and hence invariant with respect
to this, but it does of course change the 3D representatives
Y
and
Z
recovered for each line.
4 Implementation
This section summarizes the complete algorithm
for factorization-based 3D projective reconstruction from im-
age points and lines, and discusses a few important implemen-
tation details and variants. The algorithm goes as follows:
0) Extract and match points and lines across all images.
1) Standardize all image coordinates (see below).
2) Estimate a set of fundamental matrices and epipoles suffi-
cienttochainalltheimagestogether(e.g. using pointmatches).
3) For each point, estimate the projective depths using equa-
tion (1). Build and balance the depth matrix
ip
, and use it to
build the rescaled point measurement matrix
W
.
4) For each line choose two via-pointsandtransferthemto the
other images using the transfer equations (2). Build and bal-
ance the rescaled line via-point matrix.
5) Combine the line and point measurement matrices into a
3
m
(
n
points
+ 2
n
lines
)
data matrix and factorize it using either
SVD or the fixed-rank method. Recover 3D projective struc-
ture (point and via-point coordinates) and motion (projection
matrices) from the factorization.
6) Un-standardize the projection matrices (see below).
4

Citations
More filters
01 Jan 2001
TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.
Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

14,282 citations

Book
30 Sep 2010
TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.
Abstract: Humans perceive the three-dimensional structure of the world with apparent ease. However, despite all of the recent advances in computer vision research, the dream of having a computer interpret an image at the same level as a two-year old remains elusive. Why is computer vision such a challenging problem and what is the current state of the art? Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which students can apply to their own personal photos and videos. More than just a source of recipes, this exceptionally authoritative and comprehensive textbook/reference also takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene. These problems are also analyzed using statistical models and solved using rigorous engineering techniques Topics and features: structured to support active curricula and project-oriented courses, with tips in the Introduction for using the book in a variety of customized courses; presents exercises at the end of each chapter with a heavy emphasis on testing algorithms and containing numerous suggestions for small mid-term projects; provides additional material and more detailed mathematical topics in the Appendices, which cover linear algebra, numerical techniques, and Bayesian estimation theory; suggests additional reading at the end of each chapter, including the latest research in each sub-field, in addition to a full Bibliography at the end of the book; supplies supplementary course material for students at the associated website, http://szeliski.org/Book/. Suitable for an upper-level undergraduate or graduate-level course in computer science or engineering, this textbook focuses on basic techniques that work under real-world conditions and encourages students to push their creative boundaries. Its design and exposition also make it eminently suitable as a unique reference to the fundamental techniques and current research literature in computer vision.

4,146 citations


Cites methods from "Factorization methods for projectiv..."

  • ...…(Section 7.3) were developed to solve efficiently problems for which orthographic camera approximations were applicable (Figure 1.9a) (Tomasi and Kanade 1992; Poelman and Kanade 1997; Anandan and Irani 2002) and then later extended to the perspective case (Christy and Horaud 1996; Triggs 1996)....

    [...]

Book
01 Jan 2008
TL;DR: A novel algorithm for recovering a smooth manifold of unknown dimension and topology from a set of points known to belong to it is presented and it can easily be applied when the ambient space is not Euclidean, which is important in many applications.
Abstract: We present a novel algorithm for recovering a smooth manifold of unknown dimension and topology from a set of points known to belong to it. Numerous applications in computer vision can be naturally interpreted as instanciations of this fundamental problem. Recently, a non-iterative discrete approach, tensor voting, has been introduced to solve this problem and has been applied successfully to various applications. As an alternative, we propose a variational formulation of this problem in the continuous setting and derive an iterative algorithm which approximates its solutions. This method and tensor voting are somewhat the differential and integral form of one another. Although iterative methods are slower in general, the strength of the suggested method is that it can easily be applied when the ambient space is not Euclidean, which is important in many applications. The algorithm consists in solving a partial differential equation that performs a special anisotropic diffusion on an implicit representation of the known set of points. This results in connecting isolated neighbouring points. This approach is very simple, mathematically sound, robust and powerful since it handles in a homogeneous way manifolds of arbitrary dimension and topology, embedded in Euclidean or non-Euclidean spaces, with or without border. We shall present this approach and demonstrate both its benefits and shortcomings in two different contexts: (i) data visual analysis, (ii) skin detection in color images.

1,098 citations


Cites background or methods from "Factorization methods for projectiv..."

  • ...698 P. David, D. DeMenthon, R. Duraiswami, H. Samet A Pseudo-Metric for Weighted Point Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....

    [...]

  • ...Furthermore, consider the case of two isolated just distinguishable points in R located at coordinates 0 and σ. Setting c3 = 0, because the smoothing must be negligible with respect to the other effects, the equilibrium condition writes c2(u− 1)− u′′ = 0, which is a linear differential equation of the second order....

    [...]

  • ...Level Set [14,13,19] and variational methods are increasingly considered by the vision community [17]....

    [...]

  • ...Kalitzin Behrooz Kamgar-Parsi Kenichi Kanatani Danny Keren Erwan Kerrien Charles Kervrann Renato Keshet Ali Khamene Shamim Khan Nahum Kiryati Reinhard Koch Ullrich Koethe Esther B. Koller-Meier John Krumm Hannes Kruppa Murat Kunt Prasun Lala Michael Langer Ivan Laptev Jean-Pierre Le Cadre Bastian Leibe Ricahrd Lengagne Vincent Lepetit Thomas Leung Maxime Lhuillier Weiliang Li David Liebowitz Georg Lindgren David Lowe John MacCormick Henrik Malm Roberto Manduchi Petros Maragos Eric Marchand Jiri Matas Bogdan Matei Esther B. Meier Jason Meltzer Etienne Mémin Rudolf Mester Ross J. Micheals Anurag Mittal Hiroshi Mo William Moran Greg Mori Yael Moses Jane Mulligan Don Murray Masahide Naemura Kenji Nagao Mirko Navara Shree Nayar Oscar Nestares Bernd Neumann Jeffrey Ng Tat Hieu Nguyen Peter Nillius David Nister Alison Noble Tom O’Donnell Takayuki Okatani Nuria Olivier Ole Fogh Olsen Magnus Oskarsson Nikos Paragios Ioannis Patras Josef Pauli Shmuel Peleg Robert Pless Swaminathan Rahul Deva Ramanan Lionel Reveret Dario Ringach Ruth Rosenholtz Volker Roth Payam Saisan Garbis Salgian Frank Sauer Peter Savadjiev Silvio Savarese Harpreet Sawhney Frederik Schaffalitzky Yoav Schechner Chrostoph Schnoerr Stephan Scholze Ali Shahrokri Doron Shaked Eitan Sharon Eli Shechtman Jamie Sherrah Akinobu Shimizu Ilan Shimshoni Kaleem Siddiqi Hedvig Sidenbladh Robert Sim Denis Simakov Philippe Simard Eero Simoncelli Nir Sochen Yang Song Andreas Soupliotis Sven Spanne Martin Spengler Alon Spira Thomas Strömberg Richard Szeliski Hai Tao Huseyin Tek Seth Teller Paul Thompson Jan Tops Benjamin J. Tordoff Kentaro Toyama Tinne Tuytelaars Shimon Ullman Richard Unger Raquel Urtasun Sven Utcke Luca Vacchetti Anton van den Hengel Geert Van Meerbergen X Organization Pierre Vandergheynst Zhizhou Wang Baba Vemuri Frank Verbiest Maarten Vergauwen Jaco Vermaak Mike Werman David Vernon Thomas Vetter Rene Vidal Michel Vidal-Naquet Marta Wilczkowiak Ramesh Visvanathan Dan Witzner Hansen Julia Vogel Lior Wolf Bob Woodham Robert J. Woodham Chenyang Xu Yaser Yacoob Anthony Yezzi Ramin Zabih Hugo Zaragoza Lihi Zelnik-Manor Ying Zhu Assaf Zomet Table of Contents, Part II Surface Geometry A Variational Approach to Recovering a Manifold from Sample Points . . . . . . . . . 3 J. Gomes, A. Mojsilovic A Variational Approach to Shape from Defocus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 H. Jin, P. Favaro Shadow Graphs and Surface Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Y. Yu, J.T. Chang Specularities Reduce Ambiguity of Uncalibrated Photometric Stereo . . . . . . . . . . ....

    [...]

  • ...This is mainly due to the intrinsic formulation of geometric constraints in projective geometry and a better understanding of numerical and statistical properties of geometric estimation [19,42]....

    [...]

Proceedings ArticleDOI
17 Jun 1997
TL;DR: The author describes a new method for camera autocalibration and scaled Euclidean structure and motion, from three or more views taken by a moving camera with fixed but unknown intrinsic parameters, based on a general constrained optimization technique-sequential quadratic programming-that may well be useful in other vision problems.
Abstract: The author describes a new method for camera autocalibration and scaled Euclidean structure and motion, from three or more views taken by a moving camera with fixed but unknown intrinsic parameters. The motion constancy of these is used to rectify an initial projective reconstruction. Euclidean scene structure is formulated in terms of the absolute quadric-the singular dual 3D quadric (4/spl times/4 rank 3 matrix) giving the Euclidean dot-product between plane normals. This is equivalent to the traditional absolute conic but simpler to use. It encodes both affine and Euclidean structure, and projects very simply to the dual absolute image conic which encodes camera calibration. Requiring the projection to be constant gives a bilinear constraint between the absolute quadric and image conic, from which both can be recovered nonlinearly from m/spl ges/3 images, or quasi-linearly from m/spl ges/4. Calibration and Euclidean structure follow easily. The nonlinear method is stabler, faster, more accurate and more general than the quasi-linear one. It is based on a general constrained optimization technique-sequential quadratic programming-that may well be useful in other vision problems.

514 citations

Journal ArticleDOI
TL;DR: Results show that the triangulation algorithm outperforms standard linear and bias-corrected quasi-linear algorithms, and that bundle adjustment using the orthonormal representation yields results similar to the standard maximum likelihood trifocal tensor algorithm, while being usable for any number of views.

306 citations

References
More filters
Book
31 Jan 1986
TL;DR: Numerical Recipes: The Art of Scientific Computing as discussed by the authors is a complete text and reference book on scientific computing with over 100 new routines (now well over 300 in all), plus upgraded versions of many of the original routines, with many new topics presented at the same accessible level.
Abstract: From the Publisher: This is the revised and greatly expanded Second Edition of the hugely popular Numerical Recipes: The Art of Scientific Computing. The product of a unique collaboration among four leading scientists in academic research and industry, Numerical Recipes is a complete text and reference book on scientific computing. In a self-contained manner it proceeds from mathematical and theoretical considerations to actual practical computer routines. With over 100 new routines (now well over 300 in all), plus upgraded versions of many of the original routines, this book is more than ever the most practical, comprehensive handbook of scientific computing available today. The book retains the informal, easy-to-read style that made the first edition so popular, with many new topics presented at the same accessible level. In addition, some sections of more advanced material have been introduced, set off in small type from the main body of the text. Numerical Recipes is an ideal textbook for scientists and engineers and an indispensable reference for anyone who works in scientific computing. Highlights of the new material include a new chapter on integral equations and inverse methods; multigrid methods for solving partial differential equations; improved random number routines; wavelet transforms; the statistical bootstrap method; a new chapter on "less-numerical" algorithms including compression coding and arbitrary precision arithmetic; band diagonal linear systems; linear algebra on sparse matrices; Cholesky and QR decomposition; calculation of numerical derivatives; Pade approximants, and rational Chebyshev approximation; new special functions; Monte Carlo integration in high-dimensional spaces; globally convergent methods for sets of nonlinear equations; an expanded chapter on fast Fourier methods; spectral analysis on unevenly sampled data; Savitzky-Golay smoothing filters; and two-dimensional Kolmogorov-Smirnoff tests. All this is in addition to material on such basic top

12,662 citations

Book
01 Jan 1989

12,473 citations


"Factorization methods for projectiv..." refers methods in this paper

  • ...One practical method of factorizing 4 is the Singular Value Decomposition [12]....

    [...]

Journal ArticleDOI

11,285 citations


"Factorization methods for projectiv..." refers methods in this paper

  • ...The standard workhorse for such problems is Levenberg-Marquardt iteration [12], so for comparison with the linear methods I have implemented simple L-M based projective reconstruction algorithms....

    [...]

  • ...One practical method of factorizingW is the Singular Value Decomposition [12]....

    [...]

  • ...Per- in ria -0 05 48 36 4, v er si on 1 - 20 D ec 2 01 0 Author manuscript, published in "International Conference on Computer Vision & Pattern Recognition (CVPR '96) (1996) 845--851" haps the most significant result in the paper is the extension of the method to work for lines as well as points, but I will also show how the factorization can be iteratively ‘polished’ (with results similar to nonlinear least squares iteration), and how any factorization-based method can be speeded up significantly for large problems, by using an approximate fixed-rank factorization technique in place of the Singular Value Decomposition....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the singular value decomposition (SVDC) technique is used to factor the measurement matrix into two matrices which represent object shape and camera rotation respectively, and two of the three translation components are computed in a preprocessing stage.
Abstract: Inferring scene geometry and camera motion from a stream of images is possible in principle, but is an ill-conditioned problem when the objects are distant with respect to their size. We have developed a factorization method that can overcome this difficulty by recovering shape and motion under orthography without computing depth as an intermediate step. An image stream can be represented by the 2FxP measurement matrix of the image coordinates of P points tracked through F frames. We show that under orthographic projection this matrix is of rank 3. Based on this observation, the factorization method uses the singular-value decomposition technique to factor the measurement matrix into two matrices which represent object shape and camera rotation respectively. Two of the three translation components are computed in a preprocessing stage. The method can also handle and obtain a full solution from a partially filled-in measurement matrix that may result from occlusions or tracking failures. The method gives accurate results, and does not introduce smoothing in either shape or motion. We demonstrate this with a series of experiments on laboratory and outdoor image streams, with and without occlusions.

2,696 citations

Book ChapterDOI
19 May 1992
TL;DR: It is shown, using experiments with noisy data, that it is possible to calibrate a camera just by pointing it at the environment, selecting points of interest and then tracking them in the image as the camera moves.
Abstract: The problem of finding the internal orientation of a camera (camera calibration) is extremely important for practical applications. In this paper a complete method for calibrating a camera is presented. In contrast with existing methods it does not require a calibration object with a known 3D shape. The new method requires only point matches from image sequences. It is shown, using experiments with noisy data, that it is possible to calibrate a camera just by pointing it at the environment, selecting points of interest and then tracking them in the image as the camera moves. It is not necessary to know the camera motion.

1,021 citations


"Factorization methods for projectiv..." refers background in this paper

  • ...The key result is that projective reconstruction is the best that can be done without calibration or metric information about the scene, and that it is possible from at least two views of point-scenes or three views of line-scenes [2, 3, 8, 6]....

    [...]

Frequently Asked Questions (13)
Q1. What contributions have the authors mentioned in the paper "Factorization methods for projective structure and motion" ?

This paper describes a family of factorization-based algorithms that recover 3D projective structure and motion from multiple uncalibrated perspective images of 3D points and lines. The key to projective factorization is the recovery of a consistent set of projective depths ( scale factors ) for the image points: this is done using fundamental matrices and epipoles estimated from the image data. The authors compare the performance of the new techniques with several existing ones, and also describe an approximate factorization method that gives similar results to SVDbased factorization, but runs much more quickly for large problems. 

Future work will expand on this. Summary: Projective structure and motion can be recovered from multiple perspective images of a scene consisting of points and lines, by estimating fundamental matrices and epipoles from the image data, using these to rescale the image measurements, and then factorizing the resulting rescaled measurement matrix using either SVD or a fast approximate factorization algorithm. 

Fundamental matrices and epipoles are estimated using the linear least squares method with all the available point matches, followed by a supplementary SVD to project the fundamental matrices to rank 2 and find the epipoles. 

As part of the current blossoming of interest in multiimage reconstruction, Shashua [14] recently extended the wellknown two-image epipolar constraint to a trilinear constraint between matching points in three images. 

Summary: Projective structure and motion can be recovered from multiple perspective images of a scene consisting of points and lines, by estimating fundamental matrices and epipoles from the image data, using these to rescale the image measurements, and then factorizing the resulting rescaled measurement matrix using either SVD or a fast approximate factorization algorithm. 

The key technical advance that makes this work possible is a practical method for estimating these using fundamental matrices and epipoles obtained from the image data. 

The authors need to recover 3D structure (point locations) and motion (camera calibrations and locations) from m uncalibrated perspective images of a scene containing n 3D points. 

The factorization paradigm has two key attractions that are only enhanced by moving from the affine to the projective case: (i) All of the data in all of the images is treated uniformly — there is no need to single out ‘privileged’ features or images for special treatment; (ii) No initialization is required and convergence is virtually guaranteed by the nature of the numerical methods used. 

With such a non-redundant set of equations the depths for each point p can be found trivially by chaining together the solutions for each image, starting from some arbitrary initial value such as 1p = 1. 

When the matrix is not exactly of rank r the guesses are not quite optimal and it is useful to include further sweeps (say 2r in total) and then SVD the matrix of extracted columns to estimate the best r combinations of them. 

Although SVD is probably near-optimal for full-rank matrices, rank r matrices can be factorized in ‘output sensitive’ time O(mnr). 

There is no obvious solution to the error modelling problem, beyond using the factorization to initialize a nonlinear least squares routine (as is done in some of the experiments below). 

The full theory of projective depth recovery applies equally to two, three and four image matching tensors, but throughout this paper The authorwill concentrate on the two-image (fundamental matrix) case for simplicity.