What are the future works mentioned in the paper "Factorization methods for projective structure and motion" ?

Future work will expand on this. Summary: Projective structure and motion can be recovered from multiple perspective images of a scene consisting of points and lines, by estimating fundamental matrices and epipoles from the image data, using these to rescale the image measurements, and then factorizing the resulting rescaled measurement matrix using either SVD or a fast approximate factorization algorithm.

How are the fundamental matrices and epipoles estimated?

Fundamental matrices and epipoles are estimated using the linear least squares method with all the available point matches, followed by a supplementary SVD to project the fundamental matrices to rank 2 and find the epipoles.

How can the authors recover projective structure and motion from multiple perspective images?

Summary: Projective structure and motion can be recovered from multiple perspective images of a scene consisting of points and lines, by estimating fundamental matrices and epipoles from the image data, using these to rescale the image measurements, and then factorizing the resulting rescaled measurement matrix using either SVD or a fast approximate factorization algorithm.

How many points can be recovered from a scene?

The authors need to recover 3D structure (point locations) and motion (camera calibrations and locations) from m uncalibrated perspective images of a scene containing n 3D points.

What is the way to estimate the r combinations of columns?

When the matrix is not exactly of rank r the guesses are not quite optimal and it is useful to include further sweeps (say 2r in total) and then SVD the matrix of extracted columns to estimate the best r combinations of them.

How can one factorize a rank r matrix?

Although SVD is probably near-optimal for full-rank matrices, rank r matrices can be factorized in ‘output sensitive’ time O(mnr).

(Open Access) Factorization methods for projective structure and motion (1996) | Bill Triggs

Q: What is the main reason for the expansion of the epipolar constraint?

As part of the current blossoming of interest in multiimage reconstruction, Shashua [14] recently extended the wellknown two-image epipolar constraint to a trilinear constraint between matching points in three images.

HAL Id: inria-00548364

https://hal.inria.fr/inria-00548364

Submitted on 20 Dec 2010

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Factorization Methods for Projective Structure and

Motion

Bill Triggs

To cite this version:

Bill Triggs. Factorization Methods for Projective Structure and Motion. International Conference

on Computer Vision & Pattern Recognition (CVPR ’96), Jun 1996, San Francisco, United States.

pp.845–851, �10.1109/CVPR.1996.517170�. �inria-00548364�

Factorization Methods for Projective Structure and Motion

Bill Triggs

INRIA Rh

one-Alpes,

655 avenue de l’Europe, 38330 Montbonnot Saint-Martin, France.

Bill.Triggs@inrialpes.fr http://www.inrialpes.fr/MOVI/Triggs

Abstract

This paper describes a family of factorization-based algorithms that

recover 3D projective structure and motion from multiple uncali-

brated perspective images of 3D points and lines. They can be viewed

as generalizations of the Tomasi-Kanade algorithm from afﬁne to

fully perspective cameras, and from points to lines. They make no

restrictive assumptions about scene or camera geometry, and unlike

most existing reconstruction methods they do not rely on ‘privileged’

points or images. All of the available imagedata is used, and each fea-

ture in each image is treateduniformly. The key to projective factoriz-

ation is the recovery of a consistent set of projective depths (scale fac-

tors) forthe image points: this is done using fundamental matricesand

epipoles estimatedfromthe image data. We compare the performance

of the new techniques with several existing ones, and also describe an

approximate factorization method that gives similar results to SVD-

based factorization, but runs much more quickly for large problems.

Keywords: Multi-image Structure, Projective Reconstruction, Ma-

trix Factorization.

1 Introduction

There has been considerable progress on scene reconstruction

from multiple images in the last few years, aimed at applica-

tions ranging from very precise industrial measurement sys-

tems with several ﬁxed cameras, to approximate structure and

motion from real time video for active robot navigation. One

can usefully begin by ignoring the issues of camera calibra-

tion andmetricstructure, initially recoveringthescene up to an

overall projective transformation and only later adding metric

information if needed [5, 10, 1]. The key result is that projec-

tive reconstruction is the best that can be done without calibra-

tion or metric information about the scene, and that it is pos-

sible from at least two views of point-scenes or three views of

line-scenes [2, 3, 8, 6].

Most current reconstruction methods either work only for

the minimal number of views (typically two), or single out a

few ‘privileged’ views for initialization before bootstrapping

themselves to the multi-view case [5, 10, 9]. For robustness

and accuracy, there is a need for methods that uniformly take

To appear in CVPR’96. This work was supported by an EC HCM grant and

INRIA Rhˆone-Alpes. I would like to thank Peter Sturm and Richard Hartley

for enlightening discussions.

account of all the data in all the images, without making re-

strictive special assumptions or relying on privileged features

or images for initialization. The orthographicand paraperspec-

tive structure/motionfactorizationmethods of Tomasi, Kanade

and Poelman [17, 11] partially fulﬁll these requirements, but

they only apply when the camera projections are well approx-

imated by afﬁne mappings. This happens only for cameras

viewingsmall, distant scenes, which is seldom the case in prac-

tice. Factorization methods for perspective images are needed,

however it has not been clear how to ﬁnd the unknown projec-

tive scale factors of the image measurements that are required

for this. (In the afﬁne case the scales are constant and can be

eliminated).

As part of the current blossoming of interest in multi-

imagereconstruction,Shashua[14]recentlyextendedthewell-

known two-image epipolar constraint to a trilinear constraint

between matching points in three images. Hartley [6] showed

that this constraint also applies to lines in three images, and

Faugeras & Mourrain [4] and I [18, 19] completed that cor-

ner of the puzzle by systematically studying the constraints for

lines and points in any number of images. A key aspect of the

viewpointpresentedin [18, 19] is that projectivereconstruction

is essentially a matter of recovering a coherent set of projec-

tive depths — projective scale factors that represent the depth

informationlostduringimageprojection. These are exactlythe

missing factorization scales mentioned above. They satisfy a

set of consistencyconditionscalled ‘joint image reconstruction

equations’ [18], that link them together via the corresponding

image point coordinates and the various inter-image matching

tensors.

In the MOVI group, we have recently been developing pro-

jective structure and motion algorithms based on this ‘projec-

tive depth’ picture. Several of these methods use the factoriz-

ation paradigm, and so can be viewed as generalizations of the

Tomasi-Kanade method from afﬁne to fully perspective pro-

jections. However they also require a depth recovery phase

that is not present in the afﬁne case. The basic reconstruction

method for point images was introduced in [15]. The current

paper extends this in several directions, and presents a detailed

assessment of the performanceof the new methods in compar-

ison to existing techniques such as Tomasi-Kanade factoriz-

ation and Levenberg-Marquardt nonlinear least squares. Per-

haps the most signiﬁcant result in the paper is the extension

of the method to work for lines as well as points, but I will

also show how the factorization can be iteratively ‘polished’

(with results similar to nonlinear least squares iteration), and

how any factorization-based method can be speeded up signif-

icantly for large problems, by using an approximate ﬁxed-rank

factorization technique in place of the Singular Value Decom-

position.

The factorization paradigm has two key attractions that are

onlyenhancedbymovingfromtheafﬁnetotheprojectivecase:

(i) All of the data in all of the images is treated uniformly —

there is no need to single out ‘privileged’ features or images

for special treatment; (ii) No initialization is required and con-

vergence is virtually guaranteed by the nature of the numerical

methods used. Factorization also has some well known disad-

vantages:

1) Every primitive must be visible in every image. This is un-

realistic in practice givenocclusionand extractionand tracking

failures.

2) It is not possible to incorporate a full statistical error model

for the image data, althoughsome sort of implicit least-squares

trade-off is made.

3) It is not clear how to incorporate additional points or im-

ages incrementally: the whole calculation must be redone.

4) SVD-based factorization is slow for large problems.

Only the speed problem will be considered here. SVD is

slow because it was designed for general, full rank matrices.

For matrices of ﬁxed low rank

(as here, where the rank is 3

for the afﬁne method or 4 for the projective one), approximate

factorizations can be computed in time

(

mnr

)

, i.e. directly

proportional to the size of the input data.

The Tomasi-Kanade ‘hallucination’ process can be used to

work around missing data [17], as in the afﬁne case. How-

ever this greatly complicates the method and dilutes some of

its principal beneﬁts. There is no obvious solution to the error

modelling problem, beyond using the factorization to initialize

a nonlinear least squares routine (as is done in some of the ex-

periments below). It would probablybe possible to develop in-

cremental factorization update methods, although there do not

seem to be any in the standard numerical algebra literature.

The rest of the paper outlines the theory of projective fac-

torization for points and lines, describes the ﬁnal algorithms

and implementation, reportsonexperimentalresults using syn-

thetic and real data, and concludes with a discussion. The

full theory of projective depth recovery applies equally to two,

three and four image matching tensors, but throughout this pa-

per I will concentrate on the two-image (fundamental matrix)

case for simplicity. The underlying theory for the higher va-

lency cases can be found in [18].

2 Point Reconstruction

We need to recover 3D structure (point locations) and mo-

tion (camera calibrations and locations) from

uncalibrated

perspective images of a scene containing

3D points. With-

out further information it is only possible to reconstruct the

scene up to an overall projective transformation [2, 8], so we

willworkin homogeneouscoordinateswith respect to arbitrary

projectivecoordinateframes. Let

(

= 1

; : : : ; n

) be the un-

known homogeneous 3D point vectors,

(

= 1

; : : : ; m

) the

unknown



image projections,and

the measured homo-

geneous image point vectors. Modulo some scale factors



theimagepointsareprojectedfromtheworldpoints:



. Each object is deﬁned only up to rescaling. The



’s

‘cancel out’ the arbitrary scales of the image points, but there

is still the freedom to: (i) arbitrarily rescale each world point

and each projection

; (ii) apply an arbitrary nonsingular



projective deformation



Modulo changes of the



, the image projectionsare invariant

under both of these transformations.

The scale factors



will be called projectivedepths. With

correctly normalized points and projections they become true

optical depths, i.e. orthogonal distances from the focal planes

of the cameras. (NB: this is not the same as Shashua’s ‘projec-

tive depth’ [13]). In general,



projective depths can

be set arbitrarily by choosing appropriatescales forthe

and

. However, once this is done the remaining

(



1)(



degrees of freedom contain real information that can be used

for 3D reconstruction: taken as a whole the projective depths

have a strong internal coherence. In fact, [18, 19] argues that

just as the key to calibrated stereo reconstruction is the recov-

ery of Euclidean depth, the essence of projective reconstruc-

tion is precisely the recovery of a coherent set of projective

depths modulo overall projection and world point rescalings.

Once this is done, reconstruction reduces to choosinga projec-

tive basis for a certain abstract three dimensional ‘joint image’

subspace, and reading off point coordinates with respect to it.

2.1 Factorization

Gather the point projections into a single



matrix equa-

tion:





  



  



  





  



Hence, with a consistent set of projective depths the rescaled

measurement matrix

has rank at most 4. Any rank 4 ma-

trix can be factorized into some



matrix of ‘projections’

multiplying a



matrix of ‘points’ as shown, and any such

factorization corresponds to a valid projective reconstruction:

the freedom in factorization is exactly a



nonsingular lin-

ear transformation

P T



T X

, which can be re-

garded as a projective transformation of the reconstructed 3D

space.

One practical methodof factorizing

is the Singular Value

Decomposition [12]. This decomposes an arbitrary



ma-

trix



of rank

into a product



where the columns of



and



are orthonormal bases

for the input (co-kernel) and output (range) spaces of



and



is a diagonal matrix of positive decreasing ‘singular

values’. Thedecompositionis uniquewhenthe singular values

are distinct, and can be computed stably and reliably in time

(

k l

min(

k ; l

))

. The matrix

of singular values can be ab-

sorbed into either

to give a decomposition of the pro-

jection/point form

. (I absorb it into

to form

The SVD has been used by Tomasi, Kanade and Poel-

man [17, 11] for their afﬁne (orthographic and paraperspec-

tive) reconstruction techniques. The currentapplication can be

viewed as a generalization of these methods to projective re-

construction. The projective case leads to slightly larger ma-

trices (



rank 4 as opposed to



rank 3), but is

actually simpler than the afﬁne case as there is no need to sub-

tract translation terms orapply nonlinear constraints to guaran-

tee the orthogonality of the projection matrices.

Ideally, one would like to ﬁnd reconstructions in time

(

)

(the size of the input data). SVD is a factor of

(min(3

m; n

))

slower than this, which can be signiﬁcant if

there are many points and images. Although SVD is proba-

bly near-optimal for full-rank matrices, rank

matrices can be

factorized in ‘output sensitive’ time

(

mnr

)

. I have experi-

mented with one such ‘ﬁxed rank’ method, and ﬁnd it to be al-

most as accurate as SVD and signiﬁcantly faster for large prob-

lems. The method repeatedlysweepsthe matrix, at each sweep

guessing and subtracting a column-vector that ‘explains’ as

much as possible of the residual error in the matrix columns.

A rank

matrix is factorized in

sweeps. When the matrix is

not exactly of rank

the guesses are not quite optimal and it is

useful to includefurther sweeps (say

in total) and then SVD

the matrix of extracted columns to estimate the best

combi-

nations of them.

2.2 Projective Depth Recovery

The above factorization techniques can only be used if a self-

consistent set of projective depths



can be found. The key

technical advance that makes this work possible is a practical

method for estimating these using fundamental matrices and

epipoles obtained from the image data. The full theory can be

found in [18], which also describes how to use trivalent and

quadrivalent matching tensors for depth recovery. Here we

brieﬂy sketch the fundamental matrix case. The image projec-

tions



imply that the



matrix

















has rank at most 4, so all of its



minors vanish. Expand-

ing by cofactors in the last column gives homogeneous linear

equations in the components of



and



, with coef-

ﬁcients that are



determinants of projection matrix rows.

These turn out to be the expressions for the fundamentalmatrix

and epipole

of camera

in image

in terms of projection

matrix components [19, 4]. The result is the projective depth

recovery equation:

(

)



= (

)



(1)

This says two things: (i) The epipolar line of

in image

the same as the line through the corresponding point

and

epipole

(as is well known); (ii) With the correct projective

depths and scalings for

and

, the two terms have exactly

the same size. The equality is exact, not just up to scale. This is

the new result that allows us to recover projective depths using

fundamental matrices and epipoles. Analogous results based

on higher order matching tensors can be found in [18].

It is straightforward to recover projective depths using (1).

Each instance of it linearly relates the depths of a single 3D

point in two images. By estimating a sufﬁcient number of fun-

damental matrices and epipoles, we can amass a system of ho-

mogeneous linear equations that allows the complete set of

depths for a given point to be found, up to an arbitrary over-

all scale factor. At a minimum, this can be done by selecting

any set of



equations that link the

images into a single

connected graph. With such a non-redundant set of equations

the depths for each point

can be found trivially by chaining

together the solutions for each image, starting from some arbi-

trary initial value such as



= 1

. Solving the depth recovery

equation in least squares gives a simple recursion relation for



in terms of



(

)



(

)



If additional depth recovery equations are used, this simple re-

cursion must be replacedby a redundant(and hence potentially

more robust) homogeneous linear system. However, care is

needed. The depth recovery equationsare sensitive to the scale

factors chosen for the

’s and

’s, and these can not be recov-

ereddirectlyfromthe image data. This is irrelevantwhen a sin-

gle chain of equations is used, as rescalings of

and

affect

all points equally and hence amount to rescalings of the corre-

sponding projection matrices. However with redundant equa-

tions it is essential to choose a mutually self-consistent set of

scales for the

’s and

’s. I will not describe this process here,

except to note that the consistency condition is the Grassmann

identity

[18].

It is still unclear what the best trade-off between economy

and robustnessis fordepth recovery. This paperconsidersonly

two simple non-redundantchoices: either the images are taken

pairwiseinsequence,

;

; : : : ;

m m



, or all subsequent

images are scaled in parallel from the ﬁrst,

;

; : : : ;

It might seem that long chains of rescalings would prove nu-

merically unstable, but in practice depth recovery is surpris-

ingly well conditioned. Both serial and parallel chains work

very well despite their non-redundancyand chain length or re-

liance on a ‘key’ image. The two methods give similar results

except when there are many (

40) images, when the shorter

chainsof theparallelsystembecomemorerobust. Both are sta-

ble even when epipolarpoint transferis ill-conditioned (e.g. for

a camera moving in a straight line, when the epipolar lines of

different images coincide): the image observations act as sta-

ble ‘anchors’ for the transfer process.

Balancing: A further point is that with arbitrary choices of

scale for the fundamental matrices and epipoles, the average

size of the recovered depths might tend to increase or decrease

exponentially during the solution-chaining process. Theoret-

ically this is not a problem as the overall scales are arbitrary,

but it couldeasilymake the factorization phase numerically ill-

conditioned. To counter this the recovered matrix of projec-

tive depths must be balanced after it has been built, by judi-

cious overall row and column rescalings. The process is very

simple. The image points are normalized on input, so ideally

all of the scale factors



should have roughly the same or-

der of magnitude,

(1)

say. For each point the depths are esti-

matedas above, andthen: (i) each row (image) of theestimated

depth matrix is rescaled to have length

; (ii) each column

(point) of the resulting matrix is rescaled to length

. This

process is repeated until it roughly converges, which happens

very quickly (within 2–3 iterations).

3 Line Reconstruction

3D lines can also be reconstructed using the above techniques.

A line

can be represented by any two 3D points lying on it,

say

and

. In image

projects to some image line

and

project to image points

and

lying on

. The

points

= 1

; : : : ; m

are in epipolar correspondence, so

they can be used in the depth recovery equation (1) to recon-

struct

, and similarly for

. The representatives

and

can

be ﬁxed implicitly by choosing

and

arbitrarilyon

in the

ﬁrst image, and using the epipolar constraint to transfer these

to the correspondingpoints in the remaining images:

lies on

both

and the epipolar line of

, so is located at their inter-

section.

In fact, epipolar transfer and depth recovery can be done in

one step. Let

stand for the rescaled via points

. Substi-

tute these into equation (1), cross-product with

, expand, and

simplify using



= 0

(

) =

(

)



(



)

+ (



)



(



)

(2)

Up to a factor of



, the intersection

(

)

with the epipolarline of

automatically gives the correct pro-

jective depth for reconstruction. Hence, factorization-based

line reconstruction can be implemented by choosing a suitable

(widely spaced) pair of via-points on each line in the ﬁrst im-

age, and then chaining together instances of equation (2) to

ﬁnd the corresponding, correctly scaled via-points in the other

images. The required fundamental matrices can not be found

directly from line matches, but they can be estimated from

point matches, or from the trilinear line matching constraints

(trivalent tensor) [6, 14, 4, 19, 18]. Alternatively, the triva-

lent tensor can be used directly: in tensorial notation [18], the

trivalent via-point transfer equation is

)

As with points, redundant equations may be included if and

only if a self-consistent normalization is chosen for the funda-

mental matrices and epipoles. For numerical stability, it is es-

sential to balance the resulting via-points(i.e. depthestimates).

This works with the



lines

‘

’ matrix of via-points,

iteratively rescaling all coordinates of each image (triple of

rows) and all coordinates of each line (pair of columns) until

an approximateequilibrium is reached,where the overallmean

square size of each coordinate is

(1)

in each case. To ensure

that the via-points representing each line are on average well

separated, I also orthonormalize the two

-component col-

umn vectors for each line with respect to one another. The via-

point equations (2) are linear and hence invariant with respect

to this, but it does of course change the 3D representatives

and

recovered for each line.

4 Implementation

This section summarizes the complete algorithm

for factorization-based 3D projective reconstruction from im-

age points and lines, and discusses a few important implemen-

tation details and variants. The algorithm goes as follows:

0) Extract and match points and lines across all images.

1) Standardize all image coordinates (see below).

2) Estimate a set of fundamental matrices and epipoles sufﬁ-

cienttochainalltheimagestogether(e.g. using pointmatches).

3) For each point, estimate the projective depths using equa-

tion (1). Build and balance the depth matrix



, and use it to

build the rescaled point measurement matrix

4) For each line choose two via-pointsandtransferthemto the

other images using the transfer equations (2). Build and bal-

ance the rescaled line via-point matrix.

5) Combine the line and point measurement matrices into a



(

points

+ 2

lines

)

data matrix and factorize it using either

SVD or the ﬁxed-rank method. Recover 3D projective struc-

ture (point and via-point coordinates) and motion (projection

matrices) from the factorization.

6) Un-standardize the projection matrices (see below).

Factorization methods for projective structure and motion

Citations

Factorization Method for Structure from Perspective Multi-View Images

PALM: portable sensor-augmented vision system for large-scene modeling

Optimal motion estimation from multiple images by normalized epipolar constraint

A factorization method for multiple perspective views via iterative depth estimation

Extending 3D Lucas–Kanade tracking with adaptive templates for head pose estimation

References

Numerical Recipes in C: The Art of Scientific Computing

Numerical recipes in Pascal : the art of scientific computing

Numerical Recipes in C: The Art of Scientific Computing

Shape and motion from image streams under orthography: a factorization method

Camera Self-Calibration: Theory and Experiments

Related Papers (5)

Shape and motion from image streams under orthography: a factorization method

Multiple view geometry in computer vision

Bundle Adjustment - A Modern Synthesis

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Recovering non-rigid 3D shape from image streams

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Factorization methods for projective structure and motion" ?

Q2. What are the future works mentioned in the paper "Factorization methods for projective structure and motion" ?

Q3. How are the fundamental matrices and epipoles estimated?

Q4. What is the main reason for the expansion of the epipolar constraint?

Q5. How can the authors recover projective structure and motion from multiple perspective images?

Q6. What is the key technical advance that makes this work possible?

Q7. How many points can be recovered from a scene?

Q8. What are the key attractions of the factorization paradigm?

Q9. How can the authors find the depths for each point p?

Q10. What is the way to estimate the r combinations of columns?

Q11. How can one factorize a rank r matrix?

Q12. What is the solution to the error modelling problem?

Q13. What is the underlying theory of projective depth recovery?