What have the authors stated for future works in "Robust global translations with 1dsfm" ?

In the future the authors hope to explore further ways of aggregating 1DSfM subproblems than simple summation, which could shed light on more complicated outliers, such as those arising from ambiguous scene structures.

What is the strength of their method?

The authors believe a strength of their method is its simplicity—it relies on a well-studied combinatorial optimization problem, and a simple non-linear solver.

What is the effect of a Huber loss on the solution quality?

Their results will show that a Huber loss can improve solution quality while retaining good convergence, and that the benefit is largely orthogonal to 1DSfM.

Why does the method give similar results to a sequential SfM system?

Because ground truth positions are usually unavailable for large-scale SfM problems, the authors show their method gives similar results to a sequential SfM system based on Bundler [20], but in much less time.

What is the importance of robust function in the SfM problem?

In their case, within an continuous optimization framework, the authors have found that the choice of robust function is very important—Cauchy1 Formally, Eq. 5 is undefined if ever tj = ti for any edge.

What is the way to compute global rotations?

To compute global rotations, the authors run Chatterjee and Govindu’s rotations averaging method [5], with the parameters suggested in their paper.

How did the authors find that 1DSfM classified edges?

At their threshold of τ = 0.1, the authors found that 1DSfM classified edges with a precision of 0.96 and an recall of 0.92 (averaged across the four datasets).

What is the x component of each translations measurement?

Only the x component of each translations measurement is nowrelevant to the problem: t̂ij · 〈1, 0, 0〉 = xij , and the authors need to assign an x-coordinate to each vertex.

(Open Access) Robust Global Translations with 1DSfM (2014) | Kyle Wilson

Q: What have the authors contributed in "Robust global translations with 1dsfm" ?

The authors present a simple, effective method for solving structure from motion problems by averaging epipolar geometries. First, the authors propose a method for removing outliers from problem instances by solving simpler low-dimensional subproblems, which they refer to as 1DSfM problems. Second, the authors present a simple, principled averaging scheme. The authors demonstrate this new method in the wild on Internet photo collections.

Q: Why have there been significant recent interest in revisiting global methods?

there has been significant recent interest in revisiting global methods because of their potential for improved speed and decreased dependence on local decisions or image ordering.

Robust Global Translations with 1DSfM

Kyle Wilson and Noah Snavely

Cornell University, Ithaca NY

{wilsonkl,snavely}@cs.cornell.edu

Abstract.

We present a simple, effective method for solving structure from mo-

tion problems by averaging epipolar geometries. Based on recent successes in

solving for global camera rotations using averaging schemes, we focus on the

problem of solving for 3D camera translations given a network of noisy pairwise

camera translation directions (or 3D point observations). To do this well, we have

two main insights. First, we propose a method for removing outliers from problem

instances by solving simpler low-dimensional subproblems, which we refer to as

1DSfM problems. Second, we present a simple, principled averaging scheme. We

demonstrate this new method in the wild on Internet photo collections.

Keywords: Structure from Motion, translations problem, robust estimation

1 Introduction

Recent work on the unstructured Structure from Motion (SfM) problem has had renewed

interest in global methods. Unlike sequential approaches which build 3D models from

photo collections by iteratively growing a small seed model, global (or batch) methods for

SfM consider the entire problem at once. By doing this they avoid several disadvantages

of sequential methods, which have tended to be costly, requiring a repeated nonlinear

model reﬁnement (bundle adjustment) to avoid errors. Also, unlike global methods,

sequential SfM necessarily treats images unequally, where those considered ﬁrst can

have a disproportionate effect on the ﬁnal model. In practice, this behavior can sometimes

lead to cascading mistakes and can exacerbate the problem of drift.

However, global methods have difﬁculties of their own. A key problem is that

reasoning about outliers is challenging. Techniques from sequential methods, such

as ﬁltering out measurements inconsistent with the current model at each step, are

not directly applicable in a global setting. It is harder to reason a priori about which

measurements are unreliable.

In this work, we present a new global SfM method; like other methods, we solve

ﬁrst for global camera rotations, then translations, given a set of pairwise epipolar

geometries. As there has been signiﬁcant progress on the rotations problem, we focus

on translations, and offer two key insights. The ﬁrst, which we call

1DSfM

, is a simple

way to preprocess a problem instance to remove outlier measurements. 1DSfM is based

on reducing a difﬁcult problem to single-dimensional subproblems where inference

becomes a more straightforward combinatorial computation. Under this 1D projection,

a translations problem becomes an instance of MINIMUM FEEDBACK ARC SET, a well

studied graph problem. By solving for a 1D ordering, we recover information about

2 Kyle Wilson and Noah Snavely

which 3D measurements are likely inconsistent. Second, we describe a new, very simple

solver for the translations problem. Surprisingly, we ﬁnd that non-linear optimization

with this solver—even with random initialization—works remarkably well, especially

once outliers have been removed. Hence, our 1DSfM-based outlier removal technique

goes hand in hand with our simple translations solver to achieve high-quality results.

We show the effectiveness of our two methods on a variety of landmark-scale Internet

community photo collections, covering a range of sizes and scene types. Our code and

data are available at http://www.cs.cornell.edu/projects/1dsfm.

2 Related Work

While some earlier SfM methods were global, such as factorization [21], most current

large-scale SfM systems involve sequential reconstruction [20, 2, 9]. Sequential methods

build models a few images at a time, often with bundle adjustment in between steps.

However, there has been signiﬁcant recent interest in revisiting global methods because of

their potential for improved speed and decreased dependence on local decisions or image

ordering. These methods often work by ﬁrst estimating an initial set of camera poses

(typically through use of estimated relative poses between pairs or triplets), followed

by a global bundle adjustment to reﬁne this initial solution. With a few exceptions (e.g.

[12]), these methods ﬁrst solve for camera rotations, and then camera translations.

Rotations.

A number of methods have been proposed for solving for global rotations

from pairwise estimates of relative rotations. Some methods formulate the problem as a

linear system by relaxing constraints on rotation parameterizations [11, 17, 3, 18]. Enqvist

et al. [8] look for a best spanning tree of pairwise rotations to ﬁlter outliers in advance.

Sinha et al. [19] use vanishing point estimates as an additional cue. More recently, Hartley

et al. [13] as well as Chatterjee and Govindu [5] have presented robust

methods based

on the Lie algebraic structure of the manifold of rotations. Finally, Fredriksson and

Olsson [10] present an approach based on primal and dual problems which can certify if

a solution is globally optimal. We have found the method of Chatterjee and Govindu [5]

particularly effective, and use it to produce input for our method.

Translations.

Like the rotations problem, the translations problem is often formulated

as computing global camera translations from pairwise ones. Some approaches are based

on a linear system of cross product constraints [11, 3]. Others use Second Order Cone

Programming, based on the

∞

norm [16–18]. These require very careful attention to out-

liers. Brand et al. [4] use a spectral approach, but do not address outliers. Sinha et al. [19]

robustly compute similarity transformations that align pairs of reconstructions, and then

average over these transformations. Recently Jiang et al. [14] have formulated a linear

constraint with geometric, rather than algebraic meaning, based on co-planarity in triplets

of cameras. Finally, Crandall et al. [6] take a different approach to optimization, using a

complex scheme involving a discrete Markov Random Field search and a continuous

Levenberg-Marquardt reﬁnement to robustly explore the solution space. Our translations

solver optimizes an objective function that depends only on comparing measurement

directions to model directions, as opposed to other methods [11, 3] where the objective

function is also a function of the distance between images. To avoid the resulting bias,

Robust Global Translations with 1DSfM 3

Govindu proposes an iterative reweighting scheme [11], which is unnecessary in our

approach. Jiang et al. discuss the importance of geometric vs. algebraic cost functions,

as they minimize a value that has physical signiﬁcance. In this sense our cost function is

also geometric (but in the space of measurements, rather than in the solution space).

Handling outliers.

A key contribution of our work is a simple algorithm for removing

outliers in a translations problem. Zach et al. [24] detect outlier epipolar geometries by

looking at loop closure in graph cycles. Moulon et al. improve on this approach, and

also robustly ﬁt trifocal tensors to ﬁnd less noisy translations. Our method for outlier

removal is similar in motivation to [18], but by projecting into a single dimension we

solve tractable subproblems that reduce to a simple combinatorial graph problem.

3 Problem Formulation

The gold standard method for structure from motion is bundle adjustment—the joint

nonlinear reﬁnement of camera and structure parameters [22]. However, bundle adjust-

ment is a largely local search, and its success depends critically on initialization. Given a

good initial guess, bundle adjustment can produce high quality solutions, but if the guess

is bad, the optimization may fall into local minima far from the optimal solution. For

this reason most SfM methods focus on creating a close-enough initialization which can

then be reﬁned with bundle adjustment; sequential (or incremental) SfM methods are

one such approach that use repeated bundle adjustment on increasingly large problems

to reach a good solution.

Initializing bundle adjustment involves estimating a rotation matrix and a position

for each camera. In our notation, a rotation matrix

represents a mapping from world

coordinates to camera coordinates, and a translation

represents a location in the world

coordinate frame (in our work, we use “location” and “translation” interchangeably, in a

slight abuse of terminology). As with other recent global methods, our input is a set of

images

, and a network of computed epipolar geometries

(

)

between pairs

(i, j)

of overlapping images. (We will use a hat for epipolar geometries, to emphasize that

they are our input measurements. We use a superscript

for relative translations between

two cameras, which are deﬁned in a local coordinate system.) These epipolar geometries

are not available for all camera pairs, because not all pairs of images visually overlap.

These inputs deﬁne a graph we call the epipolar geometry (EG) graph

G = (V, E)

a set of images

, where for every edge

(i, j) ∈ E

we have a measurement

(

)

Given perfect measurements, global camera poses (R

, t

) would satisfy

= R

(1)

= R

− t

) (2)

where λ

’s are unknown scaling factors (unique up to global gauge ambiguity).

Following a now-common approach [11, 16, 17, 6, 3, 14], we separate the initializa-

tion into two stages: a rotations problem and a translations problem. These two together

produce an initialization to a ﬁnal bundle adjustment. Recent work has been successful

in solving the rotations problem robustly [13, 10, 5]; we build on this work and focus on

the translations problem. Given estimates

of camera rotation matrices, we can write

4 Kyle Wilson and Noah Snavely

our measurement of the direction from camera

to camera

= R

, where

is a unit 3-vector (i.e., a point on the unit sphere) in the global coordinate system. Hence,

the translations problem reduces to the following graph embedding problem:

Given: Graph G = (V, E)

Measurements

: E → S

Metric d : S

× S

→ R

Minimize:

(i,j)∈E



− t



over embeddings: T : V → R

i.e. T = {t

|i ∈ V }

Note that in this framework, the second endpoint

of an edge may be a point or a camera.

Camera-point constraints can be important for achieving full scene coverage, and for

avoiding degeneracies arising from collinear motion, an issue discussed in [14].

The formulation above does not specify the exact form of our objective function. It

also excludes objective functions that depend on the distance between

and

, rather

than only the direction. These issues will both be discussed in Section 5.

Finally, this problem is made greatly more difﬁcult by noise. We hope that most EGs

will be approximately correct, but sometimes calculating EGs returns a wildly incorrect

solution. For the translations problem we assume a mixed model of small variance inlier

noise with a smaller fraction of outlier noise distributed uniformly over S

4 Outlier Removal using 1DSfM

By removing bad measurements in advance we can solve problems more accurately and

reliably. In this section, we present a new method for identifying outlier measurements

by projecting translations problems to 1-dimensional subproblems which we can solve

more easily. Our approach is related to previous work [24] which detects outliers as

measurements that cannot be consistently chained along cycles. However, there are

usually many cycles to enumerate, and inferring erroneous measurements from bad

cycles is difﬁcult for large problems. Our method is based on many smaller, simpler

inferences that are then aggregated. This makes outlier detection tractable even for large

problems where [24] has difﬁculty.

The translations problem described above is a 3-dimensional embedding. One way to

approach outlier detection is to try to ﬁrst simplify this underlying problem. For instance,

we could project the 3D problem onto a ground plane, resulting in a 2D graph embedding

problem. In other words, we could ignore the

component of each measurement, and

consider only the 2D projections:

7→

− proj

, where

k = h0, 0, 1i

. In this

projected problem, we would need to assign an (x, y) pair to each vertex.

In our work, we take this idea a step further and project onto a single dimensional

subspace. Consider projecting a translations problem onto the

-axis, as in the blue

problem in Figure 1. Only the

component of each translations measurement is now

Robust Global Translations with 1DSfM 5

Fig. 1.

A toy illustration of 1DSfM. Panel (a) is a good solution to a translations problem for

reference. Panel (b) shows the translations problem input—a set of edges with orientations. One

outlier edge has been added in red. We also show two directions for projection:

and

. Panel

problems are instances of MINIMUM FEEDBACK ARC SET. Finally, (d) contains good solutions

to the 1D problems in (c). In the lower case, not all ordering constraints can be satisﬁed, due to the

outlier edge. Note that outlier edges may be consistent in some subproblems but not in others.

relevant to the problem:

· h1, 0, 0i = x

, and we need to assign an

-coordinate

to each vertex. Recall that our pairwise translation measurements represent directions,

but not distances. On the

-axis there are only two directions: left and right. Hence, an

embedding is consistent with edge

(i, j)

> 0

and

embeds to the left of

, and vice

versa for

< 0

. Figure 1 panel (d) shows such an embedding. Note that in 1D, edge

directions have become ordering constraints: all embeddings with the same ordering

are equally consistent with our problem. Hence, this 1D problem is a combinatorial

ordering problem, rather than a continuous optimization problem: we want to ﬁnd a

global ordering of the vertices that satisﬁes the pairwise orderings as well as possible.

We can formulate this problem on a directed version of our graph

, as described below.

Figure 1 also illustrates projecting the same problem in a different direction (in

green). Notice that the outlier shown in red is inconsistent in one projection direction,

but not in another. To catch as many outliers as possible, we embed a graph in many 1D

subspaces, each deﬁned by a unit vector

. For each subproblem only the component

of translations measurements

in the direction of

is relevant to the optimization:

7→ p·

= w

. By regarding the pair

(i, j), w

as equivalent to

(j, i), −w

, we can

form a problem with directed edges with positive edge weights. Given a directed graph

formed in this way, we try to ﬁnd an ordering that satisﬁes as many of these pairwise

constraints as possible; the edges that are inconsistent with this ordering are potential

outliers. This is a well-studied problem in optimization called MINIMUM FEEDBACK

ARC SET (MFAS). Unfortunately it is NP-complete, but there is a rich literature of

approximation algorithms. We found that a variant of [7], as detailed in Algorithm 1,

worked very well on our problems. This algorithm greedily builds an order from left to

right. It always selects a next node that breaks no order constraints if possible. If not, it

selects the next node to maximize a heuristic:

(1 + deg

out

(v))/(1 + deg

(v))

, where

Robust Global Translations with 1DSfM

Figures

Citations

Going deeper with convolutions

Structure-from-Motion Revisited

LIFT: Learned Invariant Feature Transform

D2-Net: A Trainable CNN for Joint Description and Detection of Local Features

GMS: Grid-Based Motion Statistics for Fast, Ultra-Robust Feature Correspondence

References

Bundle Adjustment - A Modern Synthesis

Photo tourism: exploring photo collections in 3D

Shape and motion from image streams under orthography: a factorization method

Building Rome in a day

Building Rome in a day

Related Papers (5)

Distinctive Image Features from Scale-Invariant Keypoints

Structure-from-Motion Revisited

Photo tourism: exploring photo collections in 3D

Towards Linear-Time Incremental Structure from Motion

Multiple view geometry in computer vision

Frequently Asked Questions (12)

Q1. What have the authors contributed in "Robust global translations with 1dsfm" ?

Q2. What have the authors stated for future works in "Robust global translations with 1dsfm" ?

Q3. What is the strength of their method?

Q4. What is the importance of camera-point constraints?

Q5. Why have there been significant recent interest in revisiting global methods?

Q6. What is the effect of a Huber loss on the solution quality?

Q7. Why does the method give similar results to a sequential SfM system?

Q8. What is the importance of robust function in the SfM problem?

Q9. What is the way to compute global rotations?

Q10. What is the importance of geometric vs. algebraic cost functions?

Q11. How did the authors find that 1DSfM classified edges?

Q12. What is the x component of each translations measurement?