scispace - formally typeset
Open AccessBook ChapterDOI

Robust Global Translations with 1DSfM

Kyle Wilson, +1 more
- pp 61-75
Reads0
Chats0
TLDR
This work proposes a method for removing outliers from problem instances by solving simpler low-dimensional subproblems, which it refers to as 1DSfM problems.
Abstract
We present a simple, effective method for solving structure from motion problems by averaging epipolar geometries. Based on recent successes in solving for global camera rotations using averaging schemes, we focus on the problem of solving for 3D camera translations given a network of noisy pairwise camera translation directions (or 3D point observations). To do this well, we have two main insights. First, we propose a method for removing outliers from problem instances by solving simpler low-dimensional subproblems, which we refer to as 1DSfM problems. Second, we present a simple, principled averaging scheme. We demonstrate this new method in the wild on Internet photo collections.

read more

Content maybe subject to copyright    Report

Robust Global Translations with 1DSfM
Kyle Wilson and Noah Snavely
Cornell University, Ithaca NY
{wilsonkl,snavely}@cs.cornell.edu
Abstract.
We present a simple, effective method for solving structure from mo-
tion problems by averaging epipolar geometries. Based on recent successes in
solving for global camera rotations using averaging schemes, we focus on the
problem of solving for 3D camera translations given a network of noisy pairwise
camera translation directions (or 3D point observations). To do this well, we have
two main insights. First, we propose a method for removing outliers from problem
instances by solving simpler low-dimensional subproblems, which we refer to as
1DSfM problems. Second, we present a simple, principled averaging scheme. We
demonstrate this new method in the wild on Internet photo collections.
Keywords: Structure from Motion, translations problem, robust estimation
1 Introduction
Recent work on the unstructured Structure from Motion (SfM) problem has had renewed
interest in global methods. Unlike sequential approaches which build 3D models from
photo collections by iteratively growing a small seed model, global (or batch) methods for
SfM consider the entire problem at once. By doing this they avoid several disadvantages
of sequential methods, which have tended to be costly, requiring a repeated nonlinear
model refinement (bundle adjustment) to avoid errors. Also, unlike global methods,
sequential SfM necessarily treats images unequally, where those considered first can
have a disproportionate effect on the final model. In practice, this behavior can sometimes
lead to cascading mistakes and can exacerbate the problem of drift.
However, global methods have difficulties of their own. A key problem is that
reasoning about outliers is challenging. Techniques from sequential methods, such
as filtering out measurements inconsistent with the current model at each step, are
not directly applicable in a global setting. It is harder to reason a priori about which
measurements are unreliable.
In this work, we present a new global SfM method; like other methods, we solve
first for global camera rotations, then translations, given a set of pairwise epipolar
geometries. As there has been significant progress on the rotations problem, we focus
on translations, and offer two key insights. The first, which we call
1DSfM
, is a simple
way to preprocess a problem instance to remove outlier measurements. 1DSfM is based
on reducing a difficult problem to single-dimensional subproblems where inference
becomes a more straightforward combinatorial computation. Under this 1D projection,
a translations problem becomes an instance of MINIMUM FEEDBACK ARC SET, a well
studied graph problem. By solving for a 1D ordering, we recover information about

2 Kyle Wilson and Noah Snavely
which 3D measurements are likely inconsistent. Second, we describe a new, very simple
solver for the translations problem. Surprisingly, we find that non-linear optimization
with this solver—even with random initialization—works remarkably well, especially
once outliers have been removed. Hence, our 1DSfM-based outlier removal technique
goes hand in hand with our simple translations solver to achieve high-quality results.
We show the effectiveness of our two methods on a variety of landmark-scale Internet
community photo collections, covering a range of sizes and scene types. Our code and
data are available at http://www.cs.cornell.edu/projects/1dsfm.
2 Related Work
While some earlier SfM methods were global, such as factorization [21], most current
large-scale SfM systems involve sequential reconstruction [20, 2, 9]. Sequential methods
build models a few images at a time, often with bundle adjustment in between steps.
However, there has been significant recent interest in revisiting global methods because of
their potential for improved speed and decreased dependence on local decisions or image
ordering. These methods often work by first estimating an initial set of camera poses
(typically through use of estimated relative poses between pairs or triplets), followed
by a global bundle adjustment to refine this initial solution. With a few exceptions (e.g.
[12]), these methods first solve for camera rotations, and then camera translations.
Rotations.
A number of methods have been proposed for solving for global rotations
from pairwise estimates of relative rotations. Some methods formulate the problem as a
linear system by relaxing constraints on rotation parameterizations [11, 17, 3, 18]. Enqvist
et al. [8] look for a best spanning tree of pairwise rotations to filter outliers in advance.
Sinha et al. [19] use vanishing point estimates as an additional cue. More recently, Hartley
et al. [13] as well as Chatterjee and Govindu [5] have presented robust
l
1
methods based
on the Lie algebraic structure of the manifold of rotations. Finally, Fredriksson and
Olsson [10] present an approach based on primal and dual problems which can certify if
a solution is globally optimal. We have found the method of Chatterjee and Govindu [5]
particularly effective, and use it to produce input for our method.
Translations.
Like the rotations problem, the translations problem is often formulated
as computing global camera translations from pairwise ones. Some approaches are based
on a linear system of cross product constraints [11, 3]. Others use Second Order Cone
Programming, based on the
l
norm [16–18]. These require very careful attention to out-
liers. Brand et al. [4] use a spectral approach, but do not address outliers. Sinha et al. [19]
robustly compute similarity transformations that align pairs of reconstructions, and then
average over these transformations. Recently Jiang et al. [14] have formulated a linear
constraint with geometric, rather than algebraic meaning, based on co-planarity in triplets
of cameras. Finally, Crandall et al. [6] take a different approach to optimization, using a
complex scheme involving a discrete Markov Random Field search and a continuous
Levenberg-Marquardt refinement to robustly explore the solution space. Our translations
solver optimizes an objective function that depends only on comparing measurement
directions to model directions, as opposed to other methods [11, 3] where the objective
function is also a function of the distance between images. To avoid the resulting bias,

Robust Global Translations with 1DSfM 3
Govindu proposes an iterative reweighting scheme [11], which is unnecessary in our
approach. Jiang et al. discuss the importance of geometric vs. algebraic cost functions,
as they minimize a value that has physical significance. In this sense our cost function is
also geometric (but in the space of measurements, rather than in the solution space).
Handling outliers.
A key contribution of our work is a simple algorithm for removing
outliers in a translations problem. Zach et al. [24] detect outlier epipolar geometries by
looking at loop closure in graph cycles. Moulon et al. improve on this approach, and
also robustly fit trifocal tensors to find less noisy translations. Our method for outlier
removal is similar in motivation to [18], but by projecting into a single dimension we
solve tractable subproblems that reduce to a simple combinatorial graph problem.
3 Problem Formulation
The gold standard method for structure from motion is bundle adjustment—the joint
nonlinear refinement of camera and structure parameters [22]. However, bundle adjust-
ment is a largely local search, and its success depends critically on initialization. Given a
good initial guess, bundle adjustment can produce high quality solutions, but if the guess
is bad, the optimization may fall into local minima far from the optimal solution. For
this reason most SfM methods focus on creating a close-enough initialization which can
then be refined with bundle adjustment; sequential (or incremental) SfM methods are
one such approach that use repeated bundle adjustment on increasingly large problems
to reach a good solution.
Initializing bundle adjustment involves estimating a rotation matrix and a position
for each camera. In our notation, a rotation matrix
R
i
represents a mapping from world
coordinates to camera coordinates, and a translation
t
i
represents a location in the world
coordinate frame (in our work, we use “location” and “translation” interchangeably, in a
slight abuse of terminology). As with other recent global methods, our input is a set of
images
V
, and a network of computed epipolar geometries
(
ˆ
R
ij
,
ˆ
t
l
ij
)
between pairs
(i, j)
of overlapping images. (We will use a hat for epipolar geometries, to emphasize that
they are our input measurements. We use a superscript
l
for relative translations between
two cameras, which are defined in a local coordinate system.) These epipolar geometries
are not available for all camera pairs, because not all pairs of images visually overlap.
These inputs define a graph we call the epipolar geometry (EG) graph
G = (V, E)
on
a set of images
V
, where for every edge
(i, j) E
we have a measurement
(
ˆ
R
ij
,
ˆ
t
l
ij
)
.
Given perfect measurements, global camera poses (R
i
, t
i
) would satisfy
ˆ
R
ij
= R
>
i
R
j
(1)
λ
ij
ˆ
t
l
ij
= R
>
i
(t
j
t
i
) (2)
where λ
ij
s are unknown scaling factors (unique up to global gauge ambiguity).
Following a now-common approach [11, 16, 17, 6, 3, 14], we separate the initializa-
tion into two stages: a rotations problem and a translations problem. These two together
produce an initialization to a final bundle adjustment. Recent work has been successful
in solving the rotations problem robustly [13, 10, 5]; we build on this work and focus on
the translations problem. Given estimates
R
i
of camera rotation matrices, we can write

4 Kyle Wilson and Noah Snavely
our measurement of the direction from camera
i
to camera
j
as
ˆ
t
ij
= R
i
ˆ
t
l
ij
, where
ˆ
t
ij
is a unit 3-vector (i.e., a point on the unit sphere) in the global coordinate system. Hence,
the translations problem reduces to the following graph embedding problem:
Given: Graph G = (V, E)
Measurements
ˆ
t
ij
: E S
2
Metric d : S
2
× S
2
R
Minimize:
X
(i,j)E
d
ˆ
t
ij
,
t
j
t
i
kt
j
t
i
k
over embeddings: T : V R
3
i.e. T = {t
i
|i V }
Note that in this framework, the second endpoint
j
of an edge may be a point or a camera.
Camera-point constraints can be important for achieving full scene coverage, and for
avoiding degeneracies arising from collinear motion, an issue discussed in [14].
The formulation above does not specify the exact form of our objective function. It
also excludes objective functions that depend on the distance between
t
i
and
t
j
, rather
than only the direction. These issues will both be discussed in Section 5.
Finally, this problem is made greatly more difficult by noise. We hope that most EGs
will be approximately correct, but sometimes calculating EGs returns a wildly incorrect
solution. For the translations problem we assume a mixed model of small variance inlier
noise with a smaller fraction of outlier noise distributed uniformly over S
2
.
4 Outlier Removal using 1DSfM
By removing bad measurements in advance we can solve problems more accurately and
reliably. In this section, we present a new method for identifying outlier measurements
by projecting translations problems to 1-dimensional subproblems which we can solve
more easily. Our approach is related to previous work [24] which detects outliers as
measurements that cannot be consistently chained along cycles. However, there are
usually many cycles to enumerate, and inferring erroneous measurements from bad
cycles is difficult for large problems. Our method is based on many smaller, simpler
inferences that are then aggregated. This makes outlier detection tractable even for large
problems where [24] has difficulty.
The translations problem described above is a 3-dimensional embedding. One way to
approach outlier detection is to try to first simplify this underlying problem. For instance,
we could project the 3D problem onto a ground plane, resulting in a 2D graph embedding
problem. In other words, we could ignore the
z
component of each measurement, and
consider only the 2D projections:
ˆ
t
ij
7→
ˆ
t
ij
proj
ˆ
k
ˆ
t
ij
, where
ˆ
k = h0, 0, 1i
. In this
projected problem, we would need to assign an (x, y) pair to each vertex.
In our work, we take this idea a step further and project onto a single dimensional
subspace. Consider projecting a translations problem onto the
x
-axis, as in the blue
problem in Figure 1. Only the
x
component of each translations measurement is now

Robust Global Translations with 1DSfM 5
Fig. 1.
A toy illustration of 1DSfM. Panel (a) is a good solution to a translations problem for
reference. Panel (b) shows the translations problem input—a set of edges with orientations. One
outlier edge has been added in red. We also show two directions for projection:
ˆ
i
and
p
. Panel
(c) contains only the projected translations problems, one for each projection direction. These
problems are instances of MINIMUM FEEDBACK ARC SET. Finally, (d) contains good solutions
to the 1D problems in (c). In the lower case, not all ordering constraints can be satisfied, due to the
outlier edge. Note that outlier edges may be consistent in some subproblems but not in others.
relevant to the problem:
ˆ
t
ij
· h1, 0, 0i = x
ij
, and we need to assign an
x
-coordinate
to each vertex. Recall that our pairwise translation measurements represent directions,
but not distances. On the
x
-axis there are only two directions: left and right. Hence, an
embedding is consistent with edge
(i, j)
if
x
ij
> 0
and
i
embeds to the left of
j
, and vice
versa for
x
ij
< 0
. Figure 1 panel (d) shows such an embedding. Note that in 1D, edge
directions have become ordering constraints: all embeddings with the same ordering
are equally consistent with our problem. Hence, this 1D problem is a combinatorial
ordering problem, rather than a continuous optimization problem: we want to find a
global ordering of the vertices that satisfies the pairwise orderings as well as possible.
We can formulate this problem on a directed version of our graph
G
, as described below.
Figure 1 also illustrates projecting the same problem in a different direction (in
green). Notice that the outlier shown in red is inconsistent in one projection direction,
but not in another. To catch as many outliers as possible, we embed a graph in many 1D
subspaces, each defined by a unit vector
p
. For each subproblem only the component
of translations measurements
ˆ
t
ij
in the direction of
p
is relevant to the optimization:
ˆ
t
ij
7→ p·
ˆ
t
ij
= w
ij
. By regarding the pair
(i, j), w
ij
as equivalent to
(j, i), w
ij
, we can
form a problem with directed edges with positive edge weights. Given a directed graph
formed in this way, we try to find an ordering that satisfies as many of these pairwise
constraints as possible; the edges that are inconsistent with this ordering are potential
outliers. This is a well-studied problem in optimization called MINIMUM FEEDBACK
ARC SET (MFAS). Unfortunately it is NP-complete, but there is a rich literature of
approximation algorithms. We found that a variant of [7], as detailed in Algorithm 1,
worked very well on our problems. This algorithm greedily builds an order from left to
right. It always selects a next node that breaks no order constraints if possible. If not, it
selects the next node to maximize a heuristic:
(1 + deg
out
(v))/(1 + deg
in
(v))
, where

Citations
More filters
Proceedings ArticleDOI

Going deeper with convolutions

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Proceedings ArticleDOI

Structure-from-Motion Revisited

TL;DR: This work proposes a new SfM technique that improves upon the state of the art to make a further step towards building a truly general-purpose pipeline.
Book ChapterDOI

LIFT: Learned Invariant Feature Transform

TL;DR: This work introduces a novel Deep Network architecture that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description, and shows how to learn to do all three in a unified manner while preserving end-to-end differentiability.
Proceedings ArticleDOI

D2-Net: A Trainable CNN for Joint Description and Detection of Local Features

TL;DR: This work proposes an approach where a single convolutional neural network plays a dual role: It is simultaneously a dense feature descriptor and a feature detector, and shows that this model can be trained using pixel correspondences extracted from readily available large-scale SfM reconstructions, without any further annotations.
Proceedings ArticleDOI

GMS: Grid-Based Motion Statistics for Fast, Ultra-Robust Feature Correspondence

TL;DR: GMS (Grid-based Motion Statistics), a simple means of encapsulating motion smoothness as the statistical likelihood of a certain number of matches in a region, enables translation of high match numbers into high match quality.
References
More filters
Book ChapterDOI

Bundle Adjustment - A Modern Synthesis

TL;DR: A survey of the theory and methods of photogrammetric bundle adjustment can be found in this article, with a focus on general robust cost functions rather than restricting attention to traditional nonlinear least squares.
Journal ArticleDOI

Photo tourism: exploring photo collections in 3D

TL;DR: This work presents a system for interactively browsing and exploring large unstructured collections of photographs of a scene using a novel 3D interface that consists of an image-based modeling front end that automatically computes the viewpoint of each photograph and a sparse 3D model of the scene and image to model correspondences.
Journal ArticleDOI

Shape and motion from image streams under orthography: a factorization method

TL;DR: In this paper, the singular value decomposition (SVDC) technique is used to factor the measurement matrix into two matrices which represent object shape and camera rotation respectively, and two of the three translation components are computed in a preprocessing stage.
Proceedings ArticleDOI

Building Rome in a day

TL;DR: A system that can match and reconstruct 3D scenes from extremely large collections of photographs such as those found by searching for a given city on Internet photo sharing sites and is designed to scale gracefully with both the size of the problem and the amount of available computation.
Journal ArticleDOI

Building Rome in a day

TL;DR: A system that can match and reconstruct 3D scenes from extremely large collections of photographs such as those found by searching for a given city on Internet photo sharing sites and is designed to scale gracefully with both the size of the problem and the amount of available computation.
Frequently Asked Questions (12)
Q1. What have the authors contributed in "Robust global translations with 1dsfm" ?

The authors present a simple, effective method for solving structure from motion problems by averaging epipolar geometries. First, the authors propose a method for removing outliers from problem instances by solving simpler low-dimensional subproblems, which they refer to as 1DSfM problems. Second, the authors present a simple, principled averaging scheme. The authors demonstrate this new method in the wild on Internet photo collections. 

In the future the authors hope to explore further ways of aggregating 1DSfM subproblems than simple summation, which could shed light on more complicated outliers, such as those arising from ambiguous scene structures. 

The authors believe a strength of their method is its simplicity—it relies on a well-studied combinatorial optimization problem, and a simple non-linear solver. 

Camera-point constraints can be important for achieving full scene coverage, and for avoiding degeneracies arising from collinear motion, an issue discussed in [14]. 

there has been significant recent interest in revisiting global methods because of their potential for improved speed and decreased dependence on local decisions or image ordering. 

Their results will show that a Huber loss can improve solution quality while retaining good convergence, and that the benefit is largely orthogonal to 1DSfM. 

Because ground truth positions are usually unavailable for large-scale SfM problems, the authors show their method gives similar results to a sequential SfM system based on Bundler [20], but in much less time. 

In their case, within an continuous optimization framework, the authors have found that the choice of robust function is very important—Cauchy1 Formally, Eq. 5 is undefined if ever tj = ti for any edge. 

To compute global rotations, the authors run Chatterjee and Govindu’s rotations averaging method [5], with the parameters suggested in their paper. 

Jiang et al. discuss the importance of geometric vs. algebraic cost functions, as they minimize a value that has physical significance. 

At their threshold of τ = 0.1, the authors found that 1DSfM classified edges with a precision of 0.96 and an recall of 0.92 (averaged across the four datasets). 

Only the x component of each translations measurement is nowrelevant to the problem: t̂ij · 〈1, 0, 0〉 = xij , and the authors need to assign an x-coordinate to each vertex.