scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

PL-SLAM: Real-time monocular visual SLAM with points and lines

TL;DR: This paper builds upon ORB-SLAM, presumably the current state-of-the-art solution both in terms of accuracy as efficiency, and extends its formulation to simultaneously handle both point and line correspondences, and demonstrates that the use of lines does not only improve the performance of the original ORB -SLAM solution in poorly textured frames, but also systematically improves it in sequence frames combining points and lines, without compromising the efficiency.
Abstract: Low textured scenes are well known to be one of the main Achilles heels of geometric computer vision algorithms relying on point correspondences, and in particular for visual SLAM. Yet, there are many environments in which, despite being low textured, one can still reliably estimate line-based geometric primitives, for instance in city and indoor scenes, or in the so-called “Manhattan worlds”, where structured edges are predominant. In this paper we propose a solution to handle these situations. Specifically, we build upon ORB-SLAM, presumably the current state-of-the-art solution both in terms of accuracy as efficiency, and extend its formulation to simultaneously handle both point and line correspondences. We propose a solution that can even work when most of the points are vanished out from the input images, and, interestingly it can be initialized from solely the detection of line correspondences in three consecutive frames. We thoroughly evaluate our approach and the new initialization strategy on the TUM RGB-D benchmark and demonstrate that the use of lines does not only improve the performance of the original ORB-SLAM solution in poorly textured frames, but also systematically improves it in sequence frames combining points and lines, without compromising the efficiency.

Summary (1 min read)

Introduction

  • At the beginning of the 1960s GDP per capita in Spain was less than 60 percent of the average of the countries that now comprise the European Union, but by 1975 the Spanish figure was nearly 80 percent of the average.
  • 1 Within this broad context, regions in Spain exhibited large differences among themselves in GDP per capita.

I. Regional Grants from the European Union and the Central Government of Spain

  • Regions in Spain receive regional grants from two sources, the Spanish central government and the European Union.
  • Three of these objectives are specifically aimed at regions: i) Objective 1, to promote the development and structural adjustment of less developed regions; ii) Objective 2, to help regions affected by industrial decline; and iii) Objective 5b, to help the development of agricultural areas.
  • Table 1 displays these figures on a per capita basis, while Table 2 displays the figures as a percentage of GDP.
  • Four additional regions, C. Valenciana, Navarra, País Vasco, and La Rioja, received relatively small amounts under the two biggest structural funds, FEDER and FCI.

II. Economic Performance of the Regions of Spain 1964-1994

  • The seventeen regions of Spain, the so called Comunidades Autónomas, are quite different in their economic and social characteristics.
  • Baleares, Madrid and Cataluña were the richest regions at the end of the sample, and they were among the top four at all times.
  • Table 4 presents average annual growth rates of real GDP per capita.
  • Beginning in mid-1970 the growth rate was slow for more than a decade until the big expansion of the second part of the 1980s when the growth rate was around four percent before declining sharply in the most recent recession.
  • As can be seen from table 8, public capital has increased significantly over the period analyzed; it was barely 20 percent of GDP in 1964 and by 1991 it was nearly 40 percent.

III. Analysis of the Effect of Regional Grants on Regional Economic Performance

  • To assess the impact of the grants on regional economic development, the authors compare the economic performance of two groups of regions before and after the grant policy intervention.
  • The earlier time period ends in the year before the imposition of the FCI grant in Spain, while the latter period is as far into the period of both Spanish and European Union grant intervention as the data permit.
  • The authors choose roughly comparable lengths for the two time periods.
  • Thus, without attributing causality, there appears to be a correlation between the imposition and receipt of the grants, and an improvement in growth rates of real GDP.

Did you find this useful? Give us your feedback

Figures (4)

Content maybe subject to copyright    Report

PL-SLAM: Real-Time Monocular Visual SLAM with Points and Lines
Albert Pumarola
1
Alexander Vakhitov
2
Antonio Agudo
1
Alberto Sanfeliu
1
Francesc Moreno-Noguer
1
Abstract Low textured scenes are well known to be one of
the main Achilles heels of geometric computer vision algorithms
relying on point correspondences, and in particular for visual
SLAM. Yet, there are many environments in which, despite
being low textured, one can still reliably estimate line-based
geometric primitives, for instance in city and indoor scenes,
or in the so-called “Manhattan worlds”, where structured
edges are predominant. In this paper we propose a solution
to handle these situations. Specifically, we build upon ORB-
SLAM, presumably the current state-of-the-art solution both
in terms of accuracy as efficiency, and extend its formulation
to simultaneously handle both point and line correspondences.
We propose a solution that can even work when most of
the points are vanished out from the input images, and,
interestingly it can be initialized from solely the detection of line
correspondences in three consecutive frames. We thoroughly
evaluate our approach and the new initialization strategy
on the TUM RGB-D benchmark and demonstrate that the
use of lines does not only improve the performance of the
original ORB-SLAM solution in poorly textured frames, but
also systematically improves it in sequence frames combining
points and lines, without compromising the efficiency.
I. INTRODUCTION
The last years have witnessed a surge in autonomous
cars and aerial vehicles able to navigate for hundreds of
miles without human intervention [10], [16], [32]. Among
other technologies, at the core of these systems lie sophisti-
cated Simultaneous Localization And Mapping (SLAM) al-
gorithms, which have proven effective to accurately estimate
trajectories while geometrically reconstructing the unknown
environment.
Since the groundbreaking Parallel Tracking And Mapping
(PTAM) [13] algorithm was introduced by Klein and Murray
in 2007, many other real-time visual SLAM approaches
have been proposed, including the feature point-based ORB-
SLAM [18], and the direct-based methods LSD-SLAM [7]
and RGBD-SLAM [6] that optimize directly over image
pixels. Among them, the ORB-SLAM [18] seems to be
the current state-of-the-art, yielding better accuracy than the
direct methods counterparts.
While the performance of ORB-SLAM [18] in well tex-
tured sequences is impressive, it is prone to fail when dealing
with poorly textured videos or when feature points are
temporary vanished out due to, e.g., motion blur. This kind
of situations are often encountered in man-made scenarios.
However, despite the lack of reliable feature points, these
environments may still contain a number of lines that can be
used in a similar way.
1
A.Pumarola, A.Agudo, A.Sanfeliu and F.Moreno-Noguer are with the
Institut de Rob
`
otica i Inform
`
atica Industrial (UPC-CSIC), Barcelona, Spain
2
A.Vakhitov is with Skolkovo Institute of Science and Technology,
Moscow, Russia.
2 1 0 1 2 3 4
x [m]
3
2
1
0
1
2
y [m]
Groun d Tru t h
Point based ORB-SLAM
Point an d Lin e based PL-SLAM
3 4
0.02
0.04
0.06
0.08
0.10
Error
2 1 0 1
x [m]
3
2
1
0
y [m ]
2 1 0 1 2
3
2
1
0
1
2
y [m ]
Gro
2 1 0 1 2
x [ m ]
3
2
1
0
1
y [m ]
2 1 0 1 2 3 4
x [m ]
3
2
1
0
1
2
y [m ]
Ground Tru th
0. 02
0. 04
0. 06
0. 08
0. 10
0. 12
0. 14
0. 16
Erro r
2 1 0 1 2 3 4
x [m]
3
2
1
0
1
2
y [m]
Groun d Trut h
Point based ORB-SLAM
Point an d Lin e based PL-SLAM
Fig. 1. ORB-SLAM [18] vs PL-SLAM. Top: The proposed PL-SLAM
allows to simultaneously handle point and line features. This is specially
advantageous in situations with small number of points such as that shown
in the second image. Bottom-Left: Comparison of the trajectories obtained
using the state-of-the-art point-based method ORB-SLAM [18] and our PL-
SLAM, in a TUM RGB-D sequence. The black dotted line shows the ground
truth, the blue dashed line is the trajectory obtained with ORB-SLAM [18],
and the green solid line is the trajectory obtained with PL-SLAM. Bottom-
Right: Close-up of part of the map color-coded with the amount of error.
Red corresponds to higher error levels, and green to lower ones. Note
how the use of lines consistently improves the accuracy of the estimated
trajectory.
Exploiting lines, though, is not a trivial task. First, ex-
isting line detectors and parameterizations are not as well-
established in the literature as feature point ones. And
secondly, the algorithms to compute pose from line corre-
spondences are less reliable than those based on points and
are very sensitive to the partial occlusions that lines may
undergo. These reasons made that current SLAM approaches
making use of lines rely on range cameras or laser scan-
ners [2], [12], [20], [25].
In this work, we tackle all these issues using a purely
visual-based approach. Building upon the ORB-SLAM [18]
framework, we propose PL-SLAM (Point and Line SLAM),
a solution that can simultaneously leverage points and lines
information. As recently suggested by [30], lines are parame-
terized by their endpoints, whose exact location in the image
plane is estimated following a two-step optimization process.
This representation, besides yielding robustness to occlusions
and mis-detections, allows integrating the line representation
within the SLAM machinery as if they were points and
hence re-use most of the ORB-SLAM [18] architecture. The
resulting approach is shown to be very accurate in poorly
textured environments, and also, improves the performance
of the original ORB-SLAM [18] in highly textured sequences
(see Fig. 1).

An additional contribution of this paper is that we also
propose a new initialization approach that allows estimating
an approximate initial maps from only line correspondences
between three consecutive images. Previous solutions were
based on homography [8] or essential matrix estimation [29],
and required point correspondences. To the best of our
knowledge, there are no equivalent techniques based on lines.
The solution we propose holds on the assumption of constant
rotation between three consecutive frames and that these
rotations are relatively small. In the experimental section,
we will show that despite these approximations, the initial
map we estimate highly resembles those obtained by point-
based solutions, and therefore, are a very good alternative to
use when feature points are not available.
II. RELATED WORK
Building the 3D rigid structure of unknown environment
while recovering the camera trajectory from a monocular
image sequence has been an extremely important research
area in robotics and computer vision for decades, with
many real applications in autonomous robot navigation and
augmented reality. This problem is known as SLAM, and its
core is roughly the same compared to structure-from-motion
algorithms.
Early filtering approaches applied the Extended Kalman
Filter (EKF) [5] to process every frame in the video for
small maps, providing the first real-time solutions. Subse-
quent works based on Bundle Adjustment (BA) handled
denser maps just using key-frames to estimate the map [13],
[17], obtaining more accurate solutions [27] than filtering
techniques. Most approaches rely on PTAM algorithm [13],
that represented a breakthrough in visual-based SLAM. This
method approximately decouples localization and mapping
in two threads that run in parallel, relying on FAST corners
points [23]. In [14] the accuracy was improved with edge
features together with a rotation estimation step during
tracking that provided better relocalization results, and even
reducing the computational cost [24]. More recently, the
ORB-SLAM system has been proposed in [18], providing a
more robust camera tracking and mapping estimator. A multi-
threaded CPU approach was presented in [7] to estimate real-
time dense structure estimation.
However, all previous feature-based methods fail in en-
vironments with poor texture or situations with defocus
and motion blur. To solve this, dense and direct meth-
ods can be applied, even though they are likely to be
computationally expensive [19], [21], and require dedicated
GPU-implementations to achieve real-time performance.
Other semi-direct methods such as [9] overcome the high-
computation requirement of dense methods by exploiting
only pixels with strong gradients, providing an intermedi-
ate level of accuracy, density and complexity. Scene prior
information have been also exploited to provide a significant
boost to SLAM systems [3], [4].
Motivated by the need for efficient and accurate scene
representations even for poorly textured environments, in
Fig. 2. PL-SLAM pipeline, an extension of the ORB-SLAM [18] pipeline.
The system is composed by three main threads: Tracking, Local Mapping
and Loop Closing. The Tracking thread estimates the camera position and
decides when to add new keyframes. Then, Local Mapping adds the new
keyframe information into the map and optimizes it with BA. The Loop
Closing thread is constantly checking for loops and correcting them.
tasks such as visual inspection from aerial vehicles or hand-
held devices (i.e., with limited computational resources), we
here propose a novel visual-based SLAM system that can
combine points and lines information in a unified framework
while keeping the computational cost. Note that several
parametrizations to combine points and lines were used
in EKF-SLAM [26]. However, as we said above, filtering-
based approaches have been outperformed by optimization-
based approaches in rigid SLAM, as we do in this work.
We validate our method on a wide variety of scenarios,
outperforming state-of-the-art solutions for highly textured
sequences and showing very accurate solutions in low-
textured scenarios where standard feature-based methods fail.
III. SYSTEM OVERVIEW
The pipeline of our approach highly resembles that of the
ORB-SLAM [18], in which we have integrated the informa-
tion provided by line features (see Fig. 2). We next briefly
review the main building blocks in which line operations
are performed. For a description of the operations involving
point features, the reader is referred to [18].
One of the main issues to address in SLAM algorithms is
the computational complexity. In order to preserve the real-
time characteristics of ORB-SLAM [18], we have carefully
chosen, used and implemented fast methods for operating
with lines in all stages of the pipeline: detection, trian-
gulation, matching, culling, relocalization and optimization.
Line segments in an input frame are detected by mean of
LSD [31], an O(n) line segment detector, where n is the
number of pixels in the image. Then, lines are pairwise
matched with lines already present in the map using a
relational graph strategy [33]. This approach relies on lines’
local appearance (Line Band Descriptors) and geometric

constraints and is shown to be quite robust against image
artifacts while preserving the computational efficiency.
As it is done with point features, after having obtained
an initial set of map-to-image line feature pairs, all lines
of the local map are projected onto the image to find
further correspondences. Then, if the image contains suf-
ficient new information about the environment, it is flagged
as a keyframe and its corresponding lines are triangulated
and added to the map. To discard possible outliers, lines
seen from less than three viewpoints or in less than 25%
of the frames from which they were expected to be seen
are discarded too (culling). Line positions in the map are
optimized with a local BA. Note in Fig. 2 that we do not use
lines for loop closing. Matching lines across the whole map
is too computationally expensive. Hence, only point features
are used for loop detection.
IV. LINE-BASED SLAM
We next describe the line parameterization and error
function we use and how this is integrated within the
main building blocks of the SLAM pipeline, namely bundle
adjustment, global relocalization and feature matching.
A. Line-based Reprojection Error
In order to extend the ORB-SLAM [18] to lines, we
need a proper definition of the reprojection error and line
parameterization.
Following [30], let P, Q R
3
be the 3D endpoints of a
line, p
d
, q
d
R
2
their 2D detections in the image plane, and
p
h
d
, q
h
d
R
3
theirs corresponding homogeneous coordinates.
From the latter we can obtain the normalized line coefficients
as:
l =
p
h
d
× q
h
d
p
h
d
× q
h
d
. (1)
The line reprojection error E
line
is then defined as the
sum of point-to-line distances E
pl
between the projected line
segment endpoints, and the detected line in the image plane
(see Fig. 3-right). That is:
E
line
(P, Q, l, θ , K) = E
2
pl
(P, l, θ , K) + E
2
pl
(Q, l, θ , K), (2)
with:
E
pl
(P, l, θ , K) = l
>
π(P, θ , K), (3)
where l are the detected line coefficients, π(P, θ, K) rep-
resents the projection of the endpoint P onto the image
plane, given the internal camera calibration matrix K, and
the camera parameters θ = {R, t} that includes the rotation
and translation parameters, respectively.
Note that in practice, due to real conditions such as line
occlusions or mis-detections, the image detected endpoints
p
d
and q
d
will not match the projections of the endpoints
P and Q (see Fig. 3-left). Therefore, we define the detected
line reprojection error as:
E
line,d
(p
d
, q
d
, l) = E
2
pl,d
(p
d
, l) + E
2
pl,d
(q
d
, l), (4)
where l is the projected 3D line coefficients and the detected
point-to-line error is E
pl,d
(p
d
, l) = l
>
p
d
.
Fig. 3. Left: Notation. Let P, Q R
3
be the 3D endpoints of a 3D line,
e
p,
e
q R
2
their projected 2D endpoints to the image plane and
e
l the projected
line coefficients. p
d
, q
d
R
2
the 2D endpoints of a detected line, P
d
, Q
d
R
3
their real 3D endpoints, and l the detected line coefficients. X R
3
is a
3D point and
e
x R
2
its corresponding 2D projection. Right: Line-based
reprojection error. d
1
and d
2
represent the line reprojection error, and d
0
1
and d
0
2
the detected line reprojection error between a detected 2D line (blue
solid) and the corresponding projected 3D line (green dashed).
Based on the methodology proposed in [30], a recursion
over the detected reprojection line error will be applied in
order to optimize the pose parameters θ while approximating
E
line,d
to the line error E
line
defined on Eq. (2).
B. Bundle Adjustment with Points and Lines
The camera pose parameters θ = {R, t} are optimized at
each frame with a BA strategy that constrains θ to lie in the
SE(3) group. For doing this, we build upon the framework of
the ORB-SLAM [18] but besides feature point observations,
we include the lines as defined in the previous subsection.
We next define the specific cost function we propose to
be optimized by the BA that combines the two types of
geometric entities.
Let X
j
R
3
be the generic j-th point of the map. For
the i-th keyframe, this point can be projected onto the image
plane as:
e
x
i, j
= π(X
j
, θ
i
, K), (5)
where θ
i
= {R
i
, t
i
} denotes the specific pose of the i-th
keyframe. Given an observation x
i, j
of this point, we define
following 3D error:
e
i, j
= x
i, j
e
x
i, j
. (6)
Similarly, let us denote by P
j
and Q
j
the endpoints
of the j-th map line segment. The corresponding image
projections (expressed in homogeneous coordinates) onto the
same keyframe can be written as:
˜
p
h
i, j
= π(P
j
, θ
i
, K), (7)
˜
q
h
i, j
= π(Q
j
, θ
i
, K) . (8)
Then, given the image observations p
i, j
and q
i, j
of the j-th
line endpoints, we use Eq. (1) to estimate the coefficients of
the observed line
˜
l
i, j
. We define the following error vectors
for the line:
e
0
i, j
= (
˜
l
i, j
)
>
(K
1
p
h
i, j
), (9)
e
00
i, j
= (
˜
l
i, j
)
>
(K
1
q
h
i, j
). (10)

Fig. 4. Estimating camera rotation from line correspondences. P, Q R
3
are the 3D line endpoints, l
i
, i = {1, 2, 3} its detections in three consecutive
frames with endpoints p
i
, q
i
, and coefficients l
i
.
The errors (9, 10) are in fact instances of the point-to-line
error (3). As explained in [30] they are not constant w.r.t.
shift of the endpoints P
j
, Q
j
along the corresponding 3D line,
which serves as implicit regularization allowing us to use
such a non-minimal line parametrization in the BA.
Observe that representing lines using their endpoints we
obtain comparable error representations for points and lines.
We can therefore build a unified cost function that integrates
each of the error terms as:
C =
i, j
ρ
e
>
i, j
1
i, j
e
i, j
+ e
0
i, j
>
0
i, j
1
e
0
i, j
+ e
00
i, j
>
00
i, j
1
e
00
i, j
where ρ is the Huber robust cost function and
i, j
,
0
i, j
,
00
i, j
are the covariance matrices associated to the scale at which
the keypoints and line endpoints were detected, respectively.
C. Global Relocalization
An important component of any SLAM method, is an
approach to relocalize the camera when the tracker is lost.
This is typically achieved by means of a PnP algorithm,
that estimates the pose of the current (lost) frame given
correspondences with 3D map points appearing in previous
keyframes. On top of the PnP method, a RANSAC strategy
is used to reject outliers correspondences.
In the ORB-SLAM [18], the specific PnP method that is
used is the EPnP [1], which however, only accepts point
correspondences as inputs. In order to make our approach ap-
propriate to handle lines for relocalization, we have replaced
the EPnP by the recently published EPnPL [30], which
minimizes the detected line reprojection error of Eq. (4).
Furthermore, EPnPL [30] is robust to partial line occlusion
and mis-detections. This is achieved by means of a two-step
procedure in which first minimizes the reprojection error of
the detected lines and estimates the line endpoints p
d
, q
d
.
These points, are then shifted along the line in order to match
the projections
˜
p
d
,
˜
q
d
of the 3D model endpoints P, Q (see
Fig. 3). Once these matches are established, the camera pose
can be reliably estimated.
V. MAP INITIALIZATION WITH LINES
Another contribution of this paper is an algorithm to esti-
mate an initial map using only line correspondences. Current
optimization-based SLAM approaches are initialized with
maps built from point correspondences between at least two
frames. Homography [8] or essential matrix [29] estimation
algorithms are then used to compute the initial map and pose
parameters. We next describe our line-based solution for map
initialization, which can be a good alternative in low textured
scenes with lack of feature points.
Let us consider the setup of Fig. 4, where a line defined
by endpoints P, Q is projected onto three camera views. Let
{p
1
, q
1
}, {p
2
, q
2
} and {p
3
, q
3
} be the endpoint projections
in each of the views and l
1
, l
2
, l
3
R
3
the corresponding line
coefficients computed from the projected endpoints.
We will make the assumption of small and continuous
rotation between consecutive camera poses, such that the
rotation from the first to the second camera views is the
same than the rotation from the second to the third one
1
.
Under this assumption we can represent the three camera
rotations by R
1
= R
>
, R
2
= I, and R
3
= R, with I being the
3 ×3 identity matrix.
Note that the line coefficients l
i
, i = {1, 2, 3} also represent
the parameters of a vector which is normal to the plane
formed by the center of projection O
i
and the projections
p
i
, q
i
. The cross product of two such vectors l
i
will be parallel
to the line P, Q and at the same time orthogonal to the third
vector, all of them appropriately rotated and put in a common
reference. This constraint can be written as:
l
>
2
(R
>
l
1
) ×(Rl
3
)
= 0. (11)
Additionally, for small rotations we can approximate R as:
R =
1 r
3
r
2
r
3
1 r
1
r
2
r
1
1
. (12)
For this parametrization, having three matched lines, we
will have three quadratic equations like Eq. (11) with three
unknowns, r
1
, r
2
and r
3
. We adapt the polynomial solver
of [15], which yields up to eight solutions. For each possible
rotation matrix we can get t
1
, t
3
by using the trifocal tensor
equations [11] which will be linear in t
1
, t
3
. We assume t
2
=
0. We evaluate the eight possible solutions and keep the one
that minimizes Eq. (11).
It is worth to point that in order to get enough independent
constraints when solving for the translation components
using the trifocal tensor equations, we need two additional
line correspondences, and hence, the total number of line
matches required by our algorithm is ve.
VI. EXPERIMENTAL RESULTS
We have compared our system with the current state-
of-the-art Visual SLAM methods using the TUM RGB-D
benchmark [28]. Also, we evaluate the proposed initialization
approach with synthetic and real data and compare the
computation time of our PL-SLAM algorithm and the ORB-
SLAM [18]. All experiments were carried out with an Intel
1
In the experimental section we will evaluate the consequences of this
assumption, and show that in practice is a good approximation.

TABLE I
LOCALIZATION ACCURACY IN THE TUM RGB-D BENCHMARK [28]
Absolute KeyFrame Trajectory RMSE [cm]
TUM RGB-D
Sequence
PL-SLAM
Classic Init
PL-SLAM
Line Init
ORB-SLAM
PTAM
LSD-SLAM
RGBD-SLAM
f1 xyz 1.21 1.46 1.38 1.15 9.00 1.34
f2 xyz 0.43 1.49 0.54 0.2 2.15 2.61
f1 floor 7.59 9.42 8.71 - 38.07 3.51
f2 360 kidnap 3.92 60.11 4.99 2.63 - 393.3
f3 long office 1.97 5.33 4.05 - 38.53 -
f3 nstr tex far
ambiguity
detected
37.60
ambiguity
detected
34.74 18.31 -
f3 nstr tex near 2.06 1.58 2.88 2.74 7.54 -
f3 str tex far 0.89 1.25 0.98 0.93 7.95 -
f3 str tex near 1.25 7.47 1.5451 1.04 - -
f2 desk person 1.99 6.34 5.95 - 31.73 6.97
f3 sit xyz 0.066 9.03 0.08 0.83 7.73 -
f3 sit halfsph 1.31 9.05 1.48 - 5.87 -
f3 walk xyz 1.54
ambiguity
detected
1.64 - 12.44 -
f3 walk halfsph 1.60
ambiguity
detected
2.09 - - -
Median over 5 executions for each sequence. All trajectories were aligned with
7DoF with the ground truth before computing the ATE error with the script provided
by the benchmark [28]. Both ORB-SLAM and PL-SLAM were executed with the
parametrization of the on-line open source ORB-SLAM package.
Result of PTAM,
LSD-SLAM and RGBD-SLAM were extracted from [18].
TABLE II
TRACKING AND MAPPING TIMES
Mean execution time [ms]
Thread Operation PL-SLAM ORB-SLAM
KeyFrame
Insertion
17.08 9.86
Local
Map Feature
Culling
1.18 1
Mapping
Map Features
Creation
74.64 8.39
Local BA 218.25 118.5
KeyFrame
Culling
12.7 2.86
Total 3Hz 7Hz
Tracking
Features
Extraction
31.32 10.76
Initial Pose
Estimation
7.16 7.16
Track
Local Map
12.58 3.18
Total 20Hz 50Hz
Mean execution time of 5 different se-
quences of the TUM RGB-D bench-
mark [28].
Core i7-4790 (4 cores @3.6 GHz), 8Gb RAM and ROS
Hydro [22]. Due to the randomness of the some stages
of the pipeline, e.g., initialization, position optimization or
global relocalization, all experiments were run five times
and we report the median of all executions. Supplemen-
tary material can be found on website http://www.
albertpumarola.com/research/pl-slam/.
A. Localization Accuracy in the TUM RGB-D Benchmark
To evaluate the localization accuracy we compare our PL-
SLAM method against current state-of-the-art Visual SLAM
methods, including ORB-SLAM [18], PTAM [13], LSD-
SLAM [7] and RGBD-SLAM [6]. The metric used for the
comparison is the Absolute Trajectory Error (ATE), provided
by the evaluation script of the benchmark. Before computing
the error, all trajectories are aligned using a similarity warp
except for the RGBD-SLAM [6] which is aligned by a rigid
body transformation. The results are summarized in Table I.
Note that our PL-SLAM consistently improves the tra-
jectory accuracy of ORB-SLAM [18] in all sequences.
Indeed, it yields the best result in all but two sequences,
for which PTAM [13] performs slightly better. Nevertheless,
PTAM [13] turned not to be so reliable, as in 5 out of
all 12 sequences it lost track. LSD-SLAM [7] and RGBD-
SLAM [6] also lost track in 3 and 7 sequences, respectively.
B. Map Initialization - Synthetic Experiments
In order to evaluate the map initialization algorithm we
describe in Sect. V we perform several synthetic and real
experiments.
In the synthetic tests we first evaluate the stability of
the polynomial solver we built, modifying the toolbox of
Kukelova et al. [15]. Fig. 5-left shows the distribution of
Median Relative Translation Error
0,8
0,9
1
Median Rotation Angle [deg]
2
4
6
8
10
Rotation Angle [deg]
0 5 10 15 20 25 30
Fig. 5. Map Initialization - Synthetic experiments. Left: Numerical stability
of the polynomial system solver. Right: Rotation and translation error w.r.t
frames rotation.
errors in the parameter estimation for ideal solutions. Note
that the average error is around 1e-15, indicating that our
modified solver is very stable.
Additionally, we have assessed the consequences of as-
suming small and constant rotations between three consecu-
tive frames. Fig. 5-right displays the rotation and translation
errors produced for increasing inter-frame rotations. While
the estimated rotation error remains within relatively small
bounds, the translation error is more severely affected by
the small rotation assumption. In any event, when this initial
map is fed into the BA optimizer, the translation error is
drastically reduced.
C. Map Initialization - Real Experiments
We also evaluate our PL-SLAM method using the clas-
sic initialization (based on homography or essential matrix
computation), and with the proposed map initialization based
only on lines (see again Table I). As expected, the accuracy
with the line map initialization drops due to the small rotation

Citations
More filters
Journal ArticleDOI
TL;DR: PL-SLAM is proposed, a stereo visual SLAM system that combines both points and line segments to work robustly in a wider variety of scenarios, particularly in those where point features are scarce or not well-distributed in the image.
Abstract: Traditional approaches to stereo visual simultaneous localization and mapping (SLAM) rely on point features to estimate the camera trajectory and build a map of the environment. In low-textured environments, though, it is often difficult to find a sufficient number of reliable point features and, as a consequence, the performance of such algorithms degrades. This paper proposes PL-SLAM, a stereo visual SLAM system that combines both points and line segments to work robustly in a wider variety of scenarios, particularly in those where point features are scarce or not well-distributed in the image. PL-SLAM leverages both points and line segments at all the instances of the process: visual odometry, keyframe selection, bundle adjustment, etc. We contribute also with a loop-closure procedure through a novel bag-of-words approach that exploits the combined descriptive power of the two kinds of features. Additionally, the resulting map is richer and more diverse in three-dimensional elements, which can be exploited to infer valuable, high-level scene structures, such as planes, empty spaces, ground plane, etc. (not addressed in this paper). Our proposal has been tested with several popular datasets (such as EuRoC or KITTI), and is compared with state-of-the-art methods such as ORB-SLAM2, revealing a more robust performance in most of the experiments while still running in real time. An open-source version of the PL-SLAM C++ code has been released for the benefit of the community.

329 citations


Cites methods from "PL-SLAM: Real-time monocular visual..."

  • ...Finally, by the time of the first submission of this paper, a work with the same name (PL-SLAM, [36]) was published extending the monocular algorithm ORB-SLAM to the case of including line segment features computed through the line segment detector (LSD) detector [37]....

    [...]

Journal ArticleDOI
TL;DR: This article summarizes new aerial robotic manipulation technologies and methods-aerial robotic manipulators with dual arms and multidirectional thrusters-developed in the AEROARMS project for outdoor industrial inspection and maintenance (I&M).
Abstract: This article summarizes new aerial robotic manipulation technologies and methods-aerial robotic manipulators with dual arms and multidirectional thrusters-developed in the AEROARMS project for outdoor industrial inspection and maintenance (IaM).

167 citations

Journal ArticleDOI
Yijia He1, Ji Zhao, Yue Guo1, Wenhao He1, Kui Yuan1 
10 Apr 2018-Sensors
TL;DR: The experiments evaluated on public datasets demonstrate that the PL-VIO method that combines point and line features outperforms several state-of-the-art VIO systems which use point features only.
Abstract: To address the problem of estimating camera trajectory and to build a structural three-dimensional (3D) map based on inertial measurements and visual observations, this paper proposes point-line visual-inertial odometry (PL-VIO), a tightly-coupled monocular visual-inertial odometry system exploiting both point and line features. Compared with point features, lines provide significantly more geometrical structure information on the environment. To obtain both computation simplicity and representational compactness of a 3D spatial line, Plucker coordinates and orthonormal representation for the line are employed. To tightly and efficiently fuse the information from inertial measurement units (IMUs) and visual sensors, we optimize the states by minimizing a cost function which combines the pre-integrated IMU error term together with the point and line re-projection error terms in a sliding window optimization framework. The experiments evaluated on public datasets demonstrate that the PL-VIO method that combines point and line features outperforms several state-of-the-art VIO systems which use point features only.

165 citations


Cites background from "PL-SLAM: Real-time monocular visual..."

  • ...For visual-only SLAM, there are several works combining point and line features to estimate camera motion [28,29]....

    [...]

  • ...Furthermore, we found that these sequences with rapid rotation caused large changes in the viewing direction, and the lighting conditions are especially challenging for tracking point features [25,26,28]....

    [...]

Journal ArticleDOI
TL;DR: This paper reviews the state-of-the-art techniques for the three-dimensional (3D) reconstruction of indoor environments and concludes that most of the existing indoor environment reconstruction methods are based on the strong Manhattan assumption, which may not be true in a real indoor environment, hence limiting the effectiveness and robustness of existing indoor environments reconstruction methods.
Abstract: Indoor environment model reconstruction has emerged as a significant and challenging task in terms of the provision of a semantically rich and geometrically accurate indoor model. Recently, there has been an increasing amount of research related to indoor environment reconstruction. Therefore, this paper reviews the state-of-the-art techniques for the three-dimensional (3D) reconstruction of indoor environments. First, some of the available benchmark datasets for 3D reconstruction of indoor environments are described and discussed. Then, data collection of 3D indoor spaces is briefly summarized. Furthermore, an overview of the geometric, semantic, and topological reconstruction of the indoor environment is presented, where the existing methodologies, advantages, and disadvantages of these three reconstruction types are analyzed and summarized. Finally, future research directions, including technique challenges and trends, are discussed for the purpose of promoting future research interest. It can be concluded that most of the existing indoor environment reconstruction methods are based on the strong Manhattan assumption, which may not be true in a real indoor environment, hence limiting the effectiveness and robustness of existing indoor environment reconstruction methods. Moreover, based on the hierarchical pyramid structures and the learnable parameters of deep-learning architectures, multi-task collaborative schemes to share parameters and to jointly optimize each other using redundant and complementary information from different perspectives show their potential for the 3D reconstruction of indoor environments. Furthermore, indoor–outdoor space seamless integration to achieve a full representation of both interior and exterior buildings is also heavily in demand.

88 citations


Cites methods from "PL-SLAM: Real-time monocular visual..."

  • ...Based on the ORB-SLAM system, Pumarola [63] incorporated line features for designing a monocular point and line SLAM that builds a tracking model....

    [...]

  • ...on the ORB-SLAM system, Pumarola [63] incorporated line features for designing a monocular point and line SLAM that builds a tracking model....

    [...]

  • ...Based on the ORB-SLAM system, Pumarola [63] incorporated line features for designing a monocular point and line SLAM that builds a tracking model....

    [...]

Journal ArticleDOI
Danping Zou1, Yuanxin Wu1, Ling Pei1, Haibin Ling, Wenxian Yu1 
TL;DR: Zhang et al. as discussed by the authors proposed a novel visual-inertial odometry (VIO) approach that adopts structural regularity in man-made environments, instead of using Manhattan world assumption, they use Atlanta world model to describe such regularity.
Abstract: In this paper, we propose a novel visual-inertial odometry (VIO) approach that adopts structural regularity in man-made environments. Instead of using Manhattan world assumption, we use Atlanta world model to describe such regularity. An Atlanta world is a world that contains multiple local Manhattan worlds with different heading directions. Each local Manhattan world is detected on the fly, and their headings are gradually refined by the state estimator when new observations are received. With full exploration of structural lines that aligned with each local Manhattan worlds, our VIO method becomes more accurate and robust, as well as more flexible to different kinds of complex man-made environments. Through benchmark tests and real-world tests, the results show that the proposed approach outperforms existing visual-inertial systems in large-scale man-made environments.

68 citations

References
More filters
Book
01 Jan 2000
TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.
Abstract: From the Publisher: A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. The book covers the geometric principles and how to represent objects algebraically so they can be computed and applied. The authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly.

15,558 citations

01 Jan 2001
TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.
Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

14,282 citations


"PL-SLAM: Real-time monocular visual..." refers methods in this paper

  • ...For each possible rotation matrix we can get t1, t3 by using the trifocal tensor equations [11] which will be linear in t1, t3....

    [...]

Proceedings ArticleDOI
16 Jun 2012
TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.
Abstract: Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection. Our recording platform is equipped with four high resolution video cameras, a Velodyne laser scanner and a state-of-the-art localization system. Our benchmarks comprise 389 stereo and optical flow image pairs, stereo visual odometry sequences of 39.2 km length, and more than 200k 3D object annotations captured in cluttered scenarios (up to 15 cars and 30 pedestrians are visible per image). Results from state-of-the-art algorithms reveal that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world. Our goal is to reduce this bias by providing challenging benchmarks with novel difficulties to the computer vision community. Our benchmarks are available online at: www.cvlibs.net/datasets/kitti

11,283 citations


"PL-SLAM: Real-time monocular visual..." refers background in this paper

  • ...The last years have witnessed a surge in autonomous cars and aerial vehicles able to navigate for hundreds of miles without human intervention [10], [16], [32]....

    [...]

Proceedings Article
01 Jan 2009
TL;DR: This paper discusses how ROS relates to existing robot software frameworks, and briefly overview some of the available application software which uses ROS.
Abstract: This paper gives an overview of ROS, an opensource robot operating system. ROS is not an operating system in the traditional sense of process management and scheduling; rather, it provides a structured communications layer above the host operating systems of a heterogenous compute cluster. In this paper, we discuss how ROS relates to existing robot software frameworks, and briefly overview some of the available application software which uses ROS.

8,387 citations

Journal ArticleDOI
TL;DR: ORB-SLAM as discussed by the authors is a feature-based monocular SLAM system that operates in real time, in small and large indoor and outdoor environments, with a survival of the fittest strategy that selects the points and keyframes of the reconstruction.
Abstract: This paper presents ORB-SLAM, a feature-based monocular simultaneous localization and mapping (SLAM) system that operates in real time, in small and large indoor and outdoor environments. The system is robust to severe motion clutter, allows wide baseline loop closing and relocalization, and includes full automatic initialization. Building on excellent algorithms of recent years, we designed from scratch a novel system that uses the same features for all SLAM tasks: tracking, mapping, relocalization, and loop closing. A survival of the fittest strategy that selects the points and keyframes of the reconstruction leads to excellent robustness and generates a compact and trackable map that only grows if the scene content changes, allowing lifelong operation. We present an exhaustive evaluation in 27 sequences from the most popular datasets. ORB-SLAM achieves unprecedented performance with respect to other state-of-the-art monocular SLAM approaches. For the benefit of the community, we make the source code public.

4,522 citations

Frequently Asked Questions (14)
Q1. What contributions have the authors mentioned in the paper "Pl-slam: real-time monocular visual slam with points and lines" ?

In this paper the authors propose a solution to handle these situations. The authors propose a solution that can even work when most of the points are vanished out from the input images, and, interestingly it can be initialized from solely the detection of line correspondences in three consecutive frames. The authors thoroughly evaluate their approach and the new initialization strategy on the TUM RGB-D benchmark and demonstrate that the use of lines does not only improve the performance of the original ORB-SLAM solution in poorly textured frames, but also systematically improves it in sequence frames combining points and lines, without compromising the efficiency. 

In future work, the authors plan to further exploit line features and incorporate other geometric primitives like planes, which can be built from lines in a similar manner as they have built lines from point features. 

In order to preserve the realtime characteristics of ORB-SLAM [18], the authors have carefully chosen, used and implemented fast methods for operating with lines in all stages of the pipeline: detection, triangulation, matching, culling, relocalization and optimization. 

It is worth to point that in order to get enough independent constraints when solving for the translation components using the trifocal tensor equations, the authors need two additional line correspondences, and hence, the total number of line matches required by their algorithm is five. 

Motivated by the need for efficient and accurate scene representations even for poorly textured environments, intasks such as visual inspection from aerial vehicles or handheld devices (i.e., with limited computational resources), the authors here propose a novel visual-based SLAM system that can combine points and lines information in a unified framework while keeping the computational cost. 

Due to the randomness of the some stages of the pipeline, e.g., initialization, position optimization or global relocalization, all experiments were run five times and the authors report the median of all executions. 

Line segments in an input frame are detected by mean of LSD [31], an O(n) line segment detector, where n is the number of pixels in the image. 

The authors next describe the line parameterization and error function the authors use and how this is integrated within the main building blocks of the SLAM pipeline, namely bundle adjustment, global relocalization and feature matching. 

This is achieved by means of a two-step procedure in which first minimizes the reprojection error of the detected lines and estimates the line endpoints pd,qd. 

The authors next describe their line-based solution for map initialization, which can be a good alternative in low textured scenes with lack of feature points. 

(1)The line reprojection error Eline is then defined as the sum of point-to-line distances Epl between the projected line segment endpoints, and the detected line in the image plane (see Fig. 3-right). 

In order to make their approach appropriate to handle lines for relocalization, the authors have replaced the EPnP by the recently published EPnPL [30], which minimizes the detected line reprojection error of Eq. (4). 

To solve this, dense and direct methods can be applied, even though they are likely to be computationally expensive [19], [21], and require dedicated GPU-implementations to achieve real-time performance. 

To evaluate the localization accuracy the authors compare their PLSLAM method against current state-of-the-art Visual SLAM methods, including ORB-SLAM [18], PTAM [13], LSDSLAM [7] and RGBD-SLAM [6].