Proceedings Article•DOI•

PL-SLAM: Real-time monocular visual SLAM with points and lines

Albert Pumarola¹, Alexander Vakhitov², Antonio Agudo¹, Alberto Sanfeliu¹, Francese Moreno-Noguer¹ - Show less +1 more•Institutions (2)

Spanish National Research Council¹, Skolkovo Institute of Science and Technology²

01 May 2017-pp 4503-4508

TL;DR: This paper builds upon ORB-SLAM, presumably the current state-of-the-art solution both in terms of accuracy as efficiency, and extends its formulation to simultaneously handle both point and line correspondences, and demonstrates that the use of lines does not only improve the performance of the original ORB -SLAM solution in poorly textured frames, but also systematically improves it in sequence frames combining points and lines, without compromising the efficiency.

read less

Abstract: Low textured scenes are well known to be one of the main Achilles heels of geometric computer vision algorithms relying on point correspondences, and in particular for visual SLAM. Yet, there are many environments in which, despite being low textured, one can still reliably estimate line-based geometric primitives, for instance in city and indoor scenes, or in the so-called “Manhattan worlds”, where structured edges are predominant. In this paper we propose a solution to handle these situations. Specifically, we build upon ORB-SLAM, presumably the current state-of-the-art solution both in terms of accuracy as efficiency, and extend its formulation to simultaneously handle both point and line correspondences. We propose a solution that can even work when most of the points are vanished out from the input images, and, interestingly it can be initialized from solely the detection of line correspondences in three consecutive frames. We thoroughly evaluate our approach and the new initialization strategy on the TUM RGB-D benchmark and demonstrate that the use of lines does not only improve the performance of the original ORB-SLAM solution in poorly textured frames, but also systematically improves it in sequence frames combining points and lines, without compromising the efficiency.

...read moreread less

Summary (1 min read)

Jump to: [Introduction] – [I. Regional Grants from the European Union and the Central Government of Spain] – [II. Economic Performance of the Regions of Spain 1964-1994] and [III. Analysis of the Effect of Regional Grants on Regional Economic Performance]

Introduction

At the beginning of the 1960s GDP per capita in Spain was less than 60 percent of the average of the countries that now comprise the European Union, but by 1975 the Spanish figure was nearly 80 percent of the average.
1 Within this broad context, regions in Spain exhibited large differences among themselves in GDP per capita.

I. Regional Grants from the European Union and the Central Government of Spain

Regions in Spain receive regional grants from two sources, the Spanish central government and the European Union.
Three of these objectives are specifically aimed at regions: i) Objective 1, to promote the development and structural adjustment of less developed regions; ii) Objective 2, to help regions affected by industrial decline; and iii) Objective 5b, to help the development of agricultural areas.
Table 1 displays these figures on a per capita basis, while Table 2 displays the figures as a percentage of GDP.
Four additional regions, C. Valenciana, Navarra, País Vasco, and La Rioja, received relatively small amounts under the two biggest structural funds, FEDER and FCI.

II. Economic Performance of the Regions of Spain 1964-1994

The seventeen regions of Spain, the so called Comunidades Autónomas, are quite different in their economic and social characteristics.
Baleares, Madrid and Cataluña were the richest regions at the end of the sample, and they were among the top four at all times.
Table 4 presents average annual growth rates of real GDP per capita.
Beginning in mid-1970 the growth rate was slow for more than a decade until the big expansion of the second part of the 1980s when the growth rate was around four percent before declining sharply in the most recent recession.
As can be seen from table 8, public capital has increased significantly over the period analyzed; it was barely 20 percent of GDP in 1964 and by 1991 it was nearly 40 percent.

III. Analysis of the Effect of Regional Grants on Regional Economic Performance

To assess the impact of the grants on regional economic development, the authors compare the economic performance of two groups of regions before and after the grant policy intervention.
The earlier time period ends in the year before the imposition of the FCI grant in Spain, while the latter period is as far into the period of both Spanish and European Union grant intervention as the data permit.
The authors choose roughly comparable lengths for the two time periods.
Thus, without attributing causality, there appears to be a correlation between the imposition and receipt of the grants, and an improvement in growth rates of real GDP.

Did you find this useful? Give us your feedback

Figures (4)

Fig. 1. ORB-SLAM [18] vs PL-SLAM. Top: The proposed PL-SLAM allows to simultaneously handle point and line features. This is specially advantageous in situations with small number of points such as that shown in the second image. Bottom-Left: Comparison of the trajectories obtained using the state-of-the-art point-based method ORB-SLAM [18] and our PLSLAM, in a TUM RGB-D sequence. The black dotted line shows the ground truth, the blue dashed line is the trajectory obtained with ORB-SLAM [18], and the green solid line is the trajectory obtained with PL-SLAM. BottomRight: Close-up of part of the map color-coded with the amount of error. Red corresponds to higher error levels, and green to lower ones. Note how the use of lines consistently improves the accuracy of the estimated trajectory.

Fig. 2. PL-SLAM pipeline, an extension of the ORB-SLAM [18] pipeline. The system is composed by three main threads: Tracking, Local Mapping and Loop Closing. The Tracking thread estimates the camera position and decides when to add new keyframes. Then, Local Mapping adds the new keyframe information into the map and optimizes it with BA. The Loop Closing thread is constantly checking for loops and correcting them.

Fig. 3. Left: Notation. Let P,Q ∈ R3 be the 3D endpoints of a 3D line, p̃, q̃∈R2 their projected 2D endpoints to the image plane and l̃ the projected line coefficients. pd,qd ∈R2 the 2D endpoints of a detected line, Pd,Qd ∈R3 their real 3D endpoints, and l the detected line coefficients. X ∈ R3 is a 3D point and x̃ ∈ R2 its corresponding 2D projection. Right: Line-based reprojection error. d1 and d2 represent the line reprojection error, and d′1 and d′2 the detected line reprojection error between a detected 2D line (blue solid) and the corresponding projected 3D line (green dashed).

Fig. 4. Estimating camera rotation from line correspondences. P,Q ∈ R3 are the 3D line endpoints, li, i = {1,2,3} its detections in three consecutive frames with endpoints pi,qi, and coefficients li.

Content maybe subject to copyright Report

PL-SLAM: Real-Time Monocular Visual SLAM with Points and Lines

Albert Pumarola

Alexander Vakhitov

Antonio Agudo

Alberto Sanfeliu

Francesc Moreno-Noguer

Abstract— Low textured scenes are well known to be one of

the main Achilles heels of geometric computer vision algorithms

relying on point correspondences, and in particular for visual

SLAM. Yet, there are many environments in which, despite

being low textured, one can still reliably estimate line-based

geometric primitives, for instance in city and indoor scenes,

or in the so-called “Manhattan worlds”, where structured

edges are predominant. In this paper we propose a solution

to handle these situations. Speciﬁcally, we build upon ORB-

SLAM, presumably the current state-of-the-art solution both

in terms of accuracy as efﬁciency, and extend its formulation

to simultaneously handle both point and line correspondences.

We propose a solution that can even work when most of

the points are vanished out from the input images, and,

interestingly it can be initialized from solely the detection of line

correspondences in three consecutive frames. We thoroughly

evaluate our approach and the new initialization strategy

on the TUM RGB-D benchmark and demonstrate that the

use of lines does not only improve the performance of the

original ORB-SLAM solution in poorly textured frames, but

also systematically improves it in sequence frames combining

points and lines, without compromising the efﬁciency.

I. INTRODUCTION

The last years have witnessed a surge in autonomous

cars and aerial vehicles able to navigate for hundreds of

miles without human intervention [10], [16], [32]. Among

other technologies, at the core of these systems lie sophisti-

cated Simultaneous Localization And Mapping (SLAM) al-

gorithms, which have proven effective to accurately estimate

trajectories while geometrically reconstructing the unknown

environment.

Since the groundbreaking Parallel Tracking And Mapping

(PTAM) [13] algorithm was introduced by Klein and Murray

in 2007, many other real-time visual SLAM approaches

have been proposed, including the feature point-based ORB-

SLAM [18], and the direct-based methods LSD-SLAM [7]

and RGBD-SLAM [6] that optimize directly over image

pixels. Among them, the ORB-SLAM [18] seems to be

the current state-of-the-art, yielding better accuracy than the

direct methods counterparts.

While the performance of ORB-SLAM [18] in well tex-

tured sequences is impressive, it is prone to fail when dealing

with poorly textured videos or when feature points are

temporary vanished out due to, e.g., motion blur. This kind

of situations are often encountered in man-made scenarios.

However, despite the lack of reliable feature points, these

environments may still contain a number of lines that can be

used in a similar way.

A.Pumarola, A.Agudo, A.Sanfeliu and F.Moreno-Noguer are with the

Institut de Rob

otica i Inform

atica Industrial (UPC-CSIC), Barcelona, Spain

A.Vakhitov is with Skolkovo Institute of Science and Technology,

Moscow, Russia.

− 2 − 1 0 1 2 3 4

x [m]

− 3

− 2

− 1

y [m]

Groun d Tru t h

Point based ORB-SLAM

Point an d Lin e based PL-SLAM

3 4

0.02

0.04

0.06

0.08

0.10

Error

− 2 − 1 0 1

x [m]

− 3

− 2

− 1

y [m ]

2 1 0 1 2

− 3

− 2

− 1

y [m ]

Gro

− 2 − 1 0 1 2

x [ m ]

− 3

− 2

− 1

y [m ]

− 2 − 1 0 1 2 3 4

x [m ]

− 3

− 2

− 1

y [m ]

Ground Tru th

0. 02

0. 04

0. 06

0. 08

0. 10

0. 12

0. 14

0. 16

Erro r

− 2 − 1 0 1 2 3 4

x [m]

− 3

− 2

− 1

y [m]

Groun d Trut h

Point based ORB-SLAM

Point an d Lin e based PL-SLAM

Fig. 1. ORB-SLAM [18] vs PL-SLAM. Top: The proposed PL-SLAM

allows to simultaneously handle point and line features. This is specially

advantageous in situations with small number of points such as that shown

in the second image. Bottom-Left: Comparison of the trajectories obtained

using the state-of-the-art point-based method ORB-SLAM [18] and our PL-

SLAM, in a TUM RGB-D sequence. The black dotted line shows the ground

truth, the blue dashed line is the trajectory obtained with ORB-SLAM [18],

and the green solid line is the trajectory obtained with PL-SLAM. Bottom-

Right: Close-up of part of the map color-coded with the amount of error.

Red corresponds to higher error levels, and green to lower ones. Note

how the use of lines consistently improves the accuracy of the estimated

trajectory.

Exploiting lines, though, is not a trivial task. First, ex-

isting line detectors and parameterizations are not as well-

established in the literature as feature point ones. And

secondly, the algorithms to compute pose from line corre-

spondences are less reliable than those based on points and

are very sensitive to the partial occlusions that lines may

undergo. These reasons made that current SLAM approaches

making use of lines rely on range cameras or laser scan-

ners [2], [12], [20], [25].

In this work, we tackle all these issues using a purely

visual-based approach. Building upon the ORB-SLAM [18]

framework, we propose PL-SLAM (Point and Line SLAM),

a solution that can simultaneously leverage points and lines

information. As recently suggested by [30], lines are parame-

terized by their endpoints, whose exact location in the image

plane is estimated following a two-step optimization process.

This representation, besides yielding robustness to occlusions

and mis-detections, allows integrating the line representation

within the SLAM machinery as if they were points and

hence re-use most of the ORB-SLAM [18] architecture. The

resulting approach is shown to be very accurate in poorly

textured environments, and also, improves the performance

of the original ORB-SLAM [18] in highly textured sequences

(see Fig. 1).

An additional contribution of this paper is that we also

propose a new initialization approach that allows estimating

an approximate initial maps from only line correspondences

between three consecutive images. Previous solutions were

based on homography [8] or essential matrix estimation [29],

and required point correspondences. To the best of our

knowledge, there are no equivalent techniques based on lines.

The solution we propose holds on the assumption of constant

rotation between three consecutive frames and that these

rotations are relatively small. In the experimental section,

we will show that despite these approximations, the initial

map we estimate highly resembles those obtained by point-

based solutions, and therefore, are a very good alternative to

use when feature points are not available.

II. RELATED WORK

Building the 3D rigid structure of unknown environment

while recovering the camera trajectory from a monocular

image sequence has been an extremely important research

area in robotics and computer vision for decades, with

many real applications in autonomous robot navigation and

augmented reality. This problem is known as SLAM, and its

core is roughly the same compared to structure-from-motion

algorithms.

Early ﬁltering approaches applied the Extended Kalman

Filter (EKF) [5] to process every frame in the video for

small maps, providing the ﬁrst real-time solutions. Subse-

quent works based on Bundle Adjustment (BA) handled

denser maps just using key-frames to estimate the map [13],

[17], obtaining more accurate solutions [27] than ﬁltering

techniques. Most approaches rely on PTAM algorithm [13],

that represented a breakthrough in visual-based SLAM. This

method approximately decouples localization and mapping

in two threads that run in parallel, relying on FAST corners

points [23]. In [14] the accuracy was improved with edge

features together with a rotation estimation step during

tracking that provided better relocalization results, and even

reducing the computational cost [24]. More recently, the

ORB-SLAM system has been proposed in [18], providing a

more robust camera tracking and mapping estimator. A multi-

threaded CPU approach was presented in [7] to estimate real-

time dense structure estimation.

However, all previous feature-based methods fail in en-

vironments with poor texture or situations with defocus

and motion blur. To solve this, dense and direct meth-

ods can be applied, even though they are likely to be

computationally expensive [19], [21], and require dedicated

GPU-implementations to achieve real-time performance.

Other semi-direct methods such as [9] overcome the high-

computation requirement of dense methods by exploiting

only pixels with strong gradients, providing an intermedi-

ate level of accuracy, density and complexity. Scene prior

information have been also exploited to provide a signiﬁcant

boost to SLAM systems [3], [4].

Motivated by the need for efﬁcient and accurate scene

representations even for poorly textured environments, in

Fig. 2. PL-SLAM pipeline, an extension of the ORB-SLAM [18] pipeline.

The system is composed by three main threads: Tracking, Local Mapping

and Loop Closing. The Tracking thread estimates the camera position and

decides when to add new keyframes. Then, Local Mapping adds the new

keyframe information into the map and optimizes it with BA. The Loop

Closing thread is constantly checking for loops and correcting them.

tasks such as visual inspection from aerial vehicles or hand-

held devices (i.e., with limited computational resources), we

here propose a novel visual-based SLAM system that can

combine points and lines information in a uniﬁed framework

while keeping the computational cost. Note that several

parametrizations to combine points and lines were used

in EKF-SLAM [26]. However, as we said above, ﬁltering-

based approaches have been outperformed by optimization-

based approaches in rigid SLAM, as we do in this work.

We validate our method on a wide variety of scenarios,

outperforming state-of-the-art solutions for highly textured

sequences and showing very accurate solutions in low-

textured scenarios where standard feature-based methods fail.

III. SYSTEM OVERVIEW

The pipeline of our approach highly resembles that of the

ORB-SLAM [18], in which we have integrated the informa-

tion provided by line features (see Fig. 2). We next brieﬂy

review the main building blocks in which line operations

are performed. For a description of the operations involving

point features, the reader is referred to [18].

One of the main issues to address in SLAM algorithms is

the computational complexity. In order to preserve the real-

time characteristics of ORB-SLAM [18], we have carefully

chosen, used and implemented fast methods for operating

with lines in all stages of the pipeline: detection, trian-

gulation, matching, culling, relocalization and optimization.

Line segments in an input frame are detected by mean of

LSD [31], an O(n) line segment detector, where n is the

number of pixels in the image. Then, lines are pairwise

matched with lines already present in the map using a

relational graph strategy [33]. This approach relies on lines’

local appearance (Line Band Descriptors) and geometric

constraints and is shown to be quite robust against image

artifacts while preserving the computational efﬁciency.

As it is done with point features, after having obtained

an initial set of map-to-image line feature pairs, all lines

of the local map are projected onto the image to ﬁnd

further correspondences. Then, if the image contains suf-

ﬁcient new information about the environment, it is ﬂagged

as a keyframe and its corresponding lines are triangulated

and added to the map. To discard possible outliers, lines

seen from less than three viewpoints or in less than 25%

of the frames from which they were expected to be seen

are discarded too (culling). Line positions in the map are

optimized with a local BA. Note in Fig. 2 that we do not use

lines for loop closing. Matching lines across the whole map

is too computationally expensive. Hence, only point features

are used for loop detection.

IV. LINE-BASED SLAM

We next describe the line parameterization and error

function we use and how this is integrated within the

main building blocks of the SLAM pipeline, namely bundle

adjustment, global relocalization and feature matching.

A. Line-based Reprojection Error

In order to extend the ORB-SLAM [18] to lines, we

need a proper deﬁnition of the reprojection error and line

parameterization.

Following [30], let P, Q ∈ R

be the 3D endpoints of a

line, p

, q

∈ R

their 2D detections in the image plane, and

, q

∈ R

theirs corresponding homogeneous coordinates.

From the latter we can obtain the normalized line coefﬁcients

as:

l =

× q



× q



. (1)

The line reprojection error E

line

is then deﬁned as the

sum of point-to-line distances E

between the projected line

segment endpoints, and the detected line in the image plane

(see Fig. 3-right). That is:

line

(P, Q, l, θ , K) = E

(P, l, θ , K) + E

(Q, l, θ , K), (2)

with:

(P, l, θ , K) = l

π(P, θ , K), (3)

where l are the detected line coefﬁcients, π(P, θ, K) rep-

resents the projection of the endpoint P onto the image

plane, given the internal camera calibration matrix K, and

the camera parameters θ = {R, t} that includes the rotation

and translation parameters, respectively.

Note that in practice, due to real conditions such as line

occlusions or mis-detections, the image detected endpoints

and q

will not match the projections of the endpoints

P and Q (see Fig. 3-left). Therefore, we deﬁne the detected

line reprojection error as:

line,d

, q

, l) = E

pl,d

, l) + E

pl,d

, l), (4)

where l is the projected 3D line coefﬁcients and the detected

point-to-line error is E

pl,d

, l) = l

Fig. 3. Left: Notation. Let P, Q ∈ R

be the 3D endpoints of a 3D line,

q ∈ R

their projected 2D endpoints to the image plane and

l the projected

line coefﬁcients. p

, q

∈ R

the 2D endpoints of a detected line, P

, Q

∈ R

their real 3D endpoints, and l the detected line coefﬁcients. X ∈ R

is a

3D point and

x ∈ R

its corresponding 2D projection. Right: Line-based

reprojection error. d

and d

represent the line reprojection error, and d

and d

the detected line reprojection error between a detected 2D line (blue

solid) and the corresponding projected 3D line (green dashed).

Based on the methodology proposed in [30], a recursion

over the detected reprojection line error will be applied in

order to optimize the pose parameters θ while approximating

line,d

to the line error E

line

deﬁned on Eq. (2).

B. Bundle Adjustment with Points and Lines

The camera pose parameters θ = {R, t} are optimized at

each frame with a BA strategy that constrains θ to lie in the

SE(3) group. For doing this, we build upon the framework of

the ORB-SLAM [18] but besides feature point observations,

we include the lines as deﬁned in the previous subsection.

We next deﬁne the speciﬁc cost function we propose to

be optimized by the BA that combines the two types of

geometric entities.

Let X

∈ R

be the generic j-th point of the map. For

the i-th keyframe, this point can be projected onto the image

plane as:

i, j

= π(X

, θ

, K), (5)

where θ

= {R

, t

} denotes the speciﬁc pose of the i-th

keyframe. Given an observation x

i, j

of this point, we deﬁne

following 3D error:

i, j

= x

i, j

−

i, j

. (6)

Similarly, let us denote by P

and Q

the endpoints

of the j-th map line segment. The corresponding image

projections (expressed in homogeneous coordinates) onto the

same keyframe can be written as:

i, j

= π(P

, θ

, K), (7)

i, j

= π(Q

, θ

, K) . (8)

Then, given the image observations p

i, j

and q

i, j

of the j-th

line endpoints, we use Eq. (1) to estimate the coefﬁcients of

the observed line

i, j

. We deﬁne the following error vectors

for the line:

i, j

= (

i, j

)

−1

i, j

), (9)

i, j

= (

i, j

)

−1

i, j

). (10)

Fig. 4. Estimating camera rotation from line correspondences. P, Q ∈ R

are the 3D line endpoints, l

, i = {1, 2, 3} its detections in three consecutive

frames with endpoints p

, q

, and coefﬁcients l

The errors (9, 10) are in fact instances of the point-to-line

error (3). As explained in [30] they are not constant w.r.t.

shift of the endpoints P

, Q

along the corresponding 3D line,

which serves as implicit regularization allowing us to use

such a non-minimal line parametrization in the BA.

Observe that representing lines using their endpoints we

obtain comparable error representations for points and lines.

We can therefore build a uniﬁed cost function that integrates

each of the error terms as:

C =

∑

i, j



i, j

Ω

−1

i, j

+ e

i, j

Ω

i, j

−1

i, j

+ e

i, j

Ω

i, j

−1

i, j



where ρ is the Huber robust cost function and Ω

i, j

, Ω

i, j

, Ω

i, j

are the covariance matrices associated to the scale at which

the keypoints and line endpoints were detected, respectively.

C. Global Relocalization

An important component of any SLAM method, is an

approach to relocalize the camera when the tracker is lost.

This is typically achieved by means of a PnP algorithm,

that estimates the pose of the current (lost) frame given

correspondences with 3D map points appearing in previous

keyframes. On top of the PnP method, a RANSAC strategy

is used to reject outliers correspondences.

In the ORB-SLAM [18], the speciﬁc PnP method that is

used is the EPnP [1], which however, only accepts point

correspondences as inputs. In order to make our approach ap-

propriate to handle lines for relocalization, we have replaced

the EPnP by the recently published EPnPL [30], which

minimizes the detected line reprojection error of Eq. (4).

Furthermore, EPnPL [30] is robust to partial line occlusion

and mis-detections. This is achieved by means of a two-step

procedure in which ﬁrst minimizes the reprojection error of

the detected lines and estimates the line endpoints p

, q

These points, are then shifted along the line in order to match

the projections

of the 3D model endpoints P, Q (see

Fig. 3). Once these matches are established, the camera pose

can be reliably estimated.

V. MAP INITIALIZATION WITH LINES

Another contribution of this paper is an algorithm to esti-

mate an initial map using only line correspondences. Current

optimization-based SLAM approaches are initialized with

maps built from point correspondences between at least two

frames. Homography [8] or essential matrix [29] estimation

algorithms are then used to compute the initial map and pose

parameters. We next describe our line-based solution for map

initialization, which can be a good alternative in low textured

scenes with lack of feature points.

Let us consider the setup of Fig. 4, where a line deﬁned

by endpoints P, Q is projected onto three camera views. Let

, q

}, {p

, q

} and {p

, q

} be the endpoint projections

in each of the views and l

, l

∈ R

the corresponding line

coefﬁcients computed from the projected endpoints.

We will make the assumption of small and continuous

rotation between consecutive camera poses, such that the

rotation from the ﬁrst to the second camera views is the

same than the rotation from the second to the third one

Under this assumption we can represent the three camera

rotations by R

= R

, R

= I, and R

= R, with I being the

3 ×3 identity matrix.

Note that the line coefﬁcients l

, i = {1, 2, 3} also represent

the parameters of a vector which is normal to the plane

formed by the center of projection O

and the projections

, q

. The cross product of two such vectors l

will be parallel

to the line P, Q and at the same time orthogonal to the third

vector, all of them appropriately rotated and put in a common

reference. This constraint can be written as:



) ×(Rl

)



= 0. (11)

Additionally, for small rotations we can approximate R as:

R =





1 −r

−r





. (12)

For this parametrization, having three matched lines, we

will have three quadratic equations like Eq. (11) with three

unknowns, r

, r

and r

. We adapt the polynomial solver

of [15], which yields up to eight solutions. For each possible

rotation matrix we can get t

, t

by using the trifocal tensor

equations [11] which will be linear in t

, t

. We assume t

0. We evaluate the eight possible solutions and keep the one

that minimizes Eq. (11).

It is worth to point that in order to get enough independent

constraints when solving for the translation components

using the trifocal tensor equations, we need two additional

line correspondences, and hence, the total number of line

matches required by our algorithm is ﬁve.

VI. EXPERIMENTAL RESULTS

We have compared our system with the current state-

of-the-art Visual SLAM methods using the TUM RGB-D

benchmark [28]. Also, we evaluate the proposed initialization

approach with synthetic and real data and compare the

computation time of our PL-SLAM algorithm and the ORB-

SLAM [18]. All experiments were carried out with an Intel

In the experimental section we will evaluate the consequences of this

assumption, and show that in practice is a good approximation.

TABLE I

LOCALIZATION ACCURACY IN THE TUM RGB-D BENCHMARK [28]

Absolute KeyFrame Trajectory RMSE [cm]

TUM RGB-D

Sequence

PL-SLAM

Classic Init

PL-SLAM

Line Init

ORB-SLAM

PTAM

†

LSD-SLAM

†

RGBD-SLAM

†

f1 xyz 1.21 1.46 1.38 1.15 9.00 1.34

f2 xyz 0.43 1.49 0.54 0.2 2.15 2.61

f1 ﬂoor 7.59 9.42 8.71 - 38.07 3.51

f2 360 kidnap 3.92 60.11 4.99 2.63 - 393.3

f3 long ofﬁce 1.97 5.33 4.05 - 38.53 -

f3 nstr tex far

ambiguity

detected

37.60

ambiguity

detected

34.74 18.31 -

f3 nstr tex near 2.06 1.58 2.88 2.74 7.54 -

f3 str tex far 0.89 1.25 0.98 0.93 7.95 -

f3 str tex near 1.25 7.47 1.5451 1.04 - -

f2 desk person 1.99 6.34 5.95 - 31.73 6.97

f3 sit xyz 0.066 9.03 0.08 0.83 7.73 -

f3 sit halfsph 1.31 9.05 1.48 - 5.87 -

f3 walk xyz 1.54

ambiguity

detected

1.64 - 12.44 -

f3 walk halfsph 1.60

ambiguity

detected

2.09 - - -

Median over 5 executions for each sequence. All trajectories were aligned with

7DoF with the ground truth before computing the ATE error with the script provided

by the benchmark [28]. Both ORB-SLAM and PL-SLAM were executed with the

parametrization of the on-line open source ORB-SLAM package.

†

Result of PTAM,

LSD-SLAM and RGBD-SLAM were extracted from [18].

TABLE II

TRACKING AND MAPPING TIMES

Mean execution time [ms]

Thread Operation PL-SLAM ORB-SLAM

KeyFrame

Insertion

17.08 9.86

Local

Map Feature

Culling

1.18 1

Mapping

Map Features

Creation

74.64 8.39

Local BA 218.25 118.5

KeyFrame

Culling

12.7 2.86

Total 3Hz 7Hz

Tracking

Features

Extraction

31.32 10.76

Initial Pose

Estimation

7.16 7.16

Track

Local Map

12.58 3.18

Total 20Hz 50Hz

Mean execution time of 5 different se-

quences of the TUM RGB-D bench-

mark [28].

Core i7-4790 (4 cores @3.6 GHz), 8Gb RAM and ROS

Hydro [22]. Due to the randomness of the some stages

of the pipeline, e.g., initialization, position optimization or

global relocalization, all experiments were run ﬁve times

and we report the median of all executions. Supplemen-

tary material can be found on website http://www.

albertpumarola.com/research/pl-slam/.

A. Localization Accuracy in the TUM RGB-D Benchmark

To evaluate the localization accuracy we compare our PL-

SLAM method against current state-of-the-art Visual SLAM

methods, including ORB-SLAM [18], PTAM [13], LSD-

SLAM [7] and RGBD-SLAM [6]. The metric used for the

comparison is the Absolute Trajectory Error (ATE), provided

by the evaluation script of the benchmark. Before computing

the error, all trajectories are aligned using a similarity warp

except for the RGBD-SLAM [6] which is aligned by a rigid

body transformation. The results are summarized in Table I.

Note that our PL-SLAM consistently improves the tra-

jectory accuracy of ORB-SLAM [18] in all sequences.

Indeed, it yields the best result in all but two sequences,

for which PTAM [13] performs slightly better. Nevertheless,

PTAM [13] turned not to be so reliable, as in 5 out of

all 12 sequences it lost track. LSD-SLAM [7] and RGBD-

SLAM [6] also lost track in 3 and 7 sequences, respectively.

B. Map Initialization - Synthetic Experiments

In order to evaluate the map initialization algorithm we

describe in Sect. V we perform several synthetic and real

experiments.

In the synthetic tests we ﬁrst evaluate the stability of

the polynomial solver we built, modifying the toolbox of

Kukelova et al. [15]. Fig. 5-left shows the distribution of

Median Relative Translation Error

0,8

0,9

Median Rotation Angle [deg]

Rotation Angle [deg]

0 5 10 15 20 25 30

% experiments

log error

−16 −14 −12 −10 −8 −6

Fig. 5. Map Initialization - Synthetic experiments. Left: Numerical stability

of the polynomial system solver. Right: Rotation and translation error w.r.t

frames rotation.

errors in the parameter estimation for ideal solutions. Note

that the average error is around 1e-15, indicating that our

modiﬁed solver is very stable.

Additionally, we have assessed the consequences of as-

suming small and constant rotations between three consecu-

tive frames. Fig. 5-right displays the rotation and translation

errors produced for increasing inter-frame rotations. While

the estimated rotation error remains within relatively small

bounds, the translation error is more severely affected by

the small rotation assumption. In any event, when this initial

map is fed into the BA optimizer, the translation error is

drastically reduced.

C. Map Initialization - Real Experiments

We also evaluate our PL-SLAM method using the clas-

sic initialization (based on homography or essential matrix

computation), and with the proposed map initialization based

only on lines (see again Table I). As expected, the accuracy

with the line map initialization drops due to the small rotation

HTML Viewer

Frequently Asked Questions (14)

Q1. What contributions have the authors mentioned in the paper "Pl-slam: real-time monocular visual slam with points and lines" ?

In this paper the authors propose a solution to handle these situations. The authors propose a solution that can even work when most of the points are vanished out from the input images, and, interestingly it can be initialized from solely the detection of line correspondences in three consecutive frames. The authors thoroughly evaluate their approach and the new initialization strategy on the TUM RGB-D benchmark and demonstrate that the use of lines does not only improve the performance of the original ORB-SLAM solution in poorly textured frames, but also systematically improves it in sequence frames combining points and lines, without compromising the efficiency.

Q2. What future works have the authors mentioned in the paper "Pl-slam: real-time monocular visual slam with points and lines" ?

In future work, the authors plan to further exploit line features and incorporate other geometric primitives like planes, which can be built from lines in a similar manner as they have built lines from point features.

Q3. How do the authors preserve the realtime characteristics of ORB-SLAM?

In order to preserve the realtime characteristics of ORB-SLAM [18], the authors have carefully chosen, used and implemented fast methods for operating with lines in all stages of the pipeline: detection, triangulation, matching, culling, relocalization and optimization.

Q4. How many lines are required to solve the trifocal tensor equations?

It is worth to point that in order to get enough independent constraints when solving for the translation components using the trifocal tensor equations, the authors need two additional line correspondences, and hence, the total number of line matches required by their algorithm is five.

Q5. What is the main idea behind the proposed system?

Motivated by the need for efficient and accurate scene representations even for poorly textured environments, intasks such as visual inspection from aerial vehicles or handheld devices (i.e., with limited computational resources), the authors here propose a novel visual-based SLAM system that can combine points and lines information in a unified framework while keeping the computational cost.

Q6. Why did the authors run all the experiments five times?

Due to the randomness of the some stages of the pipeline, e.g., initialization, position optimization or global relocalization, all experiments were run five times and the authors report the median of all executions.

Q7. How are lines detected in an input frame?

Line segments in an input frame are detected by mean of LSD [31], an O(n) line segment detector, where n is the number of pixels in the image.

Q8. What are the main building blocks of the SLAM pipeline?

The authors next describe the line parameterization and error function the authors use and how this is integrated within the main building blocks of the SLAM pipeline, namely bundle adjustment, global relocalization and feature matching.

Q9. How do the authors estimate the coefficients of the j-th line?

This is achieved by means of a two-step procedure in which first minimizes the reprojection error of the detected lines and estimates the line endpoints pd,qd.

Q10. What is the way to estimate a map?

The authors next describe their line-based solution for map initialization, which can be a good alternative in low textured scenes with lack of feature points.

Q11. What is the definition of line reprojection error?

(1)The line reprojection error Eline is then defined as the sum of point-to-line distances Epl between the projected line segment endpoints, and the detected line in the image plane (see Fig. 3-right).

Q12. What is the relocalization error of the line?

In order to make their approach appropriate to handle lines for relocalization, the authors have replaced the EPnP by the recently published EPnPL [30], which minimizes the detected line reprojection error of Eq. (4).

Q13. How can dense and direct methods be used to solve this problem?

To solve this, dense and direct methods can be applied, even though they are likely to be computationally expensive [19], [21], and require dedicated GPU-implementations to achieve real-time performance.

Q14. What is the SLAM method for evaluating the localization accuracy?

To evaluate the localization accuracy the authors compare their PLSLAM method against current state-of-the-art Visual SLAM methods, including ORB-SLAM [18], PTAM [13], LSDSLAM [7] and RGBD-SLAM [6].

PL-SLAM: Real-time monocular visual SLAM with points and lines

Summary (1 min read)

Introduction

I. Regional Grants from the European Union and the Central Government of Spain

II. Economic Performance of the Regions of Spain 1964-1994

III. Analysis of the Effect of Regional Grants on Regional Economic Performance

Figures (4)

Citations

Cites methods from "PL-SLAM: Real-time monocular visual..."

Cites background from "PL-SLAM: Real-time monocular visual..."

Cites methods from "PL-SLAM: Real-time monocular visual..."

References

"PL-SLAM: Real-time monocular visual..." refers methods in this paper

"PL-SLAM: Real-time monocular visual..." refers background in this paper

Related Papers (5)

Frequently Asked Questions (14)

Q1. What contributions have the authors mentioned in the paper "Pl-slam: real-time monocular visual slam with points and lines" ?

Q2. What future works have the authors mentioned in the paper "Pl-slam: real-time monocular visual slam with points and lines" ?

Q3. How do the authors preserve the realtime characteristics of ORB-SLAM?

Q4. How many lines are required to solve the trifocal tensor equations?

Q5. What is the main idea behind the proposed system?

Q6. Why did the authors run all the experiments five times?

Q7. How are lines detected in an input frame?

Q8. What are the main building blocks of the SLAM pipeline?

Q9. How do the authors estimate the coefficients of the j-th line?

Q10. What is the way to estimate a map?

Q11. What is the definition of line reprojection error?

Q12. What is the relocalization error of the line?

Q13. How can dense and direct methods be used to solve this problem?

Q14. What is the SLAM method for evaluating the localization accuracy?