Proceedings Article•DOI•

Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems

Andrew J. Taylor¹, Victor D. Dorobantu¹, Hoang M. Le¹, Yisong Yue¹, Aaron D. Ames¹ - Show less +1 more•Institutions (1)

04 Mar 2019-arXiv: Robotics-

TL;DR: A machine learning framework centered around Control Lyapunov Functions to adapt to parametric uncertainty and unmodeled dynamics in general robotic systems and yields a stabilizing quadratic program model-based controller.

read less

Abstract: Many modern nonlinear control methods aim to endow systems with guaranteed properties, such as stability or safety, and have been successfully applied to the domain of robotics. However, model uncertainty remains a persistent challenge, weakening theoretical guarantees and causing implementation failures on physical systems. This paper develops a machine learning framework centered around Control Lyapunov Functions (CLFs) to adapt to parametric uncertainty and unmodeled dynamics in general robotic systems. Our proposed method proceeds by iteratively updating estimates of Lyapunov function derivatives and improving controllers, ultimately yielding a stabilizing quadratic program model-based controller. We validate our approach on a planar Segway simulation, demonstrating substantial performance improvements by iteratively refining on a base model-free controller.

...read moreread less

Summary (2 min read)

Jump to: [Introduction] – [II. PRELIMINARIES ON CLFS] – [B. Control Lyapunov Functions] – [A. Uncertainty Modeling Assumptions] – [B. Motivating a Data-Driven Learning Approach] – [A. Episodic Learning Framework] – [B. Additional Controller Details] and [V. APPLICATION ON SEGWAY PLATFORM]

Introduction

The authors instead constructively prescribe a CLF, and focus on learning only the necessary information to choose control inputs that achieve the associated stability guarantees, which can be much lower-dimensional.
In particular, exhaustive data collection typically scales exponentially with dimensionality of the joint state and control output space, and so should be avoided.
We also provide a Python software package implementing their experiments and learning framework.the authors.the authors.

II. PRELIMINARIES ON CLFS

This section provides a brief review of input-output feedback linearization, a control technique which can be used to synthesize a CLF.
The resulting CLF will be used to quantify the impact of model uncertainty and specify the learning problem outlined in Section III.
A. Input-Output Linearization Input-Output Linearization is a nonlinear control method that creates stable linear dynamics for a selected set of outputs of a system [21].
This implies the desired output trajectory yd is exponentially stable.
This conclusion allows us to construct a Lyapunov function for the system using converse theorems found in [21].

B. Control Lyapunov Functions

The preceding formulation of a Lyapunov function required the choice of the specific control law given in (6).
For optimality purposes, it may be desirable to choose a different control input for the system, thus motivating the following definition.
The authors see that the previously constructed Lyapunov function satisfying (10) satisfies (11) by choosing the control input specified in (6).
(12) Information about the dynamics is encoded within the scalar function V̇ , offering a reduction in dimensionality which will become relevant later in learning.
Here Sm+ denotes the set of m ×m symmetric positive semi-definite matrices.

A. Uncertainty Modeling Assumptions

As defined in Section II, the authors consider affine robotic control systems that evolve under dynamics described by (1).
The authors assume the estimated model (14) satisfies the relative degree condition on the domain R, and thus may use the method of feedback linearization to produce a Control Lyapunov Function (CLF), V , for the system.
This holds since the true values of f̃ and g̃, if known, enable choosing control inputs as in (6) that respect the same linear output dynamics (8).
Instead of learning the unknown dynamics terms A and b, which scale with both the dimension of the configuration space and the number of inputs, the authors will learn the terms a and b, which scale only with the number of inputs.

B. Motivating a Data-Driven Learning Approach

The formulation from (15) and (16) defines a general class of dynamics uncertainty.
To motivate their learningbased framework, first consider a simple approach of learning a and b via supervised regression [19]: the authors operate the system using some given state-feedback controller to gather data points along the system’s evolution and learn a function that approximates a and b via supervised learning.
An experiment is defined as the evolution of the system over a finite time interval from the initial condition (q0,0) using a discrete-time implementation of the given controller.
As a consequence, standard supervised learning with sequential, non-i.i.d data collection often leads to error cascades [24].

A. Episodic Learning Framework

Episodic learning refers to learning procedures that iteratively alternates between executing an intermediate controller (also known as a roll-out in reinforcement learning [22]), collecting data from that roll-out, and designing a new controller using the newly collected data.
The data set is aggregated and a new ERM problem is solved after each episode.
Such exploration can be achieved by randomly perturbing the controller used in an experiment at each time step.
Algorithm 1 specifies a method of computing a sequence of Lyapunov function derivative estimates and augmenting controllers.
The trust coefficients form a monotonically nondecreasing sequence on the interval [0, 1].

B. Additional Controller Details

This is done to avoid chatter that may arise from the optimization based nature of the CLF-QP formulation [27].
Note that for this choice of Lyapunov function, the gradient ∂V∂η , and therefore a, approach 0 as η approaches 0, which occurs close to the desired trajectory.
Such relative error causes the optimization problem in (20) to be poorly conditioned near the desired trajectory.
As states approach the trajectory, the coefficient of the quadratic term decreases and enables relaxation of the exponential stability inequality constraint.
The exploratory control during experiments is naively chosen as additive noise from a centered uniform distribution, with each coordinate drawn i.i.d.

V. APPLICATION ON SEGWAY PLATFORM

In this section the authors apply the episodic learning algorithm constructed in Section IV to the Segway platform.
The authors seek to track a pitch angle trajectory2 generated for the estimated model.
The baseline PD controller and the augmented controller after 20 experiments can be seen in the right portion Fig. 3.
The mean trajectory consistently improves in these later episodes as the trust factor increases.
The variation increases but 2Trajectory was generated using the GPOPS-II Optimal Control Software 3Models were implemented in Keras remains small, indicating that the learning problem is robust to randomness in the initialization of the neural networks, in the network training algorithm, and in the noise added during the experiments.

Did you find this useful? Give us your feedback

Figures (3)

Fig. 3. Augmenting controllers consistently improve trajectory tracking across episodes. 10 instances of the algorithm are executed with the shaded region formed from minimum and maximum angles for each time step within an episode. The corresponding average angle trajectories are also displayed.

Fig. 2. (Left) Model based QP controller fails to track trajectory. (Right) Improvement in angle tracking of system with augmented controller over nominal PD controller. (Bottom) Corresponding visualizations of state data. Note that Segway is tilted in the incorrect direction at the end of the QP controller simulation, but is correctly aligned during the augmented controller simulation.

Fig. 1. CAD model & physical system, a modified Ninebot Segway.

Content maybe subject to copyright Report

Episodic Learning with Control Lyapunov Functions for

Uncertain Robotic Systems*

Andrew J. Taylor

, Victor D. Dorobantu

, Hoang M. Le, Yisong Yue, Aaron D. Ames

Abstract— Many modern nonlinear control methods aim to

endow systems with guaranteed properties, such as stability

or safety, and have been successfully applied to the domain

of robotics. However, model uncertainty remains a persistent

challenge, weakening theoretical guarantees and causing im-

plementation failures on physical systems. This paper develops

a machine learning framework centered around Control Lya-

punov Functions (CLFs) to adapt to parametric uncertainty

and unmodeled dynamics in general robotic systems. Our

proposed method proceeds by iteratively updating estimates

of Lyapunov function derivatives and improving controllers,

ultimately yielding a stabilizing quadratic program model-

based controller. We validate our approach on a planar Segway

simulation, demonstrating substantial performance improve-

ments by iteratively reﬁning on a base model-free controller.

I. INTRODUCTION

The use of Control Lyapunov Functions (CLFs) [5], [38]

for nonlinear control of robotic systems is becoming in-

creasingly popular [26], [17], [29], often utilizing quadratic

program controllers [3], [2], [17]. While effective, one major

challenge is the need for extensive tuning, which is largely

due to modeling deﬁciencies such as parametric error and

unmodeled dynamics (cf. [26]). While there has been much

research in developing robust control methods that maintain

stability under uncertainty (e.g., via input-to-state stability

[39]) or in adapting to limited forms of uncertainty (e.g.,

adaptive control [23],[20]), relatively little work has been

done on systematically reducing uncertainty while maintain-

ing stability for general function classes of models.

We take a machine learning approach to address the above

limitations. Learning-based approaches have already shown

great promise for controlling imperfectly modeled robotic

platforms [22], [35]. Successful learning-based approaches

have typically focused on learning model-based uncertainty

[6], [9], [8], [37], or direct model-free controller design [25],

[36], [14], [42], [24].

We are particularly interested in learning-based approaches

that guarantee Lyapunov stability [21]. From that perspective,

the bulk of previous work has focused on using learning to

construct a Lyapunov function [31], [12], [30], or to assess

the region of attraction for a Lyapunov function [10], [7].

*This work was supported by Google Brain Robotics and DARPA Award

HR00111890035

Both authors contributed equally.

All authors are with the Department of Computing and

Mathematical Sciences, California Institute of Technology,

Pasadena, CA 91125, USA ajtaylor@caltech.edu,

vdoroban@caltech.edu, hmle@caltech.edu,

yyue@caltech.edu, ames@caltech.edu

Fig. 1. CAD model & physical system, a modiﬁed Ninebot Segway.

One limitation of previous work is the learning is conducted

over the full-dimensional state space, which can be data

inefﬁcient. We instead constructively prescribe a CLF, and

focus on learning only the necessary information to choose

control inputs that achieve the associated stability guarantees,

which can be much lower-dimensional.

One challenge in developing learning-based methods for

controller improvement is how best to collect training data

that accurately reﬂects the desired operating environment

and control goals. In particular, exhaustive data collection

typically scales exponentially with dimensionality of the joint

state and control output space, and so should be avoided. But

ﬁrst pre-collecting data upfront can lead to poor performance

as downstream control behavior may enter states that are not

present in the pre-collected training data. We will leverage

episodic learning approaches such as Dataset Aggregation

(DAgger) [33] to address these challenges in a data-efﬁcient

manner, and lead to iteratively reﬁned controllers.

In this paper we present a novel episodic learning approach

that utilizes CLFs to iteratively improve controller design

while maintaining stability. To the best of our knowledge,

our approach is the ﬁrst that integrates CLFs and general

supervised learning (e.g., including deep learning) in a

mathematically integrated way. Another distinctive aspect is

that our approach performs learning on the projection of state

dynamics onto the CLF time derivative, which can be much

lower dimensional than learning the full state dynamics or

the region of attraction.

Our paper is organized as follows. Section II presents a

review of input-output feedback linearization with a focus

on constructing CLFs for unconstrained robotic systems.

Section III discusses model uncertainty of a general robotic

arXiv:1903.01577v1 [cs.RO] 4 Mar 2019

system and establishes assumptions on the structure of this

uncertainty. These assumptions allow us to prescribe a CLF

for the true system, but leave open the question of how to

model its time derivative. Section IV provides an episodic

learning approach to iteratively improving a model of the

time derivative of the CLF. We also present a variant of

optimal CLF-based control that integrates the learned rep-

resentation. Finally, Section V provides simulation results

on a model of a modiﬁed Ninebot by Segway E+, seen in

Fig. 1. We also provide a Python software package (LyaPy)

implementing our experiments and learning framework.

II. PRELIMINARIES ON CLFS

This section provides a brief review of input-output feed-

back linearization, a control technique which can be used to

synthesize a CLF. The resulting CLF will be used to quantify

the impact of model uncertainty and specify the learning

problem outlined in Section III.

A. Input-Output Linearization

Input-Output Linearization is a nonlinear control method

that creates stable linear dynamics for a selected set of

outputs of a system [21]. The relevance of Input-Output Lin-

earization is that it provides a constructive way to generate

Lyapunov functions for the class of afﬁne robotic control

systems. Consider a conﬁguration space Q ⊆ R

and an

input space U ⊆ R

. Assume Q is path-connected and non-

empty. Consider a control system speciﬁed by:

D(q)

q + C(q,

q + G(q)

| {z }

H(q,

= Bu, (1)

with generalized coordinates q ∈ Q, coordinate rates

q ∈

, input u ∈ U, inertia matrix D : Q → S

, centrifugal

and Coriolis terms C : Q×R

→ R

n×n

, gravitational forces

G : Q → R

, and static actuation matrix B ∈ R

n×m

. Here

denotes the set of n × n symmetric positive deﬁnite

matrices. Deﬁne twice-differentiable outputs y : Q → R

with k ≤ m, and assume each output has relative degree

2 on some domain R ⊆ Q (see [34] for more details).

Consider the time interval I = [t

, t

] for initial and ﬁnal

times t

, t

satisfying t

> t

and deﬁne twice-differentiable

time-dependent desired outputs y

: I → R

with r(t) =



(t)



. The error between the outputs and the

desired outputs (commonly referred to as virtual constraints

[44]) yields the dynamic system:



y(q) − y

(t)

y(q,

q) −

(t)



f (q,

z }| {

∂y

∂q

∂

∂q

q −

∂y

∂q

D(q)

−1

H(q,

−



(t)



| {z }

r(t)



k×m

∂y

∂q

D(q)

−1



| {z }

g(q)

(2)

https://github.com/vdorobantu/lyapy

noting that

∂

∂y

∂q

. For all q ∈ R, g(q) is full rank by the

relative degree assumption. Deﬁne η : Q × R

× I → R

f : Q × R

→ R

, and

g : Q → R

k×m

as:

η(q,

q, t) =



y(q) − y

(t)

y(q,

q) −

(t)



(3)

f(q,

q) =

∂

∂q

q −

∂y

∂q

D(q)

−1

H(q,

q) (4)

g(q) =

∂y

∂q

D(q)

−1

B, (5)

and assume U = R

. The input-output linearizing control

input is speciﬁed by:

u(q,

q, t) =

g(q)

†

(−

f(q,

q) +

(t) + ν(q,

q, t)), (6)

with auxiliary input ν(q,

q, t) ∈ R

for all q ∈ Q,

q ∈ R

, and t ∈ I, where † denotes the Moore-Penrose

pseudoinverse. This controller used in (2) generates linear

output dynamics of the form:

η(q,

q, t) =



k×k



| {z }

η(q,

q, t) +



k×k



| {z }

ν(q,

q, t),

(7)

where (F, G) are a controllable pair. Deﬁning K =





where K

, K

∈ S

, the auxiliary control

input ν(q,

q, t) = −Kη(q,

q, t) induces output dynamics:

η(q,

q, t) = A

η(q,

q, t), (8)

where A

= F − GK is Hurwitz. This implies the desired

output trajectory y

is exponentially stable. This conclusion

allows us to construct a Lyapunov function for the system

using converse theorems found in [21]. With A

Hurwitz,

for any Q ∈ S

, there exists a unique P ∈ S

such that

the Continuous Time Lyapunov Equation (CTLE):

P + PA

= −Q, (9)

is satisﬁed. Let C = {η (q,

q, t) : (q,

q) ∈ R × R

, t ∈ I}.

Then V (η) = η

Pη, implicitly a function of q,

q, and t,

is a Lyapunov function certifying exponential stability of (8)

on C satisfying:

min

(P)kηk

≤ V (η) ≤ λ

max

(P)kηk

V (η) ≤ −λ

min

(Q) kηk

, (10)

for all η ∈ C. Here λ

min

(·) and λ

max

(·) denote the minimum

and maximum eigenvalues of a symmetric matrix, respec-

tively. Alternatively, a Lyapunov function of the same form

can be constructed directly from (7) using the Continuous

Algebraic Riccati Equation (CARE) [21].

B. Control Lyapunov Functions

The preceding formulation of a Lyapunov function re-

quired the choice of the speciﬁc control law given in (6). For

optimality purposes, it may be desirable to choose a different

control input for the system, thus motivating the following

deﬁnition. Let C ⊆ R

. A function V : R

→ R

is a

Control Lyapunov Function (CLF) for (1) on C certifying

exponential stability if there exist constants c

, c

> 0

such that:

kηk

≤ V (η) ≤ c

kηk

inf

u∈U

V (η, u) ≤ −c

kηk

, (11)

for all η ∈ C. We see that the previously constructed

Lyapunov function satisfying (10) satisﬁes (11) by choosing

the control input speciﬁed in (6). In the absence of a speciﬁc

control input, we may write the Lyapunov function time

derivative as:

V (η, u) =

∂V

∂η

η =

∂V

∂η

(f(q,

q) −

r(t) + g(q)u). (12)

Information about the dynamics is encoded within the scalar

function

V , offering a reduction in dimensionality which will

become relevant later in learning. Also note that

V is afﬁne

in u. This leads to the class of quadratic program based

controllers given by:

u(q,

q, t) = arg min

u∈U

Mu + s

u + r

s.t.

V (η, u) ≤ −c

kηk

, (13)

for M ∈ S

, s ∈ R

, and r ∈ R, provided U is a

polyhedron. Here S

denotes the set of m × m symmetric

positive semi-deﬁnite matrices.

III. UNCERTAINTY MODELS & LEARNING

This section deﬁnes the class of model uncertainty we

consider in this work and investigates its impact on the

control system, and concludes with motivation for a data-

driven approach to mitigate this impact.

A. Uncertainty Modeling Assumptions

As deﬁned in Section II, we consider afﬁne robotic control

systems that evolve under dynamics described by (1). In

practice, we do not know the dynamics of the system exactly,

and instead develop our control systems using the estimated

model:

D(q)

q +

C(q,

q +

G(q)

| {z }

H(q,

Bu. (14)

We assume the estimated model (14) satisﬁes the relative

degree condition on the domain R, and thus may use

the method of feedback linearization to produce a Control

Lyapunov Function (CLF), V , for the system. Using the def-

initions established in (2) in conjunction with the estimated

model, we see that true system evolves as:

η =

f(q,

q) −

r(t) +

g(q)u

+ (g(q) −

g(q)

| {z }

A(q)

)u + f (q,

q) −

f(q,

| {z }

b(q,

. (15)

We note the following features of modeling uncertainty in

this fashion:

• Uncertainty is allowed to enter the system dynamics

via parametric error as well as through completely

unmodeled dynamics. In particular, the function H can

capture a wide variety of nonlinear behavior and only

needs to be Lipschitz continuous.

• This formulation explicitly allows uncertainty in how

the input is introduced into the dynamics via uncertainty

in the inertia matrix D and static actuation matrix B.

This deﬁnition of uncertainty is also compatible with a

dynamic actuation matrix B : Q × R

→ R

n×m

given

proper assumptions on the relative degree of the system.

Given this formulation of our uncertainty, we make the

following assumptions of the true dynamics:

Assumption 1. The true system is assumed to be determin-

istic, time invariant, and afﬁne in the control input.

Assumption 2. The CLF V , formulated for the estimated

model, is a CLF for the true system.

It is sufﬁcient to assume that the true system have relative

degree 2 on the domain R to satisfy Assumption 2. This

holds since the true values of

f and

g, if known, enable

choosing control inputs as in (6) that respect the same linear

output dynamics (8). Given that V is a CLF for the true

system, its time derivative under uncertainty is given by:

V (η, u) =

V (η,u)

z }| {

∂V

∂η

(

f(q,

q) −

r(t) +

g(q)u)

∂V

∂η

A(q)

| {z }

a(η,q)

u +

∂V

∂η

b(q,

| {z }

b(η,q,

, (16)

for all η ∈ R

and u ∈ U. While V is a CLF for the true

system, it is no longer possible to determine if a speciﬁc

control value will satisfy the derivative condition in (11) due

to the unknown components a and b. Rather than form a new

Lyapunov function, we seek to better estimate the Lyapunov

function derivative

V to enable control selection that satisﬁes

the exponential stability requirement. This estimate should be

afﬁne in the control input, enabling its use in the controller

described in (13). Instead of learning the unknown dynamics

terms A and b, which scale with both the dimension of

the conﬁguration space and the number of inputs, we will

learn the terms a and b, which scale only with the number

of inputs. In the case of the planar Segway model we

simulate, we reduce the number of learned components from

4 to 2 (assuming kinematics are known). These learned

representations need to accurately capture the uncertainty

over the domain in which the system is desired to evolve

to ensure stability during operation.

B. Motivating a Data-Driven Learning Approach

The formulation from (15) and (16) deﬁnes a general class

of dynamics uncertainty. It is natural to consider a data-

driven method to estimate the unknown quantities a and b

over the domain of the system. To motivate our learning-

based framework, ﬁrst consider a simple approach of learning

a and b via supervised regression [19]: we operate the system

using some given state-feedback controller to gather data

points along the system’s evolution and learn a function that

approximates a and b via supervised learning.

Concretely, let q

∈ Q be an initial conﬁguration. An

experiment is deﬁned as the evolution of the system over a

ﬁnite time interval from the initial condition (q

, 0) using

a discrete-time implementation of the given controller. A

resulting discrete-time state history is obtained, which is then

transformed with Lyapunov function V and ﬁnally differen-

tiated numerically to estimate

V throughout the experiment.

This yields a data set comprised of input-output pairs:

D = {((q

, η

, u

)}

i=1

⊆ (Q × R

× R

× U) × R.

(17)

Consider a class H

of nonlinear functions mapping from

× Q to R

and a class H

of nonlinear functions

mapping from R

× Q × R

to R. For a given

a ∈ H

and

b ∈ H

, deﬁne

W as:

W (η, q,

q, u) =

V (η, u) +

a(η, q)

u +

b(η, q,

q), (18)

and let H be the class of all such estimators mapping R

Q×R

×U to R. Deﬁning a loss function L : R × R → R

the supervised regression task is then to ﬁnd a function in

H via empirical risk minimization (ERM):

inf

a∈H

b∈H

i=1

W (η

, q

, u

). (19)

This experiment protocol can be executed either in simulation

or directly on hardware. While being simple to implement,

supervised learning critically assumes independently and

identically distributed (i.i.d) training data. Each experiment

violates this assumption, as the regression target of each data

point is coupled with the input data of the next time step. As

a consequence, standard supervised learning with sequential,

non-i.i.d data collection often leads to error cascades [24].

IV. INTEGRATING EPISODIC LEARNING & CLFS

In this section we present the main contribution of this

work: an episodic learning algorithm that captures the un-

certainty present in the Lyapunov function derivative in a

learned model and utilizes it in a quadratic program based

controller.

A. Episodic Learning Framework

Episodic learning refers to learning procedures that itera-

tively alternates between executing an intermediate controller

(also known as a roll-out in reinforcement learning [22]),

collecting data from that roll-out, and designing a new

controller using the newly collected data. Our approach

integrates learning a and b with improving the performance

and stability of the control policy u in such an iterative

fashion. First, assume we are given a nominal state-feedback

controller u : Q × R

× I → U. With an estimator

W ∈ H

Algorithm 1 Dataset Aggregation for Control Lyapunov

Functions (DaCLyF)

Require: Control Lyapunov Function V , derivative esti-

mate

, model classes H

and H

, loss function L,

set of initial conﬁgurations Q

, nominal state-feedback

controller u

, number of experiments T , sequence of trust

coefﬁcients 0 ≤ w

≤ · · · ≤ w

≤ 1

D = ∅  Initialize data set

for k = 1, . . . , T do

, 0) ← sample(Q

× {0})  Get initial condition

← experiment((q

, 0), u

k−1

)  Run experiment

D ← D ∪ D

 Aggregate data set

b ← ERM(H

, H

, L, D,

)  Fit estimators

←

u +

b  Update derivative estimator

← u

+ w

· augment(u

)  Update controller

end for

return

, u

as deﬁned in (18), we specify an augmenting controller as:

(q,

q, t) = arg min

∈R

J(u

)

s.t.

W (η, q,

q, u(q,

q, t) + u

) ≤ −c

kηk

u(q,

q, t) + u

∈ U, (20)

where J : R

→ R is any positive semi-deﬁnite quadratic

cost function.

Our goal is to use this new controller to obtain better

estimates of a and b. One option, as seen in Section III-B,

is to perform experiments and use conventional supervised

regression to update

a and

b. To overcome the limitations

of conventional supervised learning, we leverage reduction

techniques: a sequential prediction problem is reduced to

a sequence of supervised learning problems over multiple

episodes [15], [32]. In particular, in each episode, an experi-

ment generates data using a different controller. The data set

is aggregated and a new ERM problem is solved after each

episode. Our episodic learning implementation is inspired by

the Data Aggregation algorithm (DAgger) [32], with some

key differences:

• DAgger is a reinforcement learning algorithm, which

trains a policy directly in each episode using optimal

computational oracles. Our algorithm deﬁnes a con-

troller indirectly via a CLF to ensure stability.

• The ERM problem is underdetermined, i.e., different

approximations (

b) may achieve similar loss for a

given data set while failing to accurately model a and

b. This potentially introduces error in estimating

for control inputs not reﬂected in the training data,

and necessitates the use of exploratory control action

to constrain the estimators

a and

b. Such exploration

can be achieved by randomly perturbing the controller

used in an experiment at each time step. This need

for exploration is an analog to the notion of persistent

t = 0 t = 1 t = 2 t = 3 t = 4 t = 5

Fig. 2. (Left) Model based QP controller fails to track trajectory. (Right) Improvement in angle tracking of system with augmented controller over nominal

PD controller. (Bottom) Corresponding visualizations of state data. Note that Segway is tilted in the incorrect direction at the end of the QP controller

simulation, but is correctly aligned during the augmented controller simulation.

excitation from adaptive systems [28].

Algorithm 1 speciﬁes a method of computing a sequence

of Lyapunov function derivative estimates and augmenting

controllers. During each episode, the augmenting controller

associated with the estimate of the Lyapunov function deriva-

tive is scaled by a factor reﬂecting trust in the estimate and

added to the nominal controller for use in the subsequent

experiment. The trust coefﬁcients form a monotonically non-

decreasing sequence on the interval [0, 1]. Importantly, this

experiment need not take place in simulation; the same

procedure may be executed directly on hardware. It may be

infeasible to choose a speciﬁc conﬁguration for an initial

condition on a hardware platform; therefore we specify a

set of initial conﬁgurations Q

⊆ Q from which an initial

condition may be sampled, potentially randomly.

B. Additional Controller Details

During augmentation, we specify the controller in (20) by

selecting the minimum-norm cost function:

J(u

) =

ku(q,

q, t) + u

, (21)

for all u

∈ R

, q ∈ Q,

q ∈ R

, and t ∈ I. We additionally

incorporate a smoothing regularizer into the cost function of

the form:

R(u

) = R ku

− u

for all u

∈ R

, where u

∈ R

is the previously

computed augmenting controller and R > 0. This is done

to avoid chatter that may arise from the optimization based

nature of the CLF-QP formulation [27].

Note that for this choice of Lyapunov function, the gra-

dient

∂V

∂η

, and therefore a, approach 0 as η approaches

0, which occurs close to the desired trajectory. While the

estimated Lyapunov function derivative may be ﬁt with low

absolute error on the data set, the relative error may still

be high for states near the desired trajectory. Such relative

error causes the optimization problem in (20) to be poorly

conditioned near the desired trajectory. We therefore add a

slack term δ ∈ R

to the decision variables, which appears

in the inequality constraint [3]. The slack term is additionally

incorporated into the cost function as:

C(δ) =





∂V

∂η

g(q)



a(η, q)



, (22)

for all δ ∈ R

, where C > 0. As states approach the

trajectory, the coefﬁcient of the quadratic term decreases

and enables relaxation of the exponential stability inequality

constraint. In practice this leads to input-to-state stable

behavior, described in [40], around the trajectory.

The exploratory control during experiments is naively cho-

sen as additive noise from a centered uniform distribution,

with each coordinate drawn i.i.d. The variance is scaled by

the norm of the underlying controller to introduce exploration

while maintaining a high signal-to-noise ratio.

HTML Viewer

Frequently Asked Questions (12)

Q1. What have the authors contributed in "Episodic learning with control lyapunov functions for uncertain robotic systems*" ?

This paper develops a machine learning framework centered around Control Lyapunov Functions ( CLFs ) to adapt to parametric uncertainty and unmodeled dynamics in general robotic systems. The authors validate their approach on a planar Segway simulation, demonstrating substantial performance improvements by iteratively refining on a base model-free controller.

Q2. What are the future works in "Episodic learning with control lyapunov functions for uncertain robotic systems*" ?

There are two main interesting directions for future work.

Q3. How are the parameters of the model modified?

The parameters of the model (including mass, inertias, and motor parameters but excluding gravity) are randomly modified by up to 10% of their nominal values and are fixed for the simulations.

Q4. What is the definition of an experiment?

An experiment is defined as the evolution of the system over a finite time interval from the initial condition (q0,0) using a discrete-time implementation of the given controller.

Q5. What is the definition of supervised learning?

Episodic learning refers to learning procedures that iteratively alternates between executing an intermediate controller (also known as a roll-out in reinforcement learning [22]), collecting data from that roll-out, and designing a new controller using the newly collected data.

Q6. How do the authors specify the controller in the experiment?

During augmentation, the authors specify the controller in (20) by selecting the minimum-norm cost function:J(u′) = 12 ‖u(q, q̇, t) + u′‖22 , (21)for all u′ ∈ Rm, q ∈ Q, q̇ ∈ Rn, and t ∈ I.

Q7. What is the true system's time derivative?

Given that V is a CLF for the true system, its time derivative under uncertainty is given by:V̇ (η,u) =̂̇V (η,u)︷︸︸︷ ∂V∂η (f̂(q, q̇)− ṙ(t) + ĝ(q)u)+ ∂V∂η A(q)︸︷︷︸a(η,q)>u+ ∂V∂η b(q, q̇)︸︷︷︸b(η,q,q̇), (16)for all η ∈ R2k and u ∈ U .

Q8. What is the use of augmenting controllers?

During each episode, the augmenting controller associated with the estimate of the Lyapunov function derivative is scaled by a factor reflecting trust in the estimate and added to the nominal controller for use in the subsequent experiment.

Q9. What is the slack term used in the exploratory control?

The exploratory control during experiments is naively chosen as additive noise from a centered uniform distribution, with each coordinate drawn i.i.d.

Q10. What is the function of the supervised learning?

define ̂̇W as:̂̇W (η,q, q̇,u) = ̂̇V (η,u) + â(η,q)>u+ b̂(η,q, q̇), (18) and let H be the class of all such estimators mapping R2k× Q×Rn×U to R. Defining a loss function L : R×R→ R+, the supervised regression task is then to find a function in H via empirical risk minimization (ERM):inf â∈Ha b̂∈Hb1N N∑ i=1 L(̂̇W (ηi,qi, q̇i,ui), V̇i). (19)

Q11. What is the slack term in the inequality constraint?

The slack term is additionally incorporated into the cost function as:C(δ) = 12 C ∥∥∥∥∥ ( ∂V ∂η ĝ(q) )> + â(η,q) ∥∥∥∥∥ 22δ2, (22)for all δ ∈ R+, where C > 0.

Q12. What is the definition of a control system?

In practice, the authors do not know the dynamics of the system exactly, and instead develop their control systems using the estimated model:D̂(q)q̈+ Ĉ(q, q̇)q̇+ Ĝ(q)︸︷︷︸ Ĥ(q,q̇) = B̂u. (14)The authors assume the estimated model (14) satisfies the relative degree condition on the domain R, and thus may use the method of feedback linearization to produce a Control Lyapunov Function (CLF), V , for the system.

Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems

Summary (2 min read)

Introduction

II. PRELIMINARIES ON CLFS

B. Control Lyapunov Functions

A. Uncertainty Modeling Assumptions

B. Motivating a Data-Driven Learning Approach

A. Episodic Learning Framework

B. Additional Controller Details

V. APPLICATION ON SEGWAY PLATFORM

Figures (3)

Citations

Cites background or methods from "Episodic Learning with Control Lyap..."

References

Additional excerpts

Additional excerpts

"Episodic Learning with Control Lyap..." refers background or methods in this paper

"Episodic Learning with Control Lyap..." refers methods in this paper

"Episodic Learning with Control Lyap..." refers background in this paper

Related Papers (5)

Frequently Asked Questions (12)

Q1. What have the authors contributed in "Episodic learning with control lyapunov functions for uncertain robotic systems*" ?

Q2. What are the future works in "Episodic learning with control lyapunov functions for uncertain robotic systems*" ?

Q3. How are the parameters of the model modified?

Q4. What is the definition of an experiment?

Q5. What is the definition of supervised learning?

Q6. How do the authors specify the controller in the experiment?

Q7. What is the true system's time derivative?

Q8. What is the use of augmenting controllers?

Q9. What is the slack term used in the exploratory control?

Q10. What is the function of the supervised learning?

Q11. What is the slack term in the inequality constraint?

Q12. What is the definition of a control system?