Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems
Summary (2 min read)
Introduction
- The authors instead constructively prescribe a CLF, and focus on learning only the necessary information to choose control inputs that achieve the associated stability guarantees, which can be much lower-dimensional.
- In particular, exhaustive data collection typically scales exponentially with dimensionality of the joint state and control output space, and so should be avoided.
- We also provide a Python software package implementing their experiments and learning framework.the authors.the authors.
II. PRELIMINARIES ON CLFS
- This section provides a brief review of input-output feedback linearization, a control technique which can be used to synthesize a CLF.
- The resulting CLF will be used to quantify the impact of model uncertainty and specify the learning problem outlined in Section III.
- A. Input-Output Linearization Input-Output Linearization is a nonlinear control method that creates stable linear dynamics for a selected set of outputs of a system [21].
- This implies the desired output trajectory yd is exponentially stable.
- This conclusion allows us to construct a Lyapunov function for the system using converse theorems found in [21].
B. Control Lyapunov Functions
- The preceding formulation of a Lyapunov function required the choice of the specific control law given in (6).
- For optimality purposes, it may be desirable to choose a different control input for the system, thus motivating the following definition.
- The authors see that the previously constructed Lyapunov function satisfying (10) satisfies (11) by choosing the control input specified in (6).
- (12) Information about the dynamics is encoded within the scalar function V̇ , offering a reduction in dimensionality which will become relevant later in learning.
- Here Sm+ denotes the set of m ×m symmetric positive semi-definite matrices.
A. Uncertainty Modeling Assumptions
- As defined in Section II, the authors consider affine robotic control systems that evolve under dynamics described by (1).
- The authors assume the estimated model (14) satisfies the relative degree condition on the domain R, and thus may use the method of feedback linearization to produce a Control Lyapunov Function (CLF), V , for the system.
- This holds since the true values of f̃ and g̃, if known, enable choosing control inputs as in (6) that respect the same linear output dynamics (8).
- Instead of learning the unknown dynamics terms A and b, which scale with both the dimension of the configuration space and the number of inputs, the authors will learn the terms a and b, which scale only with the number of inputs.
B. Motivating a Data-Driven Learning Approach
- The formulation from (15) and (16) defines a general class of dynamics uncertainty.
- To motivate their learningbased framework, first consider a simple approach of learning a and b via supervised regression [19]: the authors operate the system using some given state-feedback controller to gather data points along the system’s evolution and learn a function that approximates a and b via supervised learning.
- An experiment is defined as the evolution of the system over a finite time interval from the initial condition (q0,0) using a discrete-time implementation of the given controller.
- As a consequence, standard supervised learning with sequential, non-i.i.d data collection often leads to error cascades [24].
A. Episodic Learning Framework
- Episodic learning refers to learning procedures that iteratively alternates between executing an intermediate controller (also known as a roll-out in reinforcement learning [22]), collecting data from that roll-out, and designing a new controller using the newly collected data.
- The data set is aggregated and a new ERM problem is solved after each episode.
- Such exploration can be achieved by randomly perturbing the controller used in an experiment at each time step.
- Algorithm 1 specifies a method of computing a sequence of Lyapunov function derivative estimates and augmenting controllers.
- The trust coefficients form a monotonically nondecreasing sequence on the interval [0, 1].
B. Additional Controller Details
- This is done to avoid chatter that may arise from the optimization based nature of the CLF-QP formulation [27].
- Note that for this choice of Lyapunov function, the gradient ∂V∂η , and therefore a, approach 0 as η approaches 0, which occurs close to the desired trajectory.
- Such relative error causes the optimization problem in (20) to be poorly conditioned near the desired trajectory.
- As states approach the trajectory, the coefficient of the quadratic term decreases and enables relaxation of the exponential stability inequality constraint.
- The exploratory control during experiments is naively chosen as additive noise from a centered uniform distribution, with each coordinate drawn i.i.d.
V. APPLICATION ON SEGWAY PLATFORM
- In this section the authors apply the episodic learning algorithm constructed in Section IV to the Segway platform.
- The authors seek to track a pitch angle trajectory2 generated for the estimated model.
- The baseline PD controller and the augmented controller after 20 experiments can be seen in the right portion Fig. 3.
- The mean trajectory consistently improves in these later episodes as the trust factor increases.
- The variation increases but 2Trajectory was generated using the GPOPS-II Optimal Control Software 3Models were implemented in Keras remains small, indicating that the learning problem is robust to randomness in the initialization of the neural networks, in the network training algorithm, and in the noise added during the experiments.
Did you find this useful? Give us your feedback
Citations
132 citations
90 citations
Cites background or methods from "Episodic Learning with Control Lyap..."
...Learning-based approaches have already shown great promise for controlling systems with uncertain models (Schaal and Atkeson (2010); Kober et al. (2013); Khansari-Zadeh and Billard (2014); Cheng et al. (2019); Taylor et al. (2019b); Shi et al. (2019))....
[...]
...Future work will seek to investigate the impact of residual error on safe behavior through the analysis established in Taylor et al. (2019a)....
[...]
...Furthermore, we build upon recent work utilizing learning in the context of Control Lyapunov Functions (CLFs) (Taylor et al. (2019b)) to construct an approach for learning model uncertainty....
[...]
...Additional details on related work are provided in the extended version of this paper (Taylor et al. (2019c))....
[...]
...Instead, we take a data-driven approach similar to (Taylor et al. (2019b)) to learn uncertainty as it appears in the time derivative of the CBF, ḣ, given in (6)....
[...]
53 citations
52 citations
42 citations
References
9,020 citations
Additional excerpts
...Successful learning-based approaches have typically focused on learning model-based uncertainty [5], [8], [7], [37], or direct model-free controller design [25], [36], [14], [42], [24]....
[...]
4,225 citations
Additional excerpts
...Successful learning-based approaches have typically focused on learning model-based uncertainty [5], [8], [7], [37], or direct model-free controller design [25], [36], [14], [42], [24]....
[...]
2,391 citations
"Episodic Learning with Control Lyap..." refers background or methods in this paper
...Episodic learning refers to learning procedures that iteratively alternates between executing an intermediate controller (also known as a roll-out in reinforcement learning [22]), collecting data from that roll-out, and designing a new controller using the newly collected data....
[...]
...Learning-based approaches have already shown great promise for controlling imperfectly modeled robotic platforms [22], [35]....
[...]
1,931 citations
"Episodic Learning with Control Lyap..." refers methods in this paper
...To motivate our learningbased framework, first consider a simple approach of learning a and b via supervised regression [19]: we operate the system using some given state-feedback controller to gather data points along the system’s evolution and learn a function that approximates a and b via supervised learning....
[...]
1,925 citations
"Episodic Learning with Control Lyap..." refers background in this paper
...Input-Output (IO) Linearization is a nonlinear control method that creates stable linear dynamics for a selected set of outputs of a system [34]....
[...]
...Define twice-differentiable outputs y : Q → R, with k ≤ m, and assume each output has relative degree 2 on some domain R ⊆ Q (see [34] for details)....
[...]
Related Papers (5)
Frequently Asked Questions (12)
Q2. What are the future works in "Episodic learning with control lyapunov functions for uncertain robotic systems*" ?
There are two main interesting directions for future work.
Q3. How are the parameters of the model modified?
The parameters of the model (including mass, inertias, and motor parameters but excluding gravity) are randomly modified by up to 10% of their nominal values and are fixed for the simulations.
Q4. What is the definition of an experiment?
An experiment is defined as the evolution of the system over a finite time interval from the initial condition (q0,0) using a discrete-time implementation of the given controller.
Q5. What is the definition of supervised learning?
Episodic learning refers to learning procedures that iteratively alternates between executing an intermediate controller (also known as a roll-out in reinforcement learning [22]), collecting data from that roll-out, and designing a new controller using the newly collected data.
Q6. How do the authors specify the controller in the experiment?
During augmentation, the authors specify the controller in (20) by selecting the minimum-norm cost function:J(u′) = 12 ‖u(q, q̇, t) + u′‖22 , (21)for all u′ ∈ Rm, q ∈ Q, q̇ ∈ Rn, and t ∈ I.
Q7. What is the true system's time derivative?
Given that V is a CLF for the true system, its time derivative under uncertainty is given by:V̇ (η,u) =̂̇V (η,u)︷ ︸︸ ︷ ∂V∂η (f̂(q, q̇)− ṙ(t) + ĝ(q)u)+ ∂V∂η A(q)︸ ︷︷ ︸a(η,q)>u+ ∂V∂η b(q, q̇)︸ ︷︷ ︸b(η,q,q̇), (16)for all η ∈ R2k and u ∈ U .
Q8. What is the use of augmenting controllers?
During each episode, the augmenting controller associated with the estimate of the Lyapunov function derivative is scaled by a factor reflecting trust in the estimate and added to the nominal controller for use in the subsequent experiment.
Q9. What is the slack term used in the exploratory control?
The exploratory control during experiments is naively chosen as additive noise from a centered uniform distribution, with each coordinate drawn i.i.d.
Q10. What is the function of the supervised learning?
define ̂̇W as:̂̇W (η,q, q̇,u) = ̂̇V (η,u) + â(η,q)>u+ b̂(η,q, q̇), (18) and let H be the class of all such estimators mapping R2k× Q×Rn×U to R. Defining a loss function L : R×R→ R+, the supervised regression task is then to find a function in H via empirical risk minimization (ERM):inf â∈Ha b̂∈Hb1N N∑ i=1 L(̂̇W (ηi,qi, q̇i,ui), V̇i). (19)
Q11. What is the slack term in the inequality constraint?
The slack term is additionally incorporated into the cost function as:C(δ) = 12 C ∥∥∥∥∥ ( ∂V ∂η ĝ(q) )> + â(η,q) ∥∥∥∥∥ 22δ2, (22)for all δ ∈ R+, where C > 0.
Q12. What is the definition of a control system?
In practice, the authors do not know the dynamics of the system exactly, and instead develop their control systems using the estimated model:D̂(q)q̈+ Ĉ(q, q̇)q̇+ Ĝ(q)︸ ︷︷ ︸ Ĥ(q,q̇) = B̂u. (14)The authors assume the estimated model (14) satisfies the relative degree condition on the domain R, and thus may use the method of feedback linearization to produce a Control Lyapunov Function (CLF), V , for the system.