A Control Lyapunov Perspective on Episodic Learning via Projection to State Stability
Summary (2 min read)
Introduction
- Properly characterizing uncertainty is a key aspect of robust control [35].
- This low-dimensional form is also appealing from a learning perspective, as learning is typically more tractable in lowerdimensional spaces [32], [34], [31].
- Other approaches, such as those based on adaptive control [18], can adaptively learn a CLF but are restricted to learning over specific classes of model uncertainty.
- Section III defines Projection to State Stability (PSS), and how PSS enables constructing bounds on the state of a system that depend on a projected disturbance.
II. PRELIMINARIES
- This section provides a review of Control Lyapunov Functions (CLFs) and Input to State Stability (ISS).
- The following definitions, taken from [17], are useful in analyzing stability of (1).
- The authors note that the strictly increasing nature of Class K (K∞) functions permits an inverse Class K (K∞) function α−1 : [0, α(a)) → R+. Definition 3 (Control Lyapunov Function).
- The disturbance may be time-varying, state-dependent, and/or input-dependent.
III. PROJECTION TO STATE STABILITY
- This requirement does not easily permit analysis of Input to State behavior when the disturbance is more easily described by its impact in a Lyapunov function derivative.
- This limitation motivates Projection to State Stability (PSS), which instead relies a bound on the state in terms of a projection of the disturbance.
- The authors are now ready to state their main definition.
- Definition 9 (Projection to State Stable Control Lyapunov Function).
- If the system governed by (7) has a PSS-CLF, then the system governed by (3) is PSS with respect to the projection Π. Proof.
A. Uncertain Affine Systems
- Note that the disturbance d = A(x)u+b(x) is explicitly characterized as time-invariant, state-dependent, and input-dependent, with potentially unknown A(x) and b(x) for all x ∈ X .
- As discussed in [2], [31], CLFs may be constructively formed for affine systems under proper assumptions regarding relative degree and unbounded control.
- Furthermore, if the true system satisfies the relative degree properties of the estimated model, then the CLF found for the estimated system can be used for the true system.
- In (26) the residual terms a and b capture the effect of the unmodeled dynamics on the Lyapunov function derivative.
- In (27) the residual terms reflect the error in estimating this effect.
B. Projection to State Stability via Uncertainty Functions
- From this point forward the authors limit their attention to a subset of the state space and make a critical assumption regarding the estimate ˆ̇V for a CLF V .
- If the estimated and true system satisfy the same relative degree property, then this assumption amounts to the addition of estimates â and b̂ not violating the relative degree property.
- Theorem 2 (Sufficient Conditions for PSS in Affine Control Systems).
- The authors state this formally in the next result.
C. Uncertainty Function Construction
- Assume A and b are Lipschitz continuous with constants LA and Lb, respectively.
- By including such estimators, the observed loss term may be reduced, but the bound must be modified with the following additional continuous function: H(x,x ′,u′) = |(â(x)− â(x′))>u′ + b̂(x)− b̂(x′)|, (52) which accounts for potential error in the estimation at the test point.
- The authors now explore the practical interplay between learning and systematic improvement of PSS properties, in particular by decreasing the upper bound in (51).
A. Episodic Learning Framework
- The authors demonstrate the practicality of PSS by incorporating it into an episodic learning framework based on learning CLF time derivatives [31].
- Controller improvement is achieved by alternating between executing a controller to gather data and refining estimates of residual uncertainty.
- Here S2m+ denotes the set of positive semidefinite matrices of size 2m × 2m.
- Algorithm 1 Dataset Aggregation for Control Lyapunov Functions [31].
- Should Ha and Hb be classes of Lipschitz continuous estimators, the upper bound (52) can be weakened further using the associated Lipschitz constants to permit further analysis of the uncertainty function specified in (43).
B. Simulation Results
- The true mass and the length are perturbed by up to 30% of their estimated values.
- The estimators are chosen from the class of two layer neural networks with 200 hidden units and ReLU nonlinearities, mapping concatenated state and Lyapunov function gradients to Rm and R.
- The trust factors are chosen in a sigmoid fashion.
- A comparison of the baseline controller and final augmented controller demonstrating improved tracking performance is shown in Fig.
- The bounds are small along the observed trajectory, in comparison.
VI. CONCLUSION
- The authors presented a novel low-dimensional view of stability for uncertain systems and a method of evaluating PSS behavior using experimental data.
- Quantifying the impact of learning on PSS provides an objective for deciding how to collect data, also known as the exploration problem in learning literature [22], [6], [10], [9], [30].
- In particular, reductions of the uncertainty bound may be used to formulate regret in online learning settings or reward in imitation and reinforcement learning settings.
Did you find this useful? Give us your feedback
Citations
90 citations
Cites background or methods from "A Control Lyapunov Perspective on E..."
...Learning-based approaches have already shown great promise for controlling systems with uncertain models (Schaal and Atkeson (2010); Kober et al. (2013); Khansari-Zadeh and Billard (2014); Cheng et al. (2019); Taylor et al. (2019b); Shi et al. (2019))....
[...]
...Future work will seek to investigate the impact of residual error on safe behavior through the analysis established in Taylor et al. (2019a)....
[...]
...Furthermore, we build upon recent work utilizing learning in the context of Control Lyapunov Functions (CLFs) (Taylor et al. (2019b)) to construct an approach for learning model uncertainty....
[...]
...Additional details on related work are provided in the extended version of this paper (Taylor et al. (2019c))....
[...]
...Future work will seek to investigate the impact of residual error on safe behavior through the analysis established in Taylor et al. (2019a). Furthermore, this work will be used in the development of a safe exploration framework that actively collects data relevant to both the CLF and CBF learning problems....
[...]
48 citations
Additional excerpts
...This extends the generalizability of the estimator in its use by subsequent controllers, and improves stability results as explored in [43]....
[...]
42 citations
18 citations
8 citations
References
91 citations
"A Control Lyapunov Perspective on E..." refers background in this paper
...Quantifying the impact of learning on PSS provides an objective for deciding how to collect data, also known as the exploration problem in learning literature [21], [5], [9], [8], [30]....
[...]
86 citations
"A Control Lyapunov Perspective on E..." refers background in this paper
...The use of CLFs has seen multiple applications in recent years [19], [14], [23], and one of their primary benefits is to enable control objectives to be represented in a low-dimensional form that can be utilized with optimization-based real-time controllers [3]....
[...]
75 citations
49 citations
"A Control Lyapunov Perspective on E..." refers background in this paper
...In many cases, extensive tuning upon deployment is necessary [19], and even with this tuning the system is often not able to track a desired state or trajectory perfectly....
[...]
...The use of CLFs has seen multiple applications in recent years [19], [14], [23], and one of their primary benefits is to enable control objectives to be represented in a low-dimensional form that can be utilized with optimization-based real-time controllers [3]....
[...]
48 citations
"A Control Lyapunov Perspective on E..." refers background or methods in this paper
...We demonstrate the practicality of PSS by incorporating it into an episodic learning framework based on learning CLF time derivatives [31]....
[...]
...As discussed in [2], [31], CLFs may be constructively formed for affine systems under proper assumptions regarding relative degree and unbounded control....
[...]
...We briefly describe the DaCLyF (Dataset Aggregation for Control Lyapunov Functions) learning approach from [31]....
[...]
...With the increasing use of learning for dynamics modelling and control synthesis [5], [10], [8], [11], [31], [24], it is correspondingly important to develop new tools to reason about the interplay between learning and robust control....
[...]
...This lowdimensional representation is also appealing from a learning perspective, as learning is typically more tractable in lowerdimensional spaces [32], [34], [31]....
[...]