Asymptotic Optimization of a Nonlinear Hybrid System Governed by a Markov Decision Process
read more
Citations
Towars a Theory of Stochastic Hybrid Systems
A compositional modelling and analysis framework for stochastic hybrid systems
Measurability and safety verification for stochastic hybrid systems
Safety verification for probabilistic hybrid systems
Optimal and Hierarchical Controls in Dynamic Stochastic Manufacturing Systems: A Survey
References
Optimization and nonsmooth analysis
Controlled Markov processes and viscosity solutions
Finite Markov chains
Deterministic and stochastic optimal control
Related Papers (5)
Frequently Asked Questions (9)
Q2. what is the class of stationary policies?
The class of stationary policies is compact; i.e., for any sequence u(i) ∈ S, there exists a subsequence u(ij) such that the policy u∗ = limj→∞ u(ij) (i.e., the policy for which u∗(a|x) = limj→∞ u(ij)(a|x) for all a and x) is stationary.
Q3. What is the proof of theorem 2.1?
In the linear case studied in [2], γ can be chosen such thatlim →0−(1/2)γ( ) = 0.Hence, for the linear case, simple bounds on the rate of convergence are available for Lemmas 2.1 and 2.2 as well as for Theorem 2.1.
Q4. What is the solution of phpLEMMA 4.1?
phpLEMMA 4.1. Let yit(h), i = 1, 2, be functions of time t and state-action histories h. Let zit(h) be the solution of (9) obtained with y i t(h) (h is fixed), i = 1, 2.
Q5. What is the simplest way to solve the lemma?
Substituting the last inequality in (42), one obtainsmax t∈[0,1]Eu (y)x ||Zt − z y t ||1 ≤ L [ ∆( ) + (L1 + L2)∆( ) + L3µ(K( ))] ,which, by (43), completes the proof of the lemma.
Q6. What is the simplest way to explain the concept of stationary policy?
Since for any initial distribution ξ and for any stationary policy s(i), the authors haveψ0 = η(s(i)), P s(i) ξ a.s.,(38)it follows by choosing the sequence of times t(i) so that the intervals t(i + 1) − t(i) are sufficiently large, that (36) implies thatlim i→∞
Q7. What is the proof of the Pontryagin maximus?
As in this theorem, one can also establish thatlim →0B (z, x, s) = B0(z, s),with the convergence being uniform with respect to s ∈ [0, 1], x ∈ X, and z ∈ Z, where Z is a compact subset of Rn.Notice that the described approach has a decomposition structure.
Q8. what is the proof of a stationary policy?
It follows by arguments as in the first part of the proof that there exist sequences of times t(i) and of stationary policies s(i), and a constant α4 > 0 such that for all i,E s(i) ξ d 2 t(i) ≥ α4(36)for any initial distribution ξ.
Q9. What is the optimal value of the above problem?
The optimal value of the above problem does not depend on the initial distribution ξ, and it is equal to the optimal value of the following linear programming problem:Jξ(z, λ) = J(z, λ) def= minη {∑ v,a r(z, λ; v, a)η(v, a)|η = {η(v, a)} ∈W } (19)= λT f1(z) + min η{ λT f2(z)∑ v,ay(v, a)η(v, a)|η = {η(v, a)} ∈W } .b)