Q2. What is the NP membership for acyclic simple pMCsM?
The authors focus ourselves on graph-preserving instantiations, as the analysis of pMDP M and PwdM corresponds to analysing constantly many pMDPsM′ on PgpM′ , cf. Rem.
Q3. What is the transition probability function of a stationary strategy?
Its transition probability function PM is obtained by lettingPM(s, b, s′) = ∑ a∈As σ(a|s)P (s′|s, a, b) (6)for all s, s′ ∈ S and actions b ∈ Bs of player II.
Q4. What is the minimum criterion for a scheduler?
In the non-parametric case, a scheduler σ of an MDP is calledminimal if it minimises Prσ(♦T ), i.e. if σ ∈ argminσ′∈Σ Prσ ′ (♦T ).
Q5. What is the simplest way to construct a polynomial?
Essentially the polynomial f in mb4FEAS is constructed by taking the sum-of-squares of the quadratic polynomials, and further operations are adequatly shifting the polynomial.
Q6. What is the main focus of reinforcement learning?
Robust strategies have been widely studied in the field of operations research (see, e.g., [40, 57]) and are the main focus of reinforcement learning [55].
Q7. How does the game pick a successor state?
The game then picks a successor state s′ according to a fixed probability distribution P (·|s, a, b) over S, and the play continues in s′.
Q8. What is the complexity of deciding membership for sentences with an a-priori fixed upper?
In particular, deciding membership for sentences with an a-priori fixed upper bound on the number of variables is in polynomial time.
Q9. What is the definition of a deterministic scheduler?
The authors denote the set of randomised schedulers with RΣ, and (deterministic) schedulers with Σ. For pMDP M = (S,X,Act, sι, P ) and σ ∈ RΣM, the induced pMC Mσ is defined as(S,X, sι, P ′) with P ′(s, s′) = ∑ a∈Act σ(s)(a) · P (s, a, s′).
Q10. What is the definition of a complexity class?
ETR denotes the complexity class [48] of problems with a polynomial-time many-one reduction to deciding membership in the existential theory of the reals.
Q11. What is the probability to reach the target?
The gadget in Fig. 3 ensures that for any graph non-preserving instantiation, the probability to reach the target is 0, while it does not affect reachability probabilities for graph-preserving instantiations.