The complexity of reachability in parametric Markov decision processes

Question

Q1. What are the contributions in "On the complexity of reachability in parametric markov decision processes" ?

Q2. What is the NP membership for acyclic simple pMCsM?

Q3. What is the transition probability function of a stationary strategy?

Q4. What is the minimum criterion for a scheduler?

Q5. What is the simplest way to construct a polynomial?

Q6. What is the main focus of reinforcement learning?

Q7. How does the game pick a successor state?

Q8. What is the complexity of deciding membership for sentences with an a-priori fixed upper?

Q9. What is the definition of a deterministic scheduler?

Q10. What is the definition of a complexity class?

Q11. What is the probability to reach the target?

Accepted Answer

This paper studies parametric Markov decision processes ( pMDPs ), an extension to Markov decision processes ( MDPs ) where transitions probabilities are described by polynomials over a finite set of parameters. In particular, this paper studies the complexity of finding values for these parameters such that the induced MDP satisfies some reachability constraints. The authors discuss different variants depending on the comparison operator in the constraints and the domain of the parameter values. The authors improve all known lower bounds for this problem, and notably provide ETR-completeness results for distinct variants of this problem. Furthermore, the authors provide insights in the functions describing the induced reachability probabilities, and how pMDPs generalise concurrent stochastic reachability games.

Accepted Answer

The authors focus ourselves on graph-preserving instantiations, as the analysis of pMDP M and PwdM corresponds to analysing constantly many pMDPsM′ on PgpM′ , cf. Rem.

Accepted Answer

Its transition probability function PM is obtained by lettingPM(s, b, s′) = ∑ a∈As σ(a|s)P (s′|s, a, b) (6)for all s, s′ ∈ S and actions b ∈ Bs of player II.

Accepted Answer

In the non-parametric case, a scheduler σ of an MDP is calledminimal if it minimises Prσ(♦T ), i.e. if σ ∈ argminσ′∈Σ Prσ ′ (♦T ).

Accepted Answer

Essentially the polynomial f in mb4FEAS is constructed by taking the sum-of-squares of the quadratic polynomials, and further operations are adequatly shifting the polynomial.

Accepted Answer

Robust strategies have been widely studied in the field of operations research (see, e.g., [40, 57]) and are the main focus of reinforcement learning [55].

Accepted Answer

The game then picks a successor state s′ according to a fixed probability distribution P (·|s, a, b) over S, and the play continues in s′.

Accepted Answer

In particular, deciding membership for sentences with an a-priori fixed upper bound on the number of variables is in polynomial time.

Accepted Answer

The authors denote the set of randomised schedulers with RΣ, and (deterministic) schedulers with Σ. For pMDP M = (S,X,Act, sι, P ) and σ ∈ RΣM, the induced pMC Mσ is defined as(S,X, sι, P ′) with P ′(s, s′) = ∑ a∈Act σ(s)(a) · P (s, a, s′).

Accepted Answer

ETR denotes the complexity class [48] of problems with a polynomial-time many-one reduction to deciding membership in the existential theory of the reals.

Accepted Answer

The gadget in Fig. 3 ensures that for any graph non-preserving instantiation, the probability to reach the target is 0, while it does not affect reachability probabilities for graph-preserving instantiations.

The complexity of reachability in parametric Markov decision processes

Figures

Citations

On the Complexity of Reachability in Parametric Markov Decision Processes

Fine-Tuning the Odds in Bayesian Networks

Gradient-Descent for Randomized Controllers Under Partial Observability

Tweaking the Odds in Probabilistic Timed Automata

Probabilistic Timed Automata with One Clock and Initialised Clock-Dependent Probabilities

References

Artificial Intelligence: A Modern Approach

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes

Principles of Model Checking

PRISM 4.0: verification of probabilistic real-time systems

Related Papers (5)