Non-parametric Approximate Dynamic Programming via the Kernel Method
Citations
43 citations
37 citations
Cites background from "Non-parametric Approximate Dynamic ..."
...Following a slightly different line of work, Bhat et al. (2012) propose to kernelize the linear programming formulation of dynamic programming....
[...]
29 citations
21 citations
21 citations
Cites methods from "Non-parametric Approximate Dynamic ..."
...…Farias and Van Roy, 2000; Tsitsiklis and Roy, 1996; Tsitsiklis and Van Roy, 1999; Geramifard et al., 2013), approximate linear programming (De Farias and Van Roy, 2003; De Farias and Van Roy, 2004; Desai et al., 2012a), and nonparametric methods are used (Ormoneit and Sen, 2002; Bhat et al., 2012)....
[...]
References
385 citations
"Non-parametric Approximate Dynamic ..." refers background in this paper
...This case is known as the approximate linear program (ALP), and was first proposed by Schweitzer and Seidman (1985). de Farias and Van Roy (2003) provided a pioneering analysis that, stated loosely, showed ‖J∗ − z∗>Φ‖1,ν ≤ 2 1− α infz ‖J ∗ − z>Φ‖∞, for an optimal solution z∗ to the ALP....
[...]
...Consider a discrete time Markov decision process with finite state space S and finite action space A....
[...]
...A policy is a map µ : S → A, so that Jµ(x) , Ex,µ [ ∞∑ t=0 αtgxt,at ] represents the expected (discounted, infinite horizon) cost-to-go under policy µ starting at state x, with the discount factor α ∈ (0, 1)....
[...]
...…J(x) ≤ ga,x + αEx,a[J(X ′)], ∀ x ∈ S, a ∈ A, J ∈ RS , for any strictly positive state-relevance weight vector ν ∈ RS+. Motivated by this, a series of ADP algorithms (Schweitzer and Seidman, 1985; de Farias and Van Roy, 2003; Desai et al., 2011) have been proposed that compute a weight vector z by…...
[...]
337 citations
"Non-parametric Approximate Dynamic ..." refers background in this paper
...This specific network has been studied by de Farias and Van Roy (2003); Chen and Meyn (1998); Kumar and Seidman (1990), for example, and closely related networks have been studied by Harrison and Wein (1989); Kushner and Martins (1996); Martins et al. (1996); Kumar and Muthuraman (2004)....
[...]
279 citations
"Non-parametric Approximate Dynamic ..." refers methods in this paper
...By substituting this parametric regression step with a suitable non-parametric regression procedure, Bethke et al. (2008), Engel et al. (2003), and Xu et al. (2007) come up with corresponding non-parametric algorithms....
[...]
...Via a computational study on a controlled queueing network, we show that our non-parametric procedure outperforms the state of the art parametric ADP approaches and established heuristics....
[...]
258 citations
"Non-parametric Approximate Dynamic ..." refers background or methods in this paper
...Similarly, Ernst et al. (2005) replace the local averaging procedure used for regression by Ormoneit and Sen (2002) with non-parametric regression procedures such as the tree-based learning methods....
[...]
...One then employs a policy that is greedy with respect to the corresponding approximation J̃ ....
[...]
...Another idea has been to use kernel-based local averaging ideas to approximate the solution of an MDP with that of a simpler variation on a sampled state space (e.g., Ormoneit and Sen, 2002; Ormoneit and Glynn, 2002; Barreto et al., 2011)....
[...]
...Via a computational study on a controlled queueing network, we show that our non-parametric procedure outperforms the state of the art parametric ADP approaches and established heuristics....
[...]
238 citations
"Non-parametric Approximate Dynamic ..." refers background in this paper
...This policy has been extensively studied and shown to have a number of good properties, for example, being throughput optimal (Dai and Lin, 2005) and offering good performance for critically loaded settings (Stolyar, 2004)....
[...]