Human-level control through deep reinforcement learning
Citations
2,170 citations
Cites background from "Human-level control through deep re..."
..., 2015), to game play (Mnih et al., 2015; Silver et al., 2016; Moravč́ık et al., 2017), are a testament to this minimalist principle....
[...]
...…domains, from image classification (Krizhevsky et al., 2012; Szegedy et al., 2017), to natural language processing (Sutskever et al., 2014; Bahdanau et al., 2015), to game play (Mnih et al., 2015; Silver et al., 2016; Moravč́ık et al., 2017), are a testament to this minimalist principle....
[...]
2,079 citations
Cites background or result from "Human-level control through deep re..."
...Learning in simulation is especially promising for building on recent results using deep reinforcement learning to achieve human-level performance on tasks like Atari [27] and robotic control [21], [38]....
[...]
...It often requires hundreds of thousands or millions of samples [29], which could take thousands of hours to collect, making it impractical for many applications....
[...]
...Learning in simulation is especially promising for building on recent results using deep reinforcement learning to achieve human-level performance on tasks like Atari [29] and robotic control [26], [41]....
[...]
2,010 citations
2,010 citations
Cites background or methods or result from "Human-level control through deep re..."
...(6) DDQN is the same as for DQN (see Mnih et al. (2015)), but with the target yDQNi replaced by y DDQN i ....
[...]
...The results illustrate vast improvements over the single-stream baselines of Mnih et al. (2015) and van Hasselt et al. (2015)....
[...]
...Training of the dueling architectures, as with standard Q networks (e.g. the deep Q-network of Mnih et al. (2015)), requires only back-propagation....
[...]
...A key innovation in (Mnih et al., 2015) was to freeze the parameters of the target network Q(s′, a′; θ−) for a fixed number of iterations while updating the online network Q(s, a; θi) by gradient descent....
[...]
...…architecture is complementary to algorithmic innovations, we show that it improves performance for both the uniform and the prioritized replay baselines (for which we picked the easier to implement rank-based variant), with the resulting prioritized dueling variant holding the new state-of-the-art....
[...]
1,968 citations
Cites methods from "Human-level control through deep re..."
...In deep Q-learning (Mnih et al., 2015), the network is updated by using temporal difference learning with a secondary frozen target network Qθ′(s, a) to maintain a fixed...
[...]
...In deep Q-learning (Mnih et al., 2015), the network is updated by using temporal difference learning with a secondary frozen target network Qθ′(s, a) to maintain a fixed objective y over multiple updates: y = r + γQθ′(s ′, a′), a′ ∼ πφ′(s′), (3) where the actions are selected from a target actor…...
[...]
References
73,978 citations
42,067 citations
37,989 citations
30,124 citations
16,717 citations