Home
/
Authors
/
Iain Dunning

Author

Iain Dunning

Other affiliations: Google

Bio: Iain Dunning is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Reinforcement learning & Optimization problem. The author has an hindex of 18, co-authored 31 publications receiving 3700 citations. Previous affiliations of Iain Dunning include Google.

Papers

PDF

Open Access

More filters

Posted Content•

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

[...]

Lasse Espeholt¹, Hubert Soyer², Rémi Munos¹, Karen Simonyan¹, Volodymyr Mnih¹, Tom Ward¹, Yotam Doron³, Vlad Firoiu¹, Tim Harley¹, Iain Dunning¹, Shane Legg¹, Koray Kavukcuoglu¹ - Show less +8 more•Institutions (3)

Google¹, National Institute of Informatics², University College London³

05 Feb 2018-arXiv: Learning

TL;DR: A new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) is developed that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation.

...read moreread less

Abstract: In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters A key challenge is to handle the increased amount of data and extended training time We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al, 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al, 2013a)) Our results show that IMPALA is able to achieve better performance than previous agents with less data, and crucially exhibits positive transfer between tasks as a result of its multi-task approach

...read moreread less

1,088 citations

Journal Article•DOI•

JuMP: A Modeling Language for Mathematical Optimization

[...]

Iain Dunning, Joey Huchette, Miles Lubin

05 May 2017-Siam Review

TL;DR: JuMP as mentioned in this paper is an open-source modeling language that allows users to express a wide range of optimization problems (linear, mixed-integer, quadratic, conic-quadratic, semidefinite, and nonlinear) in a high-level, algebraic syntax.

...read moreread less

Abstract: JuMP is an open-source modeling language that allows users to express a wide range of optimization problems (linear, mixed-integer, quadratic, conic-quadratic, semidefinite, and nonlinear) in a high-level, algebraic syntax. JuMP takes advantage of advanced features of the Julia programming language to offer unique functionality while achieving performance on par with commercial modeling tools for standard tasks. In this work we will provide benchmarks, present the novel aspects of the implementation, and discuss how JuMP can be extended to new problem classes and composed with state-of-the-art tools for visualization and interactivity.

...read moreread less

1,056 citations

Journal Article•DOI•

JuMP: A Modeling Language for Mathematical Optimization

[...]

Iain Dunning, Joey Huchette, Miles Lubin

09 Aug 2015-arXiv: Optimization and Control

...read moreread less

907 citations

Journal Article•DOI•

Human-level performance in first-person multiplayer games with population-based deep reinforcement learning.

[...]

Max Jaderberg, Wojciech Marian Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio García Castañeda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel - Show less +14 more

03 Jul 2018-arXiv: Learning

TL;DR: In this article, the authors demonstrate that an agent can achieve human-level performance in a popular 3D multiplayer first-person video game, Quake III Arena Capture the Flag, using only pixels and game points as input.

...read moreread less

Abstract: Recent progress in artificial intelligence through reinforcement learning (RL) has shown great success on increasingly complex single-agent environments and two-player turn-based games. However, the real-world contains multiple agents, each learning and acting independently to cooperate and compete with other agents, and environments reflecting this degree of complexity remain an open challenge. In this work, we demonstrate for the first time that an agent can achieve human-level in a popular 3D multiplayer first-person video game, Quake III Arena Capture the Flag, using only pixels and game points as input. These results were achieved by a novel two-tier optimisation process in which a population of independent RL agents are trained concurrently from thousands of parallel matches with agents playing in teams together and against each other on randomly generated environments. Each agent in the population learns its own internal reward signal to complement the sparse delayed reward from winning, and selects actions using a novel temporally hierarchical representation that enables the agent to reason at multiple timescales. During game-play, these agents display human-like behaviours such as navigating, following, and defending based on a rich learned representation that is shown to encode high-level game knowledge. In an extensive tournament-style evaluation the trained agents exceeded the win-rate of strong human players both as teammates and opponents, and proved far stronger than existing state-of-the-art agents. These results demonstrate a significant jump in the capabilities of artificial agents, bringing us closer to the goal of human-level intelligence.

...read moreread less

427 citations

Journal Article•DOI•

Computing in Operations Research using Julia

[...]

Miles Lubin¹, Iain Dunning¹•Institutions (1)

Massachusetts Institute of Technology¹

16 Mar 2015-Informs Journal on Computing

TL;DR: This paper explores how Julia, a modern programming language for numerical computing that claims to bridge this divide by incorporating recent advances in language and compiler design, can be used for implementing software and algorithms fundamental to the field of operations research, with a focus on mathematical optimization.

...read moreread less

Abstract: The state of numerical computing is currently characterized by a divide between highly efficient yet typically cumbersome low-level languages such as C, C++, and Fortran and highly expressive yet typically slow high-level languages such as Python and MATLAB. This paper explores how Julia, a modern programming language for numerical computing that claims to bridge this divide by incorporating recent advances in language and compiler design (such as just-in-time compilation), can be used for implementing software and algorithms fundamental to the field of operations research, with a focus on mathematical optimization. In particular, we demonstrate algebraic modeling for linear and nonlinear optimization and a partial implementation of a practical simplex code. Extensive cross-language benchmarks suggest that Julia is capable of obtaining state-of-the-art performance. Data, as supplemental material, are available at http://dx.doi.org/10.1287/ijoc.2014.0623.

...read moreread less

340 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Posted Content•

Representation Learning with Contrastive Predictive Coding

[...]

Aaron van den Oord¹, Yazhe Li¹, Oriol Vinyals¹•Institutions (1)

Google¹

10 Jul 2018-arXiv: Learning

TL;DR: This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.

...read moreread less

Abstract: While supervised learning has enabled great progress in many applications, unsupervised learning has not seen such widespread adoption, and remains an important and challenging endeavor for artificial intelligence. In this work, we propose a universal unsupervised learning approach to extract useful representations from high-dimensional data, which we call Contrastive Predictive Coding. The key insight of our model is to learn such representations by predicting the future in latent space by using powerful autoregressive models. We use a probabilistic contrastive loss which induces the latent space to capture information that is maximally useful to predict future samples. It also makes the model tractable by using negative sampling. While most prior work has focused on evaluating representations for a particular modality, we demonstrate that our approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.

...read moreread less

5,444 citations

Journal Article•DOI•

Julia: A Fresh Approach to Numerical Computing

[...]

Jeff Bezanson, Alan Edelman, Stefan Karpinski, Viral B. Shah

07 Feb 2017-Siam Review

TL;DR: The Julia programming language as mentioned in this paper combines expertise from the diverse fields of computer science and computational science to create a new approach to numerical computing, which is designed to be easy and fast and questions notions generally held to be “laws of nature" by practitioners of numerical computing.

...read moreread less

Abstract: Bridging cultures that have often been distant, Julia combines expertise from the diverse fields of computer science and computational science to create a new approach to numerical computing. Julia is designed to be easy and fast and questions notions generally held to be “laws of nature" by practitioners of numerical computing: \beginlist \item High-level dynamic programs have to be slow. \item One must prototype in one language and then rewrite in another language for speed or deployment. \item There are parts of a system appropriate for the programmer, and other parts that are best left untouched as they have been built by the experts. \endlist We introduce the Julia programming language and its design---a dance between specialization and abstraction. Specialization allows for custom treatment. Multiple dispatch, a technique from computer science, picks the right algorithm for the right circumstance. Abstraction, which is what good computation is really about, recognizes what remains the same after dif...

...read moreread less

3,348 citations

Journal Article•DOI•

Grandmaster level in StarCraft II using multi-agent reinforcement learning.

[...]

Oriol Vinyals, Igor Babuschkin, Wojciech Marian Czarnecki, Michael Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard E. Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Max Jaderberg, Alexander Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy, Tom Le Paine, Caglar Gulcehre, Ziyu Wang, Tobias Pfaff, Yuhuai Wu, Roman Ring, Dani Yogatama, Dario Wünsch, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy P. Lillicrap, Koray Kavukcuoglu, Demis Hassabis, Chris Apps, David Silver - Show less +38 more

30 Oct 2019-Nature

TL;DR: The agent, AlphaStar, is evaluated, which uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.

...read moreread less

Abstract: Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions1-3, the strongest agents have simplified important aspects of the game, utilized superhuman capabilities, or employed hand-crafted sub-systems4. Despite these advantages, no previous agent has come close to matching the overall skill of top StarCraft players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counter-strategies, each represented by deep neural networks5,6. We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players.

...read moreread less

2,595 citations

Journal Article•DOI•

CasADi: a software framework for nonlinear optimization and optimal control

[...]

Joel Andersson¹, Joris Gillis², Greg Horn, James B. Rawlings¹, Moritz Diehl³ - Show less +1 more•Institutions (3)

University of Wisconsin-Madison¹, Katholieke Universiteit Leuven², University of Freiburg³

20 Mar 2019-Mathematical Programming Computation

TL;DR: This article gives an up-to-date and accessible introduction to the CasADi framework, which has undergone numerous design improvements over the last 7 years.

...read moreread less

Abstract: We present CasADi, an open-source software framework for numerical optimization. CasADi is a general-purpose tool that can be used to model and solve optimization problems with a large degree of flexibility, larger than what is associated with popular algebraic modeling languages such as AMPL, GAMS, JuMP or Pyomo. Of special interest are problems constrained by differential equations, i.e. optimal control problems. CasADi is written in self-contained C++, but is most conveniently used via full-featured interfaces to Python, MATLAB or Octave. Since its inception in late 2009, it has been used successfully for academic teaching as well as in applications from multiple fields, including process control, robotics and aerospace. This article gives an up-to-date and accessible introduction to the CasADi framework, which has undergone numerous design improvements over the last 7 years.

...read moreread less

2,056 citations

Posted Content•

Addressing Function Approximation Error in Actor-Critic Methods

[...]

Scott Fujimoto¹, Herke van Hoof², David Meger¹•Institutions (2)

McGill University¹, University of Amsterdam²

26 Feb 2018-arXiv: Artificial Intelligence

TL;DR: This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias.

...read moreread less

Abstract: In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.

...read moreread less

1,968 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse