scispace - formally typeset
Search or ask a question
Author

John N. Tsitsiklis

Bio: John N. Tsitsiklis is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Queueing theory & Markov decision process. The author has an hindex of 90, co-authored 412 publications receiving 49845 citations. Previous affiliations of John N. Tsitsiklis include Stanford University & SiRF Technology Holdings, Inc..


Papers
More filters
Book
01 Jan 1989
TL;DR: This work discusses parallel and distributed architectures, complexity measures, and communication and synchronization issues, and it presents both Jacobi and Gauss-Seidel iterations, which serve as algorithms of reference for many of the computational approaches addressed later.
Abstract: gineering, computer science, operations research, and applied mathematics. It is essentially a self-contained work, with the development of the material occurring in the main body of the text and excellent appendices on linear algebra and analysis, graph theory, duality theory, and probability theory and Markov chains supporting it. The introduction discusses parallel and distributed architectures, complexity measures, and communication and synchronization issues, and it presents both Jacobi and Gauss-Seidel iterations, which serve as algorithms of reference for many of the computational approaches addressed later. After the introduction, the text is organized in two parts: synchronous algorithms and asynchronous algorithms. The discussion of synchronous algorithms comprises four chapters, with Chapter 2 presenting both direct methods (converging to the exact solution within a finite number of steps) and iterative methods for linear

5,597 citations

Book
01 Jan 1996
TL;DR: This is the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.
Abstract: From the Publisher: This is the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control

3,665 citations

Book
01 Jan 1997
TL;DR: p. 27, l.
Abstract: p. 27, l. −11, replace “Schwartz” by “Schwarz” p. 69, l. −13: “ai∗x = bi” should be “ai∗x = bi∗” p. 126, l. 16, replace “inequality constraints” by “linear inequality constraints” p. 153, l. −8, replace aix 6= bi by aix 6= bi p. 163, Example 4.9, first line: replace “from” with “form” p. 165, l. 11, replace p′Ax ≥ 0 by p′Ax ≥ 0 p. 175, l. 1, replace “To this see” by “To see this” p. 203, l. 12: replace x ≥ 0 by x ≥ 0, xn+1 ≥ 0 p. 216, l. −6: replace “≤ c}” by “≤ c′}” p. 216, l. −3: replace c′ by (c1)′ p. 216, l. −2: replace c′ by (c2)′ p. 216, l. −1: right-hand side should be λ(c1)′ + (1− λ)(c2)′ p. 220, l. −12: replace “added to the pivot row” by “added to the zeroth row”

2,780 citations

Book
01 Jan 2002
TL;DR: This thesis proposes and studies actor-critic algorithms which combine the above two approaches with simulation to find the best policy among a parameterized class of policies, and proves convergence of the algorithms for problems with general state and decision spaces.
Abstract: Many complex decision making problems like scheduling in manufacturing systems, portfolio management in finance, admission control in communication networks etc., with clear and precise objectives, can be formulated as stochastic dynamic programming problems in which the objective of decision making is to maximize a single “overall” reward. In these formulations, finding an optimal decision policy involves computing a certain “value function” which assigns to each state the optimal reward one would obtain if the system was started from that state. This function then naturally prescribes the optimal policy, which is to take decisions that drive the system to states with maximum value. For many practical problems, the computation of the exact value function is intractable, analytically and numerically, due to the enormous size of the state space. Therefore one has to resort to one of the following approximation methods to find a good sub-optimal policy: (1) Approximate the value function. (2) Restrict the search for a good policy to a smaller family of policies. In this thesis, we propose and study actor-critic algorithms which combine the above two approaches with simulation to find the best policy among a parameterized class of policies. Actor-critic algorithms have two learning units: an actor and a critic. An actor is a decision maker with a tunable parameter. A critic is a function approximator. The critic tries to approximate the value function of the policy used by the actor, and the actor in turn tries to improve its policy based on the current approximation provided by the critic. Furthermore, the critic evolves on a faster time-scale than the actor. We propose several variants of actor-critic algorithms. In all the variants, the critic uses Temporal Difference (TD) learning with linear function approximation. Some of the variants are inspired by a new geometric interpretation of the formula for the gradient of the overall reward with respect to the actor parameters. This interpretation suggests a natural set of basis functions for the critic, determined by the family of policies parameterized by the actor's parameters. We concentrate on the average expected reward criterion but we also show how the algorithms can be modified for other objective criteria. We prove convergence of the algorithms for problems with general (finite, countable, or continuous) state and decision spaces. To compute the rate of convergence (ROC) of our algorithms, we develop a general theory of the ROC of two-time-scale algorithms and we apply it to study our algorithms. In the process, we study the ROC of TD learning and compare it with related methods such as Least Squares TD (LSTD). We study the effect of the basis functions used for linear function approximation on the ROC of TD. We also show that the ROC of actor-critic algorithms does not depend on the actual basis functions used in the critic but depends only on the subspace spanned by them and study this dependence. Finally, we compare the performance of our algorithms with other algorithms that optimize over a parameterized family of policies. We show that when only the “natural” basis functions are used for the critic, the rate of convergence of the actor critic algorithms is the same as that of certain stochastic gradient descent algorithms. However, with appropriate additional basis functions for the critic, we show that our algorithms outperform the existing ones in terms of ROC. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

1,766 citations

Journal ArticleDOI
TL;DR: A model for asynchronous distributed computation is presented and it is shown that natural asynchronous distributed versions of a large class of deterministic and stochastic gradient-like algorithms retain the desirable convergence properties of their centralized counterparts.
Abstract: We present a model for asynchronous distributed computation and then proceed to analyze the convergence of natural asynchronous distributed versions of a large class of deterministic and stochastic gradient-like algorithms. We show that such algorithms retain the desirable convergence properties of their centralized counterparts, provided that the time between consecutive interprocessor communications and the communication delays are not too large.

1,761 citations


Cited by
More filters
Book
01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

37,989 citations

Journal ArticleDOI
26 Feb 2015-Nature
TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Abstract: The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

23,074 citations

Book
23 May 2011
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

17,433 citations

Journal ArticleDOI
TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

14,635 citations

Book
01 Apr 2003
TL;DR: This chapter discusses methods related to the normal equations of linear algebra, and some of the techniques used in this chapter were derived from previous chapters of this book.
Abstract: Preface 1. Background in linear algebra 2. Discretization of partial differential equations 3. Sparse matrices 4. Basic iterative methods 5. Projection methods 6. Krylov subspace methods Part I 7. Krylov subspace methods Part II 8. Methods related to the normal equations 9. Preconditioned iterations 10. Preconditioning techniques 11. Parallel implementations 12. Parallel preconditioners 13. Multigrid methods 14. Domain decomposition methods Bibliography Index.

13,484 citations