scispace - formally typeset
Search or ask a question
Book ChapterDOI

The Games Computers (and People) Play

30 Jul 2000-Vol. 52, pp 1179
TL;DR: The past, present, and future of the development of games-playing programs are discussed and some surprising changes of direction occurring that will result in games being more of an experimental testbed for mainstream AI research, with less emphasis on building world-championship-caliber programs.
Abstract: The development of high-performance game-playing programs has been one of the major successes of artificial intelligence research. The results have been outstanding but, with one notable exception (Deep Blue), they have not been widely disseminated. This talk will discuss the past, present, and future of the development of games-playing programs. Case studies for backgammon, bridge, checkers, chess, go, hex, Othello, poker, and Scrabble will be used. The research emphasis of the past has been on high performance (synonymous with brute-force search) for twoplayer perfect-information games. The research emphasis of the present encompasses multi-player imperfect/nondeterministic information games. And what of the future? There are some surprising changes of direction occurring that will result in games being more of an experimental testbed for mainstream AI research, with less emphasis on building world-championship-caliber programs. One of the most profound contributions to mankind’s knowledge has been made by the artificial intelligence (AI) research community: the realization that intelligence is not uniquely human. 1 Using computers, it is possible to achieve human-like behavior in nonhumans. In other words, the illusion of human intelligence can be created in a computer. This idea has been vividly illustrated throughout the history of computer games research. Unlike most of the early work in AI, game researchers were interested in developing high-performance, real-time solutions to challenging problems. This led to an ends-justify-the-means attitude: the result—a strong chess program—was all that mattered, not the means by which it was achieved. In contrast, much of the mainstream AI work used simplified domains, while eschewing real-time performance objectives. This research typically used human intelligence as a model: one only had to emulate the human example to achieve intelligent behavior. The battle (and philosophical) lines were drawn. The difference in philosophy can be easily illustrated. The human brain and the computer are different machines, each with its own sets of strengths and weaknesses. Humans are good at, for example, learning, reasoning by analogy, and

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
28 Jan 2016-Nature
TL;DR: Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
Abstract: The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of stateof-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

14,377 citations

Journal ArticleDOI
TL;DR: The Monte-Carlo revolution in computer Go is surveyed, the key ideas that led to the success of MoGo and subsequent Go programs are outlined, and for the first time a comprehensive description, in theory and in practice, of this extended framework for Monte- Carlo tree search is provided.

375 citations

Proceedings Article
04 Aug 2001
TL;DR: This paper introduces two properties as desirable for a learning agent when in the presence of other learning agents, namely rationality and convergence, and contributes a new learning algorithm, WoLF policy hillclimbing, that is proven to be rational.
Abstract: This paper investigates the problem of policy learning in multiagent environments using the stochastic game framework, which we briefly overview. We introduce two properties as desirable for a learning agent when in the presence of other learning agents, namely rationality and convergence. We examine existing reinforcement learning algorithms according to these two properties and notice that they fail to simultaneously meet both criteria. We then contribute a new learning algorithm, WoLF policy hillclimbing, that is based on a simple principle: “learn quickly while losing, slowly while winning.” The algorithm is proven to be rational and we present empirical results for a number of stochastic games showing the algorithm converges.

333 citations

Proceedings Article
04 Aug 2001
TL;DR: It is found that agents consistently obtain significantly larger gains from trade than their human counterparts, in sharp contrast to the robust convergence observed in previous all-human or all-agent CDA experiments.
Abstract: The Continuous Double Auction (CDA) is the dominant market institution for real-world trading of equities, commodities, derivatives, etc. We describe a series of laboratory experiments that, for the first time, allow human subjects to interact with software bidding agents in a CDA. Our bidding agents use strategies based on extensions of the Gjerstad-Dickhaut and Zero-Intelligence-Plus algorithms. We find that agents consistently obtain significantly larger gains from trade than their human counterparts. This was unexpected because both humans and agents have approached theoretically perfect efficiency in prior all-human or allagent CDA experiments. Another unexpected finding is persistent far-from-equilibrium trading, in sharp contrast to the robust convergence observed in previous all-human or all-agent experiments. We consider possible explanations for our empirical findings, and speculate on the implications for future agent-human interactions in electronic markets.

270 citations


Cites background from "The Games Computers (and People) Pl..."

  • ...We consider possible explanations for our empirical findings, and speculate on the implications for future agent-human interactions in electronic markets....

    [...]

Journal ArticleDOI
TL;DR: It is concluded that decision complexity is more important than state-space complexity as a determining factor in games solved in the domain of two-person zero-sum games with perfect information and there is a trade-off between knowledge-based methods and brute-force methods.

258 citations

References
More filters
Book
01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

37,989 citations

Journal ArticleDOI
TL;DR: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.
Abstract: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, temporal-difference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage.

4,803 citations

Journal ArticleDOI
TL;DR: In this article, two machine learning procedures have been investigated in some detail using the game of checkers, and enough work has been done to verify the fact that a computer can be programmed so that it will lear...
Abstract: Two machine-learning procedures have been investigated in some detail using the game of checkers. Enough work has been done to verify the fact that a computer can be programmed so that it will lear...

2,845 citations

Journal ArticleDOI
TL;DR: This heuristic depth-first iterative-deepening algorithm is the only known algorithm that is capable of finding optimal solutions to randomly generated instances of the Fifteen Puzzle within practical resource limits.

1,698 citations

Journal ArticleDOI
Gerald Tesauro1
TL;DR: The domain of complex board games such as Go, chess, checkers, Othello, and backgammon has been widely regarded as an ideal testing ground for exploring a variety of concepts and approaches in artificial intelligence and machine learning.
Abstract: Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-learning program [10] the domain of complex board games such as Go, chess, checkers, Othello, and backgammon has been widely regarded as an ideal testing ground for exploring a variety of concepts and approaches in artificial intelligence and machine learning. Such board games offer the challenge of tremendous complexity and sophistication required to play at expert level. At the same time, the problem inputs and performance measures are clear-cut and well defined, and the game environment is readily automated in that it is easy to simulate the board, the rules of legal play, and the rules regarding when the game is over and determining the outcome.

1,515 citations