scispace - formally typeset
Search or ask a question

Showing papers presented at "Advances in Computer Games in 2015"


Book ChapterDOI
01 Jul 2015
TL;DR: In this paper, adaptive playout-policies which improve the playout policy during a tree-search were explored and the resulting program was able to deal with difficult test-cases which are known to pose a problem for Monte-Carlo-Tree-Search.
Abstract: Monte-Carlo Tree Search evaluates positions with the help of a playout policy. If the playout policy evaluates a position wrong then there are cases where the tree-search has difficulties to find the correct move due to the large search-space. This paper explores adaptive playout-policies which improve the playout-policy during a tree-search. With the help of policy-gradient reinforcement learning techniques we optimize the playout-policy to give better evaluations. We tested the algorithm in Computer Go and measured an increase in playing strength of more than 100 ELO. The resulting program was able to deal with difficult test-cases which are known to pose a problem for Monte-Carlo-Tree-Search.

16 citations


Book ChapterDOI
01 Jul 2015
TL;DR: The resulting algorithm named Playout Policy Adaptation (PPA) is better than UCT with a uniform random playout policy, with the notable exceptions of Go and Nogo.
Abstract: Monte-Carlo Tree Search (MCTS) is the state of the art algorithm for General Game Playing (GGP). We propose to learn a playout policy online so as to improve MCTS for GGP. We test the resulting algorithm named Playout Policy Adaptation (PPA) on Atarigo, Breakthrough, Misere Breakthrough, Domineering, Misere Dominee-ring, Go, Knightthrough, Misere Knightthrough, Nogo and Misere Nogo. For most of these games, PPA is better than UCT with a uniform random playout policy, with the notable exceptions of Go and Nogo.

10 citations


Book ChapterDOI
01 Jul 2015
TL;DR: A hybrid technique is possible that has features of both mini-max and MCTS and works by stopping the random M CTS playouts early and using an evaluation function to determine the winner of the playout.
Abstract: Many researchers view mini-max and MCTS-based searches as competing and incompatible approaches. For example, it is generally agreed that chess and checkers require a mini-max approach while Go and Havannah require MCTS. However, a hybrid technique is possible that has features of both mini-max and MCTS. It works by stopping the random MCTS playouts early and using an evaluation function to determine the winner of the playout. We call this algorithm MCTS-EPT (MCTS with early playout termination) and study it using MCTS-EPT programs we have written for Amazons, Havannah, and Breakthrough.

9 citations


Book ChapterDOI
01 Jul 2015
TL;DR: Deep Proof-Number Search outperforms PN-search considerably in Othello and Hex and adjusts a parameter to choose between best-first or depth-first behavior.
Abstract: In this paper, DeepPN is introduced. It is a modified version of PN-search. It introduces a procedure to solve the seesaw effect. DeepPN employs two important values associated with each node, viz. the usual proof number and a deep value. The deep value of a node is defined as the depth to which each child node has been searched. So, the deep value of a node shows the progress of the search in the depth direction. By mixing the proof numbers and the deep value, DeepPN works with two characteristics, viz., the best-first manner of search (equal to the original proof-number search) and the depth-first manner. By adjusting a parameter (called R in this paper) we can choose between best-first or depth-first behavior. In our experiments, we tried to find a balance between both manners of searching. As it turned out, best results were obtained at an R value in between the two extremes of best-first search (original proof number search) and depth-first search. Our experiments showed better results for DeepPN compared to the original PN-search: a point in between best-first and depth-first performed best. For random Othello and Hex positions, DeepPN works almost twice as good as PN-search. From the results, we may conclude that Deep Proof-Number Search outperforms PN-search considerably in Othello and Hex.

8 citations


Book ChapterDOI
01 Jul 2015
TL;DR: Stimulated by FIDE/ACP, the fundamentals of the subject are revisit to advance a framework suitable for improved standards of computational experiment and more precise results.
Abstract: The assessment of chess players is both an increasingly attractive opportunity and an unfortunate necessity. The chess community needs to limit potential reputational damage by inhibiting cheating and unjustified accusations of cheating: there has been a recent rise in both. A number of counter-intuitive discoveries have been made by benchmarking the intrinsic merit of players’ moves: these call for further investigation. Is Capablanca actually, objectively the most accurate World Champion? Has ELO rating inflation not taken place? Stimulated by FIDE/ACP, we revisit the fundamentals of the subject to advance a framework suitable for improved standards of computational experiment and more precise results. Other games and domains look to chess as demonstrator of good practice, including the rating of professionals making high-value decisions under pressure, personnel evaluation by Multichoice Assessment and the organization of crowd-sourcing in citizen science projects. The ‘3P’ themes of performance, prediction and profiling pervade all these domains.

7 citations


Book ChapterDOI
01 Jul 2015
TL;DR: This paper studies how to improve and analyze the playing strength of an MCTS-based CDC program, named DarkKnight, which won the CDC tournament in the 17th Computer Olympiad and incorporates the three recent techniques, early playout terminations, implicit minimax backups, and quality-based rewards, into the program.
Abstract: Monte-Carlo tree search (MCTS) has been successfully applied to Chinese dark chess (CDC). In this paper, we study how to improve and analyze the playing strength of an MCTS-based CDC program, named DarkKnight, which won the CDC tournament in the 17th Computer Olympiad. We incorporate the three recent techniques, early playout terminations, implicit minimax backups, and quality-based rewards, into the program. For early playout terminations, playouts end when reaching states with likely outcomes. Implicit minimax backups use heuristic evaluations to help guide selections of MCTS. Quality-based rewards adjust rewards based on online collected information. Our experiments showed that the win rates against the original DarkKnight were 60.75 %, 70.90 % and 59.00 %, respectively for incorporating the three techniques. By incorporating all together, we obtained a win rate of 76.70 %.

7 citations


Book ChapterDOI
01 Jul 2015
TL;DR: A family of LinUCT algorithms that incorporate LinUCB into MCTS algorithms are proposed that outperform UCT, especially when the branching factor is relatively large.
Abstract: UCT is a de facto standard method for Monte-Carlo tree search (MCTS) algorithms, which have been applied to various domains and have achieved remarkable success. This study proposes a family of LinUCT algorithms that incorporate LinUCB into MCTS algorithms. LinUCB is a recently developed method that generalizes past episodes by ridge regression with feature vectors and rewards. LinUCB outperforms UCB1 in contextual multi-armed bandit problems. We introduce a straightforward application of LinUCB, \(\text {LinUCT}_{\text {PLAIN}}\) by substituting UCB1 with LinUCB in UCT. We show that it does not work well owing to the minimax structure of game trees. To better handle such tree structures, we present \(\text {LinUCT}_{\text {RAVE}}\) and \(\text {LinUCT}_{\text {FP}}\) by further incorporating two existing techniques, rapid action value estimation (RAVE) and feature propagation, which recursively propagates the feature vector of a node to that of its parent. Experiments were conducted with a synthetic model, which is an extension of the standard incremental random tree model in which each node has a feature vector that represents the characteristics of the corresponding position. The experimental results indicate that \(\text {LinUCT}_{\text {RAVE}}\), \(\text {LinUCT}_{\text {FP}}\), and their combination \(\text {LinUCT}_{\text {RAVE-FP}}\) outperform UCT, especially when the branching factor is relatively large.

5 citations


Book ChapterDOI
01 Jul 2015
TL;DR: This research uses machine learning to allow a program to label relevant moves with an associated shape name based on local patterns of stones, a first step towards a program that is able to communicate with human players in a game review or match.
Abstract: Computer Go programs with only a 4-stone handicap have recently defeated professional humans. Now that the strength of Go programs is sufficiently close to that of humans, a new target in artificial intelligence is to develop programs able to provide commentary on Go games. A fundamental difficulty in this development is to learn the terminology of Go, which is often not well defined. An example is the problem of naming shapes such as Atari, Attachment or Hane. In this research, our goal is to allow a program to label relevant moves with an associated shape name. We use machine learning to deduce these names based on local patterns of stones. First, strong amateur players recorded for each game move the associated shape name, using a pre-selected list of 71 terms. Next, these records were used to train a supervised machine learning algorithm. The result is a program able to output the shape name from the local patterns of stones. Including other Go features such as change in liberties improved the performance. Humans agreed on a shape name with a rate of about 82 %. Our algorithm achieved a similar performance, picking the name most preferred by the humans with a rate of about 82 %. This performance is a first step towards a program that is able to communicate with human players in a game review or match.

4 citations


Book ChapterDOI
01 Jul 2015
TL;DR: The strength of SCNS features is shown: most critical is to initialize leaves via a multi-step process, which scales well for 2 threads but less efficiently for 4 or 8 threads.
Abstract: Recently we introduced Sibling Conspiracy Number Search — an algorithm based not on evaluation of leaf states of the search tree but, for each node, on relative evaluation scores of all children of that node — and implemented an SCNS Hex bot. Here we show the strength of SCNS features: most critical is to initialize leaves via a multi-step process. Also, we show a simple parallel version of SCNS: it scales well for 2 threads but less efficiently for 4 or 8 threads.

3 citations


Book ChapterDOI
01 Jul 2015
TL;DR: From the standpoint of computational complexity, it is established that, although there are positions where one player has no legal move, there is no position where both players lack a legal move and that the game cannot end in a draw.
Abstract: Two features set Slither apart from other connection games. Previously played stones can be relocated and some stone configurations are forbidden. We show that the interplay of these peculiar mechanics with the standard goal of connecting opposite edges of a board results in a game with a few properties unexpected among connection games, for instance, the existence of mutual Zugzwangs. We also establish that, although there are positions where one player has no legal move, there is no position where both players lack a legal move and that the game cannot end in a draw. From the standpoint of computational complexity, we show that the game is pspace-complete, the relocation rule can indeed be tamed so as to simulate a hex game on a Slither board.

3 citations


Book ChapterDOI
01 Jul 2015
TL;DR: This work investigates random walks with discrete time steps and discrete state spaces as games with “Diffusion Control”: a player (=controller) with certain intentions influences the random movements of the particle.
Abstract: Random walks with discrete time steps and discrete state spaces have widely been studied for several decades. We investigate such walks as games with “Diffusion Control”: a player (=controller) with certain intentions influences the random movements of the particle. In our models the controller decides only about the step size for a single particle. It turns out that this small amount of control is sufficient to cause the particle to stay in “premium regions” of the state space with surprisingly high probabilities.

Book ChapterDOI
01 Jul 2015
TL;DR: It is concluded that the random walk of the game is recurrent and the actor has no direct control over this particle but may indirectly influence the movement of the particle by blockades.
Abstract: We consider a single-player game where a particle on a board has to be steered to evacuation cells. The actor has no direct control over this particle but may indirectly influence the movement of the particle by blockades. We examine optimal blocking strategies and the recurrence property experimentally and conclude that the random walk of our game is recurrent. Furthermore, we are interested in the average time in which an evacuation cell is reached.

Book ChapterDOI
01 Jul 2015
TL;DR: A novel evaluation function that uses network characteristics to capture features of the board states from two perspectives and is tested against the world-champion Hex program MoHex, showing that the method was superior to the 2011 version of MoH Hex on an \(11 \times 11\) board.
Abstract: The game of Hex was invented in the 1940s, and many studies have proposed ideas that led to the development of a computer Hex. One of the main approaches developing computer Hex is using an evaluation function of the electric circuit model. However, such a function evaluates the board states only from one perspective. Consequently, it is recently defeated by the Monte Carlo Tree Search approaches. In this paper, we therefore propose a novel evaluation function that uses network characteristics to capture features of the board states from two perspectives. Our proposed evaluation function separately evaluates the board network and the shortest path network using betweenness centrality, and combines the results of these evaluations. Furthermore, our proposed method involves changing the ratio between global and local evaluations through a support vector machine (SVM). So, it yields an improved strategy for Hex. Our method is called Ezo. It was tested against the world-champion Hex program MoHex. The results showed that our method was superior to the 2011 version of MoHex on an \(11 \times 11\) board.

Book ChapterDOI
01 Jul 2015
TL;DR: This work presents a new method for constructing an appropriate master tree that provides more important moves with more workers on their sub-trees to improve the playing strength and introduces two advantages: being parameter free in that users do not need to tune parameters through trial and error, and efficiency suitable even for short-time matches.
Abstract: Asynchronous parallel game-tree search methods are effective in improving the playing strength by using many computers connected through relatively slow networks In game-position parallelization, the master program manages a game-tree and distributes positions in the tree to workers Then, each worker asynchronously searches the best move and the corresponding evaluation for its assigned position We present a new method for constructing an appropriate master tree that provides more important moves with more workers on their sub-trees to improve the playing strength Our contribution introduces two advantages: (1) being parameter free in that users do not need to tune parameters through trial and error, and (2) efficiency suitable even for short-time matches, such as one second per move We implemented our method in chess with a top-level chess program Stockfish and evaluated the playing strength through self-plays We confirm that the playing strength improves with up to sixty workers

Book ChapterDOI
01 Jul 2015
TL;DR: This paper presents the design of a computer program for playing Progressive Chess, and focuses on efficiently searching for checkmates, putting to test various heuristics for guiding the search.
Abstract: We present the design of a computer program for playing Progressive Chess. In this game, players play progressively longer series of moves rather than just making one move per turn. Our program follows the generally recommended strategy for this game, which consists of three phases: looking for possibilities to checkmate the opponent, playing generally good moves when no checkmate can be found, and preventing checkmates from the opponent. In this paper, we focus on efficiently searching for checkmates, putting to test various heuristics for guiding the search. We also present the findings of self-play experiments between different versions of the program.

Book ChapterDOI
01 Jul 2015
TL;DR: The first player’s cannot-lose strategies for widths 2 and 6 are shown in cylinder-Infinite-Connect-Four, a cylindrical square grid board with infinite row height and columns that cycle about its width.
Abstract: Cylinder-Infinite-Connect-Four is Connect-Four played on a cylindrical square grid board with infinite row height and columns that cycle about its width. In previous work, the first player’s cannot-lose strategies have been discovered for all widths except 2 and 6, and the second player’s cannot-lose strategies have been discovered with all widths except 6 and 11. In this paper, we show the first player’s cannot-lose strategies for widths 2 and 6.

Book ChapterDOI
01 Jul 2015
TL;DR: A combinatorial game-theoretic analysis of special Domineering positions is presented, investigating complex positions that are aggregates of simpler fragments, linked via bridging squares.
Abstract: In this paper we present a combinatorial game-theoretic analysis of special Domineering positions. In particular we investigate complex positions that are aggregates of simpler fragments, linked via bridging squares.

Book ChapterDOI
01 Jul 2015
TL;DR: The working of Transfer Learning is shown by taking knowledge from Tic-tac-toe and transfer it to Connect4 and Connect5 and the number of Heuristic functions developed is 30.
Abstract: In this paper, we propose a Transfer Learning method by Inductive Logic Programing for games. We generate general knowledge from a game, and specify the knowledge so that it is applicable in another game. This is called Transfer Learning. We show the working of Transfer Learning by taking knowledge from Tic-tac-toe and transfer it to Connect4 and Connect5. For Connect4 the number of Heuristic functions we developed is 30; for Connect5 it is 20.

Book ChapterDOI
01 Jul 2015
TL;DR: The Mi-UCT algorithm is the application of the modified UCB algorithm applied to trees, and has shown to outperform the plain UCT algorithm when only a small number of playout are given, and rougly on the same level when more playouts are available.
Abstract: The UCT algorithm, which combines the UCB algorithm and Monte-Carlo Tree Search (MCTS), is currently the most widely used variant of MCTS. Recently, a number of investigations into applying other bandit algorithms to MCTS have produced interesting results. In this research, we will investigate the possibility of combining the improved UCB algorithm, proposed by Auer et al. [2], with MCTS. However, various characteristics and properties of the improved UCB algorithm may not be ideal for a direct application to MCTS. Therefore, some modifications were made to the improved UCB algorithm, making it more suitable for the task of game-tree search. The Mi-UCT algorithm is the application of the modified UCB algorithm applied to trees. The performance of Mi-UCT is demonstrated on the games of \(9\times 9\) Go and \(9\times 9\) NoGo, and has shown to outperform the plain UCT algorithm when only a small number of playouts are given, and rougly on the same level when more playouts are available.

Book ChapterDOI
01 Jul 2015
TL;DR: The first attempt to augment Palamedes by constructing databases for certain endgame positions for the backgammon variant of Plakoto is described, the first time that an endgame database is created for this game.
Abstract: Palamedes is an ongoing project for building expert playing bots that can play backgammon variants. Until recently the position evaluation relied only on self-trained neural networks. This paper describes the first attempt to augment Palamedes by constructing databases for certain endgame positions for the backgammon variant of Plakoto. The result is 5 databases containing 12,480,720 records in total; they can calculate accurately the best move for roughly 3.4 × 1015 positions. To the best of our knowledge, this is the first time that an endgame database is created for this game.

Book ChapterDOI
01 Jul 2015
TL;DR: An optimization called boundary matching is described that accommodates this complexity to allow move generation for Sprouts games of arbitrary size at interactive speeds.
Abstract: The simplicity of the pen-and-paper game Sprouts hides a surprising combinatorial complexity. We describe an optimization called boundary matching that accommodates this complexity to allow move generation for Sprouts games of arbitrary size at interactive speeds.