scispace - formally typeset
Search or ask a question

Showing papers on "Game tree published in 2019"


Proceedings Article
24 May 2019
TL;DR: Deep Counterfactual Regret Minimization as mentioned in this paper obviates the need for abstraction by instead using deep neural networks to approximate the behavior of CFR in the full game, which is the first non-tabular variant of CFR to be successful in large games.
Abstract: Counterfactual Regret Minimization (CFR) is the leading framework for solving large imperfect-information games. It converges to an equilibrium by iteratively traversing the game tree. In order to deal with extremely large games, abstraction is typically applied before running CFR. The abstracted game is solved with tabular CFR, and its solution is mapped back to the full game. This process can be problematic because aspects of abstraction are often manual and domain specific, abstraction algorithms may miss important strategic nuances of the game, and there is a chicken-and-egg problem because determining a good abstraction requires knowledge of the equilibrium of the game. This paper introduces Deep Counterfactual Regret Minimization, a form of CFR that obviates the need for abstraction by instead using deep neural networks to approximate the behavior of CFR in the full game. We show that Deep CFR is principled and achieves strong performance in large poker games. This is the first non-tabular variant of CFR to be successful in large games.

126 citations


Journal ArticleDOI
17 Jul 2019
TL;DR: Counterfactual regret minimization (CFR) is a family of iterative algorithms that are the most popular and, in practice, fastest approach to approximately solving large imperfect information games as discussed by the authors.
Abstract: Counterfactual regret minimization (CFR) is a family of iterative algorithms that are the most popular and, in practice, fastest approach to approximately solving large imperfectinformation games. In this paper we introduce novel CFR variants that 1) discount regrets from earlier iterations in various ways (in some cases differently for positive and negative regrets), 2) reweight iterations in various ways to obtain the output strategies, 3) use a non-standard regret minimizer and/or 4) leverage “optimistic regret matching”. They lead to dramatically improved performance in many settings. For one, we introduce a variant that outperforms CFR+, the prior state-of-the-art algorithm, in every game tested, including large-scale realistic settings. CFR+ is a formidable benchmark: no other algorithm has been able to outperform it. Finally, we show that, unlike CFR+, many of the important new variants are compatible with modern imperfect-informationgame pruning techniques and one is also compatible with sampling in the game tree.

49 citations


Posted Content
TL;DR: In this paper, the authors study the performance of optimistic regret minimization algorithms for both minimizing regret in, and computing Nash equilibria of, zero-sum extensive-form games.
Abstract: We study the performance of optimistic regret-minimization algorithms for both minimizing regret in, and computing Nash equilibria of, zero-sum extensive-form games. In order to apply these algorithms to extensive-form games, a distance-generating function is needed. We study the use of the dilated entropy and dilated Euclidean distance functions. For the dilated Euclidean distance function we prove the first explicit bounds on the strong-convexity parameter for general treeplexes. Furthermore, we show that the use of dilated distance-generating functions enable us to decompose the mirror descent algorithm, and its optimistic variant, into local mirror descent algorithms at each information set. This decomposition mirrors the structure of the counterfactual regret minimization framework, and enables important techniques in practice, such as distributed updates and pruning of cold parts of the game tree. Our algorithms provably converge at a rate of $T^{-1}$, which is superior to prior counterfactual regret minimization algorithms. We experimentally compare to the popular algorithm CFR+, which has a theoretical convergence rate of $T^{-0.5}$ in theory, but is known to often converge at a rate of $T^{-1}$, or better, in practice. We give an example matrix game where CFR+ experimentally converges at a relatively slow rate of $T^{-0.74}$, whereas our optimistic methods converge faster than $T^{-1}$. We go on to show that our fast rate also holds in the Kuhn poker game, which is an extensive-form game. For games with deeper game trees however, we find that CFR+ is still faster. Finally we show that when the goal is minimizing regret, rather than computing a Nash equilibrium, our optimistic methods can outperform CFR+, even in deep game trees.

15 citations


Proceedings Article
01 Oct 2019
TL;DR: It is shown that when the goal is minimizing regret, rather than computing a Nash equilibrium, the optimistic methods can outperform CFR+, even in deep game trees, and this decomposition mirrors the structure of the counterfactual regret minimization framework.
Abstract: We study the performance of optimistic regret-minimization algorithms for both minimizing regret in, and computing Nash equilibria of, zero-sum extensive-form games. In order to apply these algorithms to extensive-form games, a distance-generating function is needed. We study the use of the dilated entropy and dilated Euclidean distance functions. For the dilated Euclidean distance function we prove the first explicit bounds on the strong-convexity parameter for general treeplexes. Furthermore, we show that the use of dilated distance-generating functions enable us to decompose the mirror descent algorithm, and its optimistic variant, into local mirror descent algorithms at each information set. This decomposition mirrors the structure of the counterfactual regret minimization framework, and enables important techniques in practice, such as distributed updates and pruning of cold parts of the game tree. Our algorithms provably converge at a rate of $T^{-1}$, which is superior to prior counterfactual regret minimization algorithms. We experimentally compare to the popular algorithm CFR+, which has a theoretical convergence rate of $T^{-0.5}$ in theory, but is known to often converge at a rate of $T^{-1}$, or better, in practice. We give an example matrix game where CFR+ experimentally converges at a relatively slow rate of $T^{-0.74}$, whereas our optimistic methods converge faster than $T^{-1}$. We go on to show that our fast rate also holds in the Kuhn poker game, which is an extensive-form game. For games with deeper game trees however, we find that CFR+ is still faster. Finally we show that when the goal is minimizing regret, rather than computing a Nash equilibrium, our optimistic methods can outperform CFR+, even in deep game trees.

14 citations


Posted Content
TL;DR: In this article, a framework of baseline-corrected values in EFGs is introduced, which results in significantly reduced variance compared to existing techniques, and one particular choice of such a function, predictive baseline, is provably optimal under certain sampling schemes.
Abstract: Extensive-form games (EFGs) are a common model of multi-agent interactions with imperfect information. State-of-the-art algorithms for solving these games typically perform full walks of the game tree that can prove prohibitively slow in large games. Alternatively, sampling-based methods such as Monte Carlo Counterfactual Regret Minimization walk one or more trajectories through the tree, touching only a fraction of the nodes on each iteration, at the expense of requiring more iterations to converge due to the variance of sampled values. In this paper, we extend recent work that uses baseline estimates to reduce this variance. We introduce a framework of baseline-corrected values in EFGs that generalizes the previous work. Within our framework, we propose new baseline functions that result in significantly reduced variance compared to existing techniques. We show that one particular choice of such a function --- predictive baseline --- is provably optimal under certain sampling schemes. This allows for efficient computation of zero-variance value estimates even along sampled trajectories.

11 citations


Proceedings Article
01 Jan 2019
TL;DR: This paper introduces the first efficient regret minimization algorithm for computing extensive-form correlated equilibria in large two-player general-sum games with no chance moves and shows that it significantly outperforms prior approaches and for larger problems it is the only viable option.
Abstract: Self-play methods based on regret minimization have become the state of the art for computing Nash equilibria in large two-players zero-sum extensive-form games. These methods fundamentally rely on the hierarchical structure of the players' sequential strategy spaces to construct a regret minimizer that recursively minimizes regret at each decision point in the game tree. In this paper, we introduce the first efficient regret minimization algorithm for computing extensive-form correlated equilibria in large two-player general-sum games with no chance moves. Designing such an algorithm is significantly more challenging than designing one for the Nash equilibrium counterpart, as the constraints that define the space of correlation plans lack the hierarchical structure and might even form cycles. We show that some of the constraints are redundant and can be excluded from consideration, and present an efficient algorithm that generates the space of extensive-form correlation plans incrementally from the remaining constraints. This structural decomposition is achieved via a special convexity-preserving operation that we coin scaled extension. We show that a regret minimizer can be designed for a scaled extension of any two convex sets, and that from the decomposition we then obtain a global regret minimizer. Our algorithm produces feasible iterates. Experiments show that it significantly outperforms prior approaches and for larger problems it is the only viable option.

10 citations


Journal ArticleDOI
Vik Pant1, Eric Yu1
19 Jul 2019
TL;DR: This article demonstrates the activation of one component in this guided approach of systematically searching for alternatives to generate a new win-win strategy, and presents a meta-model for relating i* models and Game Trees.
Abstract: Interorganizational coopetition describes a relationship in which two or more organizations cooperate and compete simultaneously. Actors under coopetition cooperate to achieve collective objectives and compete to maximize their individual benefits. Such relationships are based on the logic of win-win strategies that necessitate decision-makers in coopeting organizations to develop relationships that yield favorable outcomes for each actor. We follow a strategic modeling approach that combines i* goal-modeling to explore strategic alternatives of actors with Game Tree decision-modeling to evaluate the actions and payoffs of those players. In this article, we elaborate on the method, illustrating one particular pathway towards a positive-sum outcome – through the introduction of an intermediary actor. This article demonstrates the activation of one component in this guided approach of systematically searching for alternatives to generate a new win-win strategy. We also present a meta-model for relating i* models and Game Trees. A hypothetical industrial scenario focusing on the Industrial Data Space, which is a platform that can help organizations to overcome obstacles to data sharing in a coopetitive ecosystem, is used to explain this approach.

7 citations


Journal ArticleDOI
TL;DR: It is shown that every simplicial complex encodes a certain type of SP- game (called an "invariant SP-game") whose ruleset is independent of the board it is played on, and hence equal game values.
Abstract: Strong placement games (SP-games) are a class of combinatorial games whose structure allows one to describe the game via simplicial complexes. A natural question is whether well-known invariants of combinatorial games, such as "game value", appear as invariants of the simplicial complexes. This paper is the first step in that direction. We show that every simplicial complex encodes a certain type of SP-game (called an "invariant SP-game") whose ruleset is independent of the board it is played on. We also show that in the class of SP-games isomorphic simplicial complexes correspond to isomorphic game trees, and hence equal game values. We also study a subclass of SP-games corresponding to flag complexes, showing that there is always a game whose corresponding complex is a flag complex no matter which board it is played on.

6 citations


Proceedings ArticleDOI
01 Dec 2019
TL;DR: A new tree policy is presented that optimally allocates a limited computing budget to maximize a lower bound on the probability of correctly selecting the best action at each node in a tree search problem with an underlying Markov decision process.
Abstract: We analyze a tree search problem with an underlying Markov decision process, in which the goal is to identify the best action at the root that achieves the highest cumulative reward. We present a new tree policy that optimally allocates a limited computing budget to maximize a lower bound on the probability of correctly selecting the best action at each node. Compared to the widely used Upper Confidence Bound (UCB) type of tree policies, the new tree policy presents a more balanced approach to manage the exploration and exploitation trade-off when the sampling budget is limited. Furthermore, UCB assumes that the support of reward distribution is known, whereas our algorithm relaxes this assumption, and can be applied to game trees with mild modifications. A numerical experiment is conducted to demonstrate the efficiency of our algorithm in selecting the best action at the root.

5 citations


Journal ArticleDOI
TL;DR: The self-developed heuristics with minimax algorithm is perfect on the early stages of the zero-sum game playing and alpha-beta pruning is used to decrease the number of meaningless node which greatly increases the minimax efficiency.
Abstract: Minimax algorithm and machine learning technologies have been studied for decades to reach an ideal optimization in game areas such as chess and backgammon. In these fields, several generations try to optimize the code for pruning and effectiveness of evaluation function. Thus, there are well-armed algorithms to deal with various sophisticated situations in gaming occasion. However, as a traditional zero-sum game, Connect-4 receives less attention compared with the other members of its zero-sum family using traditional minimax algorithm. In recent years, new generation of heuristics is created to address this problem based on research conclusions, expertise and gaming experiences. However, this paper mainly introduced a self-developed heuristics supported by well-demonstrated result from researches and our own experiences which fighting against the available version of Connect-4 system online. While most previous works focused on winning algorithms and knowledge based approaches, we complement these works with analysis of heuristics. We have conducted three experiments on the relationship among functionality, depth of searching and number of features and doing contrastive test with sample online. Different from the sample based on summarized experience and generalized features, our heuristics have a basic concentration on detailed connection between pieces on board. By analysing the winning percentages when our version fights against the online sample with different searching depths, we find that our heuristics with minimax algorithm is perfect on the early stages of the zero-sum game playing. Because some nodes in the game tree have no influence on the final decision of minimax algorithm, we use alpha-beta pruning to decrease the number of meaningless node which greatly increases the minimax efficiency. During the contrastive experiment with the online sample, this paper also verifies basic characters of the minimax algorithm including depths and quantity of features. According to the experiment, these two characters can both effect the decision for each step and none of them can be absolutely in charge. Besides, we also explore some potential future issues in Connect-4 game optimization such as precise adjustment on heuristic values and inefficiency pruning on the search tree.

5 citations


Journal ArticleDOI
TL;DR: In this article, the security strategies of two-player zero-sum repeated Bayesian games with bounded regret and security regret are studied. But the regret over the other player's type is not included in the security strategy.
Abstract: This paper studies two-player zero-sum repeated Bayesian games in which every player has a private type that is unknown to the other player, and the initial probability of the type of every player is publicly known. The types of players are independently chosen according to the initial probabilities, and are kept the same all through the game. At every stage, players simultaneously choose actions, and announce their actions publicly. For finite horizon cases, an explicit linear program is provided to compute players’ security strategies. Moreover, this paper shows that a player's sufficient statistics, which is independent of the strategy of the other player, consists of the belief over the player's own type, the regret over the other player's type, and the stage. Explicit linear programs, whose size is linear in the size of the game tree, are provided to compute the initial regrets, and the security strategies that only depends on the sufficient statistics. For discounted cases, following the same idea in the finite horizon, this paper shows that a player's sufficient statistics consists of the belief of the player's own type and the antidiscounted regret with respect to the other player's type. Besides, an approximated security strategy depending on the sufficient statistics is provided, and an explicit linear program to compute the approximated security strategy is given. This paper also obtains a bound on the performance difference between the approximated security strategy and the security strategy, and shows that the bound converges to 0 exponentially fast.

Posted Content
TL;DR: In this paper, multiple Markov decision processes (MDPs) are defined as abstractions of Mahjong games to construct effective search trees, and two methods of inferring state values of the original Mahjong game are introduced.
Abstract: We propose a method for constructing artificial intelligence (AI) of mahjong, which is a multiplayer imperfect information game. Since the size of the game tree is huge, constructing an expert-level AI player of mahjong is challenging. We define multiple Markov decision processes (MDPs) as abstractions of mahjong to construct effective search trees. We also introduce two methods of inferring state values of the original mahjong using these MDPs. We evaluated the effectiveness of our method using gameplays vis-a-vis the current strongest AI player.

Proceedings ArticleDOI
01 Nov 2019
TL;DR: A game-theoretic model of a dynamic optimal-purpose problem using the example of the labor market functioning and a deterministic model of the workers optimal distribution among enterprises is described taking into account changing conditions over a period of time.
Abstract: The paper considers a game-theoretic model of a dynamic optimal-purpose problem using the example of the labor market functioning. A deterministic model of the workers optimal distribution among enterprises is described taking into account changing conditions over a period of time. At each moment of time, the state of the employee and the enterprise are determined. Moments of time are moments of the system stationary states. In each stationary state, a game in normal form is determined. In the game there is a compromise situation, optimal policy, and the system's income from appointments is calculated as the sum of the payoff functions of all players. The functioning of the labor market as a system for some periods of time is presented as a multi-step game on a tree. In a one-step game based on the principle of compromise set optimality, there is a compromise situation and the corresponding compromise control vector. On the multi-step game tree, there is a compromise income of the system in a few steps, when a sequence of games was realized, and a compromise path corresponding to a sequence of compromise control vectors. The compromise system income and the sequence of compromise controls are found using dynamic programming recurrence relationships. Thus, it is possible to indicate the optimal behavior of all participants in the labor market at any given time.

Proceedings ArticleDOI
02 May 2019
TL;DR: A game-centric approach to teaching artificial intelligence that follows the historical development of algorithms by popping the hood of these champion bots is reflected, and a server infrastructure for playing card games in perfect information and imperfect information playing mode is made available.
Abstract: Man vs. machine competitions have always been attracting much public attention and the famous defeats of human champions in chess, Jeopardy!, Go or poker undoubtedly mark important milestones in the history of artificial intelligence. In this article we reflect on our experiences with a game-centric approach to teaching artificial intelligence that follows the historical development of algorithms by popping the hood of these champion bots. Moreover, we made available a server infrastructure for playing card games in perfect information and imperfect information playing mode, where students can evaluate their implementations of increasingly sophisticated game-playing algorithms in weekly online competitions, i.e. from rule-based systems to exhaustive and heuristic search in game trees to deep learning enhanced Monte Carlo methods and reinforcement learning completely freed of human domain knowledge. The evaluation of this particular course setting revealed enthusiastic feedback not only from students but also from the university authority. What started as an experiment became part of the standard computer science curriculum after just one implementation.

Posted Content
TL;DR: The nature of the phase transitions which occur for normal play rules and misere rules, as well as for an "escape game" in which one player tries to force the game to end while the other tries to prolong it forever, are studied.
Abstract: We consider two-player combinatorial games in which the graph of positions is random and perhaps infinite, focusing on directed Galton-Watson trees. As the offspring distribution is varied, a game can undergo a phase transition, in which the probability of a draw under optimal play becomes positive. We study the nature of the phase transitions which occur for normal play rules (where a player unable to move loses the game) and misere rules (where a player unable to move wins), as well as for an "escape game" in which one player tries to force the game to end while the other tries to prolong it forever. For instance, for a Poisson$(\lambda)$ offspring distribution, the game tree is infinite with positive probability as soon as $\lambda>1$, but the game with normal play has positive probability of draws if and only if $\lambda>e$. The three games generally have different critical points; under certain assumptions the transitions are continuous for the normal and misere games and discontinuous for the escape game, but we also discuss cases where the opposite possibilities occur. We connect the nature of the phase transitions to the behaviour of quantities such as the expected length of the game under optimal play. We also establish inequalities relating the games to each other; for instance, the probability of a draw is at least as great in the misere game as in the normal game.

Proceedings ArticleDOI
09 May 2019
TL;DR: The structure, teaching tools, techniques, performance and findings of this course, named Algorithms in Game AI, as an undergraduate elective course, mainly focuses on common and state-of-the-art algorithms in the game AI area, including game tree based algorithms and reinforcement learning, are presented.
Abstract: This paper presents a course design, named Algorithms in Game AI, as an undergraduate elective course. The course mainly focuses on common and state-of-the-art algorithms in the game AI area, including game tree based algorithms and reinforcement learning. Powered by Botzone, our game AI platform, we designed different types of assignments for this course to bring a rich and fun learning experience. We chose several games including two popular Chinese classic ones, namely Mahjong and FightTheLandlord, which are both collaborative, stochastic and partially observable. To our best knowledge, it is the first time to adopt these games in AI courses, thus providing a new benchmark for game AI education. To encourage participation and reduce frustration, milestone-based competitions and bonus tasks were adopted. In this paper, we present the structure, teaching tools, techniques, performance and findings of this course. By reviewing students' performance and feedback, we found that students enjoyed the games we provided. We also found that reinforcement learning algorithms did not perform as well as other algorithms in limited time and resources.

Journal ArticleDOI
TL;DR: This paper demonstrates that human players can be manipulated this way: in the game The Settlers of Catan, the likelihood that a player accepts a trade offer that deviates from their declared preferred strategy is higher if it is accompanied by a description of what that trade offer can lead to.
Abstract: Humans face many game problems that are too large for the whole game tree to be used in their deliberations about action, and very little is understood about how they cope in such scenarios. However, when a human player's chosen strategy is conditioned on her limited perspective of how the game might progress (Degremont et al. 2016), then it should be possible to manipulate her into changing her planned move by mentioning a possible outcome of an alternative move. This paper demonstrates that human players can be manipulated this way: in the game The Settlers of Catan, where negotiation is only a small part of what one must do to win the game thereby generating uncertainty about which outcomes to the negotiation are good and which are bad, the likelihood that a player accepts a trade offer that deviates from their declared preferred strategy is higher if it is accompanied by a description of what that trade offer can lead to.

Posted Content
TL;DR: It is shown that any strategy which can rationally be chosen under common belief in future rationality in a minimal compact game if and only if it satisfies this property in every extensive form game which is related to it via some compactification process.
Abstract: We introduce an operation, called compactification, to reduce an extensive form to a compact one where each decision node in the game tree can be assigned to more than one player. Motivated by Thompson (1952)'s interchange of decision nodes, we attempt to capture the notion of a faithful representation of the chronological order of the moves in a dynamic game which plays a vital role in fields like epistemic game theory. The compactification process preserves perfect recall and the unambiguity of the order among information sets. We specify an algorithm, called leaves-to-root process, which compactifies at least as many information sets as any other compactification process. The compact extensive form provides an approach to avoid problems in dynamic game theory due to the vague definition of the chronological order of the moves, for example, belief in the opponents' future rationality (Perea (2014))'s sensitivity to the specific extensive form representation. We show that any strategy which can rationally be chosen under common belief in future rationality in a minimal compact game if and only if it satisfies this property in every extensive form game which is related to it via some compactification process.

Proceedings ArticleDOI
01 Aug 2019
TL;DR: This paper presents a simple and cheap ordinal bucketing algorithm that approximately generates q-quantiles from an incremental data stream and shows how this can be used in Ordinal Monte Carlo Tree Search (OMCTS) to yield better bounds on time and space complexity.
Abstract: In this paper, we present a simple and cheap ordinal bucketing algorithm that approximately generates q-quantiles from an incremental data stream. The bucketing is done dynamically in the sense that the amount of buckets q increases with the number of seen samples. We show how this can be used in Ordinal Monte Carlo Tree Search (OMCTS) to yield better bounds on time and space complexity, especially in the presence of noisy rewards. Besides complexity analysis and quality tests of quantiles, we evaluate our method using OMCTS in the General Video Game Framework (GVGAI). Our results demonstrate its dominance over vanilla Monte Carlo Tree Search in the presence of noise, where OMCTS without bucketing has a very bad time and space complexity.

Patent
27 Sep 2019
TL;DR: In this article, a prediction device includes a data input assembly, a modelling unit, a resolution unit, an interpretation unit, and an information transmission unit, which is configured to generate a game tree evaluated on the basis of input data, based on game theory.
Abstract: A prediction device includes a data input assembly, a modelling unit, a resolution unit, an interpretation unit, and an information transmission unit. The data input assembly is configured to enter attacker data relating to attack models, and defender data relating to ground a zone to be defended and to available defense means. The modelling unit is configured to generate a game tree evaluated on the basis of input data, based on game theory. The resolution unit is configured to define a game balance based on game theory, the game balance defining an attacker strategy and defender strategy pair. The interpretation unit is configured to determine, on the basis of the game balance, an optimum attack solution, as well as an optimum defense solution that is best suited to the optimum attack solution.

Posted Content
TL;DR: A range of measures over the induced game trees are presented and compared against benchmark problems in chess, observing a promising level of accuracy in matching up trap states.
Abstract: We study strategic similarity of game positions in two-player extensive games of perfect information, by looking at the structure of their local game trees, with the aim of improving the performance of game playing agents in detecting forcing continuations. We present a range of measures over the induced game trees and compare them against benchmark problems in chess, observing a promising level of accuracy in matching up trap states.

Proceedings ArticleDOI
26 Aug 2019
TL;DR: This paper creates a system that takes basic predefined heuristic evaluation functions as input parameters, generates compositions from these functions with certain rules, and automatically tests all of them with a specified number of games, and finds that compositions of evaluation functions that maximize empty spaces and monotonicity of tiles on the board perform the best.
Abstract: 2048 is a simple and intriguing sliding block puzzle game that has been studied for several years. Many complex solvers, often developed using neural nets are available and capable of achieving very high scores. We are, however, interested in using only basic heuristics, the kind that could be conceivably employed by human players without the aid of computation. A common way to implement a 2048 solver involves searching the game tree for the best moves, choosing a move and scoring the game board using some evaluation functions. The choice in heuristic evaluation function can dramatically affect the moves chosen by the solver. Furthermore, two or more possible moves can frequently produce the same score as evaluated by the same heuristic function, requiring either a random choice, or the use of a secondary or back up evaluation function which itself in turn may produce a tie. In this paper, we test the effectiveness of several basic heuristics in a simple 2048 solver. In order to test these, we create a system that takes basic predefined heuristic evaluation functions as input parameters, generates compositions from these functions with certain rules, and automatically tests all of them with a specified number of games. We find that compositions of evaluation functions that maximize empty spaces and monotonicity of tiles on the board-especially those that prioritize high numbers of empty spaces above prioritizing higher monotonicity- perform the best out of all compositions that we test.

Proceedings ArticleDOI
01 Apr 2019
TL;DR: Two algorithms to coordinate the movement of two separately controlled eyes, and, to select the leader eye in their synchronous movement are proposed.
Abstract: Robotic eyes imitate human eye appearance, but, they lack human-like movements. Also, there are no criteria to select one eye as the leader in eyes simultaneous movement. To solve these challenges, this paper proposes two algorithms to coordinate the movement of two separately controlled eyes, and, to select the leader eye in their synchronous movement. In the first algorithm—called Gazist—eyes apply the concept of bi-matrix games in noncooperative game theory to decide between their vertical and horizontal options. In the second algorithm—called Leaf—we consider game trees for sequential eye movements. Nash equilibrium for each game tree (found by backward induction) is our proposed criteria to choose the leader eye. Our algorithms are validated through experiments on a 3D printed eye robot that tracks the face for a person walking before it. To the best of our knowledge, Gazist algorithm constitutes one of the first examples of coordinated movement for robotic eyes by game theory. Also, Leaf algorithm proposes the first criteria for choosing the leader in eyes synchronous movement.

Proceedings ArticleDOI
27 Jul 2019
TL;DR: A theoretical game tree model is improved by introducing a parameter called correlated evaluation accuracy to represent the coupling relations between two players and it is found that “knowing more is less (more)” can be found in other real games.
Abstract: In the course of playing combinatorial games, players want to get better through knowing her opponent’s information. However, knowing more does not always lead to higher probability of wining. In simulation of Five-in-a-Row (FIR), both phenomena of “knowing more is more” and “ knowing more is less” can happen under different conditions: the superior one wins more after knowing the other while the inferior one may lose more after knowing the other. A theoretical game tree model was built and it can show these phenomena whose determinant condition is independent with their opponents. This is inconsistent with our findings in FIR real games. In this paper we improve the model by introducing a parameter called correlated evaluation accuracy to represent the coupling relations between two players. Then we find the phenomena of “knowing more is less (more)” will happen when player’s correlated evaluation accuracy is greater (lower) than 0.5. This is consistent with our finding in Five-in-a-row real games that the superiority between the player and her opponent determines “knowing more is less” or “knowing more is more This model is a general representation for many combinatorial games, thus we believe that “knowing more is less (more)” can be found in other real games. This finding displays the complex interaction in real games and offer a method to study coevolution of players.

Posted Content
TL;DR: In this article, regret minimization for correlated equilibria in large two-player general-sum games with no chance moves has been studied and a regret minimizer can be designed for a scaled extension of any two convex sets.
Abstract: Self-play methods based on regret minimization have become the state of the art for computing Nash equilibria in large two-players zero-sum extensive-form games. These methods fundamentally rely on the hierarchical structure of the players' sequential strategy spaces to construct a regret minimizer that recursively minimizes regret at each decision point in the game tree. In this paper, we introduce the first efficient regret minimization algorithm for computing extensive-form correlated equilibria in large two-player general-sum games with no chance moves. Designing such an algorithm is significantly more challenging than designing one for the Nash equilibrium counterpart, as the constraints that define the space of correlation plans lack the hierarchical structure and might even form cycles. We show that some of the constraints are redundant and can be excluded from consideration, and present an efficient algorithm that generates the space of extensive-form correlation plans incrementally from the remaining constraints. This structural decomposition is achieved via a special convexity-preserving operation that we coin scaled extension. We show that a regret minimizer can be designed for a scaled extension of any two convex sets, and that from the decomposition we then obtain a global regret minimizer. Our algorithm produces feasible iterates. Experiments show that it significantly outperforms prior approaches and for larger problems it is the only viable option.

Posted Content
TL;DR: There seems to be quite difficulty in understanding the nature of the SSS* algorithm, why it does what it does and it being termed as being too complex to fathom, visualize and understand on an intellectual level, so this article tries to bridge this gap.
Abstract: The alpha-beta pruning algorithms have been popular in game tree searching ever since they were discovered. Numerous enhancements are proposed in literature and it is often overwhelming as to which would be the best for implementation. A certain enhancement can take far too long to fine tune its hyper parameters or to decide whether it is going to not make much of a difference due to the memory limitations. On the other hand are the best first pruning techniques, mostly the counterparts of the infamous SSS* algorithm, the algorithm which proved out to be disruptive at the time of its discovery but gradually became outcast as being too memory intensive and having a higher time complexity. Later research doesn't see the best first approaches to be completely different from the depth first based enhancements but both seem to be transitionary in the sense that a best first approach could be looked as a depth first approach with a certain set of enhancements and with the growing power of the computers, SSS* didn't seem to be as taxing on the memory either. Even so, there seems to be quite difficulty in understanding the nature of the SSS* algorithm, why it does what it does and it being termed as being too complex to fathom, visualize and understand on an intellectual level. This article tries to bridge this gap and provide some experimental results comparing the two with the most promising advances.

Journal Article
TL;DR: In this article, the authors analyzed the critical indicators of the incomplete information game theory model and the mathematical engineering method for solving Bayes-Nash equilibrium, the risk-averse income function for information assets is summarized as the problem of maximising the return of the equilibrium point.
Abstract: In the process of constructing the traditional offensive and defensive game theory model, these are some shortages for considering the dynamic change of security risk problem. By analysing the critical indicators of the incomplete information game theory model, incomplete information attack and defense game theory model and the mathematical engineering method for solving Bayes-Nash equilibrium, the risk-averse income function for information assets is summarized as the problem of maximising the return of the equilibrium point. To obtain the functional relationship between the optimal strategy combination of the offense and defense and the information asset security probability and risk probability. At the same time, the offensive and defensive examples are used to visually analyse and demonstrate the incomplete information game and the Harsanyi conversion method. First, the incomplete information game and the Harsanyi conversion problem is discussed through the attack and defense examples and using the game tree. Then the strategy expression of incomplete information static game and the engineering mathematics method of Bayes-Nash equilibrium are given. After that, it focuses on the offensive and defensive game problem of unsafe information network based on risk aversion. The problem of attack and defense is obtained by the issue of maximizing utility, and then the Bayes-Nash equilibrium of offense and defense game is carried out around the security risk of assets. Finally, the application model in network security penetration and defense is analyzed by designing a simulation example of attack and defense penetration. The analysis results show that the constructed income function model is feasible and practical.

Proceedings ArticleDOI
03 Jun 2019
TL;DR: Based on a variety of pruning optimization algorithms, PSO algorithm is applied to Amazons, combined with UCT algorithm, and the effectiveness and feasibility of the algorithm are validated, and ultimately the win rate and searching efficiency are effectively improved.
Abstract: Amazons is one of the complete information games. The win rate of searching algorithm based on game tree is closely related to the evaluation function. Accurate evaluation can not only greatly improve the win rate, but also improve the searching efficiency to some extent. Based on a variety of pruning optimization algorithms, PSO algorithm is applied to Amazons. Combined with UCT algorithm, Amazons evaluation optimization strategy is proposed. The effectiveness and feasibility of the algorithm are validated by experiments, and ultimately the win rate and searching efficiency are effectively improved.

Posted Content
Xu Cao1, Yanghao Lin1
TL;DR: In this paper, the authors combine adaptive dynamic programming (ADP), a reinforcement learning method and UCB applied to trees (UCT) algorithm with a more powerful heuristic function based on Progressive Bias method and two pruning strategies for a traditional board game Gomoku.
Abstract: We combine Adaptive Dynamic Programming (ADP), a reinforcement learning method and UCB applied to trees (UCT) algorithm with a more powerful heuristic function based on Progressive Bias method and two pruning strategies for a traditional board game Gomoku. For the Adaptive Dynamic Programming part, we train a shallow forward neural network to give a quick evaluation of Gomoku board situations. UCT is a general approach in MCTS as a tree policy. Our framework use UCT to balance the exploration and exploitation of Gomoku game trees while we also apply powerful pruning strategies and heuristic function to re-select the available 2-adjacent grids of the state and use ADP instead of simulation to give estimated values of expanded nodes. Experiment result shows that this method can eliminate the search depth defect of the simulation process and converge to the correct value faster than single UCT. This approach can be applied to design new Gomoku AI and solve other Gomoku-like board game.

Patent
16 Oct 2019
TL;DR: In this paper, an autonomous vehicle (AV) planning method comprises: receiving sensor inputs pertaining to an AV; processing the AV sensor inputs to determine an encountered driving scenario; in an AV planner, executing a tree search algorithm to determine a sequence of AV manoeuvres corresponding to a path through a constructed game tree; and generating AV control signals for executing the determined sequence of manoeuvres.
Abstract: An autonomous vehicle (AV) planning method comprises: receiving sensor inputs pertaining to an AV; processing the AV sensor inputs to determine an encountered driving scenario; in an AV planner, executing a tree search algorithm to determine a sequence of AV manoeuvres corresponding to a path through a constructed game tree; and generating AV control signals for executing the determined sequence of AV manoeuvres; wherein the game tree has a plurality of nodes representing anticipated states of the encountered driving scenario, and the anticipated driving scenario state of each child node is determined by updating the driving scenario state of its parent node based on (i) a candidate AV manoeuvre and (ii) an anticipated behaviour of at least one external agent in the encountered driving scenario.