scispace - formally typeset
Search or ask a question

Showing papers on "Stochastic game published in 2018"


Book ChapterDOI
14 Apr 2018
TL;DR: In this paper, a two-player turn-based stochastic game is formulated to generate adversarial examples, where the first player's objective is to minimize the distance to an adversarial example by manipulating the features, and the second player can be cooperative, adversarial, or random.
Abstract: Despite the improved accuracy of deep neural networks, the discovery of adversarial examples has raised serious safety concerns. Most existing approaches for crafting adversarial examples necessitate some knowledge (architecture, parameters, etc) of the network at hand. In this paper, we focus on image classifiers and propose a feature-guided black-box approach to test the safety of deep neural networks that requires no such knowledge. Our algorithm employs object detection techniques such as SIFT (Scale Invariant Feature Transform) to extract features from an image. These features are converted into a mutable saliency distribution, where high probability is assigned to pixels that affect the composition of the image with respect to the human visual system. We formulate the crafting of adversarial examples as a two-player turn-based stochastic game, where the first player’s objective is to minimise the distance to an adversarial example by manipulating the features, and the second player can be cooperative, adversarial, or random. We show that, theoretically, the two-player game can converge to the optimal strategy, and that the optimal strategy represents a globally minimal adversarial image. For Lipschitz networks, we also identify conditions that provide safety guarantees that no adversarial examples exist. Using Monte Carlo tree search we gradually explore the game state space to search for adversarial examples. Our experiments show that, despite the black-box setting, manipulations guided by a perception-based saliency distribution are competitive with state-of-the-art methods that rely on white-box saliency matrices or sophisticated optimization procedures. Finally, we show how our method can be used to evaluate robustness of neural networks in safety-critical applications such as traffic sign recognition in self-driving cars.

213 citations


Journal ArticleDOI
01 Jul 2018-Nature
TL;DR: This framework shows which feedbacks between exploitation and environment—either naturally occurring or designed—help to overcome social dilemmas, and finds that the dependence of the public resource on previous interactions can greatly enhance the propensity for cooperation.
Abstract: Social dilemmas occur when incentives for individuals are misaligned with group interests1–7. According to the ‘tragedy of the commons’, these misalignments can lead to overexploitation and collapse of public resources. The resulting behaviours can be analysed with the tools of game theory8. The theory of direct reciprocity9–15 suggests that repeated interactions can alleviate such dilemmas, but previous work has assumed that the public resource remains constant over time. Here we introduce the idea that the public resource is instead changeable and depends on the strategic choices of individuals. An intuitive scenario is that cooperation increases the public resource, whereas defection decreases it. Thus, cooperation allows the possibility of playing a more valuable game with higher payoffs, whereas defection leads to a less valuable game. We analyse this idea using the theory of stochastic games16–19 and evolutionary game theory. We find that the dependence of the public resource on previous interactions can greatly enhance the propensity for cooperation. For these results, the interaction between reciprocity and payoff feedback is crucial: neither repeated interactions in a constant environment nor single interactions in a changing environment yield similar cooperation rates. Our framework shows which feedbacks between exploitation and environment—either naturally occurring or designed—help to overcome social dilemmas.

159 citations


Journal ArticleDOI
TL;DR: Numerical results demonstrate that the proposed algorithm can achieve superior performance in terms of average network delay and content distribution efficiency compared with the other heuristic schemes.
Abstract: Driven by the evolutionary development of automobile industry and cellular technologies, dependable vehicular connectivity has become essential to realize future intelligent transportation systems (ITS). In this paper, we investigate how to achieve dependable content distribution in device-to-device (D2D)-based cooperative vehicular networks by combining big data-based vehicle trajectory prediction with coalition formation game-based resource allocation. First, vehicle trajectory is predicted based on global positioning system and geographic information system data, which is critical for finding reliable and long-lasting vehicle connections. Then, the determination of content distribution groups with different lifetimes is formulated as a coalition formation game. We model the utility function based on the minimization of average network delay, which is transferable to the individual payoff of each coalition member according to its contribution. The merge and split process is implemented iteratively based on preference relations, and the final partition is proved to converge to a Nash-stable equilibrium. Finally, we evaluate the proposed algorithm based on real-world map and realistic vehicular traffic. Numerical results demonstrate that the proposed algorithm can achieve superior performance in terms of average network delay and content distribution efficiency compared with the other heuristic schemes.

142 citations


Journal ArticleDOI
TL;DR: The potential game approach enables us to study the existence and uniqueness of the Nash equilibrium and to design an online distributed algorithm to achieve that equilibrium and results show that the proposed algorithm can increase the energy hubs’ average payoff by 18.8%.
Abstract: With increasing the presence of co- and tri-generating units, energy hub operators are encouraged to optimally schedule the available energy resources in an economic way. This scheduling needs to be run in an online manner due to the uncertainties in energy prices and demands. In this paper, the real-time scheduling problem of energy hubs is formulated in a dynamic pricing market. The energy hubs interaction is modeled as an exact potential game to optimize each energy hub’s payments to the electricity and gas utilities, as well as the customers’ satisfaction from energy consumption. The potential game approach enables us to study the existence and uniqueness of the Nash equilibrium and to design an online distributed algorithm to achieve that equilibrium. Simulations results show that the proposed algorithm can increase the energy hubs’ average payoff by 18.8%. Furthermore, energy service companies can improve the technical performance of energy networks by reducing the peak-to-average ratio in the electricity and natural gas by 27% and 7%, respectively. When compared with a centralized approach with the objective of social welfare, the proposed algorithm has a significantly lower running time at the cost of lower social welfare.

129 citations


Journal ArticleDOI
TL;DR: A stochastic game-theoretic approach is proposed to analyze the optimal strategies that a power grid defender can adopt to protect the grid against coordinated attacks, and an optimal load shedding technique is devised to quantify the physical impacts of coordinated attacks.
Abstract: Due to the global reliance on the power grid, coordinated cyber-physical attacks on its critical infrastructure can lead to disastrous human and economic losses. In this paper, a stochastic game-theoretic approach is proposed to analyze the optimal strategies that a power grid defender can adopt to protect the grid against coordinated attacks. First, an optimal load shedding technique is devised to quantify the physical impacts of coordinated attacks. Taking these quantified impacts as input parameters, the interactions between a malicious attacker and the defender are modeled using a resource allocation stochastic game. The game is shown to admit a Nash equilibrium and a novel learning algorithm is introduced to enable the two players to reach their equilibrium strategies while maximizing their respective minimum rewards in a sequence of stages. The convergence of the proposed algorithm to a Nash equilibrium point is proved and its properties are studied. Simulation results of the stochastic game model on the WSCC 9-bus system and the IEEE 118-bus system are contrasted with those of static games, and show that different defense resources owned lead to different defense strategies.

124 citations


Journal ArticleDOI
TL;DR: An online load scheduling learning (LSL) algorithm based on the actor-critic method to determine the users’ MPE policy is developed and results show that the LSL algorithm can reduce the expected cost of users and the peak-to-average ratio in the aggregate load by 28% and 13%, respectively.
Abstract: Demand response program with real-time pricing can encourage electricity users toward scheduling their energy usage to off-peak hours. A user needs to schedule the energy usage of his appliances in an online manner since he may not know the energy prices and the demand of his appliances ahead of time. In this paper, we study the users’ long-term load scheduling problem and model the changes of the price information and load demand as a Markov decision process, which enables us to capture the interactions among users as a partially observable stochastic game. To make the problem tractable, we approximate the users’ optimal scheduling policy by the Markov perfect equilibrium (MPE) of a fully observable stochastic game with incomplete information. We develop an online load scheduling learning (LSL) algorithm based on the actor-critic method to determine the users’ MPE policy. When compared with the benchmark of not performing demand response, simulation results show that the LSL algorithm can reduce the expected cost of users and the peak-to-average ratio in the aggregate load by 28% and 13%, respectively. When compared with the short-term scheduling policies, the users with the long-term policies can reduce their expected cost by 17%.

120 citations


Journal ArticleDOI
TL;DR: A coevolutionary model where beside the payoff-driven competition of cooperator and defector players the level of a renewable resource depends sensitively on the fraction of cooperators and the total consumption of all players is considered.
Abstract: Utilizing common resources is always a dilemma for community members While cooperator players restrain themselves and consider the proper state of resources, defectors demand more than their supposed share for a higher payoff To avoid the tragedy of the common state, punishing the latter group seems to be an adequate reaction This conclusion, however, is less straightforward when we acknowledge the fact that resources are finite and even a renewable resource has limited growing capacity To clarify the possible consequences, we consider a coevolutionary model where beside the payoff-driven competition of cooperator and defector players the level of a renewable resource depends sensitively on the fraction of cooperators and the total consumption of all players The applied feedback-evolving game reveals that beside a delicately adjusted punishment it is also fundamental that cooperators should pay special attention to the growing capacity of renewable resources Otherwise, even the usage of tough punishment cannot save the community from an undesired end

116 citations


Journal ArticleDOI
TL;DR: This paper identifies sustainability of a supply chain with the equilibrium of the system over a long (but finite) period of time after integrating the various dimensions.

101 citations


Journal ArticleDOI
TL;DR: A zero-sum, hybrid state stochastic game model for designing defense policies for cyber-physical systems against different types of attacks is established and a suboptimal value iteration algorithm for a finite horizon game is proposed, and it is proved that the algorithm results an upper bound for the value of the finite horizongame.

72 citations


Journal ArticleDOI
TL;DR: The idea is to help companies who are considering a collaborative opportunity to evaluate the value of the information that would be shared so efforts are only expended on potential collaborations that have an acceptable reward for the risk.

69 citations


Journal ArticleDOI
TL;DR: This work investigates a virtualized RAN, where the CNC auctions channels at the beginning of scheduling slots to the mobile terminals (MTs) based on bids from their subscribing WSPs, and decomposes the decision making process of each WSP as a Markov decision process (MDP).
Abstract: How to allocate the limited wireless resource in dense radio access networks (RANs) remains challenging. By leveraging a software-defined control plane, the independent base stations (BSs) are virtualized as a centralized network controller (CNC). Such virtualization decouples the CNC from the wireless service providers (WSPs). We investigate a virtualized RAN, where the CNC auctions channels at the beginning of scheduling slots to the mobile terminals (MTs) based on bids from their subscribing WSPs. Each WSP aims at maximizing the expected long-term payoff from bidding channels to satisfy the MTs for transmitting packets. We formulate the problem as a stochastic game, where the channel auction and packet scheduling decisions of a WSP depend on the state of network and the control policies of its competitors. To approach the equilibrium solution, an abstract stochastic game is proposed with bounded regret. The decision making process of each WSP is modeled as a Markov decision process (MDP). To address the signalling overhead and computational complexity issues, we decompose the MDP into a series of single-agent MDPs with reduced state spaces, and derive an online localized algorithm to learn the state value functions. Our results show significant performance improvements in terms of per-MT average utility.

Journal ArticleDOI
TL;DR: Simulation results justify the convergence of the proposed algorithms and present new insights toward more efficient energy management in the smart grids.
Abstract: In this paper, the problem of the smart grid energy management under stochastic dynamics is investigated. In the considered model, at the demand side, it is assumed that customers can act as prosumers who own renewable energy sources and can both produce and consume energy. Due to the coupling between the prosumers’ decisions and the stochastic nature of renewable energy, the interaction among prosumers is formulated as a stochastic game, in which each prosumer seeks to maximize its payoff, in terms of revenues, by controlling its energy consumption and demand. In particular, the subjective behavior of prosumers is explicitly reflected into their payoff functions using the prospect theory, a powerful framework that allows modeling real-life human choices, rather than objective, user-agnostic decisions, as normative models do. For this prospect-based stochastic game, it is shown that there always exists a stationary Nash equilibrium where the prosumers’ trading policies in the equilibrium are independent of the time and their histories of the play. Moreover, to obtain one of such equilibrium policies, a novel distributed algorithm with no information sharing among prosumers is proposed and shown to converge to an $\epsilon$ -Nash equilibrium in which each prosumer is able to achieve its optimal payoff in an equilibrium up to a small additive error $\epsilon$ . On the other hand, at the supply side, the interaction between the utility company and the prosumers is formulated as an online optimization problem in which the utility company's goal is to learn its optimal energy allocation rules. For this case, it is shown that such an optimization problem admits a no-regret algorithm meaning that regardless of the actual outcome of the game among the prosumers, the utility company can follow a strategy that mitigates its allocation costs as if it knew the entire demand market a priori . Simulation results justify the convergence of the proposed algorithms and present new insights toward more efficient energy management in the smart grids.

Journal ArticleDOI
TL;DR: It is shown that the only strategies that enforce a linear relationship between the two players' payoffs are either the ZD strategies or unconditional strategies, where the latter independently cooperates with a fixed probability in each round of the game, proving a conjecture previously made for infinitely repeated games.

Journal ArticleDOI
TL;DR: This paper explores the coordination between a supplier and a buyer within a decentralized supply chain, through the use of quantity discounts in a game theoretic model, and proposes both cooperative and non-cooperative approaches considering that the product traded experiences a price sensitive demand.

Journal ArticleDOI
TL;DR: A robust approach to pricing and hedging in mathematical finance is pursued and a general pricing–hedging duality result is obtained: the infimum over superhedging prices of an exotic option with payoff G$G$ is equal to the supremum of expectations of expectations under calibrated martingale measures.
Abstract: We pursue a robust approach to pricing and hedging in mathematical finance. We consider a continuous-time setting in which some underlying assets and options, with continuous price paths, are available for dynamic trading and a further set of European options, possibly with varying maturities, is available for static trading. Motivated by the notion of prediction set in Mykland (Ann. Stat. 31:1413–1438, 2003), we include in our setup modelling beliefs by allowing to specify a set of paths to be considered, e.g. superreplication of a contingent claim is required only for paths falling in the given set. Our framework thus interpolates between model-independent and model-specific settings and allows us to quantify the impact of making assumptions or gaining information. We obtain a general pricing–hedging duality result: the infimum over superhedging prices of an exotic option with payoff $G$ is equal to the supremum of expectations of $G$ under calibrated martingale measures. Our results include in particular the martingale optimal transport duality of Dolinsky and Soner (Probab. Theory Relat. Fields 160:391–427, 2014) and extend it to multiple dimensions, multiple maturities and beliefs which are invariant under time-changes. In a general setting with arbitrary beliefs and for a uniformly continuous $G$ , the asserted duality holds between limiting values of perturbed problems.

Journal ArticleDOI
TL;DR: In this article, the authors assume that both payoff-based and conformity-based learning methods are present and compete for space within the framework of a coevolutionary model, and they reveal that the presence of a payoff-driven strategy learning method becomes exclusive for high sucker's payoff and/or high temptation values that represent a snowdrift game dilemma situation.
Abstract: Learning from a partner who collects a higher payoff is a frequently used working hypothesis in evolutionary game theory. One of the alternative dynamical rules is when the focal player prefers to follow the strategy choice of the majority in the local neighborhood, which is often called a conformity-driven strategy update. In this work we assume that both strategy learning methods are present and compete for space within the framework of a coevolutionary model. Our results reveal that the presence of a payoff-driven strategy learning method becomes exclusive for high sucker's payoff and/or high temptation values that represent a snowdrift game dilemma situation. In general, however, the competition of the mentioned strategy learning methods could be useful to enlarge the parameter space where only cooperators prevail. The success of cooperation is based on the enforced coordination of cooperator players which reveals the benefit of the latter strategy. Interestingly, the payoff-based and the conformity-based cooperator players can form an effective alliance against defectors that can also extend the parameter space of full cooperator solution in the stag-hunt game region. Our work highlights that the coevolution of strategies and individual features such as the learning method can provide a novel type of pattern formation mechanism that cannot be observed in a static model, and hence remains hidden in traditional models.

Journal ArticleDOI
TL;DR: The value function of each of the players can be approximated by the solution of a partial differential equation called the master equation and it is shown that it is governed by a solution to a stochastic differential equation.
Abstract: We consider an $n$-player symmetric stochastic game with weak interactions between the players. Time is continuous, and the horizon and the number of states are finite. We show that the value funct...

Journal ArticleDOI
TL;DR: In this article, the authors consider a coevolutionary model where the local cooperation level determines the payoff values of the applied prisoner's dilemma game and show that a higher cooperation level may change the environment in a way that is beneficial for all competitors.
Abstract: Exploiting others is beneficial individually but it could also be detrimental globally. The reverse is also true: a higher cooperation level may change the environment in a way that is beneficial for all competitors. To explore the possible consequence of this feedback we consider a coevolutionary model where the local cooperation level determines the payoff values of the applied prisoner's dilemma game. We observe that the coevolutionary rule provides a significantly higher cooperation level comparing to the traditional setup independently of the topology of the applied interaction graph. Interestingly, this cooperation supporting mechanism offers lonely defectors a high surviving chance for a long period hence the relaxation to the final cooperating state happens logarithmically slow. As a consequence, the extension of the traditional evolutionary game by considering interactions with the environment provides a good opportunity for cooperators, but their reward may arrive with some delay.

Journal ArticleDOI
TL;DR: New theoretical insights into Nash equilibrium-based asymptotic stability (NEAS) of two-group and three-group asymmetric evolutionary games in typical scenarios of electricity market (EM) are introduced and the complete dynamics behavior and multi-group evolutionary stable strategy (MESS) of the AEG system in 3-D mixed strategy space is demonstrated.
Abstract: This paper introduces new theoretical insights into Nash equilibrium-based asymptotic stability (NEAS) of two-group and three-group asymmetric evolutionary games in typical scenarios of electricity market (EM). EM competition has become a complex dynamic evolution process accomplished by more complex characteristics of market economy behavior. Replicator dynamics in evolutionary game theory, as well as Lyapunov stability theory, are employed to solve incomplete-information and bounded-rationality game issues in EM, so as to overcome theoretical demerits of classical game theory in solving multi-group games in EM. First, the NEAS of a unilateral two-group asymmetric evolutionary game (AEG) is investigated. Then, this is expanded to a complicated $2\times 2\times 2$ trilateral multi-group AEG, and the NEAS of it under different game situations in EM is thoroughly discussed. Finally, a practical case study is conducted for verification. The case illustrates how the factors affect the payoff matrix which will change ultimate evolutionary stable state of the multi-group AEG in EM. One main finding demonstrates the complete dynamics behavior and multi-group evolutionary stable strategy (MESS) of the AEG system in 3-D mixed strategy space. The other one reveals that EM policies formulated by government and other factors can gradually influence the MESS via changing the payoff distribution matrix.

Journal ArticleDOI
07 Mar 2018-EPL
TL;DR: This work explores how imitation- based or learning attitude and innovation-based or best response attitude compete for space in a complex model where both attitudes are available and a four-state solution can be observed for the stag-hunt parameter space.
Abstract: Evolution is based on the assumption that competing players update their strategies to increase their individual payoffs. However, while the applied updating method can be different, most of previous works proposed uniform models where players use identical way to revise their strategies. In this work we explore how imitation-based or learning attitude and innovation-based or myopic best-response attitude compete for space in a complex model where both attitudes are available. In the absence of additional cost the best response trait practically dominates the whole snow-drift game parameter space which is in agreement with the average payoff difference of basic models. When additional cost is involved then the imitation attitude can gradually invade the whole parameter space but this transition happens in a highly nontrivial way. However, the role of competing attitudes is reversed in the stag-hunt parameter space where imitation is more successful in general. Interestingly, a four-state solution can be observed for the latter game which is a consequence of an emerging cyclic dominance between possible states. These phenomena can be understood by analyzing the microscopic invasion processes, which reveals the unequal propagation velocities of strategies and attitudes.

Journal ArticleDOI
TL;DR: In this paper, a stochastic game of mean field type where the agents solve optimal stopping problems and interact through the proportion of players that have already stopped is formulated, and the agents interact with each other.
Abstract: We formulate a stochastic game of mean field type where the agents solve optimal stopping problems and interact through the proportion of players that have already stopped. Working with a continuum...

Journal ArticleDOI
14 Nov 2018-Energies
TL;DR: Simulation results demonstrate 169% increase in the total payoff compared to the imperialist competition algorithm, which proves the effectiveness, extensibility and flexibility of the presented approach in encouraging participants to join the market and boost their profits.
Abstract: The principal aim of this study is to devise a combined market operator and a distribution network operator structure for multiple home-microgrids (MH-MGs) connected to an upstream grid. Here, there are three distinct types of players with opposite intentions that can participate as a consumer and/or prosumer (as a buyer or seller) in the market. All players that are price makers can compete with each other to obtain much more possible profitability while consumers aim to minimize the market-clearing price. For modeling the interactions among partakers and implementing this comprehensive structure, a multi-objective function problem is solved by using a static, non-cooperative game theory. The propounded structure is a hierarchical bi-level controller, and its accomplishment in the optimal control of MH-MGs with distributed energy resources has been evaluated. The outcome of this algorithm provides the best and most suitable power allocation among different players in the market while satisfying each player’s goals. Furthermore, the amount of profit gained by each player is ascertained. Simulation results demonstrate 169% increase in the total payoff compared to the imperialist competition algorithm. This percentage proves the effectiveness, extensibility and flexibility of the presented approach in encouraging participants to join the market and boost their profits.

Posted ContentDOI
TL;DR: In this article, the uniqueness, comparative statics, and the approximation of a Nash equilibrium are determined by a precise relationship between the lowest eigenvalue of the network, a measure of players' payoff concavity, and a parameter capturing the strength of the strategic interaction among players.
Abstract: This paper studies strategic interaction in networks. We focus on games of strategic substitutes and strategic complements, and departing from previous literature, we do not assume particular functional forms on players' payoffs. By exploiting variational methods, we show that the uniqueness, the comparative statics, and the approximation of a Nash equilibrium are determined by a precise relationship between the lowest eigenvalue of the network, a measure of players' payoff concavity, and a parameter capturing the strength of the strategic interaction among players. We apply our framework to the study of aggregative network games, games of mixed interactions, and Bayesian network games.

Journal ArticleDOI
TL;DR: In this article, a game-theoretic perspective on the challenge of climate change mitigation is presented, where a non-cooperative coordination game, related to the stag hunt, with a brown equilibrium with lower payoffs and a green equilibrium with higher payoffs, is considered.

Journal ArticleDOI
05 Sep 2018
TL;DR: This is the first paper that effectively investigates computation offloading strategy optimization for multiple, heterogeneous, and competitive mobile users and multiple heterogeneous mobile edge clouds by using a non-cooperative game approach and makes noticeable contributions towards the understanding of a competing mobile edge computing environment and its stabilization.
Abstract: Computation offloading from a user equipment (UE, also called mobile user, mobile subscriber, or mobile device) to a mobile edge cloud (MEC) provides an effective way to virtualize an ordinary smart mobile device (e.g., smartphone, tablet, handheld computer, wearable device, and personal digital assistant) into a formidable equipment, which is able to provide more and stronger functionalities than that of a laptop or a desktop computer. It is conceivable that there can be several MECs with different processing capabilities in a geographic area, and each MEC many serve many UEs with endless sequences of computation tasks, various application characteristics, and diversified communication requirements and bandwidths. Furthermore, the mobile users are competitive and selfish, which means that computation offloading strategy optimization needs to be carried out for each individual mobile user to optimize the performance of only his applications. In this paper, we conduct a mathematical study of computation offloading strategy optimization for non-cooperative users in mobile edge computing by using a game theoretic approach. The main contributions of this paper can be summarized as follows. We establish an M/G/1 queueing model to characterize multiple heterogeneous UEs and MECs, so that the average response time of all offloadable and non-offloadable tasks generated on a UE can be calculated analytically and the optimal computation offloading strategy of a UE can be defined rigorously. We construct a non-cooperative game framework for a mobile edge computing environment, in which each player (i.e., a UE) can selfishly minimize his payoff by choosing an appropriate strategy in his strategy space. We prove the existence of the Nash equilibrium of the above game. We develop algorithms to find the Nash equilibrium, including an algorithm to find the best response of a mobile user and an iterative algorithm to find the Nash equilibrium. We demonstrate numerical examples and data of our game, including numerical data for the Nash equilibrium and numerical data for the convergence of the Nash equilibrium. To the best of the author's knowledge, this is the first paper that effectively investigates computation offloading strategy optimization for multiple, heterogeneous, and competitive mobile users and multiple heterogeneous mobile edge clouds by using a non-cooperative game approach. Hence, the paper makes noticeable contributions towards the understanding of a competing mobile edge computing environment and its stabilization.

Journal ArticleDOI
TL;DR: This paper analyzes the Government-Industry-University-Research (GIUR) intellectual property cooperation behavior and its influencing factors from market mechanism and administrative supervision mechanism and develops game models to study evolutionarily stable strategies of multi-stakeholders.

Proceedings Article
09 Jul 2018
TL;DR: It is shown that making a single agent prosocial, that is, making them care about the rewards of their partners can increase the probability that groups converge to good outcomes, and experimentally shows that this result carries over to a variety of more complex environments with Stag Hunt-like dynamics including ones where agents must learn from raw input pixels.
Abstract: Real world interactions are full of coordination problems [2, 3, 8, 14, 15] and thus constructing agents that can solve them is an important problem for artificial intelligence research. One of the simplest, most heavily studied coordination problems is the matrixform, two-player Stag Hunt. In the Stag Hunt, each player makes a choice between a risky action (hunt the stag) and a safe action (forage for mushrooms). Foraging for mushrooms always yields a safe payoff while hunting yields a high payoff if the other player also hunts but a very low payoff if one shows up to hunt alone. This game has two important Nash equilibria: either both players show up to hunt (this is called the payoff dominant equilibrium) or both players stay home and forage (this is called the risk-dominant equilibrium [7]).

Journal ArticleDOI
TL;DR: This paper investigates the evolutionary dynamic and strategy optimisation for a kind of networked evolutionary games whose strategy updating rules incorporate ‘bankruptcy’ mechanism, and the situation that each player's bankruptcy is due to the previous continuous low profits gaining from the game is considered.
Abstract: This paper investigates the evolutionary dynamic and strategy optimisation for a kind of networked evolutionary games whose strategy updating rules incorporate ‘bankruptcy’ mechanism, and the situation that each player's bankruptcy is due to the previous continuous low profits gaining from the game is considered. First, by using semi-tensor product of matrices method, the evolutionary dynamic of this kind of games is expressed as a higher order logical dynamic system and then converted into its algebraic form, based on which, the evolutionary dynamic of the given games can be discussed. Second, the strategy optimisation problem is investigated, and some free-type control sequences are designed to maximise the total payoff of the whole game. Finally, an illustrative example is given to show that our new results are very effective.

Journal ArticleDOI
TL;DR: This work study the repeated n-person public-goods game and search for a strategy that forms a cooperative Nash equilibrium in the presence of implementation error with a guarantee that the resulting payoff will be no less than any of the co-players'.

Journal ArticleDOI
TL;DR: It is found that unlike resonance effect by adding noise to payoff matrix in case of spatial prisoner's dilemma (SPD) games, adding time-varying noise on both effectiveness and cost does not make difference from the default setting without perturbation to the third strategy.
Abstract: Recently, a new vaccination game model was proposed, where an intermediate defense measure besides two fundamental strategies; committing vaccination that leads to a perfect immunity and not committing vaccination, was introduced as third strategy. We explore what happens if both effectiveness and cost of an intermediate defense measure stochastically perturbing on the viewpoint of whether or not the third strategy helping to improve total social payoff. We found that unlike resonance effect by adding noise to payoff matrix in case of spatial prisoner's dilemma (SPD) games, adding time-varying noise on both effectiveness and cost does not make difference from the default setting without perturbation to the third strategy. However, if the noise initially given to each agent is frozen, we found the third strategy becoming robust to survive. In particular, if the strategy updating rule allows a more advantageous third strategy can be more commonly shared among agents through copying, the total social payoff is significantly improved.