scispace - formally typeset
Proceedings ArticleDOI

Utility Based Q-learning to Maintain Cooperation in Prisoner's Dilemma Games

Koichi Moriyama
- pp 146-152
Reads0
Chats0
TLDR
This work derives a theorem on how many times the cooperation is needed to make the Q- function of cooperation larger than that of defection, and derives a corollary on how much utility is necessary to make that function larger by one-shot mutual cooperation.
Abstract
This work deals with Q-learning in a multiagent environment. There are many multiagent Q-learning methods, and most of them aim to converge to a Nash equilibrium, which is not desirable in games like the prisoner's dilemma (PD). However, normal Q-learning agents that use a stochastic method in choosing actions to avoid local optima may bring mutual cooperation in PD. Although such mutual cooperation usually occurs singly, it can be maintained if the Q- function of cooperation becomes larger than that of defection after the cooperation. This work derives a theorem on how many times the cooperation is needed to make the Q- function of cooperation larger than that of defection. In addition, from the perspective of the author's previous works that discriminate utilities from rewards and use utilities for learning in PD, this work also derives a corollary on how much utility is necessary to make the Q-function larger by one-shot mutual cooperation.

read more

Citations
More filters
Journal ArticleDOI

Algorithms, machine learning, and collusion

TL;DR: Whether self-learning price-setting algorithms can coordinate their pricing behavior to achieve a collusive outcome that maximizes the joint profits of the firms using them is discussed.
Posted Content

Algorithmic Pricing and Competition: Empirical Evidence from the German Retail Gasoline Market

TL;DR: In this article, the authors investigate the impact of AI adoption on outcomes linked to competition in the German retail gasoline market and show that adoption in-creases margins by 9%, but only in non-monopoly markets.
Proceedings ArticleDOI

Learning-Rate Adjusting Q-Learning for Prisoner's Dilemma Games

TL;DR: This paper introduces a new Q-learning method called the learning-rate adjusting Q- learning, or LRA-Q, which deals with the learning rate directly and aims to converge to a Nash equilibrium.
Proceedings ArticleDOI

Maintaining cooperation in homogeneous multi-agent system

TL;DR: It is shown that the system can maintain certain level of cooperation though the agents are individually rational, and a mathematical model is developed to analyze the dynamics resulting from the learning framework.

Multi-agent learning in complex environments

TL;DR: Experimental results show that agents using the proposed approaches can learn efficient coordinated behaviors in domains of different sizes, and a collective multi-agent learning framework is proposed to study the impact of agent local collective learning on the emergence of social norms in a number of different settings in terms of agent heterogeneities and topological varieties.
References
More filters
Book

Reinforcement Learning: An Introduction

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Book

The Evolution of Cooperation

TL;DR: In this paper, a model based on the concept of an evolutionarily stable strategy in the context of the Prisoner's Dilemma game was developed for cooperation in organisms, and the results of a computer tournament showed how cooperation based on reciprocity can get started in an asocial world, can thrive while interacting with a wide range of other strategies, and can resist invasion once fully established.
Journal ArticleDOI

Technical Note Q-Learning

TL;DR: In this article, it is shown that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action values are represented discretely.
Book ChapterDOI

Markov games as a framework for multi-agent reinforcement learning

TL;DR: A Q-learning-like algorithm for finding optimal policies and its application to a simple two-player game in which the optimal policy is probabilistic is demonstrated.
Related Papers (5)