scispace - formally typeset
Open AccessPosted Content

Automatic Bridge Bidding Using Deep Reinforcement Learning

Reads0
Chats0
TLDR
In this paper, a deep reinforcement learning model was proposed to learn to bid automatically based on the raw card data for bridge zero-sum games without the aid of human domain knowledge.
Abstract
Bridge is among the zero-sum games for which artificial intelligence has not yet outperformed expert human players. The main difficulty lies in the bidding phase of bridge, which requires cooperative decision making under partial information. Existing artificial intelligence systems for bridge bidding rely on and are thus restricted by human-designed bidding systems or features. In this work, we propose a pioneering bridge bidding system without the aid of human domain knowledge. The system is based on a novel deep reinforcement learning model, which extracts sophisticated features and learns to bid automatically based on raw card data. The model includes an upper-confidence-bound algorithm and additional techniques to achieve a balance between exploration and exploitation. Our experiments validate the promising performance of our proposed model. In particular, the model advances from having no knowledge about bidding to achieving superior performance when compared with a champion-winning computer bridge program that implements a human-designed bidding system.

read more

Citations
More filters
Journal ArticleDOI

The Hanabi Challenge: A New Frontier for AI Research

TL;DR: It is argued that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground and developing novel techniques for such theory of mind reasoning will not only be crucial for success in Hanabi, but also in broader collaborative efforts, especially those with human partners.
Journal ArticleDOI

Multisource Transfer Double DQN Based on Actor Learning

TL;DR: Experiments prove that MTDDQN achieves not only human-like actor learning transfer capability, but also the desired learning efficiency and testing accuracy on target task.
Posted Content

Learning to Communicate Implicitly By Actions

TL;DR: This work introduces a novel algorithm: Policy Belief Learning (PBL), which uses a belief module to model the other agent's private information and a policy module to form a distribution over actions informed by the belief module and proposes a novel auxiliary reward which incentivizes one agent to help its partner to make correct inferences about its private information.
Journal ArticleDOI

Automatic Bridge Bidding Using Deep Reinforcement Learning

TL;DR: A flexible and pioneering bridge-bidding system, which can learn either with or without the aid of human domain knowledge, based on a novel deep reinforcement learning model, which extracts sophisticated features and learns to bid automatically based on raw card data.
Posted Content

Joint Policy Search for Multi-agent Collaboration with Imperfect Information

TL;DR: It is shown global changes of game values can be decomposed to policy changes localized at each information set, with a novel term named policy-change density, and proposed Joint Policy Search (JPS) that iteratively improves joint policies of collaborative agents in imperfect information games, without re-evaluating the entire game.
References
More filters
Book

Reinforcement Learning: An Introduction

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Journal ArticleDOI

Mastering the game of Go with deep neural networks and tree search

TL;DR: Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
Posted Content

Playing Atari with Deep Reinforcement Learning

TL;DR: This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
Journal ArticleDOI

Technical Note : \cal Q -Learning

TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
Journal ArticleDOI

Finite-time Analysis of the Multiarmed Bandit Problem

TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Related Papers (5)