scispace - formally typeset
Open AccessJournal ArticleDOI

An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm

Reads0
Chats0
TLDR
An efficient hardware architecture that implements the Q-Learning algorithm, suitable for real-time applications, with low-power, high throughput and limited hardware resources, and a technique based on approximated multipliers to reduce the hardware complexity of the algorithm.
Abstract
In this paper we propose an efficient hardware architecture that implements the Q-Learning algorithm, suitable for real-time applications. Its main features are low-power, high throughput and limited hardware resources. We also propose a technique based on approximated multipliers to reduce the hardware complexity of the algorithm. We implemented the design on a Xilinx Zynq Ultrascale+ MPSoC ZCU106 Evaluation Kit. The implementation results are evaluated in terms of hardware resources, throughput and power consumption. The architecture is compared to the state of the art of Q-Learning hardware accelerators presented in the literature obtaining better results in speed, power and hardware resources. Experiments using different sizes for the Q-Matrix and different wordlengths for the fixed point arithmetic are presented. With a Q-Matrix of size $8\times4$ (8 bit data) we achieved a throughput of 222 MSPS (Mega Samples Per Second) and a dynamic power consumption of 37 mW, while with a Q-Matrix of size $256\times16$ (32 bit data) we achieved a throughput of 93 MSPS and a power consumption 611 mW. Due to the small amount of hardware resources required by the accelerator, our system is suitable for multi-agent IoT applications. Moreover, the architecture can be used to implement the SARSA (State-Action-Reward-State-Action) Reinforcement Learning algorithm with minor modifications.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Printed synaptic transistor-based electronic skin for robots to feel and learn

TL;DR: This work presents an approach to realize synaptic transistors (12-by-14 array) using ZnO nanowires printed on flexible substrate with 100% yield and high uniformity, and demonstrates excellent bio-like synaptic behavior and show great potential for in-hardware learning.
Journal ArticleDOI

Indoor Localization System Based on Bluetooth Low Energy for Museum Applications

TL;DR: A system allowing the accurate indoor localization of people visiting a museum or any other cultural institution, based on a feed-forward neural network trained by a measurement campaign in the considered environment and on a non-linear least square algorithm.
Journal ArticleDOI

Optimized CNNs to Indoor Localization through BLE Sensors Using Improved PSO.

TL;DR: This paper developed a Convolutional Neural Network (CNN) based positioning model based on the 2D image composed of the received number of signals indicator from both x and y-axes, and adopted machine learning and deep learning algorithms for predicting a user’s location in an indoor environment.
Book ChapterDOI

Machine Learning Approaches for Smart City Applications: Emergence, Challenges and Opportunities

TL;DR: In this article , the role of Machine Learning (ML) algorithms and Deep Reinforcement Learning (DRL) and Artificial Intelligence (AI) in the development of the smart city is discussed.
Journal ArticleDOI

A Survey of Domain-Specific Architectures for Reinforcement Learning

- 01 Jan 2022 - 
TL;DR: A review of hardware architectures for the acceleration of reinforcement learning algorithms is presented in this article , where FPGA-based implementations are the focus of this work, but GPU-based approaches are considered as well.
References
More filters
Book

Reinforcement Learning: An Introduction

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Journal ArticleDOI

Technical Note : \cal Q -Learning

TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
Posted Content

Trust Region Policy Optimization

TL;DR: Trust Region Policy Optimization (TRPO) as mentioned in this paper is an iterative procedure for optimizing policies, with guaranteed monotonic improvement, which is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks.
Journal ArticleDOI

Consistency in Networks of Relations

TL;DR: The primary aim is to provide an accessible, unified framework, within which to present the algorithms including a new path consistency algorithm, to discuss their relationships and the may applications, both realized and potential of network consistency algorithms.
Journal ArticleDOI

Low-power CMOS digital design

TL;DR: In this paper, techniques for low power operation are presented which use the lowest possible supply voltage coupled with architectural, logic style, circuit, and technology optimizations to reduce power consumption in CMOS digital circuits while maintaining computational throughput.
Related Papers (5)
Trending Questions (1)
How does Q-Star compare to other reinforcement learning algorithms?

The provided paper does not compare Q-Star to other reinforcement learning algorithms.