scispace - formally typeset
Search or ask a question
Author

Marco Matta

Bio: Marco Matta is an academic researcher from University of Rome Tor Vergata. The author has contributed to research in topics: Field-programmable gate array & Hardware acceleration. The author has an hindex of 6, co-authored 22 publications receiving 99 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: An efficient hardware architecture that implements the Q-Learning algorithm, suitable for real-time applications, with low-power, high throughput and limited hardware resources, and a technique based on approximated multipliers to reduce the hardware complexity of the algorithm.
Abstract: In this paper we propose an efficient hardware architecture that implements the Q-Learning algorithm, suitable for real-time applications. Its main features are low-power, high throughput and limited hardware resources. We also propose a technique based on approximated multipliers to reduce the hardware complexity of the algorithm. We implemented the design on a Xilinx Zynq Ultrascale+ MPSoC ZCU106 Evaluation Kit. The implementation results are evaluated in terms of hardware resources, throughput and power consumption. The architecture is compared to the state of the art of Q-Learning hardware accelerators presented in the literature obtaining better results in speed, power and hardware resources. Experiments using different sizes for the Q-Matrix and different wordlengths for the fixed point arithmetic are presented. With a Q-Matrix of size $8\times4$ (8 bit data) we achieved a throughput of 222 MSPS (Mega Samples Per Second) and a dynamic power consumption of 37 mW, while with a Q-Matrix of size $256\times16$ (32 bit data) we achieved a throughput of 93 MSPS and a power consumption 611 mW. Due to the small amount of hardware resources required by the accelerator, our system is suitable for multi-agent IoT applications. Moreover, the architecture can be used to implement the SARSA (State-Action-Reward-State-Action) Reinforcement Learning algorithm with minor modifications.

71 citations

Journal ArticleDOI
TL;DR: A novel approach for swarm reinforcement learning that extends the standard Q-learning to multi-agent systems by developing a Q- learning real-time swarm algorithm (Q-RTS), which is iteration-based and suitable for real- time systems.
Abstract: The authors introduce a novel approach for swarm reinforcement learning that extends the standard Q-learning to multi-agent systems. State-of-the-art methods implement a knowledge sharing mechanism between the agents that is triggered by the episodes succession. This causes an intrinsic limit in the convergence speed of the algorithms. They overcame this issue by developing a Q-learning real-time swarm algorithm (Q-RTS), which is iteration-based and suitable for real-time systems. Q-RTS was tested in different environments and compared to other related methods in the literature. They obtained positive results in terms of learning time and scalability, i.e. achieving a speed-up factor of at least 1.49 with respect to standard Q-learning. Moreover, Q-RTS shows enhanced learning performance as the environments complexity increases.

22 citations

Journal ArticleDOI
TL;DR: The design of an RL Agent able to learn the behavior of a Timing Recovery Loop (TRL) through the Q-Learning algorithm is proposed and it is able to adapt its behavior to different modulation formats without the need of any tuning for the system parameters.
Abstract: Machine Learning (ML) based on supervised and unsupervised learning models has been recently applied in the telecommunication field. However, such techniques rely on application-specific large datasets and the performance deteriorates if the statistics of the inference data changes over time. Reinforcement Learning (RL) is a solution to these issues because it is able to adapt its behavior to the changing statistics of the input data. In this work, we propose the design of an RL Agent able to learn the behavior of a Timing Recovery Loop (TRL) through the Q-Learning algorithm. The Agent is compatible with popular PSK and QAM formats. We validated the RL synchronizer by comparing it to the Mueller and Muller TRL in terms of Modulation Error Ratio (MER) in a noisy channel scenario. The results show a good trade-off in terms of MER performance. The RL based synchronizer loses less than 1 dB of MER with respect to the conventional one but it is able to adapt its behavior to different modulation formats without the need of any tuning for the system parameters.

18 citations

Journal ArticleDOI
TL;DR: This paper presents an FPGA implementation of a hand-written number recognition system based on CNN using a 9-bit representation that allows for avoiding the use of DSP and can reach a classification accuracy of 90%.
Abstract: Convolutional Neural Networks (CNNs) are the state-of-the-art in computer vision for different purposes such as image and video classification, recommender systems and natural language processing. The connectivity pattern between CNNs neurons is inspired by the structure of the animal visual cortex. In order to allow the processing, they are realized with multiple parallel 2-dimensional FIR filters that convolve the input signal with the learned feature maps. For this reason, a CNN implementation requires highly parallel computations that cannot be achieved using traditional general-purpose processors, which is why they benefit from a very significant speed-up when mapped and run on Field Programmable Gate Arrays (FPGAs). This is because FPGAs offer the capability to design full customizable hardware architectures, providing high flexibility and the availability of hundreds to thousands of on-chip Digital Signal Processing (DSP) blocks. This paper presents an FPGA implementation of a hand-written number recognition system based on CNN. The system has been characterized in terms of classification accuracy, area, speed, and power consumption. The neural network was implemented on a Xilinx XC7A100T FPGA, and it uses 29.69% of Slice LUTs, 4.42% of slice registers and 52.50% block RAMs. We designed the system using a 9-bit representation that allows for avoiding the use of DSP. For this reason, multipliers are implemented using LUTs. The proposed architecture can be easily scaled on different FPGA devices thank its regularity. CNN can reach a classification accuracy of 90%.

16 citations

Book ChapterDOI
26 Sep 2018
TL;DR: An efficient FPGA implementation of an Ensemble based on Long Short-Term Memory Networks (LSTM) using the Partial Reconfiguration function available for FPGAs is presented.
Abstract: Ensemble Machine Learning (EML) consists of the combination of multiple Artificial Intelligence algorithms. This paper presents an efficient FPGA implementation of an Ensemble based on Long Short-Term Memory Networks (LSTM). For an efficient implementation, the proposed design uses the Partial Reconfiguration function available for FPGAs. Results are presented in terms of resources utilization, reconfiguration speed, power consumption and maximum clock frequency.

11 citations


Cited by
More filters
01 Jan 1990
TL;DR: An overview of the self-organizing map algorithm, on which the papers in this issue are based, is presented in this article, where the authors present an overview of their work.
Abstract: An overview of the self-organizing map algorithm, on which the papers in this issue are based, is presented in this article.

2,933 citations

01 Jan 2016
TL;DR: The digital signal processing a computer based approach is universally compatible with any devices to read and is available in the digital library an online access to it is set as public so you can download it instantly.
Abstract: digital signal processing a computer based approach is available in our digital library an online access to it is set as public so you can download it instantly. Our books collection saves in multiple countries, allowing you to get the most less latency time to download any of our books like this one. Merely said, the digital signal processing a computer based approach is universally compatible with any devices to read.

343 citations

Journal ArticleDOI
TL;DR: A detailed taxonomy of the main multi-agent approaches proposed in the literature, focusing on their related mathematical models is presented, including nonstationarity, scalability, and observability.
Abstract: In this review, we present an analysis of the most used multi-agent reinforcement learning algorithms. Starting with the single-agent reinforcement learning algorithms, we focus on the most critical issues that must be taken into account in their extension to multi-agent scenarios. The analyzed algorithms were grouped according to their features. We present a detailed taxonomy of the main multi-agent approaches proposed in the literature, focusing on their related mathematical models. For each algorithm, we describe the possible application fields, while pointing out its pros and cons. The described multi-agent algorithms are compared in terms of the most important characteristics for multi-agent reinforcement learning applications—namely, nonstationarity, scalability, and observability. We also describe the most common benchmark environments used to evaluate the performances of the considered methods.

96 citations

01 Jan 2016
TL;DR: This software defined radio architectures systems and functions will help people to enjoy a good book with a cup of coffee in the afternoon instead of having to cope with some malicious virus inside their computer.
Abstract: Thank you for downloading software defined radio architectures systems and functions. As you may know, people have search numerous times for their favorite novels like this software defined radio architectures systems and functions, but end up in harmful downloads. Rather than enjoying a good book with a cup of coffee in the afternoon, instead they cope with some malicious virus inside their computer.

78 citations

Journal ArticleDOI
TL;DR: An efficient hardware architecture that implements the Q-Learning algorithm, suitable for real-time applications, with low-power, high throughput and limited hardware resources, and a technique based on approximated multipliers to reduce the hardware complexity of the algorithm.
Abstract: In this paper we propose an efficient hardware architecture that implements the Q-Learning algorithm, suitable for real-time applications. Its main features are low-power, high throughput and limited hardware resources. We also propose a technique based on approximated multipliers to reduce the hardware complexity of the algorithm. We implemented the design on a Xilinx Zynq Ultrascale+ MPSoC ZCU106 Evaluation Kit. The implementation results are evaluated in terms of hardware resources, throughput and power consumption. The architecture is compared to the state of the art of Q-Learning hardware accelerators presented in the literature obtaining better results in speed, power and hardware resources. Experiments using different sizes for the Q-Matrix and different wordlengths for the fixed point arithmetic are presented. With a Q-Matrix of size $8\times4$ (8 bit data) we achieved a throughput of 222 MSPS (Mega Samples Per Second) and a dynamic power consumption of 37 mW, while with a Q-Matrix of size $256\times16$ (32 bit data) we achieved a throughput of 93 MSPS and a power consumption 611 mW. Due to the small amount of hardware resources required by the accelerator, our system is suitable for multi-agent IoT applications. Moreover, the architecture can be used to implement the SARSA (State-Action-Reward-State-Action) Reinforcement Learning algorithm with minor modifications.

71 citations