scispace - formally typeset
Search or ask a question
Author

Mário P. Véstias

Bio: Mário P. Véstias is an academic researcher from Instituto Superior de Engenharia de Lisboa. The author has contributed to research in topics: Field-programmable gate array & Deep learning. The author has an hindex of 11, co-authored 93 publications receiving 538 citations. Previous affiliations of Mário P. Véstias include INESC-ID & Polytechnic Institute of Lisbon.


Papers
More filters
Proceedings ArticleDOI
20 Oct 2014
TL;DR: In this article, the authors compare the trends of these computing architectures for high-performance computing and survey these platforms in the execution of algorithms belonging to different scientific application domains, showing that FPGAs are increasing the gap to GPUs and many-core CPUs moving them away from highperformance computing with intensive floating-point calculations.
Abstract: Floating-point computing with more than one TFLOP of peak performance is already a reality in recent Field-Programmable Gate Arrays (FPGA). General-Purpose Graphics Processing Units (GPGPU) and recent many-core CPUs have also taken advantage of the recent technological innovations in integrated circuit (IC) design and had also dramatically improved their peak performances. In this paper, we compare the trends of these computing architectures for high-performance computing and survey these platforms in the execution of algorithms belonging to different scientific application domains. Trends in peak performance, power consumption and sustained performances, for particular applications, show that FPGAs are increasing the gap to GPUs and many-core CPUs moving them away from high-performance computing with intensive floating-point calculations. FPGAs become competitive for custom floating-point or fixed-point representations, for smaller input sizes of certain algorithms, for combinational logic problems and parallel map-reduce problems.

77 citations

Journal ArticleDOI
TL;DR: Reconfigurable computing is being considered for inference on edge due to its high performance and energy efficiency while keeping a high hardware flexibility that allows for the easy adaption of the target computing platform to the CNN model.
Abstract: The convolutional neural network (CNN) is one of the most used deep learning models for image detection and classification, due to its high accuracy when compared to other machine learning algorithms. CNNs achieve better results at the cost of higher computing and memory requirements. Inference of convolutional neural networks is therefore usually done in centralized high-performance platforms. However, many applications based on CNNs are migrating to edge devices near the source of data due to the unreliability of a transmission channel in exchanging data with a central server, the uncertainty about channel latency not tolerated by many applications, security and data privacy, etc. While advantageous, deep learning on edge is quite challenging because edge devices are usually limited in terms of performance, cost, and energy. Reconfigurable computing is being considered for inference on edge due to its high performance and energy efficiency while keeping a high hardware flexibility that allows for the easy adaption of the target computing platform to the CNN model. In this paper, we described the features of the most common CNNs, the capabilities of reconfigurable computing for running CNNs, the state-of-the-art of reconfigurable computing implementations proposed to run CNN models, as well as the trends and challenges for future edge reconfigurable platforms.

77 citations

Journal ArticleDOI
TL;DR: This paper reviews the main research directions for edge computing deep learning algorithms and suggests new resource and energy-oriented deep learning models, as well as new computing platforms.
Abstract: Deep learning is now present in a wide range of services and applications, replacing and complementing other machine learning algorithms. Performing training and inference of deep neural networks using the cloud computing model is not viable for applications where low latency is required. Furthermore, the rapid proliferation of the Internet of Things will generate a large volume of data to be processed, which will soon overload the capacity of cloud servers. One solution is to process the data at the edge devices themselves, in order to alleviate cloud server workloads and improve latency. However, edge devices are less powerful than cloud servers, and many are subject to energy constraints. Hence, new resource and energy-oriented deep learning models are required, as well as new computing platforms. This paper reviews the main research directions for edge computing deep learning algorithms.

35 citations

Proceedings ArticleDOI
23 Sep 2008
TL;DR: This paper introduces a novel approach to the design of a decimal multiplier in FPGA using the embedded arithmetic blocks and a novel method for binary to BCD conversion and results indicate that the proposed binary toBCD converter is more efficient than the traditional shift and add-3 algorithm.
Abstract: Decimal arithmetic has become a major necessity in computer arithmetic operations associated with human-centric applications, like financial and commercial, because the results must match exactly those obtained by human calculations. The relevance of decimal arithmetic has become evident with the revision of the IEEE-754 standard to include decimal floating-point support. There are already a variety of IP cores available for implementing binary arithmetic accelerators in FPGAs. Thus far, however, little work has been done with regard to implementing cores that work with decimal arithmetic. In this paper, we introduce a novel approach to the design of a decimal multiplier in FPGA using the embedded arithmetic blocks and a novel method for binary to BCD conversion. The proposed circuits were implemented in a Xilinx Virtex 4sx35ff877-12 FPGA. The results indicate that the proposed binary to BCD converter is more efficient than the traditional shift and add-3 algorithm and that the proposed decimal multiplier is very competitive when compared to decimal multipliers implemented with direct manipulation of BCD numbers.

32 citations

Proceedings ArticleDOI
24 Mar 2010
TL;DR: This paper analyzes the tradeoffs involved in the design of a parallel decimal multiplier, for decimal operands with 8 and 16 digits, using existent coarse-grained embedded binary arithmetic blocks and indicates that the proposed parallel multipliers are very competitive when compared to decimal multipliers implemented with direct manipulation of BCD numbers.
Abstract: Human-centric applications, like financial and commercial, depend on decimal arithmetic since the results must match exactly those obtained by human calculations. The IEEE-754 2008 standard for floating point arithmetic has definitely recognized the importance of decimal for computer arithmetic. A number of hardware approaches have already been proposed for decimal arithmetic operations, including addition, subtraction, multiplication and division. However, few efforts have been done to develop decimal IP cores able to take advantage of the binary multipliers available in most reconfigurable computing architectures. In this paper, we analyze the tradeoffs involved in the design of a parallel decimal multiplier, for decimal operands with 8 and 16 digits, using existent coarse-grained embedded binary arithmetic blocks. The proposed circuits were implemented in a Xilinx Virtex 4 FPGA. The results indicate that the proposed parallel multipliers are very competitive when compared to decimal multipliers implemented with direct manipulation of BCD numbers.

31 citations


Cited by
More filters
Journal Article

192 citations

Journal ArticleDOI
TL;DR: This book is an excellent state-of-the-art review of RC and would be a worthwhile acquisition by anyone seriously considering speeding up a specific application.
Abstract: Reconfigurable Computing. Accelerating Computation with Field-Programmable Gate Arrays by Maya B. Gokhale and Paul S. Graham Springer, 2005, 238 pp. ISBN-13 978-0387-26105-8, $87.20 Reconfigurable Computing Accelerating Computation with Field-Programmable Gate Arrays is an expository and easy to digest book. The authors are recognized leaders with many years of experience on the field of reconfigurable computing. The book is written so that non-specialists can understand the principles, techniques and algorithms. Each chapter has many excellent references for interested readers. It surveys methods, algorithms, programming languages and applications targeted to reconfigurable computing. Automatic generation of parallel code from a sequential program on conventional micro-processor architectures remains an open problem. Nevertheless, a wide range of computationally intensive applications have benefited from many tools developed to tackle such a problem. For RC, it is even a much harder problem (perhaps 10x and up) and intense research is being devoted to make RC a common-place practical tool. The aim of the authors is threefold. First, guide the readers to know current issues on HLL for RC. Second, help the readers understand the intricate process of algorithmic-to-hardware compilation. And third, show that, even though this process is painful, if the application is suitable for RC the gains in performance are huge. The book is divided into two parts. The first part contains four chapters about reconfigurable computing and languages. Chapter 1 presents an introduction of RC, contrasting conventional fixed instruction microprocessors with RC architectures. This chapter also contains comprehensive reference material for further reading. Chapter 2 introduces reconfigurable logic devices by explaining the basic architecture and configuration of FPGAs. Chapter 3 deals with RC systems by discussing how parallel processing is achieved on reconfigurable computers and also gives a survey of RC systems today. Then, in chapter 4, languages, compilation, debugging and their related manual vs. automatic issues are discussed. The second part of the book comprises five chapters about applications of RC. Chapter 5 and 6 discuss digital signal and image processing applications. Chapter 7 covers the application of RC to secure network communications. The aim of Chapter 8 is to discuss some important bioinformatics applications for which RC is a good candidate, their algorithmic problems and hardware implementations. Finally, Chapter 9 covers two applications of reconfigurable supercomputers. The first one is a simulation of radiative heat transfer and the second one models large urban road traffic. This book is neither a technical nor a text book, but in the opinion of this reviewer, it is an excellent state-of-the-art review of RC and would be a worthwhile acquisition by anyone seriously considering speeding up a specific application. On the downside, it is somewhat disappointing that the book does not contain more information about HLL tools that could be used to help close the gap between traditional HPC community and the raw computing power of RC. Edusmildo Orozco, Department of Computer Science, University Of Puerto Rico at Rio Piedras.

105 citations

Journal ArticleDOI
TL;DR: In this article, a survey of the state-of-the-art software-defined radio (SDR) platforms in the context of wireless communication protocols is presented, with a focus on programmability, flexibility, portability, and energy efficiency.

91 citations

Proceedings ArticleDOI
20 Oct 2014
TL;DR: In this article, the authors compare the trends of these computing architectures for high-performance computing and survey these platforms in the execution of algorithms belonging to different scientific application domains, showing that FPGAs are increasing the gap to GPUs and many-core CPUs moving them away from highperformance computing with intensive floating-point calculations.
Abstract: Floating-point computing with more than one TFLOP of peak performance is already a reality in recent Field-Programmable Gate Arrays (FPGA). General-Purpose Graphics Processing Units (GPGPU) and recent many-core CPUs have also taken advantage of the recent technological innovations in integrated circuit (IC) design and had also dramatically improved their peak performances. In this paper, we compare the trends of these computing architectures for high-performance computing and survey these platforms in the execution of algorithms belonging to different scientific application domains. Trends in peak performance, power consumption and sustained performances, for particular applications, show that FPGAs are increasing the gap to GPUs and many-core CPUs moving them away from high-performance computing with intensive floating-point calculations. FPGAs become competitive for custom floating-point or fixed-point representations, for smaller input sizes of certain algorithms, for combinational logic problems and parallel map-reduce problems.

77 citations

Journal ArticleDOI
TL;DR: Reconfigurable computing is being considered for inference on edge due to its high performance and energy efficiency while keeping a high hardware flexibility that allows for the easy adaption of the target computing platform to the CNN model.
Abstract: The convolutional neural network (CNN) is one of the most used deep learning models for image detection and classification, due to its high accuracy when compared to other machine learning algorithms. CNNs achieve better results at the cost of higher computing and memory requirements. Inference of convolutional neural networks is therefore usually done in centralized high-performance platforms. However, many applications based on CNNs are migrating to edge devices near the source of data due to the unreliability of a transmission channel in exchanging data with a central server, the uncertainty about channel latency not tolerated by many applications, security and data privacy, etc. While advantageous, deep learning on edge is quite challenging because edge devices are usually limited in terms of performance, cost, and energy. Reconfigurable computing is being considered for inference on edge due to its high performance and energy efficiency while keeping a high hardware flexibility that allows for the easy adaption of the target computing platform to the CNN model. In this paper, we described the features of the most common CNNs, the capabilities of reconfigurable computing for running CNNs, the state-of-the-art of reconfigurable computing implementations proposed to run CNN models, as well as the trends and challenges for future edge reconfigurable platforms.

77 citations