Showing papers on "Reconfigurable computing published in 2018"

PDF

Open Access

Journal Article•DOI•

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

[...]

Stylianos I. Venieris¹, Alexandros Kouris¹, Christos-Savvas Bouganis¹•Institutions (1)

12 Jun 2018-ACM Computing Surveys

TL;DR: A survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics, which include the supported applications, architectural choices, design space exploration methods, and achieved performance.

...read moreread less

Abstract: In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep-learning ecosystem to provide a tunable balance between performance, power consumption, and programmability. In this article, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics, which include the supported applications, architectural choices, design space exploration methods, and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete, and in-depth evaluation of CNN-to-FPGA toolflows.

...read moreread less

167 citations

Posted Content•

Accelerating CNN inference on FPGAs: A Survey

[...]

Kamel Abdelouahab, Maxime Pelcat, Jocelyn Serot, François Berry

13 Mar 2018-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators and will fuel the future advances on effcient hardware deep learning.

...read moreread less

Abstract: Convolutional Neural Networks (CNNs) are currently adopted to solve an ever greater number of problems, ranging from speech recognition to image classification and segmentation The large amount of processing required by CNNs calls for dedicated and tailored hardware support methods Moreover, CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as FPGAs The amount and diversity of research on the subject of CNN FPGA acceleration within the last 3 years demonstrates the tremendous industrial and academic interest This paper presents a state-of-the-art of CNN inference accelerators over FPGAs The computational workloads, their parallelism and the involved memory accesses are analyzed At the level of neurons, optimizations of the convolutional and fully connected layers are explained and the performances of the different methods compared At the network level, approximate computing and datapath optimization methods are covered and state-of-the-art approaches compared The methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators and will fuel the future advances on effcient hardware deep learning

...read moreread less

114 citations

Journal Article•DOI•

Onboard Processing With Hybrid and Reconfigurable Computing on Small Satellites

[...]

Alan D. George¹, Christopher M. Wilson¹•Institutions (1)

University of Pittsburgh¹

27 Feb 2018

TL;DR: It is demonstrated how system designers can exploit hybrid and reconfigurable computing on SmallSats to harness these advantages for a variety of purposes, and several recent missions by NASA and industry that feature these principles and technologies are highlighted.

...read moreread less

Abstract: Due to the increasing demands of onboard sensor and autonomous processing, one of the principal needs and challenges for future spacecraft is onboard computing. Space computers must provide high performance and reliability (which are often at odds), using limited resources (power, size, weight, and cost), in an extremely harsh environment (due to radiation, temperature, vacuum, and vibration). As spacecraft shrink in size, while assuming a growing role for science and defense missions, the challenges for space computing become particularly acute. For example, processing capabilities on CubeSats (smaller class of SmallSats) have been extremely limited to date, often featuring microcontrollers with performance and reliability barely sufficient to operate the vehicle let alone support various sensor and autonomous applications. This article surveys the challenges and opportunities of onboard computers for small satellites (SmallSats) and focuses upon new concepts, methods, and technologies that are revolutionizing their capabilities, in terms of two guiding themes: hybrid computing and reconfigurable computing. These innovations are of particular need and value to CubeSats and other Smallsats. With new technologies, such as CHREC Space Processor (CSP), we demonstrate how system designers can exploit hybrid and reconfigurable computing on SmallSats to harness these advantages for a variety of purposes, and we highlight several recent missions by NASA and industry that feature these principles and technologies.

...read moreread less

101 citations

Proceedings Article•DOI•

Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software Programmable FPGAs

[...]

Yuan Zhou¹, Udit Gupta², Steve Dai¹, Ritchie Zhao¹, Nitish Srivastava¹, Hanchen Jin¹, Joseph Featherston¹, Yi-Hsiang Lai¹, Gai Liu¹, Gustavo Angarita Velasquez³, Wenping Wang⁴, Zhiru Zhang¹ - Show less +8 more•Institutions (4)

Cornell University¹, Harvard University², National University of Colombia³, Zhejiang University⁴

15 Feb 2018

TL;DR: Rosetta is a realistic benchmark suite for software programmable FPGAs that can be useful for the HLS research community, but can also serve as a set of design tutorials for non-expert HLS users.

...read moreread less

Abstract: Modern high-level synthesis (HLS) tools greatly reduce the turn-around time of designing and implementing complex FPGA-based accelerators. They also expose various optimization opportunities, which cannot be easily explored at the register-transfer level. With the increasing adoption of the HLS design methodology and continued advances of synthesis optimization, there is a growing need for realistic benchmarks to (1) facilitate comparisons between tools, (2) evaluate and stress-test new synthesis techniques, and (3) establish meaningful performance baselines to track progress of the HLS technology. While several HLS benchmark suites already exist, they are primarily comprised of small textbook-style function kernels, instead of complete and complex applications. To address this limitation, we introduce Rosetta, a realistic benchmark suite for software programmable FPGAs. Designs in Rosetta are fully-developed applications. They are associated with realistic performance constraints, and optimized with advanced features of modern HLS tools. We believe that Rosetta is not only useful for the HLS research community, but can also serve as a set of design tutorials for non-expert HLS users. In this paper we describe the characteristics of our benchmarks and the optimization techniques applied to them. We further report experimental results on an embedded FPGA device as well as a cloud FPGA platform.

...read moreread less

91 citations

Journal Article•DOI•

Recryptor: A Reconfigurable Cryptographic Cortex-M0 Processor With In-Memory and Near-Memory Computing for IoT Security

[...]

Yiqun Zhang¹, Li Xu¹, Qing Dong¹, Jingcheng Wang¹, David Blaauw¹, Dennis Sylvester¹ - Show less +2 more•Institutions (1)

University of Michigan¹

05 Feb 2018-IEEE Journal of Solid-state Circuits

TL;DR: Recryptor is a reconfigurable cryptographic processor that augments the existing memory of a commercial general-purpose processor with compute capabilities and demonstrates Recryptor’s programmability by implementing the cryptographic primitives of various public/ secret key cryptographies and hash functions.

...read moreread less

Abstract: Providing security for the Internet of Things (IoT) is increasingly important, but supporting many different cryptographic algorithms and standards within the physical constraints of IoT devices is highly challenging. Software implementations are inefficient due to the high bitwidth cryptographic operations; domain-specific accelerators are often inflexible; and reconfigurable crypto processors generally have large area and power overhead. This paper proposes Recryptor, a reconfigurable cryptographic processor that augments the existing memory of a commercial general-purpose processor with compute capabilities. It supports in-memory bitline computing using a 10-transistor bitcell to support different bitwise operations up to 512-bits wide. Custom-designed shifter, rotator, and S-box modules sit near the memory, providing high-throughput near-memory computing capabilities. We demonstrate Recryptor’s programmability by implementing the cryptographic primitives of various public/ secret key cryptographies and hash functions. Recryptor runs at 28.8 MHz in 0.7 V, achieving $6.8\times $ average speedup and $12.8\times $ average energy improvements over the state-of-the-art software- and hardware-accelerated implementations with only 0.128 mm2 area overhead in 40-nm CMOS.

...read moreread less

81 citations

Journal Article•DOI•

Survey on hardware implementation of random number generators on FPGA: Theory and experimental analyses

[...]

Mohammed Bakiri¹, Mohammed Bakiri², Christophe Guyeux¹, Jean-François Couchot¹, A. K. Oudjida² - Show less +1 more•Institutions (2)

Centre national de la recherche scientifique¹, ASM International²

01 Feb 2018-Computer Science Review

TL;DR: This research work is the first comprehensive survey on how random number generators are implemented on Field Programmable Gate Arrays (FPGAs), with a rich and up-to-date list of generators specifically mapped to FPGA.

...read moreread less

75 citations

Proceedings Article•DOI•

TGPA: tile-grained pipeline architecture for low latency CNN inference

[...]

Xuechao Wei¹, Yun Liang¹, Xiuhong Li¹, Cody Hao Yu², Peng Zhang, Jason Cong¹ - Show less +2 more•Institutions (2)

Peking University¹, University of California, Los Angeles²

05 Nov 2018

TL;DR: The Tile-Grained Pipeline Architecture (TGPA) is proposed, a heterogeneous design which supports pipelining execution of multiple tiles within a single input image on multiple heterogeneous accelerators.

...read moreread less

Abstract: FPGAs are more and more widely used as reconfigurable hardware accelerators for applications leveraging convolutional neural networks (CNNs) in recent years. Previous designs normally adopt a uniform accelerator architecture that processes all layers of a given CNN model one after another. This homogeneous design methodology usually has dynamic resource underutilization issue due to the tensor shape diversity of different layers. As a result, designs equipped with heterogeneous accelerators specific for different layers were proposed to resolve this issue. However, existing heterogeneous designs sacrifice latency for throughput by concurrent execution of multiple input images on different accelerators. In this paper, we propose an architecture named Tile-Grained Pipeline Architecture (TGPA) for low latency CNN inference. TGPA adopts a heterogeneous design which supports pipelining execution of multiple tiles within a single input image on multiple heterogeneous accelerators. The accelerators are partitioned onto different FPGA dies to guarantee high frequency. A partition strategy is designd to maximize on-chip resource utilization. Experiment results show that TGPA designs for different CNN models achieve up to 40% performance improvement than homogeneous designs, and 3X latency reduction over state-of-the-art designs.

...read moreread less

64 citations

Proceedings Article•DOI•

BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing

[...]

Yaman Umuroglu¹, Lahiru Rasnayake², Magnus Själander²•Institutions (2)

Xilinx¹, Norwegian University of Science and Technology²

01 Aug 2018

TL;DR: BISMO is presented, a vectorized bit-serial matrix multiplication overlay for reconfigurable computing that utilizes the excellent binary-operation performance of FPGAs to offer a matrix multiplication performance that scales with required precision and parallelism.

...read moreread less

Abstract: Matrix-matrix multiplication is a key computational kernel for numerous applications in science and engineering, with ample parallelism and data locality that lends itself well to high-performance implementations. Many matrix multiplication-dependent applications can use reduced-precision integer or fixed-point representations to increase their performance and energy efficiency while still offering adequate quality of results. However, precision requirements may vary between different application phases or depend on input data, rendering constant-precision solutions ineffective. We present BISMO, a vectorized bit-serial matrix multiplication overlay for reconfigurable computing. BISMO utilizes the excellent binary-operation performance of FPGAs to offer a matrix multiplication performance that scales with required precision and parallelism. We characterize the resource usage and performance of BISMO across a range of parameters to build a hardware cost model, and demonstrate a peak performance of 6.5 TOPS on the Xilinx PYNQ-Z1 board.

...read moreread less

63 citations

Proceedings Article•DOI•

Leaky Wires: Information Leakage and Covert Communication Between FPGA Long Wires

[...]

Ilias Giechaskiel¹, Kasper Bonne Rasmussen¹, Ken Eguro²•Institutions (2)

University of Oxford¹, Microsoft²

29 May 2018

TL;DR: This paper observes that a "long" routing wire carrying a logical 1 reduces the propagation delay of other adjacent but unconnected long wires in the FPGA interconnect, thereby leaking information about its state, and proposes a communication channel that can be used for both covert transmissions between circuits, and for exfiltration of secrets from the chip.

...read moreread less

Abstract: Field-Programmable Gate Arrays (FPGAs) are integrated circuits that implement reconfigurable hardware. They are used in modern systems, creating specialized, highly-optimized integrated circuits without the need to design and manufacture dedicated chips. As the capacity of FPGAs grows, it is increasingly common for designers to incorporate implementations of algorithms and protocols from a range of third-party sources. The monolithic nature of FPGAs means that all on-chip circuits, including third party black-box designs, must share common on-chip infrastructure, such as routing resources. In this paper, we observe that a "long" routing wire carrying a logical 1 reduces the propagation delay of other adjacent but unconnected long wires in the FPGA interconnect, thereby leaking information about its state. We exploit this effect and propose a communication channel that can be used for both covert transmissions between circuits, and for exfiltration of secrets from the chip. We show that the effect is measurable for both static and dynamic signals, and that it can be detected using very small on-board circuits. In our prototype, we are able to correctly infer the logical state of an adjacent long wire over 99% of the time, even without error correction, and for signals that are maintained for as little as 82us. Using a Manchester encoding scheme, our channel bandwidth is as high as 6kbps. We characterize the channel in detail and show that it is measurable even when multiple competing circuits are present and can be replicated on different generations and families of Xilinx devices (Virtex 5, Virtex 6, and Artix 7). Finally, we propose countermeasures that can be deployed by systems and tools designers to reduce the impact of this information leakage.

...read moreread less

60 citations

Journal Article•DOI•

Field-Programmable Crossbar Array (FPCA) for Reconfigurable Computing

[...]

Mohammed A. Zidan¹, YeonJoo Jeong¹, Jong Hoon Shin¹, Chao Du¹, Zhengya Zhang¹, Wei Lu¹ - Show less +2 more•Institutions (1)

University of Michigan¹

01 Oct 2018

TL;DR: In this paper, the authors proposed a memory-centric, reconfigurable, general purpose computing platform that is capable of handling the explosive amount of data in a fast and energy-efficient manner.

...read moreread less

Abstract: For decades, advances in electronics were directly driven by the scaling of CMOS transistors according to Moore's law. However, both the CMOS scaling and the classical computer architecture are approaching fundamental and practical limits, and new computing architectures based on emerging devices, such as resistive random-access memory (RRAM) devices, are expected to sustain the exponential growth of computing capability. Here, we propose a novel memory-centric, reconfigurable, general purpose computing platform that is capable of handling the explosive amount of data in a fast and energy-efficient manner. The proposed computing architecture is based on a uniform, physical, resistive, memory-centric fabric that can be optimally reconfigured and utilized to perform different computing and data storage tasks in a massively parallel approach. The system can be tailored to achieve maximal energy efficiency based on the data flow by dynamically allocating the basic computing fabric for storage, arithmetic, and analog computing including neuromorphic computing tasks.

...read moreread less

49 citations

Proceedings Article•DOI•

Area-optimized low-latency approximate multipliers for FPGA-based hardware accelerators

[...]

Salim Ullah¹, Semeen Rehman², Bharath Srinivas Prabakaran², Florian Kriebel², Muhammad Abdullah Hanif², Muhammad Shafique², Akash Kumar¹ - Show less +3 more•Institutions (2)

Dresden University of Technology¹, Vienna University of Technology²

24 Jun 2018

TL;DR: This paper presents a novel approximate multiplier architecture customized towards the FPGA-based fabrics, an efficient design methodology, and an open-source library that provides higher area, latency and energy gains along with better output accuracy than those offered by the state-of-the-art ASIC-based approximate multipliers.

...read moreread less

Abstract: The architectural differences between ASICs and FPGAs limit the effective performance gains achievable by the application of ASIC-based approximation principles for FPGA-based reconfigurable computing systems. This paper presents a novel approximate multiplier architecture customized towards the FPGA-based fabrics, an efficient design methodology, and an open-source library. Our designs provide higher area, latency and energy gains along with better output accuracy than those offered by the state-of-the-art ASIC-based approximate multipliers. Moreover, compared to the multiplier IP offered by the Xilinx Vivado, our proposed design achieves up to 30%, 53%, and 67% gains in terms of area, latency, and energy, respectively, while incurring an insignificant accuracy loss (on average, below 1% average relative error). Our library of approximate multipliers is open-source and available online at https://cfaed.tudresden.de/pd-downloads to fuel further research and development in this area, and thereby enabling a new research direction for the FPGA community.

...read moreread less

Journal Article•DOI•

Seamlessly fused digital-analogue reconfigurable computing using memristors.

[...]

Alexantrou Serb¹, Ali Khiat¹, Themistoklis Prodromakis¹•Institutions (1)

University of Southampton¹

04 Jun 2018-Nature Communications

TL;DR: This work introduces a new design paradigm where the analogue and digital worlds are seamlessly fused via memristors, enabling electronics with reconfigurability.

...read moreread less

Abstract: As the world enters the age of ubiquitous computing, the need for reconfigurable hardware operating close to the fundamental limits of energy consumption becomes increasingly pressing. Simultaneously, scaling-driven performance improvements within the framework of traditional analogue and digital design become progressively more restricted by fundamental physical constraints. Emerging nanoelectronics technologies bring forth new prospects yet a significant rethink of electronics design is required for realising their full potential. Here we lay the foundations of a design approach that fuses analogue and digital thinking by combining digital electronics with analogue memristive devices for achieving charge-based computation; information processing where every dissipated charge counts. This is realised by introducing memristive devices into standard logic gates, thus rendering them reconfigurable and capable of performing analogue computation at a power cost close to digital. The versatility and benefits of our approach are experimentally showcased through a hardware data clusterer and an analogue NAND gate.

...read moreread less

Journal Article•DOI•

An FPGA-Based Phase Measurement System

[...]

Jubin Mitra¹, T. K. Nayak¹•Institutions (1)

Variable Energy Cyclotron Centre¹

01 Jan 2018-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A design for phase measurement logic core having resolution and precision in the range of a few picoseconds is proposed, based on subsample accumulation using systematic sampling over the phase detector signal.

...read moreread less

Abstract: Phase measurement is required in electronic applications where a synchronous relationship between the signals needs to be preserved. Traditional electronic systems used for time measurement are designed using a classical mixed-signal approach. With the advent of reconfigurable hardware such as field-programmable gate arrays (FPGAs), it is more advantageous for designers to opt for all-digital architecture. Most high-speed serial transceivers of the FPGA circuitry do not ensure the same chip latency after each power cycle, reset cycle, or firmware upgrade. These cause uncertainty of phase relationship between the recovered signals. To address the need to register minute phase shift changes inside an FPGA, we propose a design for phase measurement logic core having resolution and precision in the range of a few picoseconds. The working principle is based on subsample accumulation using systematic sampling over the phase detector signal. The phase measurement logic can operate over a wide range of digital clock frequencies, ranging from a few kilohertz to the maximum frequency that is supported within the FPGA fabric. A mathematical model is developed to illustrate the operating principle of the design. The VLSI architecture is designed for the logic core. We also discussed the procedure of the phase measurement system, the calibration sequence involved, followed by the performance of the design in terms of accuracy, precision, and resolution.

...read moreread less

Proceedings Article•DOI•

Application of Multi Layer (Perceptron) Artificial Neural Network in the Diagnosis System: A Systematic Review

[...]

Arti Rana¹, Arvind Singh Rawat², Anchit Bijalwan², Himanshu Bahuguna•Institutions (2)

Uttarakhand Technical University¹, Uttaranchal University²

01 Aug 2018-Rice

TL;DR: This work converses the reviews on various research articles of neural networks whose concerns focused in execution of more than one input neuron and multilayer with or without linearity property by using FPGA, and involves a Multi Layer Perceptron with a Back Propagation learning algorithm to identify a prototype for the diagnosis.

...read moreread less

Abstract: Basic hardware comprehension of an artificial neural network (ANN), to a major scale depends on the proficient realization of a distinct neuron. For hardware execution of NNs, mostly FPGA-designed reconfigurable computing systems are favorable. FPGA comprehension of ANNs through a huge amount of neurons is mainly an exigent assignment. This work converses the reviews on various research articles of neural networks whose concerns focused in execution of more than one input neuron and multilayer with or without linearity property by using FPGA. An execution technique through reserve substitution is projected to adjust signed decimal facts. A detailed review of many research papers have been done for the proposed work. The proposed paper involves a Multi Layer Perceptron with a Back Propagation learning algorithm to identify a prototype for the diagnosis. In this paper, a brief introduction about artificial neural network used nowadays for diagnosis of disease is given.

...read moreread less

Journal Article•DOI•

Towards an FPGA-Accelerated programmable data path for edge-to-core communications in 5G networks

[...]

Ruben Ricart-Sanchez¹, Pedro Malagón², Pablo Salva-Garcia¹, Enrique Chirivella Perez¹, Qi Wang¹, Jose M. Alcaraz Calero¹ - Show less +2 more•Institutions (2)

University of the West of Scotland¹, Technical University of Madrid²

15 Dec 2018-Journal of Network and Computer Applications

TL;DR: This paper focuses on improving the performance of the data plane from the edge to the core network segment (backhaul) in a 5G multi-tenant network by leveraging and exploring the programmability introduced by software-based networking.

...read moreread less

Proceedings Article•DOI•

P4 to SDNet: Automatic Generation of an Efficient Protocol-Independent Packet Parser on Reconfigurable Hardware

[...]

Abbas Yazdinejad¹, Ali Bohlooli¹, Kamal Jamshidi¹•Institutions (1)

University of Isfahan¹

01 Oct 2018

TL;DR: In this paper, a high-level P4 language is used to implement a packet parser on a reconfigurable hardware (i.e., FPGA), which is then compiled to firmware by Xilinx SDNet.

...read moreread less

Abstract: Nowadays network managers look for ways to change the design and management of networks that can make decisions on the control plane. Future switches should be able to support the new features and flexibility required for parsing and processing packets. One of the critical components of switches is the packet parser that processes the headers of the packets to be able to decide on the incoming packets. Here the data plane, and particularly packet parser in OpenFlow switches, which should have the flexibility and programmability to support the new requirements and OpenFlow multiple versions, are focused. Designed here is an architecture that unlike the static network equipments, it has the flexibility and programmability in the data plane network, especially the SDN network, and supports the parsing and processing of specific packets. To describe this architecture, a high-level P4 language is used to implement it on a reconfigurable hardware (i.e., FPGA). After automatic generating the protocol-independent Packet parser architecture on the Virtex-7, it is compiled to firmware by Xilinx SDNet, and ultimately an FPGA Platform is implemented. It has fewer consumption resources and it is more efficient in terms of throughput and processing speed in comparison with other architectures.

...read moreread less

Journal Article•DOI•

All-spin logic operations: Memory device and reconfigurable computing

[...]

Moumita Patra¹, Santanu K. Maiti¹•Institutions (1)

Indian Statistical Institute¹

01 Feb 2018-EPL

TL;DR: In this article, a quantum interferometer is used as a programmable spin logic device (PSLD) to characterize spin-based logical operations using spin degree of freedom of electron.

...read moreread less

Abstract: Exploiting spin degree of freedom of electron a new proposal is given to characterize spin-based logical operations using a quantum interferometer that can be utilized as a programmable spin logic device (PSLD). The ON and OFF states of both inputs and outputs are described by spin state only, circumventing spin-to-charge conversion at every stage as often used in conventional devices with the inclusion of extra hardware that can eventually diminish the efficiency. All possible logic functions can be engineered from a single device without redesigning the circuit which certainly offers the opportunities of designing new generation spintronic devices. Moreover, we also discuss the utilization of the present model as a memory device and suitable computing operations with proposed experimental setups.

...read moreread less

Proceedings Article•DOI•

An architectural framework for accelerating dynamic parallel algorithms on reconfigurable hardware

[...]

Tao Chen¹, Shreesha Srinath¹, Christopher Batten¹, G. Edward Suh¹•Institutions (1)

Cornell University¹

20 Oct 2018

TL;DR: The framework introduces a task-based computation model with explicit continuation passing to support dynamic parallelism in addition to static parallelism and introduces a design methodology that includes an architectural template that allows easily creating parallel accelerators from high-level descriptions.

...read moreread less

Abstract: In this paper, we propose ParallelXL, an architectural framework for building application-specific parallel accelerators with low manual effort. The framework introduces a task-based computation model with explicit continuation passing to support dynamic parallelism in addition to static parallelism. In contrast, today's high-level design frameworks for accelerators focus on static data-level or thread-level parallelism that can be identified and scheduled at design time. To realize the new computation model, we develop an accelerator architecture that efficiently handles dynamic task generation and scheduling as well as load balancing through work stealing. The architecture is general enough to support many dynamic parallel constructs such as fork-join, data-dependent task spawning, and arbitrary nesting and recursion of tasks, as well as static parallel patterns. We also introduce a design methodology that includes an architectural template that allows easily creating parallel accelerators from high-level descriptions. The proposed framework is studied through an FPGA prototype as well as detailed simulations. Evaluation results show that the framework can generate high-performance accelerators targeting FPGAs for a wide range of parallel algorithms and achieve an average of 4.0x speedup over an eight-core out-of-order processor (24.1x over a single core), while being 11.8x more energy efficient.

...read moreread less

Journal Article•DOI•

A Reconfigurable and Scalable FPGA Architecture for Bilateral Filtering

[...]

Swapnil D. Dabhade¹, G. N. Rathna¹, Kunal N. Chaudhury¹•Institutions (1)

Indian Institute of Science¹

01 Feb 2018-IEEE Transactions on Industrial Electronics

TL;DR: To the best of the knowledge, this is the first scalable FPGA implementation of the bilateral filter that requires just

$O(1)$ operations for any arbitrary operations and is both scalable and reconfigurable.

...read moreread less

Abstract: Bilateral filter is an edge-preserving smoother that has applications in image processing, computer vision, and computational photography. In the past, field-programmable gate array (FPGA) implementations of the filter have been proposed that can achieve high throughput using parallelization and pipelining. An inherent limitation with direct implementations is that their complexity scales as $O(\omega ^2)$ with the filter width $\omega$ . In this paper, we propose an FPGA implementation of a fast bilateral filter that requires just $O(1)$ operations for any arbitrary $\omega$ . The attractive feature of the FPGA implementation is that it is both scalable and reconfigurable. To the best of our knowledge, this is the first scalable FPGA implementation of the bilateral filter. As an application, we use the FPGA implementation for image denoising.

...read moreread less

Journal Article•DOI•

Implementation of Deep Learning-Based Automatic Modulation Classifier on FPGA SDR Platform

[...]

Zhi-Ling Tang, Si-Min Li, Li-Juan Yu

19 Jul 2018-Electronics

TL;DR: A master–slave AMR architecture using the reconfigurability of field-programmable gate arrays (FPGAs) and the constraint conditions of AMR in FPGA are proposed from the aspects of computing optimization and memory access optimization.

...read moreread less

Abstract: Intelligent radios collect information by sensing signals within the radio spectrum, and the automatic modulation recognition (AMR) of signals is one of their most challenging tasks. Although the result of a modulation classification based on a deep neural network is better, the training of the neural network requires complicated calculations and expensive hardware. Therefore, in this paper, we propose a master–slave AMR architecture using the reconfigurability of field-programmable gate arrays (FPGAs). First, we discuss the method of building AMR, by using a stack convolution autoencoder (CAE), and analyze the principles of training and classification. Then, on the basis of the radiofrequency network-on-chip architecture, the constraint conditions of AMR in FPGA are proposed from the aspects of computing optimization and memory access optimization. The experimental results not only demonstrated that AMR-based CAEs worked correctly, but also showed that AMR based on neural networks could be implemented on FPGAs, with the potential for dynamic spectrum allocation and cognitive radio systems.

...read moreread less

Journal Article•DOI•

Multiagent Architecture for Distributed Adaptive Scheduling of Reconfigurable Real-Time Tasks With Energy Harvesting Constraints

[...]

Wiem Housseyni¹, Olfa Mosbahi¹, Mohamed Khalgui¹, Zhiwu Li², Li Yin² - Show less +1 more•Institutions (2)

Jinan University¹, Macau University of Science and Technology²

01 Mar 2018-IEEE Access

TL;DR: This paper presents new challenges for the real-time scheduling of distributed reconfigurable embedded systems powered by a renewable energy and shows the effectiveness of the proposed intelligent multiagent distributed architecture in terms of the number of exchanged messages, deadline success ratio, and the energy consumption.

...read moreread less

Abstract: This paper presents new challenges for the real-time scheduling of distributed reconfigurable embedded systems powered by a renewable energy. Reconfigurable computing systems have to deal with unpredictable events from the environment, such as activation of new tasks and hardware or software failures, by adapting the task allocation and scheduling in order to maintain the system feasibility and performance. The proposed approach is based on an intelligent multiagent distributed architecture composed of: 1) a global agent “coordinator” associated with the whole distributed system and 2) four local agents, such as supervisor, scheduler, battery manager, and reconfiguration manager, belong to each subsystem. The efficiency and completeness of the reconfiguration adaptative strategy is proved as all possible reconfiguration forms are considered to guarantee a feasible system with a graceful quality of service. Two communication protocols, such as an intra-subsystem communication protocol and an inter-subsystem communication protocol, are proposed to ensure the effectiveness of the proposed reconfiguration strategy. Extensive simulations show the effectiveness of the proposed intelligent multiagent distributed architecture in terms of the number of exchanged messages, deadline success ratio, and the energy consumption.

...read moreread less

Journal Article•DOI•

A Hardware Architecture for Radial Basis Function Neural Network Classifier

[...]

Mahnaz Mohammadi¹, Akhil Krishna¹, S. Nalesh¹, S. K. Nandy¹•Institutions (1)

Indian Institute of Science¹

01 Mar 2018-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A flexible and scalable hardware accelerator for realization of classification using RBFNN, which puts no limitation on the dimension of the input data is developed and comparison of results shows that scalability of the hardware architecture makes it favorable solution for classification of very large data sets.

...read moreread less

Abstract: In this paper we present design and analysis of scalable hardware architectures for training learning parameters of RBFNN to classify large data sets. We design scalable hardware architectures for K-means clustering algorithm to training the position of hidden nodes at hidden layer of RBFNN and pseudoinverse algorithm for weight adjustments at output layer. These scalable parallel pipelined architectures are capable of implementing data sets with no restriction on their dimensions. This paper also presents a flexible and scalable hardware accelerator for realization of classification using RBFNN, which puts no limitation on the dimension of the input data is developed. We report FPGA synthesis results of our implementations. We compare results of our hardware accelerator with CPU, GPU and implementations of the same algorithms and with other existing algorithms. Analysis of these results show that scalability of our hardware architecture makes it favorable solution for classification of very large data sets.

...read moreread less

Journal Article•DOI•

Application of Multi Layer Artificial Neural Network in the Diagnosis System: A Systematic Review

[...]

Arvind Singh Rawat¹, Arti Rana¹, Adesh Kumar, Ashish Bagwari¹•Institutions (1)

Uttarakhand Technical University¹

06 Aug 2018-IAES International Journal of Artificial Intelligence

TL;DR: This work discusses the reviews on various research articles of neural networks whose concerns are focused in execution of more than one input neuron and multilayer with or without linearity property by using FPGA, and an execution technique through reserve substitution is projecteded to adjust signed decimal facts.

...read moreread less

Abstract: Basic hardware comprehension of an artificial neural network (ANN), to a major scale depends on the proficientrealization of a distinctneuron. For hardware execution of NNs, mostly FPGA-designed reconfigurable computing systems are favorable .FPGA comprehension of ANNs through a hugeamount of neurons is mainlyan exigentassignment. This workconverses the reviews on various research articles of neural networks whose concernsfocused in execution of more than one input neuron and multilayer with or without linearity property by using FPGA. An execution technique through reserve substitution isprojected to adjust signed decimal facts. A detailed review of many research papers have been done for the proposed work.

...read moreread less

Journal Article•DOI•

FASTEN: An FPGA-Based Secure System for Big Data Processing

[...]

Boeui Hong¹, Han-Yee Kim¹, Minsu Kim¹, Taeweon Suh¹, Lei Xu², Weidong Shi² - Show less +2 more•Institutions (2)

Korea University¹, University of Houston²

01 Feb 2018-IEEE Design & Test of Computers

TL;DR: This paper proposes a reliable yet efficient FPGA-based security system via crypto engines and Physical Unclonable Functions (PUFs) for big data applications for cloud computing.

...read moreread less

Abstract: Editor’s note: In cloud computing framework, the data security and protection is one of the most important aspects for optimization and concrete implementation. This paper proposes a reliable yet efficient FPGA-based security system via crypto engines and Physical Unclonable Functions (PUFs) for big data applications. Considering that FPGA or GPU-based accelerators are popular in data centers, we believe the proposed approach is very practical and effective method for data security in cloud computing. —Gi-Joon Nam, IBM Research

...read moreread less

Journal Article•DOI•

A novel FPGA-based architecture for the estimation of the virtual dimensionality in remotely sensed hyperspectral images

[...]

Carlos Villaseca González¹, Sebastian Lopez², Daniel Mozos¹, Roberto Sarmiento²•Institutions (2)

Complutense University of Madrid¹, University of Las Palmas de Gran Canaria²

01 Aug 2018-Journal of Real-time Image Processing

TL;DR: Experimental results demonstrate that the hardware version of the HFC-VD algorithm can significantly outperform an equivalent software version, which makes the reconfigurable system appealing for onboard hyperspectral data processing.

...read moreread less

Abstract: A challenging problem in spectral unmixing is how to determine the number of endmembers in a given scene. One of the most popular ways to determine the number of endmembers is by estimating the virtual dimensionality (VD) of the hyperspectral image using the well-known Harsanyi–Farrand–Chang (HFC) method. Due to the complexity and high dimensionality of hyperspectral scenes, this task is computationally expensive. Reconfigurable field-programmable gate arrays (FPGAs) are promising platforms that allow hardware/software codesign and the potential to provide powerful onboard computing capabilities and flexibility at the same time. In this paper, we present the first FPGA design for the HFC-VD algorithm. The proposed method has been implemented on a Virtex-7 XC7VX690T FPGA and tested using real hyperspectral data collected by NASA’s Airborne Visible Infra-Red Imaging Spectrometer over the Cuprite mining district in Nevada and the World Trade Center in New York. Experimental results demonstrate that our hardware version of the HFC-VD algorithm can significantly outperform an equivalent software version, which makes our reconfigurable system appealing for onboard hyperspectral data processing. Most important, our implementation exhibits real-time performance with regard to the time that the hyperspectral instrument takes to collect the image data.

...read moreread less

Proceedings Article•DOI•

In-network computing to the rescue of faulty links

[...]

Hans Giesen¹, Lei Shi¹, John Sonchack¹, Anirudh Chelluri¹, Nishanth Prabhu¹, Nik Sultana¹, Latha Kant, Anthony McAuley, Alexander Poylisher, André DeHon¹, Boon Thau Loo¹ - Show less +7 more•Institutions (1)

University of Pennsylvania¹

07 Aug 2018

TL;DR: A new network function is described that relies on in-network computing to limit the erratic effect of failing network links, to enable the continued use of those links until they can be repaired.

...read moreread less

Abstract: Failing network links are usually disabled, and packets are routed around them until the links are repaired. While it is often possible to utilize some of a failing link's capacity, losing what remains of a link's capacity is typically deemed preferable to the erratic effect that unreliable links can have on application-level behavior.We describe a new network function that relies on in-network computing to limit the erratic effect of failing network links, to enable the continued use of those links until they can be repaired. We explore the design space using ns-3, and evaluate our implementation on a physical test-bed that includes programmable switches and reconfigurable hardware. Our current hardware prototype can almost saturate a 10GbE link while using around 10% of our FPGA's resources.

...read moreread less

Journal Article•DOI•

All-spin logic operations: Memory device and Reconfigurable computing

[...]

Moumita Patra¹, Santanu K. Maiti¹•Institutions (1)

Indian Statistical Institute¹

18 Sep 2018-arXiv: Mesoscale and Nanoscale Physics

TL;DR: In this paper, a quantum interferometer was used as a programmable spin logic device (PSLD) to characterize spin-based logical operations using a quantum Interferometer that can be utilized as a memory device.

...read moreread less

Abstract: Exploiting spin degree of freedom of electron a new proposal is given to characterize spin-based logical operations using a quantum interferometer that can be utilized as a programmable spin logic device (PSLD). The ON and OFF states of both inputs and outputs are described by {\em spin} state only, circumventing spin-to-charge conversion at every stage as often used in conventional devices with the inclusion of extra hardware that can eventually diminish the efficiency. All possible logic functions can be engineered from a single device without redesigning the circuit which certainly offers the opportunities of designing new generation spintronic devices. Moreover we also discuss the utilization of the present model as a memory device and suitable computing operations with proposed experimental setups.

...read moreread less

Posted Content•

BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing

[...]

Yaman Umuroglu¹, Lahiru Rasnayake², Magnus Själander²•Institutions (2)

Xilinx¹, Norwegian University of Science and Technology²

22 Jun 2018-arXiv: Hardware Architecture

TL;DR: BISMO as discussed by the authors is a vectorized bit-serial matrix multiplication overlay for reconfigurable computing, which utilizes the excellent binary operation performance of FPGAs to offer a matrix multiplication performance that scales with required precision and parallelism.

...read moreread less

Proceedings Article•DOI•

FPGA Virtualization in Cloud-Based Infrastructures Over Virtio

[...]

Joel Mandebi Mbongue¹, Festus Hategekimana¹, Danielle Tchuinkou Kwadjo¹, Christophe Bobda¹•Institutions (1)

University of Arkansas¹

01 Oct 2018

TL;DR: A novel framework for virtualizing FPGA resources in the cloud that prevents the overhead of context switches between the virtual machine and host address spaces by using the in-kernel network stack for transferring packets to FPGAs.

...read moreread less

Abstract: In this paper, we introduce a novel framework for virtualizing FPGA resources in the cloud. The proposed framework targets hardware/software architectures that leverage the Virtio paradigm for efficient communication between virtual machines (VMs) and the FPGAs. Furthermore, we present an FPGA overlay that uses reconfigurable hardware tiles and a flexible network-on-chip (NoC) architecture for transparent and optimized allocation of FPGA resources to VMs. The proposed overlay makes it possible to merge several FPGA regions allocated to a VM into a larger area, thus allowing resizing of FPGA's resources on demand. Hardware sandboxes are then provided as a means to enforce domain separation between hardware tasks belonging to different VMs. The framework introduced prevents the overhead of context switches between the virtual machine and host address spaces by using the in-kernel network stack for transferring packets to FPGAs. Experimental results show a 2x to 35x performance increase compared to current state of the art virtualization approaches.

...read moreread less

Journal Article•DOI•

An FPGA-Oriented Baseband Modulator Architecture for 4G/5G Communication Scenarios

[...]

Mario Lopes Ferreira, João Canas Ferreira

20 Dec 2018-Electronics

TL;DR: An FPGA-oriented baseband processing architecture suitable for communication scenarios such as non-contiguous carrier aggregation, centralized Cloud Radio Access Network (C-RAN) processing, and 4G/5G waveform coexistence is proposed and evaluated.

...read moreread less

Abstract: The next evolution in cellular communications will not only improve upon the performance of previous generations, but also represent an unparalleled expansion in the number of services and use cases. One of the foundations for this evolution is the design of highly flexible, versatile, and resource-/power-efficient hardware components. This paper proposes and evaluates an FPGA-oriented baseband processing architecture suitable for communication scenarios such as non-contiguous carrier aggregation, centralized Cloud Radio Access Network (C-RAN) processing, and 4G/5G waveform coexistence. Our system is upgradeable, resource-efficient, cost-effective, and provides support for three 5G waveform candidates. Exploring Dynamic Partial Reconfiguration (DPR), the proposed architecture expands the design space exploration beyond the available hardware resources on the Zynq xc7z020 through hardware virtualization. Additionally, Dynamic Frequency Scaling (DFS) allows for run-time adjustment of processing throughput and reduces power consumption up to 88%. The resource overhead for DPR and DFS is residual, and the reconfiguration latency is two orders of magnitude below the control plane latency requirements proposed for 5G communications.

...read moreread less

Collapse