scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Efficient Use of On-Chip Memories and Scheduling Techniques to Eliminate the Reconfiguration Overheads in Reconfigurable Systems

I. Hariharan1, M. Kannan1
15 Mar 2019-Journal of Circuits, Systems, and Computers (World Scientific Publishing Company)-Vol. 28, Iss: 14, pp 1950246
TL;DR: The proposed methodology mainly focuses on the prefetch heuristic, reuse technique, and the available memory hierarchy to provide an efficient mapping of tasks over the available memories to reduce reconfiguration overheads for static systems in their subsequent iterations.
Abstract: Modern embedded systems are packed with dedicated Field Programmable Gate Arrays (FPGAs) to accelerate the overall system performance. However, the FPGAs are susceptible to reconfiguration overhead...
Citations
More filters
Journal ArticleDOI
TL;DR: An Application Specific Inflexible FPGA (ASIF) is a tailored design, for a given group of known circuits, which is generated by extensively reducing the routing resources of an FPGAs.
Abstract: An Application Specific Inflexible FPGA (ASIF) is a tailored design, for a given group of known circuits, which is generated by extensively reducing the routing resources of an FPGA. In an ASIF, di...

3 citations

References
More filters
Journal ArticleDOI
TL;DR: Experimental measurements of the differences between a 90- nm CMOS field programmable gate array (FPGA) and 90-nm CMOS standard-cell application-specific integrated circuits (ASICs) in terms of logic density, circuit speed, and power consumption for core logic are presented.
Abstract: This paper presents experimental measurements of the differences between a 90-nm CMOS field programmable gate array (FPGA) and 90-nm CMOS standard-cell application-specific integrated circuits (ASICs) in terms of logic density, circuit speed, and power consumption for core logic. We are motivated to make these measurements to enable system designers to make better informed choices between these two media and to give insight to FPGA makers on the deficiencies to attack and, thereby, improve FPGAs. We describe the methodology by which the measurements were obtained and show that, for circuits containing only look-up table-based logic and flip-flops, the ratio of silicon area required to implement them in FPGAs and ASICs is on average 35. Modern FPGAs also contain "hard" blocks such as multiplier/accumulators and block memories. We find that these blocks reduce this average area gap significantly to as little as 18 for our benchmarks, and we estimate that extensive use of these hard blocks could potentially lower the gap to below five. The ratio of critical-path delay, from FPGA to ASIC, is roughly three to four with less influence from block memory and hard multipliers. The dynamic power consumption ratio is approximately 14 times and, with hard blocks, this gap generally becomes smaller

1,078 citations

Journal ArticleDOI
Stephen W. Keckler1, William J. Dally1, Brucek Khailany1, Michael Garland1, D. Glasco1 
TL;DR: The capabilities of state-of-the art GPU-based high-throughput computing systems are discussed and the challenges to scaling single-chip parallel-computing systems are considered, highlighting high-impact areas that the computing research community can address.
Abstract: This article discusses the capabilities of state-of-the art GPU-based high-throughput computing systems and considers the challenges to scaling single-chip parallel-computing systems, highlighting high-impact areas that the computing research community can address. Nvidia Research is investigating an architecture for a heterogeneous high-performance computing system that seeks to address these challenges.

626 citations

Journal ArticleDOI
01 May 2001
TL;DR: A survey of academic research and commercial development in reconfigurable computing for DSP systems over the past fifteen years is presented in this article, with a focus on the application domain of digital signal processing.
Abstract: Steady advances in VLSI technology and design tools have extensively expanded the application domain of digital signal processing over the past decade. While application-specific integrated circuits (ASICs) and programmable digital signal processors (PDSPs) remain the implementation mechanisms of choice for many DSP applications, increasingly new system implementations based on reconfigurable computing are being considered. These flexible platforms, which offer the functional efficiency of hardware and the programmability of software, are quickly maturing as the logic capacity of programmable devices follows Moore's Law and advanced automated design techniques become available. As initial reconfigurable technologies have emerged, new academic and commercial efforts have been initiated to support power optimization, cost reduction, and enhanced run-time performance. This paper presents a survey of academic research and commercial development in reconfigurable computing for DSP systems over the past fifteen years. This work is placed in the context of other available DSP implementation media including ASICs and PDSPs to fully document the range of design choices available to system engineers. It is shown that while contemporary reconfigurable computing can be applied to a variety of DSP applications including video, audio, speech, and control, much work remains to realize its full potential. While individual implementations of PDSP, ASIC, and reconfigurable resources each offer distinct advantages, it is likely that integrated combinations of these technologies will provide more complete solutions.

390 citations

Proceedings ArticleDOI
01 Feb 1999
TL;DR: A reconfigurable architecture optimised for media processing, and based on 4-bit ALUs and interconnect, is described.
Abstract: In this paper we describe a reconfigurable architecture optimised for media processing, and based on 4-bit ALUs and interconnect.

224 citations

Proceedings ArticleDOI
29 Sep 2009
TL;DR: This paper proposes to use Direct Memory Access (DMA), Master (MST) burst, and a dedicated Block RAM (BRAM) cache respectively to reduce the reconfiguration time by one order of magnitude.
Abstract: Run-time Partial Reconfiguration (PR) speed is significant in applications especially when fast IP core switching is required. In this paper, we propose to use Direct Memory Access (DMA), Master (MST) burst, and a dedicated Block RAM (BRAM) cache respectively to reduce the reconfiguration time. Based on the Xilinx PR technology and the Internal Configuration Access Port (ICAP) primitive in the FPGA fabric, we discuss multiple design architectures and thoroughly investigate their performance with measurements for different partial bitstream sizes. Compared to the reference OPB HWICAP and XPS HWICAP designs, experimental results showthatDMA HWICAP and MST HWICAP reduce the reconfiguration time by one order of magnitude, with little resource consumption overhead. The BRAM HWICAP design can even approach the reconfiguration speed limit of the ICAP primitive at the cost of large Block RAM utilization.

171 citations