Using C-to-gates to program streaming image processing kernels efficiently on FPGAs

doi:10.1109/FPL.2009.5272373

Home
/
Papers
/
Using C-to-gates to program streaming image processing kernels efficiently on FPGAs

Proceedings Article•DOI•

Using C-to-gates to program streaming image processing kernels efficiently on FPGAs

Kristof Denolf¹, Stephen Neuendorffer, Kees Vissers•Institutions (1)

Katholieke Universiteit Leuven¹

29 Sep 2009-pp 626-630

TL;DR: This paper applies algorithmic C-to-FPGA synthesis technology in a structured design approach and demonstrates its added value on two relevant vision processing kernels: optical flow and debayering.

read less

Abstract: Effectively exploiting the variety of computational and storage resources available in common FPGA architectures for complex applications, such as the real-time implementation of vision algorithms, is often difficult in standard HDL design methodologies. Higher-level design tools can enable a design to more quickly explore a range of different architectures. In this paper we apply algorithmic C-to-FPGA synthesis technology in a structured design approach and demonstrate its added value on two relevant vision processing kernels: optical flow and debayering. The impact of the proposed approach on the design time, the FPGA resource consumption and the throughput is measured.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

High-Level Synthesis for FPGAs: From Prototyping to Deployment

[...]

Jason Cong, Bin Liu, Stephen Neuendorffer¹, Juanjo Noguera¹, Kees Vissers¹, Zhiru Zhang - Show less +2 more•Institutions (1)

Xilinx¹

01 Apr 2011-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx are used as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains.

...read moreread less

Abstract: Escalating system-on-chip design complexity is pushing the design community to raise the level of abstraction beyond register transfer level. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS msystem-on-chip design complexityethodology is happening now, especially for field-programmable gate array (FPGA) designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper, we use AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. In particular, the experiment on a sphere decoder shows that the HLS solution can achieve an 11-31% reduction in FPGA resource usage with improved design productivity compared to hand-coded design.

...read moreread less

728 citations

Cites methods from "Using C-to-gates to program streami..."

...The path-based scheduling algorithm in the Yorktown Silicon Compiler is useful to optimize performance with conditional branches [12]....
[...]
...Industry efforts include Cathedral and its successors [27], Yorktown Silicon Compiler [11], and BSSC [93], among many others....
[...]

Proceedings Article•DOI•

Implementing high-performance, low-power FPGA-based optical flow accelerators in C

[...]

Josh Monson¹, Michael Wirthlin¹, Brad Hutchings¹•Institutions (1)

Brigham Young University¹

05 Jun 2013

TL;DR: C code for a complex optical-flow algorithm is optimized for both a desktop PC and for an FPGA-based system, the Xilinx Zynq-7000, a device containing both a programmable fabric and two ARM cores.

...read moreread less

Abstract: Recent developments in High-Level Synthesis (HLS) for FPGAs are making it possible to “run” C code on FPGAs thereby making modern programming environments available to FPGA developers. In this paper, C code for a complex optical-flow algorithm is optimized for both a desktop PC and for an FPGA-based system, the Xilinx Zynq-7000, a device containing both a programmable fabric and two ARM cores. The paper discusses how the code is optimized and restructured to execute effectively on the programmable fabric and the ARM cores. The resulting Zynq version of the C code is competitive with the desktop PC but only consumes 1/7th as much energy.

...read moreread less

27 citations

Cites methods from "Using C-to-gates to program streami..."

...The Synfora PICO Extreme HLS tool was used to demonstrate the productivity of HLS for two machine vision algorithms including optical flow [9]....
[...]

Proceedings Article•DOI•

OpenSoC Fabric: On-chip network generator

[...]

Farzad Fatollahi-Fard¹, David Donofrio¹, George Michelogiannakis¹, John Shalf¹•Institutions (1)

Lawrence Berkeley National Laboratory¹

17 Apr 2016

TL;DR: The internal architecture of OpenSoC Fabric is described and its powerful list of configuration parameters are described and compared against pre-validated state-of-the-art simulators using both the generated C++ and Verilog models using FPGAs.

...read moreread less

Abstract: As technology scaling continues, on-chip networks are expected to remain important in future many-core chips due to the increased parallelism and, therefore, communication. However, designing and evaluating large-scale on-chip networks is a nontrivial task given the poor scalability of software simulation for thousands of cores and the intense development effort to develop hardware RTL. In this paper, we describe OpenSoC Fabric. OpenSoC Fabric is a comprehensive on-chip network generator written in Chisel. Chisel generates both C++ and Verilog models from a single code base and has a development effort comparable to functional programming. We describe the internal architecture of OpenSoC Fabric and its powerful list of configuration parameters. We then compare OpenSoC Fabric against pre-validated state-of-the-art simulators using both the generated C++ and Verilog models using FPGAs.

...read moreread less

24 citations

Cites background from "Using C-to-gates to program streami..."

...For instance, C to Gates [27] and SystemC [28] are well-known alternatives....
[...]

Proceedings Article•DOI•

High level synthesis using Vivado HLS for Zynq SoC: Image processing case studies

[...]

A. Cortes¹, Igone Velez¹, A. Irizar¹•Institutions (1)

University of Navarra¹

01 Nov 2016

TL;DR: In this paper, image processing algorithms designed in Zynq SoC using the Vivado HLS tool are presented and compared with hand-coded designs, showing the reduction in time-to-market and the use of image processing libraries similar to OpenCV helps to reduce development time.

...read moreread less

Abstract: In this paper, image processing algorithms designed in Zynq SoC using the Vivado HLS tool are presented and compared with hand-coded designs. In Vivado HLS, the designer has the opportunity to employ libraries similar to OpenCV, a library that is well-known and wide used by software designers. The algorithms are compared in terms of area resources in two conditions: using the libraries and not using the libraries. The case studies are Data Binning, a Step Row Filter and a Sobel Filter. These algorithms have been selected because they are very common in the field of image processing and they have high computational complexity. The main benefit of the Vivado HLS tool is the reduction in time-to-market. On the other hand, when a software designer hand-codes the design, the use of image processing libraries similar to OpenCV helps to reduce development time even further because software designers are familiar with them. However, using these kinds of libraries significantly increases the necessary FPGA resources.

...read moreread less

23 citations

Cites background from "Using C-to-gates to program streami..."

...Moreover, we can find examples in different fields of application such as 3G/4G wireless systems [11], aerospace applications [12] and image processing [13] in real time environments....
[...]

Proceedings Article•DOI•

High level synthesis: Where are we? A case study on matrix multiplication

[...]

Sam Skalicky¹, Christopher A. Wood¹, Marcin Lukowiak¹, Matthew Ryan¹•Institutions (1)

Rochester Institute of Technology¹

01 Dec 2013

TL;DR: This paper investigates matrix multiplication using a standard algorithm, Strassen algorithm, and a sparse algorithm to provide a comprehensive analysis of the capabilities and usability of the Xilinx Vivado HLS tool.

...read moreread less

Abstract: One of the pitfalls of FPGA design is the relatively long implementation time when compared to alternative architectures, such as CPU, GPU or DSP. This time can be greatly reduced however by using tools that can generate hardware systems in the form of a hardware description language (HDL) from high-level languages such as C, C++, or Python. Such implementations can be optimized by applying special directives that focus the high-level synthesis (HLS) effort on particular objectives, such as performance, area, throughput, or power consumption. In this paper we examine the benefits of this approach by comparing the performance and design times of HLS generated systems versus custom systems for matrix multiplication. We investigate matrix multiplication using a standard algorithm, Strassen algorithm, and a sparse algorithm to provide a comprehensive analysis of the capabilities and usability of the Xilinx Vivado HLS tool. In our experience, a hardware-oriented electrical engineering student can achieve up to 61% of the performance of custom designs with 1/3 the effort, thus enabling faster hardware acceleration of many compute-bound algorithms.

...read moreread less

20 citations

Cites methods from "Using C-to-gates to program streami..."

...utilized the Synfora PICO Extreme HLS tool to investigate the usefulness of HLS in lowering design time and increasing performance for vision processing kernels [7]....
[...]

1
2
3
4
…

References

PDF

Open Access

More filters

Journal Article•DOI•

Demosaicking: color filter array interpolation

[...]

Bahadir K. Gunturk¹, J. Glotzbach, Yucel Altunbasak², R.W. Schafer³, Russell M. Mersereau - Show less +1 more•Institutions (3)

Louisiana State University¹, Georgia Institute of Technology², Hewlett-Packard³

21 Mar 2005-IEEE Signal Processing Magazine

TL;DR: The author begins by discussing the image formation process and examines the demosaicking methods in three groups: the first group consists of heuristic approaches, the second group formulates demosaicked as a restoration problem, and the third group is a generalization that uses the spectral filtering model given in Wandell.

...read moreread less

Abstract: The author begins by discussing the image formation process. The demosaicking methods are examined in three groups: the first group consists of heuristic approaches. The second group formulates demosaicking as a restoration problem. The third group is a generalization that uses the spectral filtering model given in Wandell.

...read moreread less

616 citations

Book•DOI•

Handbook of Mathematical Models in Computer Vision

[...]

Nikos Paragios, Yunmei Chen, Olivier Faugeras

01 Nov 2005

TL;DR: This comprehensive volume provides a detailed discourse on the mathematical models used in computational vision from leading educators and active research experts in this field and serves as a complete reference work for professionals.

...read moreread less

Abstract: This comprehensive volume is an essential reference tool for professional and academic researchers in the filed of computer vision, image processing, and applied mathematics. Continuing rapid advances in image processing have been enhanced by the theoretical efforts of mathematicians and engineers. This marriage of mathematics and computer vision - computational vision - has resulted in a discrete approach to image processing that is more reliable when leveraging in practical tasks. This comprehensive volume provides a detailed discourse on the mathematical models used in computational vision from leading educators and active research experts in this field. Topical areas include: image reconstruction, segmentation and object extraction, shape modeling and registration, motion analysis and tracking, and 3D from images, geometry and reconstruction. The book also includes a study of applications in medical image analysis. Handbook of Mathematical Models in Computer Vision provides a graduate-level treatment of this subject as well as serving as a complete reference work for professionals.

...read moreread less

601 citations

"Using C-to-gates to program streami..." refers background in this paper

...This estimation remains one of the key problems in computer vision [5]....
[...]

Book Chapter•DOI•

A Methodology to Design Programmable Embedded Systems

[...]

Bart Kienhuis, Ed F. Deprettere, Pieter van der Wolf¹, Kees Vissers²•Institutions (2)

Philips¹, University of California, Berkeley²

16 Jul 2001

TL;DR: In this article, the Y-chart approach is used to design a set of programmable architectures, where the same resources can be reused for another application by reprogramming the system.

...read moreread less

Abstract: Embedded systems architectures are increasingly becoming programmable, which means that an architecture can execute a set of applications instead of only one. This makes these systems cost-effective, as the same resources can be reused for another application by reprogramming the system. To design these programmable architectures, we present in this article a number of concepts of which one is the Y-chart approach. These concepts allow designers to perform a systematic exploration of the design space of architectures. Since this design space may be huge, it is narrowed down in a number of steps. The concepts presented in this article provide a methodology in which architectures can be obtained that satisfies a set of constraints while establishing enough flexibility to support a given set of applications.

...read moreread less

177 citations

Proceedings Article•

A methodology to design programmble embedded systems: the Y-chart approach

[...]

Bart Kienhuis, Ed F. Deprettere, Pieter van der Wolf¹, Kees Vissers²•Institutions (2)

Philips¹, University of California, Berkeley²

01 Jan 2002

TL;DR: The concepts presented in this article provide a methodology in which architectures can be obtained that satisfies a set of constraints while establishing enough flexibility to support a given set of applications.

...read moreread less

111 citations

"Using C-to-gates to program streami..." refers methods in this paper

...Figure 3 illustrates this by adding a design space pyramid next to the design flow[8]....
[...]

Proceedings Article•DOI•

Performance-constrained pipelining of software loops onto reconfigurable hardware

[...]

Greg Snider¹•Institutions (1)

Hewlett-Packard¹

24 Feb 2002

TL;DR: Integrating the best features of both retiming and slowdown yields a pipelining algorithm, retimed modulo scheduling, that can more effectively exploit the idiosyncrasies of reconfigurable hardware.

...read moreread less

Abstract: Retiming and slowdown are algorithms that can be used to pipeline synchronous circuits. Iterative modulo scheduling is an algorithm for software pipelining in the presence of resource constraints. Integrating the best features of both yields a pipelining algorithm, retimed modulo scheduling, that can more effectively exploit the idiosyncrasies of reconfigurable hardware. It also fits naturally into a design space exploration process to trade-off speed for power, energy or area.

...read moreread less

86 citations