scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Using C-to-gates to program streaming image processing kernels efficiently on FPGAs

TL;DR: This paper applies algorithmic C-to-FPGA synthesis technology in a structured design approach and demonstrates its added value on two relevant vision processing kernels: optical flow and debayering.
Abstract: Effectively exploiting the variety of computational and storage resources available in common FPGA architectures for complex applications, such as the real-time implementation of vision algorithms, is often difficult in standard HDL design methodologies. Higher-level design tools can enable a design to more quickly explore a range of different architectures. In this paper we apply algorithmic C-to-FPGA synthesis technology in a structured design approach and demonstrate its added value on two relevant vision processing kernels: optical flow and debayering. The impact of the proposed approach on the design time, the FPGA resource consumption and the throughput is measured.
Citations
More filters
Journal ArticleDOI
Jason Cong, Bin Liu, Stephen Neuendorffer1, Juanjo Noguera1, Kees Vissers1, Zhiru Zhang 
TL;DR: AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx are used as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains.
Abstract: Escalating system-on-chip design complexity is pushing the design community to raise the level of abstraction beyond register transfer level. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS msystem-on-chip design complexityethodology is happening now, especially for field-programmable gate array (FPGA) designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper, we use AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. In particular, the experiment on a sphere decoder shows that the HLS solution can achieve an 11-31% reduction in FPGA resource usage with improved design productivity compared to hand-coded design.

728 citations


Cites methods from "Using C-to-gates to program streami..."

  • ...The path-based scheduling algorithm in the Yorktown Silicon Compiler is useful to optimize performance with conditional branches [12]....

    [...]

  • ...Industry efforts include Cathedral and its successors [27], Yorktown Silicon Compiler [11], and BSSC [93], among many others....

    [...]

Proceedings ArticleDOI
05 Jun 2013
TL;DR: C code for a complex optical-flow algorithm is optimized for both a desktop PC and for an FPGA-based system, the Xilinx Zynq-7000, a device containing both a programmable fabric and two ARM cores.
Abstract: Recent developments in High-Level Synthesis (HLS) for FPGAs are making it possible to “run” C code on FPGAs thereby making modern programming environments available to FPGA developers. In this paper, C code for a complex optical-flow algorithm is optimized for both a desktop PC and for an FPGA-based system, the Xilinx Zynq-7000, a device containing both a programmable fabric and two ARM cores. The paper discusses how the code is optimized and restructured to execute effectively on the programmable fabric and the ARM cores. The resulting Zynq version of the C code is competitive with the desktop PC but only consumes 1/7th as much energy.

27 citations


Cites methods from "Using C-to-gates to program streami..."

  • ...The Synfora PICO Extreme HLS tool was used to demonstrate the productivity of HLS for two machine vision algorithms including optical flow [9]....

    [...]

Proceedings ArticleDOI
17 Apr 2016
TL;DR: The internal architecture of OpenSoC Fabric is described and its powerful list of configuration parameters are described and compared against pre-validated state-of-the-art simulators using both the generated C++ and Verilog models using FPGAs.
Abstract: As technology scaling continues, on-chip networks are expected to remain important in future many-core chips due to the increased parallelism and, therefore, communication. However, designing and evaluating large-scale on-chip networks is a nontrivial task given the poor scalability of software simulation for thousands of cores and the intense development effort to develop hardware RTL. In this paper, we describe OpenSoC Fabric. OpenSoC Fabric is a comprehensive on-chip network generator written in Chisel. Chisel generates both C++ and Verilog models from a single code base and has a development effort comparable to functional programming. We describe the internal architecture of OpenSoC Fabric and its powerful list of configuration parameters. We then compare OpenSoC Fabric against pre-validated state-of-the-art simulators using both the generated C++ and Verilog models using FPGAs.

24 citations


Cites background from "Using C-to-gates to program streami..."

  • ...For instance, C to Gates [27] and SystemC [28] are well-known alternatives....

    [...]

Proceedings ArticleDOI
01 Nov 2016
TL;DR: In this paper, image processing algorithms designed in Zynq SoC using the Vivado HLS tool are presented and compared with hand-coded designs, showing the reduction in time-to-market and the use of image processing libraries similar to OpenCV helps to reduce development time.
Abstract: In this paper, image processing algorithms designed in Zynq SoC using the Vivado HLS tool are presented and compared with hand-coded designs. In Vivado HLS, the designer has the opportunity to employ libraries similar to OpenCV, a library that is well-known and wide used by software designers. The algorithms are compared in terms of area resources in two conditions: using the libraries and not using the libraries. The case studies are Data Binning, a Step Row Filter and a Sobel Filter. These algorithms have been selected because they are very common in the field of image processing and they have high computational complexity. The main benefit of the Vivado HLS tool is the reduction in time-to-market. On the other hand, when a software designer hand-codes the design, the use of image processing libraries similar to OpenCV helps to reduce development time even further because software designers are familiar with them. However, using these kinds of libraries significantly increases the necessary FPGA resources.

23 citations


Cites background from "Using C-to-gates to program streami..."

  • ...Moreover, we can find examples in different fields of application such as 3G/4G wireless systems [11], aerospace applications [12] and image processing [13] in real time environments....

    [...]

Proceedings ArticleDOI
01 Dec 2013
TL;DR: This paper investigates matrix multiplication using a standard algorithm, Strassen algorithm, and a sparse algorithm to provide a comprehensive analysis of the capabilities and usability of the Xilinx Vivado HLS tool.
Abstract: One of the pitfalls of FPGA design is the relatively long implementation time when compared to alternative architectures, such as CPU, GPU or DSP. This time can be greatly reduced however by using tools that can generate hardware systems in the form of a hardware description language (HDL) from high-level languages such as C, C++, or Python. Such implementations can be optimized by applying special directives that focus the high-level synthesis (HLS) effort on particular objectives, such as performance, area, throughput, or power consumption. In this paper we examine the benefits of this approach by comparing the performance and design times of HLS generated systems versus custom systems for matrix multiplication. We investigate matrix multiplication using a standard algorithm, Strassen algorithm, and a sparse algorithm to provide a comprehensive analysis of the capabilities and usability of the Xilinx Vivado HLS tool. In our experience, a hardware-oriented electrical engineering student can achieve up to 61% of the performance of custom designs with 1/3 the effort, thus enabling faster hardware acceleration of many compute-bound algorithms.

20 citations


Cites methods from "Using C-to-gates to program streami..."

  • ...utilized the Synfora PICO Extreme HLS tool to investigate the usefulness of HLS in lowering design time and increasing performance for vision processing kernels [7]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The author begins by discussing the image formation process and examines the demosaicking methods in three groups: the first group consists of heuristic approaches, the second group formulates demosaicked as a restoration problem, and the third group is a generalization that uses the spectral filtering model given in Wandell.
Abstract: The author begins by discussing the image formation process. The demosaicking methods are examined in three groups: the first group consists of heuristic approaches. The second group formulates demosaicking as a restoration problem. The third group is a generalization that uses the spectral filtering model given in Wandell.

616 citations

BookDOI
01 Nov 2005
TL;DR: This comprehensive volume provides a detailed discourse on the mathematical models used in computational vision from leading educators and active research experts in this field and serves as a complete reference work for professionals.
Abstract: This comprehensive volume is an essential reference tool for professional and academic researchers in the filed of computer vision, image processing, and applied mathematics. Continuing rapid advances in image processing have been enhanced by the theoretical efforts of mathematicians and engineers. This marriage of mathematics and computer vision - computational vision - has resulted in a discrete approach to image processing that is more reliable when leveraging in practical tasks. This comprehensive volume provides a detailed discourse on the mathematical models used in computational vision from leading educators and active research experts in this field. Topical areas include: image reconstruction, segmentation and object extraction, shape modeling and registration, motion analysis and tracking, and 3D from images, geometry and reconstruction. The book also includes a study of applications in medical image analysis. Handbook of Mathematical Models in Computer Vision provides a graduate-level treatment of this subject as well as serving as a complete reference work for professionals.

601 citations


"Using C-to-gates to program streami..." refers background in this paper

  • ...This estimation remains one of the key problems in computer vision [5]....

    [...]

Book ChapterDOI
16 Jul 2001
TL;DR: In this article, the Y-chart approach is used to design a set of programmable architectures, where the same resources can be reused for another application by reprogramming the system.
Abstract: Embedded systems architectures are increasingly becoming programmable, which means that an architecture can execute a set of applications instead of only one. This makes these systems cost-effective, as the same resources can be reused for another application by reprogramming the system. To design these programmable architectures, we present in this article a number of concepts of which one is the Y-chart approach. These concepts allow designers to perform a systematic exploration of the design space of architectures. Since this design space may be huge, it is narrowed down in a number of steps. The concepts presented in this article provide a methodology in which architectures can be obtained that satisfies a set of constraints while establishing enough flexibility to support a given set of applications.

177 citations

Proceedings Article
01 Jan 2002
TL;DR: The concepts presented in this article provide a methodology in which architectures can be obtained that satisfies a set of constraints while establishing enough flexibility to support a given set of applications.
Abstract: Embedded systems architectures are increasingly becoming programmable, which means that an architecture can execute a set of applications instead of only one. This makes these systems cost-effective, as the same resources can be reused for another application by reprogramming the system. To design these programmable architectures, we present in this article a number of concepts of which one is the Y-chart approach. These concepts allow designers to perform a systematic exploration of the design space of architectures. Since this design space may be huge, it is narrowed down in a number of steps. The concepts presented in this article provide a methodology in which architectures can be obtained that satisfies a set of constraints while establishing enough flexibility to support a given set of applications.

111 citations


"Using C-to-gates to program streami..." refers methods in this paper

  • ...Figure 3 illustrates this by adding a design space pyramid next to the design flow[8]....

    [...]

Proceedings ArticleDOI
Greg Snider1
24 Feb 2002
TL;DR: Integrating the best features of both retiming and slowdown yields a pipelining algorithm, retimed modulo scheduling, that can more effectively exploit the idiosyncrasies of reconfigurable hardware.
Abstract: Retiming and slowdown are algorithms that can be used to pipeline synchronous circuits. Iterative modulo scheduling is an algorithm for software pipelining in the presence of resource constraints. Integrating the best features of both yields a pipelining algorithm, retimed modulo scheduling, that can more effectively exploit the idiosyncrasies of reconfigurable hardware. It also fits naturally into a design space exploration process to trade-off speed for power, energy or area.

86 citations


"Using C-to-gates to program streami..." refers methods in this paper

  • ...The insertion of pipelining [2] is guided by characterization of the underlying FPGA architecture ....

    [...]