Programming Heterogeneous Systems from an Image Processing DSL

doi:10.1145/3107953

Open AccessJournal ArticleDOI

Programming Heterogeneous Systems from an Image Processing DSL

Jing Pu, +6 more

- 16 Aug 2017 -

ACM Transactions on Architecture and Cod...

- Vol. 14, Iss: 3, pp 26

Chats0

TLDR

The image processing language Halide is extended so users can specify which portions of their applications should become hardware accelerators, and a compiler is provided that uses this code to automatically create the accelerator along with the “glue” code needed for the user’s application to access this hardware.

Abstract:

Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality But creating, “programming,” and integrating this hardware into a hardware/software system is difficult We address this problem by extending the image processing language Halide so users can specify which portions of their applications should become hardware accelerators, and then we provide a compiler that uses this code to automatically create the accelerator along with the “glue” code needed for the user’s application to access this hardware Starting with Halide not only provides a very high-level functional description of the hardware but also allows our compiler to generate a complete software application, which accesses the hardware for acceleration when appropriate Our system also provides high-level semantics to explore different mappings of applications to a heterogeneous system, including the flexibility of being able to change the throughput rate of the generated hardwareWe demonstrate our approach by mapping applications to a commercial Xilinx Zynq system Using its FPGA with two low-power ARM cores, our design achieves up to 6× higher performance and 38× lower energy compared to the quad-core ARM CPU on an NVIDIA Tegra K1, and 35× higher performance with 12× lower energy compared to the K1’s 192-core GPU

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

Xuan Yang, +11 more

TL;DR: In this paper, the authors present a taxonomy of DNN accelerator micro-architectures and their program mappings, which represent specific choices of loop order and hardware parallelism for computing the seven nested loops.

...read moreread less

Proceedings ArticleDOI

Spatial: a language and compiler for application accelerators

David Koeplinger, +10 more

TL;DR: This work describes a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators, and summarizes the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning.

...read moreread less

Proceedings ArticleDOI

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

Xuan Yang, +11 more

- 10 Sep 2018 -

arXiv: Distributed, Parallel, and Cluste...

TL;DR: In this paper, the authors present a taxonomy of DNN accelerator micro-architectures and their program mappings, which represent specific choices of loop order and hardware parallelism for computing the seven nested loops.

...read moreread less

Proceedings ArticleDOI

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing

Yi-Hsiang Lai, +7 more

TL;DR: Experimental results show that HeteroCL allows programmers to explore the design space efficiently in both performance and accuracy by combining different types of hardware customization and targeting spatial architectures, while keeping the algorithm code intact.

...read moreread less

Journal ArticleDOI

Transformations of High-Level Synthesis Codes for High-Performance Computing

Johannes de Fine Licht, +3 more

- 01 May 2021 -

IEEE Transactions on Parallel and Distri...

TL;DR: A collection of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications, is presented, aiming to establish a common toolbox to guide both performance engineers and compiler engineers in tapping into the performance potential offered by spatial computing architectures using HLS.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Deep learning

Yann LeCun, +4 more

- 28 May 2015 -

Nature

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.

...read moreread less

Book

Deep Learning

Ian Goodfellow, +2 more

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Proceedings ArticleDOI

Scalable parallel programming with CUDA

John R. Nickolls, +3 more

TL;DR: Presents a collection of slides covering the following topics: CUDA parallel programming model; CUDA toolkit and libraries; performance optimization; and application development.

...read moreread less

Proceedings ArticleDOI

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Chen Zhang, +5 more

TL;DR: This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.

...read moreread less

Journal ArticleDOI

OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

John E. Stone, +2 more

- 01 May 2010 -

Computing in Science and Engineering

TL;DR: The OpenCL standard offers a common API for program execution on systems composed of different types of computational devices such as multicore CPUs, GPUs, or other accelerators as mentioned in this paper, such as accelerators.

...read moreread less

Collapse

IEEE Transactions on Computer-Aided Desi...

PolyMage: Automatic Optimization for Image Processing Pipelines

Ravi Teja Mullapudi, +2 more

LegUp: high-level synthesis for FPGA-based processor/accelerator systems

Andrew Canis, +7 more

Programming Heterogeneous Systems from an Image Processing DSL

Citations

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

Spatial: a language and compiler for application accelerators

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing

Transformations of High-Level Synthesis Codes for High-Performance Computing

References

Deep learning

Deep Learning

Scalable parallel programming with CUDA

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

Related Papers (5)

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

Darkroom: compiling high-level image processing code into hardware pipelines

High-Level Synthesis for FPGAs: From Prototyping to Deployment

PolyMage: Automatic Optimization for Image Processing Pipelines

LegUp: high-level synthesis for FPGA-based processor/accelerator systems