scispace - formally typeset
Open AccessJournal ArticleDOI

Programming Heterogeneous Systems from an Image Processing DSL

Reads0
Chats0
TLDR
The image processing language Halide is extended so users can specify which portions of their applications should become hardware accelerators, and a compiler is provided that uses this code to automatically create the accelerator along with the “glue” code needed for the user’s application to access this hardware.
Abstract
Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality But creating, “programming,” and integrating this hardware into a hardware/software system is difficult We address this problem by extending the image processing language Halide so users can specify which portions of their applications should become hardware accelerators, and then we provide a compiler that uses this code to automatically create the accelerator along with the “glue” code needed for the user’s application to access this hardware Starting with Halide not only provides a very high-level functional description of the hardware but also allows our compiler to generate a complete software application, which accesses the hardware for acceleration when appropriate Our system also provides high-level semantics to explore different mappings of applications to a heterogeneous system, including the flexibility of being able to change the throughput rate of the generated hardwareWe demonstrate our approach by mapping applications to a commercial Xilinx Zynq system Using its FPGA with two low-power ARM cores, our design achieves up to 6× higher performance and 38× lower energy compared to the quad-core ARM CPU on an NVIDIA Tegra K1, and 35× higher performance with 12× lower energy compared to the K1’s 192-core GPU

read more

Citations
More filters
Proceedings ArticleDOI

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

TL;DR: In this paper, the authors present a taxonomy of DNN accelerator micro-architectures and their program mappings, which represent specific choices of loop order and hardware parallelism for computing the seven nested loops.
Proceedings ArticleDOI

Spatial: a language and compiler for application accelerators

TL;DR: This work describes a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators, and summarizes the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning.
Proceedings ArticleDOI

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

TL;DR: In this paper, the authors present a taxonomy of DNN accelerator micro-architectures and their program mappings, which represent specific choices of loop order and hardware parallelism for computing the seven nested loops.
Proceedings ArticleDOI

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing

TL;DR: Experimental results show that HeteroCL allows programmers to explore the design space efficiently in both performance and accuracy by combining different types of hardware customization and targeting spatial architectures, while keeping the algorithm code intact.
Journal ArticleDOI

Transformations of High-Level Synthesis Codes for High-Performance Computing

TL;DR: A collection of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications, is presented, aiming to establish a common toolbox to guide both performance engineers and compiler engineers in tapping into the performance potential offered by spatial computing architectures using HLS.
References
More filters
Journal ArticleDOI

Deep learning

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Book

Deep Learning

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Proceedings ArticleDOI

Scalable parallel programming with CUDA

TL;DR: Presents a collection of slides covering the following topics: CUDA parallel programming model; CUDA toolkit and libraries; performance optimization; and application development.
Proceedings ArticleDOI

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

TL;DR: This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.
Journal ArticleDOI

OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

TL;DR: The OpenCL standard offers a common API for program execution on systems composed of different types of computational devices such as multicore CPUs, GPUs, or other accelerators as mentioned in this paper, such as accelerators.
Related Papers (5)