Programming Heterogeneous Systems from an Image Processing DSL
Jing Pu,Steven Bell,Xuan Yang,Jeff Setter,Stephen Richardson,Jonathan Ragan-Kelley,Mark Horowitz +6 more
Reads0
Chats0
TLDR
The image processing language Halide is extended so users can specify which portions of their applications should become hardware accelerators, and a compiler is provided that uses this code to automatically create the accelerator along with the “glue” code needed for the user’s application to access this hardware.Abstract:
Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality But creating, “programming,” and integrating this hardware into a hardware/software system is difficult We address this problem by extending the image processing language Halide so users can specify which portions of their applications should become hardware accelerators, and then we provide a compiler that uses this code to automatically create the accelerator along with the “glue” code needed for the user’s application to access this hardware Starting with Halide not only provides a very high-level functional description of the hardware but also allows our compiler to generate a complete software application, which accesses the hardware for acceleration when appropriate Our system also provides high-level semantics to explore different mappings of applications to a heterogeneous system, including the flexibility of being able to change the throughput rate of the generated hardwareWe demonstrate our approach by mapping applications to a commercial Xilinx Zynq system Using its FPGA with two low-power ARM cores, our design achieves up to 6× higher performance and 38× lower energy compared to the quad-core ARM CPU on an NVIDIA Tegra K1, and 35× higher performance with 12× lower energy compared to the K1’s 192-core GPUread more
Citations
More filters
Proceedings ArticleDOI
Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators
Xuan Yang,Mingyu Gao,Qiaoyi Liu,Jeff Setter,Jing Pu,Ankita Nayak,Steven Bell,Kaidi Cao,Heonjae Ha,Priyanka Raina,Christos Kozyrakis,Mark Horowitz +11 more
TL;DR: In this paper, the authors present a taxonomy of DNN accelerator micro-architectures and their program mappings, which represent specific choices of loop order and hardware parallelism for computing the seven nested loops.
Proceedings ArticleDOI
Spatial: a language and compiler for application accelerators
David Koeplinger,Matthew Feldman,Raghu Prabhakar,Yaqi Zhang,Stefan Hadjis,Ruben Fiszel,Tian Zhao,Luigi Nardi,Ardavan Pedram,Christos Kozyrakis,Kunle Olukotun +10 more
TL;DR: This work describes a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators, and summarizes the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning.
Proceedings ArticleDOI
Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators
Xuan Yang,Mingyu Gao,Qiaoyi Liu,Jeff Setter,Jing Pu,Ankita Nayak,Steven Bell,Kaidi Cao,Heonjae Ha,Priyanka Raina,Christos Kozyrakis,Mark Horowitz +11 more
TL;DR: In this paper, the authors present a taxonomy of DNN accelerator micro-architectures and their program mappings, which represent specific choices of loop order and hardware parallelism for computing the seven nested loops.
Proceedings ArticleDOI
HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing
TL;DR: Experimental results show that HeteroCL allows programmers to explore the design space efficiently in both performance and accuracy by combining different types of hardware customization and targeting spatial architectures, while keeping the algorithm code intact.
Journal ArticleDOI
Transformations of High-Level Synthesis Codes for High-Performance Computing
TL;DR: A collection of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications, is presented, aiming to establish a common toolbox to guide both performance engineers and compiler engineers in tapping into the performance potential offered by spatial computing architectures using HLS.
References
More filters
Journal ArticleDOI
Deep learning
TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Book
Deep Learning
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Proceedings ArticleDOI
Scalable parallel programming with CUDA
TL;DR: Presents a collection of slides covering the following topics: CUDA parallel programming model; CUDA toolkit and libraries; performance optimization; and application development.
Proceedings ArticleDOI
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
TL;DR: This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.
Journal ArticleDOI
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
TL;DR: The OpenCL standard offers a common API for program execution on systems composed of different types of computational devices such as multicore CPUs, GPUs, or other accelerators as mentioned in this paper, such as accelerators.