Multithreaded pipeline synthesis for data-parallel kernels
Mingxing Tan,Bin Liu,Steve Dai,Zhiru Zhang +3 more
- pp 718-725
Reads0
Chats0
TLDR
Experimental results show that the proposed techniques can significantly improve the effective pipeline throughput over conventional approaches while conserving hardware resources.Abstract:
Pipelining is an important technique in high-level synthesis, which overlaps the execution of successive loop iterations or threads to achieve high throughput for loop/function kernels. Since existing pipelining techniques typically enforce in-order thread execution, a variable-latency operation in one thread would block all subsequent threads, resulting in considerable performance degradation. In this paper, we propose a multithreaded pipelining approach that enables context switching to allow out-of-order thread execution for data-parallel kernels. To ensure that the synthesized pipeline is complexity effective, we further propose efficient scheduling algorithms for minimizing the hardware overhead associated with context management. Experimental results show that our proposed techniques can significantly improve the effective pipeline throughput over conventional approaches while conserving hardware resources.read more
Citations
More filters
Proceedings ArticleDOI
Co-designing accelerators and SoC interfaces using gem5-aladdin
TL;DR: It is shown that the optimal energy-delay-product of an accelerator microarchitecture can improve by up to 7.4× when system-level effects are considered compared to optimizing accelerators in isolation.
Proceedings ArticleDOI
ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests
TL;DR: ElasticFlow is proposed, a novel architectural synthesis approach capable of dynamically distributing inner loops to an array of loop processing units (LPUs) in a complexity-effective manner that demonstrates significant performance improvements over a widely used commercial HLS tool for Xilinx FPGAs.
Proceedings ArticleDOI
Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis
TL;DR: This work proposes to generate an aggressive pipeline at compile-time while resolving hazards with memory port arbitration and squash-and-replay at run-time to enable high-throughput pipelining of irregular loops.
Proceedings ArticleDOI
Efficient data supply for hardware accelerators with prefetching and access/execute decoupling
Tao Chen,G. Edward Suh +1 more
TL;DR: An architecture framework to easily design hardware accelerators that can effectively tolerate long and variable memory latency using prefetching and access/execute decoupling with minimal manual efforts is presented.
Journal ArticleDOI
High-level Synthesis for Low-power Design
TL;DR: The recent research development of using HLS to effectively explore a multi-dimensional design space and derive low-power implementations is discussed and potential opportunities in tackling these challenges are outlined.
References
More filters
Book
Introduction to Algorithms
TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
Book ChapterDOI
Introduction to Algorithms
TL;DR: This chapter provides an overview of the fundamentals of algorithms and their links to self-organization, exploration, and exploitation.
Proceedings ArticleDOI
LLVM: a compilation framework for lifelong program analysis & transformation
Chris Lattner,Vikram Adve +1 more
TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.
GPU Computing
TL;DR: The background, hardware, and programming model for GPU computing is described, the state of the art in tools and techniques are summarized, and four GPU computing successes in game physics and computational biophysics that deliver order-of-magnitude performance gains over optimized CPU applications are presented.
Journal ArticleDOI
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
TL;DR: The OpenCL standard offers a common API for program execution on systems composed of different types of computational devices such as multicore CPUs, GPUs, or other accelerators as mentioned in this paper, such as accelerators.