Multithreaded pipeline synthesis for data-parallel kernels

doi:10.5555/2691365.2691510

Open AccessProceedings ArticleDOI

Multithreaded pipeline synthesis for data-parallel kernels

Mingxing Tan, +3 more

- pp 718-725

Chats0

TLDR

Experimental results show that the proposed techniques can significantly improve the effective pipeline throughput over conventional approaches while conserving hardware resources.

Abstract:

Pipelining is an important technique in high-level synthesis, which overlaps the execution of successive loop iterations or threads to achieve high throughput for loop/function kernels. Since existing pipelining techniques typically enforce in-order thread execution, a variable-latency operation in one thread would block all subsequent threads, resulting in considerable performance degradation. In this paper, we propose a multithreaded pipelining approach that enables context switching to allow out-of-order thread execution for data-parallel kernels. To ensure that the synthesized pipeline is complexity effective, we further propose efficient scheduling algorithms for minimizing the hardware overhead associated with context management. Experimental results show that our proposed techniques can significantly improve the effective pipeline throughput over conventional approaches while conserving hardware resources.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Co-designing accelerators and SoC interfaces using gem5-aladdin

Yakun Sophia Shao, +4 more

TL;DR: It is shown that the optimal energy-delay-product of an accelerator microarchitecture can improve by up to 7.4× when system-level effects are considered compared to optimizing accelerators in isolation.

...read moreread less

Proceedings ArticleDOI

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests

Mingxing Tan, +4 more

TL;DR: ElasticFlow is proposed, a novel architectural synthesis approach capable of dynamically distributing inner loops to an array of loop processing units (LPUs) in a complexity-effective manner that demonstrates significant performance improvements over a widely used commercial HLS tool for Xilinx FPGAs.

...read moreread less

Proceedings ArticleDOI

Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis

Steve Dai, +6 more

TL;DR: This work proposes to generate an aggressive pipeline at compile-time while resolving hazards with memory port arbitration and squash-and-replay at run-time to enable high-throughput pipelining of irregular loops.

...read moreread less

Proceedings ArticleDOI

Efficient data supply for hardware accelerators with prefetching and access/execute decoupling

Tao Chen, +1 more

TL;DR: An architecture framework to easily design hardware accelerators that can effectively tolerate long and variable memory latency using prefetching and access/execute decoupling with minimal manual efforts is presented.

...read moreread less

Journal ArticleDOI

High-level Synthesis for Low-power Design

Zhiru Zhang, +3 more

- 01 Feb 2015 -

Ipsj Transactions on System Lsi Design M...

TL;DR: The recent research development of using HLS to effectively explore a multi-dimensional design space and derive low-power implementations is discussed and potential opportunities in tackling these challenges are outlined.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Introduction to Algorithms

Thomas H. Cormen, +2 more

TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.

...read moreread less

Book ChapterDOI

Introduction to Algorithms

Xin-She Yang

TL;DR: This chapter provides an overview of the fundamentals of algorithms and their links to self-organization, exploration, and exploitation.

...read moreread less

Proceedings ArticleDOI

LLVM: a compilation framework for lifelong program analysis & transformation

Chris Lattner, +1 more

TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.

...read moreread less

GPU Computing

John D. Owens, +5 more

TL;DR: The background, hardware, and programming model for GPU computing is described, the state of the art in tools and techniques are summarized, and four GPU computing successes in game physics and computational biophysics that deliver order-of-magnitude performance gains over optimized CPU applications are presented.

...read moreread less

Journal ArticleDOI

OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

John E. Stone, +2 more

- 01 May 2010 -

Computing in Science and Engineering

TL;DR: The OpenCL standard offers a common API for program execution on systems composed of different types of computational devices such as multicore CPUs, GPUs, or other accelerators as mentioned in this paper, such as accelerators.

...read moreread less

Collapse

Related Papers (5)

LegUp: high-level synthesis for FPGA-based processor/accelerator systems

Andrew Canis, +7 more

High-Level Synthesis for FPGAs: From Prototyping to Deployment

Jason Cong, +5 more

- 01 Apr 2011 -

IEEE Transactions on Computer-Aided Desi...

Multithreaded pipeline synthesis for data-parallel kernels

Citations

Co-designing accelerators and SoC interfaces using gem5-aladdin

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests

Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis

Efficient data supply for hardware accelerators with prefetching and access/execute decoupling

High-level Synthesis for Low-power Design

References

Introduction to Algorithms

Introduction to Algorithms

LLVM: a compilation framework for lifelong program analysis & transformation

GPU Computing

OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

Related Papers (5)

LegUp: high-level synthesis for FPGA-based processor/accelerator systems

High-Level Synthesis for FPGAs: From Prototyping to Deployment

SDC-based modulo scheduling for pipeline synthesis

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests

From software threads to parallel hardware in high-level synthesis for FPGAs