scispace - formally typeset
Open AccessProceedings ArticleDOI

Multithreaded pipeline synthesis for data-parallel kernels

Reads0
Chats0
TLDR
Experimental results show that the proposed techniques can significantly improve the effective pipeline throughput over conventional approaches while conserving hardware resources.
Abstract
Pipelining is an important technique in high-level synthesis, which overlaps the execution of successive loop iterations or threads to achieve high throughput for loop/function kernels. Since existing pipelining techniques typically enforce in-order thread execution, a variable-latency operation in one thread would block all subsequent threads, resulting in considerable performance degradation. In this paper, we propose a multithreaded pipelining approach that enables context switching to allow out-of-order thread execution for data-parallel kernels. To ensure that the synthesized pipeline is complexity effective, we further propose efficient scheduling algorithms for minimizing the hardware overhead associated with context management. Experimental results show that our proposed techniques can significantly improve the effective pipeline throughput over conventional approaches while conserving hardware resources.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Co-designing accelerators and SoC interfaces using gem5-aladdin

TL;DR: It is shown that the optimal energy-delay-product of an accelerator microarchitecture can improve by up to 7.4× when system-level effects are considered compared to optimizing accelerators in isolation.
Proceedings ArticleDOI

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests

TL;DR: ElasticFlow is proposed, a novel architectural synthesis approach capable of dynamically distributing inner loops to an array of loop processing units (LPUs) in a complexity-effective manner that demonstrates significant performance improvements over a widely used commercial HLS tool for Xilinx FPGAs.
Proceedings ArticleDOI

Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis

TL;DR: This work proposes to generate an aggressive pipeline at compile-time while resolving hazards with memory port arbitration and squash-and-replay at run-time to enable high-throughput pipelining of irregular loops.
Proceedings ArticleDOI

Efficient data supply for hardware accelerators with prefetching and access/execute decoupling

TL;DR: An architecture framework to easily design hardware accelerators that can effectively tolerate long and variable memory latency using prefetching and access/execute decoupling with minimal manual efforts is presented.
Journal ArticleDOI

High-level Synthesis for Low-power Design

TL;DR: The recent research development of using HLS to effectively explore a multi-dimensional design space and derive low-power implementations is discussed and potential opportunities in tackling these challenges are outlined.
References
More filters
Book

Introduction to Algorithms

TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
Book ChapterDOI

Introduction to Algorithms

Xin-She Yang
TL;DR: This chapter provides an overview of the fundamentals of algorithms and their links to self-organization, exploration, and exploitation.
Proceedings ArticleDOI

LLVM: a compilation framework for lifelong program analysis & transformation

TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.

GPU Computing

TL;DR: The background, hardware, and programming model for GPU computing is described, the state of the art in tools and techniques are summarized, and four GPU computing successes in game physics and computational biophysics that deliver order-of-magnitude performance gains over optimized CPU applications are presented.
Journal ArticleDOI

OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

TL;DR: The OpenCL standard offers a common API for program execution on systems composed of different types of computational devices such as multicore CPUs, GPUs, or other accelerators as mentioned in this paper, such as accelerators.
Related Papers (5)