scispace - formally typeset
Proceedings ArticleDOI

Optimising multi-loop programs for heterogeneous computing systems

TLDR
A performance-driven strategy is proposed to find the best unrolling factor for each loop, such that the closer the match of run-time conditions and compile-time parameters, the higher the performance.
Abstract
This paper presents a method for optimising parallelisation and scheduling of task graphs containing representation of loops for implementation in heterogeneous computing systems with both software and hardware processors. The method integrates loop unrolling with task scheduling and determines the extent to which each loop should be unrolled to maximise performance, while meeting size constraints. A performance-driven strategy is proposed to find the best unrolling factor for each loop, such that the closer the match of run-time conditions and compile-time parameters, the higher the performance. Experimental results obtained using a speech recognition system show the proposed method outperforms an approach without unrolling by 2.1 times, and using the processing time of a 2.6GHz microprocessor as a reference, a speed up of 10 times can be achieved when compile-time and run-time parameters are matched, while the performance drops gradually when they are different.

read more

Citations
More filters
Journal ArticleDOI

Ant Colony Heuristic for Mapping and Scheduling Tasks and Communications on Heterogeneous Embedded Systems

TL;DR: This paper proposes an ant colony optimization (ACO) heuristic that, given a model of the target architecture and the application, efficiently executes both scheduling and mapping to optimize the application performance.
Proceedings ArticleDOI

Static Prediction of Loop Iteration Counts Using Machine Learning to Enable Hot Spot Optimizations

TL;DR: This paper presents a sophisticated approach using machine learning techniques to automatically generate heuristics that provide the compiler with knowledge of this run-time behavior, hence yielding more preciseHeuristics than those generated by pure static analyses.
Journal ArticleDOI

Parallel partitioning for distributed systems using sequential assignment

TL;DR: A novel mixed integer linear programming formalisation is used to assign code sections from parallel tasks to share computational components with the optimal trade-off between acceleration from component specialism and serialisation delay to achieve faster execution times.
Journal ArticleDOI

Improving communication latency with the write-only architecture

TL;DR: This paper provides formal assignment results for software benchmarks partitioned using the Write-Only Architecture and previous execution paradigms for distributed heterogeneous architectures along with bounds and complexity information to demonstrate the robust performance improvements possible with the WOA.
Journal ArticleDOI

Optimizing Hardware Design by Composing Utility-Directed Transformations

TL;DR: This work presents a systematic approach composing multiple utility-directed transformations for optimizing and mapping a sequential design onto a customizable parallel computing platform such as a Field-Programmable Gate Array (FPGA) to enable automatic design optimization at compile time.
References
More filters
Book

Fundamentals of speech recognition

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Journal ArticleDOI

Path-based scheduling for synthesis

TL;DR: A novel path-based scheduling algorithm that yields solutions with the minimum number of control steps, taking into account arbitrary constraints that limit the amount of operations in each control step, is presented.
Journal ArticleDOI

Pipeline vectorization

TL;DR: This paper presents pipeline vectorization, a method for synthesizing hardware pipelines based on software vectorizing compilers that improves efficiency and ease of development of hardware designs, particularly for users with little electronics design experience.
Proceedings ArticleDOI

Path-based scheduling for synthesis

TL;DR: A path-based scheduling algorithm for synchronous digital systems is presented, which yields solutions with the minimum number of control steps, taking into account arbitrary constraints that limit the amount of operations in each control step.
Proceedings ArticleDOI

Formulation and evaluation of scheduling techniques for control flow graphs

TL;DR: A probabilistic finite state machine is introduced to model the resulting schedule and evalute the effectiveness of the scheduling approaches for control flow graphs.
Related Papers (5)