Proceedings ArticleDOI
Optimising multi-loop programs for heterogeneous computing systems
Y. M. Lam,Jose G. F. Coutinho,Wayne Luk,Philip H. W. Leong +3 more
- pp 129-134
TLDR
A performance-driven strategy is proposed to find the best unrolling factor for each loop, such that the closer the match of run-time conditions and compile-time parameters, the higher the performance.Abstract:
This paper presents a method for optimising parallelisation and scheduling of task graphs containing representation of loops for implementation in heterogeneous computing systems with both software and hardware processors. The method integrates loop unrolling with task scheduling and determines the extent to which each loop should be unrolled to maximise performance, while meeting size constraints. A performance-driven strategy is proposed to find the best unrolling factor for each loop, such that the closer the match of run-time conditions and compile-time parameters, the higher the performance. Experimental results obtained using a speech recognition system show the proposed method outperforms an approach without unrolling by 2.1 times, and using the processing time of a 2.6GHz microprocessor as a reference, a speed up of 10 times can be achieved when compile-time and run-time parameters are matched, while the performance drops gradually when they are different.read more
Citations
More filters
Journal ArticleDOI
Ant Colony Heuristic for Mapping and Scheduling Tasks and Communications on Heterogeneous Embedded Systems
TL;DR: This paper proposes an ant colony optimization (ACO) heuristic that, given a model of the target architecture and the application, efficiently executes both scheduling and mapping to optimize the application performance.
Proceedings ArticleDOI
Static Prediction of Loop Iteration Counts Using Machine Learning to Enable Hot Spot Optimizations
Dirk Tetzlaff,Sabine Glesner +1 more
TL;DR: This paper presents a sophisticated approach using machine learning techniques to automatically generate heuristics that provide the compiler with knowledge of this run-time behavior, hence yielding more preciseHeuristics than those generated by pure static analyses.
Journal ArticleDOI
Parallel partitioning for distributed systems using sequential assignment
TL;DR: A novel mixed integer linear programming formalisation is used to assign code sections from parallel tasks to share computational components with the optimal trade-off between acceleration from component specialism and serialisation delay to achieve faster execution times.
Journal ArticleDOI
Improving communication latency with the write-only architecture
TL;DR: This paper provides formal assignment results for software benchmarks partitioned using the Write-Only Architecture and previous execution paradigms for distributed heterogeneous architectures along with bounds and complexity information to demonstrate the robust performance improvements possible with the WOA.
Journal ArticleDOI
Optimizing Hardware Design by Composing Utility-Directed Transformations
TL;DR: This work presents a systematic approach composing multiple utility-directed transformations for optimizing and mapping a sequential design onto a customizable parallel computing platform such as a Field-Programmable Gate Array (FPGA) to enable automatic design optimization at compile time.
References
More filters
Book
Fundamentals of speech recognition
TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Journal ArticleDOI
Path-based scheduling for synthesis
TL;DR: A novel path-based scheduling algorithm that yields solutions with the minimum number of control steps, taking into account arbitrary constraints that limit the amount of operations in each control step, is presented.
Journal ArticleDOI
Pipeline vectorization
Markus Weinhardt,Wayne Luk +1 more
TL;DR: This paper presents pipeline vectorization, a method for synthesizing hardware pipelines based on software vectorizing compilers that improves efficiency and ease of development of hardware designs, particularly for users with little electronics design experience.
Proceedings ArticleDOI
Path-based scheduling for synthesis
TL;DR: A path-based scheduling algorithm for synchronous digital systems is presented, which yields solutions with the minimum number of control steps, taking into account arbitrary constraints that limit the amount of operations in each control step.
Proceedings ArticleDOI
Formulation and evaluation of scheduling techniques for control flow graphs
TL;DR: A probabilistic finite state machine is introduced to model the resulting schedule and evalute the effectiveness of the scheduling approaches for control flow graphs.