Portable mapping of data parallel programs to OpenCL for heterogeneous systems

doi:10.1109/CGO.2013.6494993

Proceedings ArticleDOI

Portable mapping of data parallel programs to OpenCL for heterogeneous systems

- pp 1-10

TLDR

A compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs brings together the benefits of a clear high level-language (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores.

Abstract:

General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of a clear high level-language (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses predictive modeling to automatically determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multi-core host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on two distinct GPU based systems: Core i7/NVIDIA GeForce GTX 580 and Core 17/AMD Radeon 7970. We achieved average (up to) speedups of 4.51× and 4.20× (143× and 67×) respectively over a sequential baseline. This is, on average, a factor 1.63 and 1.56 times faster than a hand-coded, GPU-specific OpenCL implementation developed by independent expert programmers.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

End-to-End Deep Learning of Optimization Heuristics

Christopher C. Cummins, +3 more

TL;DR: A deep neural network is developed that learns heuristics over raw code, entirely without using codefeatures, and it is shown that the neural nets can transferlearning from one optimization problem to another, improving the accuracy of new models, without the help of human experts.

...read moreread less

Proceedings ArticleDOI

Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms

Yuan Wen, +2 more

TL;DR: An efficient OpenCL task scheduling scheme which schedules multiple kernels from multiple programs on CPU/GPU heterogeneous platforms by determining at runtime which kernels are likely to best utilize a device and develops a novel model that predicts a kernel's speedup based on its static code structure.

...read moreread less

Journal ArticleDOI

Machine Learning in Compiler Optimization

Zheng Wang, +1 more

TL;DR: In the last decade, machine-learning-based compilation has moved from an obscure research niche to a mainstream activity as discussed by the authors, and the main concepts of features, models, training, and deployment have been introduced.

...read moreread less

Posted Content

Machine Learning in Compiler Optimisation

Zheng Wang, +1 more

- 09 May 2018 -

arXiv: Programming Languages

TL;DR: The relationship between machine learning and compiler optimization is described and the main concepts of features, models, training, and deployment are introduced and a road map for the wide variety of different research areas is provided.

...read moreread less

Proceedings ArticleDOI

Adaptive deep learning model selection on embedded systems

Ben Taylor, +4 more

TL;DR: This paper presents an adaptive scheme to determine which DNN model to use for a given input, by considering the desired accuracy and inference time, and considers a range of influential DNN models.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

C4.5: Programs for Machine Learning

J. Ross Quinlan

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.

...read moreread less

Book

Optimizing Compilers for Modern Architectures: A Dependence-based Approach

Ken Kennedy, +1 more

TL;DR: A broad introduction to data dependence, to the many transformation strategies it supports, and to its applications to important optimization problems such as parallelization, compiler memory hierarchy management, and instruction scheduling are provided.

...read moreread less

Proceedings ArticleDOI

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Victor W. Lee, +11 more

TL;DR: This paper discusses optimization techniques for both CPU and GPU, analyzes what architecture features contributed to performance differences between the two architectures, and recommends a set of architectural features which provide significant improvement in architectural efficiency for throughput kernels.

...read moreread less

Proceedings ArticleDOI

The Scalable Heterogeneous Computing (SHOC) benchmark suite

Anthony Danalis, +7 more

TL;DR: The Scalable HeterOgeneous Computing benchmark suite (SHOC) is a spectrum of programs that test the performance and stability of scalable heterogeneous computing systems and includes benchmark implementations in both OpenCL and CUDA in order to provide a comparison of these programming models.

...read moreread less

Proceedings ArticleDOI

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

Chi-Keung Luk, +2 more

TL;DR: Adaptive mapping is proposed, a fully automatic technique to map computations to processing elements on a CPU+GPU machine and it is shown that, by judiciously distributing works over the CPU and GPU, automatic adaptive mapping achieves a 25% reduction in execution time and a 20% reduced in energy consumption than static mappings on average for a set of important computation benchmarks.

...read moreread less