scispace - formally typeset
Proceedings ArticleDOI

Portable mapping of data parallel programs to OpenCL for heterogeneous systems

TLDR
A compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs brings together the benefits of a clear high level-language (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores.
Abstract
General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of a clear high level-language (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses predictive modeling to automatically determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multi-core host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on two distinct GPU based systems: Core i7/NVIDIA GeForce GTX 580 and Core 17/AMD Radeon 7970. We achieved average (up to) speedups of 4.51× and 4.20× (143× and 67×) respectively over a sequential baseline. This is, on average, a factor 1.63 and 1.56 times faster than a hand-coded, GPU-specific OpenCL implementation developed by independent expert programmers.

read more

Citations
More filters
Proceedings ArticleDOI

End-to-End Deep Learning of Optimization Heuristics

TL;DR: A deep neural network is developed that learns heuristics over raw code, entirely without using codefeatures, and it is shown that the neural nets can transferlearning from one optimization problem to another, improving the accuracy of new models, without the help of human experts.
Proceedings ArticleDOI

Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms

TL;DR: An efficient OpenCL task scheduling scheme which schedules multiple kernels from multiple programs on CPU/GPU heterogeneous platforms by determining at runtime which kernels are likely to best utilize a device and develops a novel model that predicts a kernel's speedup based on its static code structure.
Journal ArticleDOI

Machine Learning in Compiler Optimization

TL;DR: In the last decade, machine-learning-based compilation has moved from an obscure research niche to a mainstream activity as discussed by the authors, and the main concepts of features, models, training, and deployment have been introduced.
Posted Content

Machine Learning in Compiler Optimisation

TL;DR: The relationship between machine learning and compiler optimization is described and the main concepts of features, models, training, and deployment are introduced and a road map for the wide variety of different research areas is provided.
Proceedings ArticleDOI

Adaptive deep learning model selection on embedded systems

TL;DR: This paper presents an adaptive scheme to determine which DNN model to use for a given input, by considering the desired accuracy and inference time, and considers a range of influential DNN models.
References
More filters
Book

C4.5: Programs for Machine Learning

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Book

Optimizing Compilers for Modern Architectures: A Dependence-based Approach

Ken Kennedy, +1 more
TL;DR: A broad introduction to data dependence, to the many transformation strategies it supports, and to its applications to important optimization problems such as parallelization, compiler memory hierarchy management, and instruction scheduling are provided.
Proceedings ArticleDOI

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

TL;DR: This paper discusses optimization techniques for both CPU and GPU, analyzes what architecture features contributed to performance differences between the two architectures, and recommends a set of architectural features which provide significant improvement in architectural efficiency for throughput kernels.
Proceedings ArticleDOI

The Scalable Heterogeneous Computing (SHOC) benchmark suite

TL;DR: The Scalable HeterOgeneous Computing benchmark suite (SHOC) is a spectrum of programs that test the performance and stability of scalable heterogeneous computing systems and includes benchmark implementations in both OpenCL and CUDA in order to provide a comparison of these programming models.
Proceedings ArticleDOI

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

TL;DR: Adaptive mapping is proposed, a fully automatic technique to map computations to processing elements on a CPU+GPU machine and it is shown that, by judiciously distributing works over the CPU and GPU, automatic adaptive mapping achieves a 25% reduction in execution time and a 20% reduced in energy consumption than static mappings on average for a set of important computation benchmarks.
Related Papers (5)