Proceedings ArticleDOI
Portable mapping of data parallel programs to OpenCL for heterogeneous systems
Dominik Grewe,Zheng Wang,Michael O'Boyle +2 more
- pp 1-10
TLDR
A compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs brings together the benefits of a clear high level-language (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores.Abstract:
General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of a clear high level-language (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses predictive modeling to automatically determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multi-core host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on two distinct GPU based systems: Core i7/NVIDIA GeForce GTX 580 and Core 17/AMD Radeon 7970. We achieved average (up to) speedups of 4.51× and 4.20× (143× and 67×) respectively over a sequential baseline. This is, on average, a factor 1.63 and 1.56 times faster than a hand-coded, GPU-specific OpenCL implementation developed by independent expert programmers.read more
Citations
More filters
Proceedings ArticleDOI
End-to-End Deep Learning of Optimization Heuristics
TL;DR: A deep neural network is developed that learns heuristics over raw code, entirely without using codefeatures, and it is shown that the neural nets can transferlearning from one optimization problem to another, improving the accuracy of new models, without the help of human experts.
Proceedings ArticleDOI
Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms
TL;DR: An efficient OpenCL task scheduling scheme which schedules multiple kernels from multiple programs on CPU/GPU heterogeneous platforms by determining at runtime which kernels are likely to best utilize a device and develops a novel model that predicts a kernel's speedup based on its static code structure.
Journal ArticleDOI
Machine Learning in Compiler Optimization
Zheng Wang,Michael O'Boyle +1 more
TL;DR: In the last decade, machine-learning-based compilation has moved from an obscure research niche to a mainstream activity as discussed by the authors, and the main concepts of features, models, training, and deployment have been introduced.
Posted Content
Machine Learning in Compiler Optimisation
Zheng Wang,Michael O'Boyle +1 more
TL;DR: The relationship between machine learning and compiler optimization is described and the main concepts of features, models, training, and deployment are introduced and a road map for the wide variety of different research areas is provided.
Proceedings ArticleDOI
Adaptive deep learning model selection on embedded systems
TL;DR: This paper presents an adaptive scheme to determine which DNN model to use for a given input, by considering the desired accuracy and inference time, and considers a range of influential DNN models.
References
More filters
Book
C4.5: Programs for Machine Learning
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Book
Optimizing Compilers for Modern Architectures: A Dependence-based Approach
Ken Kennedy,John R. Allen +1 more
TL;DR: A broad introduction to data dependence, to the many transformation strategies it supports, and to its applications to important optimization problems such as parallelization, compiler memory hierarchy management, and instruction scheduling are provided.
Proceedings ArticleDOI
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Victor W. Lee,Changkyu Kim,Jatin Chhugani,Michael E. Deisher,Daehyun Kim,Anthony D. Nguyen,Nadathur Satish,Mikhail Smelyanskiy,Srinivas Chennupaty,Per Hammarlund,Ronak Singhal,Pradeep Dubey +11 more
TL;DR: This paper discusses optimization techniques for both CPU and GPU, analyzes what architecture features contributed to performance differences between the two architectures, and recommends a set of architectural features which provide significant improvement in architectural efficiency for throughput kernels.
Proceedings ArticleDOI
The Scalable Heterogeneous Computing (SHOC) benchmark suite
Anthony Danalis,Gabriel Marin,Collin McCurdy,Jeremy S. Meredith,Philip C. Roth,Kyle Spafford,Vinod Tipparaju,Jeffrey S. Vetter +7 more
TL;DR: The Scalable HeterOgeneous Computing benchmark suite (SHOC) is a spectrum of programs that test the performance and stability of scalable heterogeneous computing systems and includes benchmark implementations in both OpenCL and CUDA in order to provide a comparison of these programming models.
Proceedings ArticleDOI
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
TL;DR: Adaptive mapping is proposed, a fully automatic technique to map computations to processing elements on a CPU+GPU machine and it is shown that, by judiciously distributing works over the CPU and GPU, automatic adaptive mapping achieves a 25% reduction in execution time and a 20% reduced in energy consumption than static mappings on average for a set of important computation benchmarks.
Related Papers (5)
Partitioning streaming parallelism for multi-cores: a machine learning based approach
Zheng Wang,Michael O'Boyle +1 more