Proceedings ArticleDOI
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
Kai Ma,Xue Li,Wei Chen,Chi Zhang,Xiaorui Wang +4 more
- pp 48-57
TLDR
This paper proposes Green GPU, a holistic energy management framework for GPU-CPU heterogeneous architectures that dynamically throttles the frequencies of GPU cores and memory in a coordinated manner, based on their utilizations, for maximized energy savings with only marginal performance degradation.Abstract:
In recent years, GPU-CPU heterogeneous architectures have been increasingly adopted in high performance computing, because of their capabilities of providing high computational throughput. However, the energy consumption is a major concern due to the large scale of such kind of systems. There are a few existing efforts that try to lower the energy consumption of GPU-CPU architectures, but they address either GPU or CPU in an isolated manner and thus cannot achieve maximized energy savings. In this paper, we propose Green GPU, a holistic energy management framework for GPU-CPU heterogeneous architectures. Our solution features a two-tier design. In the first tier, Green GPU dynamically splits and distributes workloads to GPU and CPU based on the workload characteristics, such that both sides can finish approximately at the same time. As a result, the energy wasted on idling and waiting for the slower side to finish is minimized. In the second tier, Green GPU dynamically throttles the frequencies of GPU cores and memory in a coordinated manner, based on their utilizations, for maximized energy savings with only marginal performance degradation. Likewise, the frequency and voltage of the CPU are scaled similarly. We implement Green GPU using the CUDA framework on a real physical test bed with Nvidia GeForce GPUs and AMD Phenom II CPUs. Experiment results show that Green GPU achieves 21.04% average energy savings and outperforms several well-designed baselines.read more
Citations
More filters
Journal ArticleDOI
A Survey of CPU-GPU Heterogeneous Computing Techniques
Sparsh Mittal,Jeffrey S. Vetter +1 more
TL;DR: This article surveys Heterogeneous Computing Techniques (HCTs) such as workload partitioning that enable utilizing both CPUs and GPUs to improve performance and/or energy efficiency and reviews both discrete and fused CPU-GPU systems.
Proceedings ArticleDOI
Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities
Ashutosh Pattnaik,Xulong Tang,Adwait Jog,Onur Kayiran,Asit K. Mishra,Mahmut Kandemir,Onur Mutlu,Chita R. Das +7 more
TL;DR: Two new runtime techniques are developed: a regression-based affinity prediction model and mechanism that accurately identifies which kernels would benefit from PIM and offloads them to GPU cores in memory, and a concurrent kernel management mechanism that uses the affinity Prediction model, a new kernel execution time prediction model, and kernel dependency information to decide which kernels to schedule concurrently on main GPU cores and the GPU core in memory.
Proceedings ArticleDOI
GPGPU performance and power estimation using machine learning
TL;DR: A GPU performance and power estimation model that uses machine learning techniques on measurements from real GPU hardware that runs as fast as, or faster than the program running natively on real hardware after an initial training phase.
Journal ArticleDOI
A Survey of Methods for Analyzing and Improving GPU Energy Efficiency
Sparsh Mittal,Jeffrey S. Vetter +1 more
TL;DR: The aim of this survey is to provide researchers with knowledge of the state of the art in GPU power management and motivate them to architect highly energy-efficient GPUs of tomorrow.
Posted Content
A Survey of Methods For Analyzing and Improving GPU Energy Efficiency
Sparsh Mittal,Jeffrey S. Vetter +1 more
TL;DR: In this paper, the authors present a survey of GPU power management techniques and compare them with other computing systems, e.g. FPGAs and CPUs, and provide a classification of these techniques on the basis of their main research idea.
References
More filters
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Yoav Freund,Robert E. Schapire +1 more
TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.
Proceedings ArticleDOI
Rodinia: A benchmark suite for heterogeneous computing
Shuai Che,Michael Boyer,Jiayuan Meng,David Tarjan,Jeremy W. Sheaffer,Sang-Ha Lee,Kevin Skadron +6 more
TL;DR: This characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.
Journal ArticleDOI
The weighted majority algorithm
TL;DR: A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm, which is robust in the presence of errors in the data, and is called the Weighted Majority Algorithm.
Journal ArticleDOI
Temperature-aware microarchitecture: Modeling and implementation
Kevin Skadron,Mircea R. Stan,Karthik Sankaranarayanan,Wei Huang,Sivakumar Velusamy,David Tarjan +5 more
TL;DR: HotSpot is described, an accurate yet fast and practical model based on an equivalent circuit of thermal resistances and capacitances that correspond to microarchitecture blocks and essential aspects of the thermal package that shows that power metrics are poor predictors of temperature, that sensor imprecision has a substantial impact on the performance of DTM, and that the inclusion of lateral resistances for thermal diffusion is important for accuracy.