scispace - formally typeset
Search or ask a question
Topic

Performance per watt

About: Performance per watt is a research topic. Over the lifetime, 315 publications have been published within this topic receiving 5778 citations.


Papers
More filters
Book ChapterDOI
16 Jun 2013
TL;DR: This work takes a look at the total cost of ownership (TCO) that includes costs for administration and programming effort and compute the costs per program run which can be used as a comparison metric for a hardware purchase decision.
Abstract: Nowadays, HPC systems emerge in a great variety including commodity processors with attached accelerators which promise to improve the performance per watt ratio. These heterogeneous architectures often get far more complex to employ. Therefore, a hardware purchase decision should not only take capital expenses and operational costs such as power consumption into account, but also manpower. In this work, we take a look at the total cost of ownership (TCO) that includes costs for administration and programming effort. From that, we compute the costs per program run which can be used as a comparison metric for a purchase decision. In a case study, we evaluate our approach on two real-world simulation applications on Intel Xeon architectures, NVIDIA GPUs and Intel Xeon Phis by using different programming models: OpenCL, OpenACC, OpenMP and Intel’s Language Extensions for Offload.

15 citations

Proceedings ArticleDOI
18 Aug 2010
TL;DR: This paper proposes and implements mechanisms and policies for a commercial OS scheduler and load balancer which incorporates thread characteristics, and shows that it results in improvements of up to 30% in performance per watt.
Abstract: Runtime characteristics of individual threads (such as IPC, cache usage, etc.) are a critical factor in making efficient scheduling decisions in modern chip-multiprocessor systems. They provide key insights into how threads interact when they share processor resources, and affect the overall system power and performance efficiency. In this paper, we propose and implement mechanisms and policies for a commercial OS scheduler and load balancer which incorporates thread characteristics, and show that it results in improvements of up to 30% in performance per watt.

14 citations

Journal ArticleDOI
TL;DR: A new technique, ASTPI (Average Stall Time Per Instruction), is proposed, design, implement and evaluate a new online monitoring approach called ESHMP, which is based on the metric, and shows that among HMP systems in which heterogeneity-aware schedulers are adopted and there are more than one LLC, the architecture where heterogeneous cores share LLCs gain better performance than the ones where homogeneous coresshare LLCs.

14 citations

Proceedings ArticleDOI
18 Mar 2013
TL;DR: CReAMS is composed of multiple adaptive reconfigurable processors that simultaneously exploit Instruction and Thread Level Parallelism, and works in a transparent fashion, so binary compatibility is maintained, with no need to change the software development process or environment.
Abstract: As the number of embedded applications increases, companies are launching new platforms within short periods of time to efficiently execute software with the lowest possible energy consumption. However, for each new platform deployment, new tool chains, with additional libraries, debuggers and compilers must come along, breaking binary compatibility. This strategy implies in high hardware and software redesign costs. In this scenario, we propose the exploitation of Custom Reconfigurable Arrays for Multiprocessor Systems (CReAMS). CReAMS is composed of multiple adaptive reconfigurable processors that simultaneously exploit Instruction and Thread Level Parallelism. It works in a transparent fashion, so binary compatibility is maintained, with no need to change the software development process or environment. We also show that CReAMS delivers higher performance per watt in comparison to a 4-issue Superscalar processor, when the same power budget is considered for both designs.

14 citations

Journal ArticleDOI
TL;DR: In this paper, the authors investigated how GPU power consumption increases non-linearly with both temperature and supply voltage, as predicted by physical transistor models, and they showed that GPU supply voltage and clock frequency while maintaining a low die temperature increases the power efficiency of an NVIDIA K20 GPU.
Abstract: The magnitude of the real-time digital signal processing challenge attached to large radio astronomical antenna arrays motivates use of high performance computing (HPC) systems. The need for high power efficiency at remote observatory sites parallels that in HPC broadly, where efficiency is a critical metric. We investigate how the performance-per-watt of graphics processing units (GPUs) is affected by temperature, core clock frequency and voltage. Our results highlight how the underlying physical processes that govern transistor operation affect power efficiency. In particular, we show experimentally that GPU power consumption increases non-linearly (quadratic) with both temperature and supply voltage, as predicted by physical transistor models. We show lowering GPU supply voltage and increasing clock frequency while maintaining a low die temperature increases the power efficiency of an NVIDIA K20 GPU by up to 37---48 % over default settings when running xGPU, a compute-bound code used in radio astronomy. We discuss how automatic temperature-aware and application-dependent voltage and frequency scaling (T-DVFS and A-DVFS) may provide a mechanism to achieve better power efficiency for a wider range of compute codes running on GPUs.

14 citations

Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
81% related
Benchmark (computing)
19.6K papers, 419.1K citations
80% related
Programming paradigm
18.7K papers, 467.9K citations
77% related
Compiler
26.3K papers, 578.5K citations
77% related
Scalability
50.9K papers, 931.6K citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202114
202015
201915
201836
201725
201631