Topic

Performance per watt

About: Performance per watt is a research topic. Over the lifetime, 315 publications have been published within this topic receiving 5778 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Accelerators for Technical Computing: Is It Worth the Pain? A TCO Perspective

[...]

Sandra Wienke¹, Dieter an Mey¹, Matthias S. Müller¹•Institutions (1)

RWTH Aachen University¹

16 Jun 2013

TL;DR: This work takes a look at the total cost of ownership (TCO) that includes costs for administration and programming effort and compute the costs per program run which can be used as a comparison metric for a hardware purchase decision.

...read moreread less

Abstract: Nowadays, HPC systems emerge in a great variety including commodity processors with attached accelerators which promise to improve the performance per watt ratio. These heterogeneous architectures often get far more complex to employ. Therefore, a hardware purchase decision should not only take capital expenses and operational costs such as power consumption into account, but also manpower. In this work, we take a look at the total cost of ownership (TCO) that includes costs for administration and programming effort. From that, we compute the costs per program run which can be used as a comparison metric for a purchase decision. In a case study, we evaluate our approach on two real-world simulation applications on Intel Xeon architectures, NVIDIA GPUs and Intel Xeon Phis by using different programming models: OpenCL, OpenACC, OpenMP and Intel’s Language Extensions for Offload.

...read moreread less

15 citations

Proceedings Article•DOI•

Dynamic workload characterization for power efficient scheduling on CMP systems

[...]

Gaurav Dhiman¹, Vasileios Kontorinis¹, Dean M. Tullsen¹, Tajana Rosing¹, Eric C. Saxe², Jonathan J. Chew² - Show less +2 more•Institutions (2)

University of California, San Diego¹, Oracle Corporation²

18 Aug 2010

TL;DR: This paper proposes and implements mechanisms and policies for a commercial OS scheduler and load balancer which incorporates thread characteristics, and shows that it results in improvements of up to 30% in performance per watt.

...read moreread less

Abstract: Runtime characteristics of individual threads (such as IPC, cache usage, etc.) are a critical factor in making efficient scheduling decisions in modern chip-multiprocessor systems. They provide key insights into how threads interact when they share processor resources, and affect the overall system power and performance efficiency. In this paper, we propose and implement mechanisms and policies for a commercial OS scheduler and load balancer which incorporates thread characteristics, and show that it results in improvements of up to 30% in performance per watt.

...read moreread less

14 citations

Journal Article•DOI•

Efficient and scalable scheduling for performance heterogeneous multicore systems

[...]

Pengcheng Nie¹, Zhenhua Duan¹•Institutions (1)

Xidian University¹

01 Mar 2012-Journal of Parallel and Distributed Computing

TL;DR: A new technique, ASTPI (Average Stall Time Per Instruction), is proposed, design, implement and evaluate a new online monitoring approach called ESHMP, which is based on the metric, and shows that among HMP systems in which heterogeneity-aware schedulers are adopted and there are more than one LLC, the architecture where heterogeneous cores share LLCs gain better performance than the ones where homogeneous coresshare LLCs.

...read moreread less

14 citations

Proceedings Article•DOI•

A transparent and energy aware reconfigurable multiprocessor platform for simultaneous ILP and TLP exploitation

[...]

Mateus Beck Rutzig¹, Antonio Carlos Schneider Beck², Luigi Carro²•Institutions (2)

Universidade Federal de Santa Maria¹, Universidade Federal do Rio Grande do Sul²

18 Mar 2013

TL;DR: CReAMS is composed of multiple adaptive reconfigurable processors that simultaneously exploit Instruction and Thread Level Parallelism, and works in a transparent fashion, so binary compatibility is maintained, with no need to change the software development process or environment.

...read moreread less

Abstract: As the number of embedded applications increases, companies are launching new platforms within short periods of time to efficiently execute software with the lowest possible energy consumption. However, for each new platform deployment, new tool chains, with additional libraries, debuggers and compilers must come along, breaking binary compatibility. This strategy implies in high hardware and software redesign costs. In this scenario, we propose the exploitation of Custom Reconfigurable Arrays for Multiprocessor Systems (CReAMS). CReAMS is composed of multiple adaptive reconfigurable processors that simultaneously exploit Instruction and Thread Level Parallelism. It works in a transparent fashion, so binary compatibility is maintained, with no need to change the software development process or environment. We also show that CReAMS delivers higher performance per watt in comparison to a 4-issue Superscalar processor, when the same power budget is considered for both designs.

...read moreread less

14 citations

Journal Article•DOI•

Optimizing performance-per-watt on GPUs in high performance computing

[...]

Danny C. Price¹, M. A. Clark², B. R. Barsdell¹, R. Babich², Lincoln J. Greenhill¹ - Show less +1 more•Institutions (2)

Harvard University¹, Nvidia²

01 Nov 2016-Computer Science - Research and Development

TL;DR: In this paper, the authors investigated how GPU power consumption increases non-linearly with both temperature and supply voltage, as predicted by physical transistor models, and they showed that GPU supply voltage and clock frequency while maintaining a low die temperature increases the power efficiency of an NVIDIA K20 GPU.

...read moreread less

Abstract: The magnitude of the real-time digital signal processing challenge attached to large radio astronomical antenna arrays motivates use of high performance computing (HPC) systems. The need for high power efficiency at remote observatory sites parallels that in HPC broadly, where efficiency is a critical metric. We investigate how the performance-per-watt of graphics processing units (GPUs) is affected by temperature, core clock frequency and voltage. Our results highlight how the underlying physical processes that govern transistor operation affect power efficiency. In particular, we show experimentally that GPU power consumption increases non-linearly (quadratic) with both temperature and supply voltage, as predicted by physical transistor models. We show lowering GPU supply voltage and increasing clock frequency while maintaining a low die temperature increases the power efficiency of an NVIDIA K20 GPU by up to 37---48 % over default settings when running xGPU, a compute-bound code used in radio astronomy. We discuss how automatic temperature-aware and application-dependent voltage and frequency scaling (T-DVFS and A-DVFS) may provide a mechanism to achieve better power efficiency for a wider range of compute codes running on GPUs.

...read moreread less

14 citations

Collapse

Network Information

Performance

Metrics

315

Papers

6,353

Citations

No. of papers in the topic in previous years
Year	Papers
2021	14
2020	15
2019	15
2018	36
2017	25
2016	31

Performance per watt

Papers published on a yearly basis

Papers

Network Information

Related Topics (5)

Performance

Metrics