Topic

Performance per watt

About: Performance per watt is a research topic. Over the lifetime, 315 publications have been published within this topic receiving 5778 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

EnergySmart: Toward energy-efficient manycores for Near-Threshold Computing

[...]

Ulya R. Karpuzcu¹, Abhishek A. Sinkar², Nam Sung Kim², Josep Torrellas¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of Wisconsin-Madison²

23 Feb 2013

TL;DR: It is shown that, at NTC, a simple chip with a single Vdd domain can deliver a higher performance per watt than one with multiple Vdd domains.

...read moreread less

Abstract: While Near-Threshold Voltage Computing (NTC) is a promising approach to push back the manycore power wall, it suffers from a high sensitivity to parameter variations. One possible way to cope with variations is to use multiple on-chip voltage (Vdd) domains. However, this paper finds that such an approach is energy inefficient. Consequently, for NTC, we propose a manycore organization that has a single Vdd domain and relies on multiple frequency domains to tackle variation. We call it EnergySmart. For this approach to be competitive, it has to be paired with effective core assignment strategies and also support fine-grain (i.e., short-interval) DVFS. This paper shows that, at NTC, a simple chip with a single Vdd domain can deliver a higher performance per watt than one with multiple Vdd domains.

...read moreread less

70 citations

Proceedings Article•DOI•

Invited paper: Using OpenCL to evaluate the efficiency of CPUS, GPUS and FPGAS for information filtering

[...]

Doris Chen, Deshanand Singh

25 Oct 2012

TL;DR: This paper explores techniques that allow programmers to efficiently use FPGAs at a level of abstraction that is closer to traditional software-centric approaches by using the emerging parallel language, OpenCL.

...read moreread less

Abstract: The FPGA can be a tremendously efficient computational fabric for many applications. In particular, the performance to power ratios of FPGA make them attractive solutions to solve the problem of data centers that are constrained largely by power and cooling costs. However, the complexity of the FPGA design flow requires the programmer to understand cycle-accurate details of how data is moved and transformed through the fabric. In this paper, we explore techniques that allow programmers to efficiently use FPGAs at a level of abstraction that is closer to traditional software-centric approaches by using the emerging parallel language, OpenCL. Although the field of high level synthesis has evolved greatly in the last few decades, several fundamental parts were missing from the complete software abstraction of the FPGA. These include standard and portable methods of describing HW/SW codesign, memory hierarchy, data movement and control of parallelism. We believe that OpenCL addresses all of these issues and allows for highly efficient description of FPGA designs with a higher level of abstraction. We demonstrate this premise by examining the performance of a document filtering algorithm, implemented in OpenCL and automatically compiled to a Stratix IV 530 FPGA. We show that our implementation achieves 5.5× and 5.25× better performance per watt ratios than GPU and CPU implementations, respectively.

...read moreread less

63 citations

Proceedings Article•DOI•

Hybrid dynamic energy and thermal management in heterogeneous embedded multiprocessor SoCs

[...]

Shervin Sharifi¹, Ayse K. Coskun², Tajana Rosing¹•Institutions (2)

University of California, San Diego¹, Boston University²

18 Jan 2010

TL;DR: This work proposes a joint thermal and energy management technique specifically designed for heterogeneous MPSoCs that simultaneously reduces the thermal hot spots, temperature gradients, and energy consumption significantly.

...read moreread less

Abstract: Heterogeneous multiprocessor system-on-chips (MPSoCs) which consist of cores with various power and performance characteristics can customize their configuration to achieve higher performance per Watt. On the other hand, inherent imbalance in power densities across MPSoCs leads to non-uniform temperature distributions, which affect performance and reliability adversely. In addition, managing temperature might result in conflicting decisions with achieving higher energy efficiency. In this work, we propose a joint thermal and energy management technique specifically designed for heterogeneous MPSoCs. Our technique identifies the performance demands of the current workload. By utilizing job scheduling and voltage/frequency scaling dynamically, we meet the desired performance while minimizing the energy consumption and the thermal imbalance. In comparison to performance-aware policies such as load balancing, our technique simultaneously reduces the thermal hot spots, temperature gradients, and energy consumption significantly.

...read moreread less

61 citations

Proceedings Article•DOI•

A Step towards Energy Efficient Computing: Redesigning a Hydrodynamic Application on CPU-GPU

[...]

Tingxing Dong¹, Veselin Dobrev², Tzanio V. Kolev², R. Rieben², Stanimire Tomov¹, Jack Dongarra¹ - Show less +2 more•Institutions (2)

University of Tennessee¹, Lawrence Livermore National Laboratory²

19 May 2014

TL;DR: This paper discusses efforts to redesign the most computation intensive parts of BLAST, an application that solves the equations for compressible hydrodynamics with high order finite elements, using GPUs BLast, Dobrev, and proposes an auto tuning technique to adapt the CUDA kernels to the orders of the finite element method.

...read moreread less

Abstract: Power and energy consumption are becoming an increasing concern in high performance computing. Compared to multi-core CPUs, GPUs have a much better performance per watt. In this paper we discuss efforts to redesign the most computation intensive parts of BLAST, an application that solves the equations for compressible hydrodynamics with high order finite elements, using GPUs BLAST, Dobrev. In order to exploit the hardware parallelism of GPUs and achieve high performance, we implemented custom linear algebra kernels. We intensively optimized our CUDA kernels by exploiting the memory hierarchy, which exceed the vendor's library routines substantially in performance. We proposed an auto tuning technique to adapt our CUDA kernels to the orders of the finite element method. Compared to a previous base implementation, our redesign and optimization lowered the energy consumption of the GPU in two aspects: 60% less time to solution and 10% less power required. Compared to the CPU-only solution, our GPU accelerated BLAST obtained a 2.5× overall speedup and 1.42× energy efficiency (green up) using 4th order (Q_4) finite elements, and a 1.9× speedup and 1.27× green up using 2nd order (Q2) finite elements.

...read moreread less

59 citations

Proceedings Article•DOI•

Power-Performance Modelling of Mobile Gaming Workloads on Heterogeneous MPSoCs

[...]

Anuj Pathania¹, Alexandru Eugen Irimiea², Alok Prakash², Tulika Mitra²•Institutions (2)

Karlsruhe Institute of Technology¹, National University of Singapore²

07 Jun 2015

TL;DR: Based on power-performance models developed, an efficient power management strategy is proposed and implemented on an Odroid-XU+E mobile platform and shows that it provides on average 20% increase in performance per watt when compared to the state-of-the-art.

...read moreread less

Abstract: Games have emerged as one of the most popular applications on mobile platforms. Recent platforms are now equipped with Heterogeneous Multiprocessor System-on-Chips (HMPSoCs) tightly integrating CPUs and GPUs on the same chip. This configuration enables high-end gaming on the platform but at the cost of high power consumption rapidly draining the underlying limited-capacity battery. The HMPSoCs are capable of independent Dynamic Voltage and Frequency Scaling (DVFS) for CPUs and GPUs for reduction in platform's power consumption. State-of-the-art power manager for mobile games on HMPSoCs oversimplifies the complex CPU-GPU interplay. In this paper, we develop power-performance models predicting the impact of DVFS on mobile gaming workloads. Based on our models, we propose an efficient power management strategy and implement it on an Odroid-XU+E mobile platform. Measurements on the platform show that our power manager provides on average 20% increase in performance per watt when compared to the state-of-the-art.

...read moreread less

58 citations

Collapse

Network Information

Performance

Metrics

315

Papers

6,353

Citations

No. of papers in the topic in previous years
Year	Papers
2021	14
2020	15
2019	15
2018	36
2017	25
2016	31

Performance per watt

Papers published on a yearly basis

Papers

Network Information

Related Topics (5)

Performance

Metrics