Topic

Performance per watt

About: Performance per watt is a research topic. Over the lifetime, 315 publications have been published within this topic receiving 5778 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

High performance biological pairwise sequence alignment: FPGA versus GPU versus cell BE versus GPP

[...]

Khaled Benkrid¹, Ali Akoglu², Cheng Ling¹, Yang Song¹, Ying Liu¹, Xiang Tian¹ - Show less +2 more•Institutions (2)

University of Edinburgh¹, University of Arizona²

01 Jan 2012-International Journal of Reconfigurable Computing

TL;DR: The paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBM's Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm.

...read moreread less

Abstract: This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBM's Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools), FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs.

...read moreread less

58 citations

Book Chapter•DOI•

Power management in grid computing with xen

[...]

Fabien Hermenier¹, Nicolas Loriant¹, Jean-Marc Menaud¹•Institutions (1)

École des mines de Nantes¹

04 Dec 2006

TL;DR: This paper proposes a workload concentration strategy to reduce grid power consumption using the Xen virtual machine migration technology, and shows that this policy decreases the overall power consumption of the grid significantly.

...read moreread less

Abstract: While chip vendors still stick to Moore's law, and the performance per dollar keeps going up, the performance per watt has been stagnant for last few years. Moreover energy prices continue to rise worldwide. This poses a major challenge to organisations running grids, indeed such architectures require large cooling systems. Indeed the one-year cost of a cooling system and of the power consumption may outfit the grid initial investment. We observe, however, that a grid does not constantly run at peak performance. In this paper, we propose a workload concentration strategy to reduce grid power consumption. Using the Xen virtual machine migration technology, our power management policy can dispatch transparently and dynamically any applications of the grid. Our policy concentrates the workload to shutdown nodes that are unused with a negligible impact on performance. We show through evaluations that this policy decreases the overall power consumption of the grid significantly.

...read moreread less

57 citations

Proceedings Article•DOI•

Physical aware frequency selection for dynamic thermal management in multi-core systems

[...]

Rajarshi Mukherjee¹, Seda Ogrenci Memik²•Institutions (2)

Synopsys¹, Northwestern University²

05 Nov 2006

TL;DR: This paper presents two mechanisms to perform frequency scaling as part of dynamic frequency and voltage scaling (DVFS) to assist dynamic thermal management (DTM) and shows that their technique is extremely fast and is suited for real time thermal management.

...read moreread less

Abstract: In order to maintain performance per Watt in microprocessors, there is a shift towards the chip level multiprocessing paradigm Microprocessor manufacturers are experimenting with tens of cores, forecasting the arrival of hundreds of cores per single processor die in the near future With such large-scale integration and increasing power densities, thermal management continues to be a significant design effort to maintain performance and reliability in modern process technologies In this paper, we present two mechanisms to perform frequency scaling as part of Dynamic Frequency and Voltage Scaling (DVFS) to assist Dynamic Thermal Management (DTM) Our frequency selection algorithms incorporate the physical interaction of the cores on a large-scale system onto the emergency intervention mechanisms for temperature reduction of the hotspot, while aiming to minimize the performance impact of frequency scaling on the core that is in thermal emergency Our results show that our algorithm consistently succeeds in maximizing the operating frequency of the most critical core while successfully relieving the thermal emergency of the core A comparison of our two alternative techniques reveals that our physical aware criticality-based algorithm results in 117% faster clock frequencies compared to our aggressive scaling algorithm We also show that our technique is extremely fast and is suited for real time thermal management

...read moreread less

57 citations

Journal Article•DOI•

Sparc64 VIIIfx: A New-Generation Octocore Processor for Petascale Computing

[...]

Takumi Maruyama¹, Toshio Yoshida¹, Ryuji Kan¹, Iwao Yamazaki¹, Shuji Yamamura¹, Noriyuki Takahashi¹, Mikio Hondou¹, Hiroshi Okano¹ - Show less +4 more•Institutions (1)

Fujitsu¹

01 Mar 2010-IEEE Micro

TL;DR: The Sparc64 VIIIfx eight-core processor, developed for use in petascale computing systems, runs at speeds of up to 2 GHz and achieves a peak performance of 128 gigaflops while consuming as little as 58 watts of power.

...read moreread less

Abstract: The Sparc64 VIIIfx eight-core processor, developed for use in petascale computing systems, runs at speeds of up to 2 GHz and achieves a peak performance of 128 gigaflops while consuming as little as 58 watts of power. Sparc64 VIIIfx realizes a six-fold improvement in performance per watt over previous generation Sparc64 processors.

...read moreread less

56 citations

Proceedings Article•DOI•

Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS

[...]

Qing Jiao¹, Mian Lu², Huynh Phung Huynh², Tulika Mitra¹•Institutions (2)

National University of Singapore¹, Institute of High Performance Computing Singapore²

07 Feb 2015

TL;DR: This paper design and experimentally validate power-performance models to carefully select the appropriate kernel combinations to be executed concurrently, the relative contributions of the kernels to the thread mix, along with the frequency choices for the cores and the memory to achieve high performance per watt metric.

...read moreread less

Abstract: Current generation GPUs can accelerate high-performance, compute-intensive applications by exploiting massive thread-level parallelism. The high performance, however, comes at the cost of increased power consumption. Recently, commercial GPGPU architectures have introduced support for concurrent kernel execution to better utilize the computational/memory resources and thereby improve overall throughput. In this paper, we argue and experimentally validate the benefits of concurrent kernels towards energy-efficient execution. We design power-performance models to carefully select the appropriate kernel combinations to be executed concurrently, the relative contributions of the kernels to the thread mix, along with the frequency choices for the cores and the memory to achieve high performance per watt metric. Our experimental evaluation shows that the concurrent kernel execution in combination with DVFS can improve energy-efficiency by up to 34.5% compared to the most energy-efficient sequential execution.

...read moreread less

52 citations

Collapse

Network Information

Performance

Metrics

315

Papers

6,353

Citations

No. of papers in the topic in previous years
Year	Papers
2021	14
2020	15
2019	15
2018	36
2017	25
2016	31

Performance per watt

Papers published on a yearly basis

Papers

Network Information

Related Topics (5)

Performance

Metrics