scispace - formally typeset
Search or ask a question
Topic

Performance per watt

About: Performance per watt is a research topic. Over the lifetime, 315 publications have been published within this topic receiving 5778 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBM's Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm.
Abstract: This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBM's Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools), FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs.

58 citations

Book ChapterDOI
04 Dec 2006
TL;DR: This paper proposes a workload concentration strategy to reduce grid power consumption using the Xen virtual machine migration technology, and shows that this policy decreases the overall power consumption of the grid significantly.
Abstract: While chip vendors still stick to Moore's law, and the performance per dollar keeps going up, the performance per watt has been stagnant for last few years. Moreover energy prices continue to rise worldwide. This poses a major challenge to organisations running grids, indeed such architectures require large cooling systems. Indeed the one-year cost of a cooling system and of the power consumption may outfit the grid initial investment. We observe, however, that a grid does not constantly run at peak performance. In this paper, we propose a workload concentration strategy to reduce grid power consumption. Using the Xen virtual machine migration technology, our power management policy can dispatch transparently and dynamically any applications of the grid. Our policy concentrates the workload to shutdown nodes that are unused with a negligible impact on performance. We show through evaluations that this policy decreases the overall power consumption of the grid significantly.

57 citations

Proceedings ArticleDOI
05 Nov 2006
TL;DR: This paper presents two mechanisms to perform frequency scaling as part of dynamic frequency and voltage scaling (DVFS) to assist dynamic thermal management (DTM) and shows that their technique is extremely fast and is suited for real time thermal management.
Abstract: In order to maintain performance per Watt in microprocessors, there is a shift towards the chip level multiprocessing paradigm Microprocessor manufacturers are experimenting with tens of cores, forecasting the arrival of hundreds of cores per single processor die in the near future With such large-scale integration and increasing power densities, thermal management continues to be a significant design effort to maintain performance and reliability in modern process technologies In this paper, we present two mechanisms to perform frequency scaling as part of Dynamic Frequency and Voltage Scaling (DVFS) to assist Dynamic Thermal Management (DTM) Our frequency selection algorithms incorporate the physical interaction of the cores on a large-scale system onto the emergency intervention mechanisms for temperature reduction of the hotspot, while aiming to minimize the performance impact of frequency scaling on the core that is in thermal emergency Our results show that our algorithm consistently succeeds in maximizing the operating frequency of the most critical core while successfully relieving the thermal emergency of the core A comparison of our two alternative techniques reveals that our physical aware criticality-based algorithm results in 117% faster clock frequencies compared to our aggressive scaling algorithm We also show that our technique is extremely fast and is suited for real time thermal management

57 citations

Journal ArticleDOI
TL;DR: The Sparc64 VIIIfx eight-core processor, developed for use in petascale computing systems, runs at speeds of up to 2 GHz and achieves a peak performance of 128 gigaflops while consuming as little as 58 watts of power.
Abstract: The Sparc64 VIIIfx eight-core processor, developed for use in petascale computing systems, runs at speeds of up to 2 GHz and achieves a peak performance of 128 gigaflops while consuming as little as 58 watts of power. Sparc64 VIIIfx realizes a six-fold improvement in performance per watt over previous generation Sparc64 processors.

56 citations

Proceedings ArticleDOI
07 Feb 2015
TL;DR: This paper design and experimentally validate power-performance models to carefully select the appropriate kernel combinations to be executed concurrently, the relative contributions of the kernels to the thread mix, along with the frequency choices for the cores and the memory to achieve high performance per watt metric.
Abstract: Current generation GPUs can accelerate high-performance, compute-intensive applications by exploiting massive thread-level parallelism. The high performance, however, comes at the cost of increased power consumption. Recently, commercial GPGPU architectures have introduced support for concurrent kernel execution to better utilize the computational/memory resources and thereby improve overall throughput. In this paper, we argue and experimentally validate the benefits of concurrent kernels towards energy-efficient execution. We design power-performance models to carefully select the appropriate kernel combinations to be executed concurrently, the relative contributions of the kernels to the thread mix, along with the frequency choices for the cores and the memory to achieve high performance per watt metric. Our experimental evaluation shows that the concurrent kernel execution in combination with DVFS can improve energy-efficiency by up to 34.5% compared to the most energy-efficient sequential execution.

52 citations

Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
81% related
Benchmark (computing)
19.6K papers, 419.1K citations
80% related
Programming paradigm
18.7K papers, 467.9K citations
77% related
Compiler
26.3K papers, 578.5K citations
77% related
Scalability
50.9K papers, 931.6K citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202114
202015
201915
201836
201725
201631