Topic

Performance per watt

About: Performance per watt is a research topic. Over the lifetime, 315 publications have been published within this topic receiving 5778 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

APOGEE: adaptive prefetching on GPUs for energy efficiency

[...]

Ankit Sethia¹, Ganesh Dasika, Mehrzad Samadi¹, Scott Mahlke¹•Institutions (1)

University of Michigan¹

07 Oct 2013

TL;DR: APOGEE as mentioned in this paper uses adjacent threads to more efficiently identify address patterns and dynamically adapt the timeliness of prefetching, which reduces thread contexts to hide memory latency and thus sustain performance.

...read moreread less

Abstract: Modern graphics processing units (GPUs) combine large amounts of parallel hardware with fast context switching among thousands of active threads to achieve high performance. However, such designs do not translate well to mobile environments where power constraints often limit the amount of hardware. In this work, we investigate the use of prefetching as a means to increase the energy efficiency of GPUs. Classically, CPU prefetching results in higher performance but worse energy efficiency due to unnecessary data being brought on chip. Our approach, called APOGEE, uses an adaptive mechanism to dynamically detect and adapt to the memory access patterns found in both graphics and scientific applications that are run on modern GPUs to achieve prefetching efficiencies of over 90%. Rather than examining threads in isolation, APOGEE uses adjacent threads to more efficiently identify address patterns and dynamically adapt the timeliness of prefetching. The net effect of APOGEE is that fewer thread contexts are necessary to hide memory latency and thus sustain performance. This reduction in thread contexts and related hardware translates to simplification of hardware and leads to a reduction in power. For Graphics and GPGPU applications, APOGEE enables an 8X reduction in multi-threading hardware, while providing a performance benefit of 19%. This translates to a 52% increase in performance per watt over systems with high multi-threading and 33% over existing GPU prefetching techniques.

...read moreread less

43 citations

Journal Article•DOI•

Cloud-Sea Computing Systems： Towards Thousand-Fold Improvement in Performance per Watt for the Coming Zettabyte Era

[...]

Zhiwei Xu¹•Institutions (1)

Chinese Academy of Sciences¹

23 Mar 2014-Journal of Computer Science and Technology

TL;DR: This special section contains 12 papers produced by the Cloud-Sea Computing Systems project team, presenting research results relating to sensing and REST 2.0, the elastic processor, the hyperparallel server, and the cloud-sea storage.

...read moreread less

Abstract: We are entering a new era of computing, characterized by the need to handle over one zettabyte (1021 bytes, or ZB) of data. The world’s capacities to sense, transmit, store, and process information need to grow three orders of magnitude, while maintain an energy consumption level similar to that of the year 2010. In other words, we need to produce thousand-fold improvement in performance per watt. To face this challenge, in 2012 the Chinese Academy of Sciences launched a 10-year strategic priority research initiative called the Next Generation Information and Communication Technology initiative (the NICT initiative). A research thrust of the NICT program is the Cloud-Sea Computing Systems project. The main idea is to augment conventional cloud computing by cooperation and integration of the cloud-side systems and the sea-side systems, where the "sea-side" refers to an augmented client side consisting of human facing and physical world facing devices and subsystems. The Cloud-Sea Computing Systems project consists of four research tasks: a new computing model called REST 2.0 which extends the REST (representational state transfer) architectural style of Web computing to cloud-sea computing, a three-tier storage system architecture capable of managing ZB of data, a billion-thread datacenter server with high energy efficiency, and an elastic processor aiming at energy efficiency of one trillion operations per second per watt. This special section contains 12 papers produced by the Cloud-Sea Computing Systems project team, presenting research results relating to sensing and REST 2.0, the elastic processor, the hyperparallel server, and the cloud-sea storage.

...read moreread less

42 citations

Proceedings Article•DOI•

Performance Per Watt Benefits of Dynamic Core Morphing in Asymmetric Multicores

[...]

Rance Rodrigues¹, Arunachalam Annamalai¹, Israel Koren¹, Sandip Kundu¹, Omer Khan¹ - Show less +1 more•Institutions (1)

University of Massachusetts Amherst¹

10 Oct 2011

TL;DR: This work proposes a heterogeneous multicore architecture with a Dynamic Core Morphing (DCM) capability, and shows that dynamic morphing of cores can provide performance/watt gains of 43% and 16% on an average, when compared to the homogeneous and baseline heterogeneous configurations, respectively.

...read moreread less

Abstract: The trend toward multicore processors is moving the emphasis in computation from sequential to parallel processing. However, not all applications can be parallelized and benefit from multiple cores. Such applications lead to under-utilization of parallel resources, hence sub-optimal performance/watt. They may however, benefit from powerful uniprocessors. On the other hand, not all applications can take advantage of more powerful uniprocessors. To address competing requirements of diverse applications, we propose a heterogeneous multicore architecture with a Dynamic Core Morphing (DCM) capability. Depending on the computational demands of the currently executing applications, the resources of a few tightly coupled cores are morphed at runtime. We present a simple hardware-based algorithm to monitor the time-varying computational needs of the application and when deemed beneficial, trigger reconfiguration of the cores at fine-grain time scales to maximize the performance/watt of the application. The proposed dynamic scheme is then compared against a baseline static heterogeneous multicore configuration and an equivalent homogeneous configuration. Our results show that dynamic morphing of cores can provide performance/watt gains of 43% and 16% on an average, when compared to the homogeneous and baseline heterogeneous configurations, respectively.

...read moreread less

42 citations

Proceedings Article•DOI•

A Comparative Study on ASIC, FPGAs, GPUs and General Purpose Processors in the O(N^2) Gravitational N-body Simulation

[...]

Tsuyoshi Hamada¹, Khaled Benkrid², Keigo Nitadori, Makoto Taiji•Institutions (2)

Nagasaki University¹, University of Edinburgh²

29 Jul 2009

TL;DR: Results show that, for gravitational force calculation and many-body simulations in general, GPUs are very competitive in terms of performance and performance per dollar figures, whereas FPGAs are competitive in Terms of performance per Watt figures.

...read moreread less

Abstract: In this paper, we describe the implementation of gravitational force calculation for N-body simulations in the context of Astrophysics. It will describe high performance implementations on general purpose processors, GPUs, and FPGAs, and compare them using a number of criteria including speed performance, power efficiency and cost of development. These results show that, for gravitational force calculation and many-body simulations in general, GPUs are very competitive in terms of performance and performance per dollar figures, whereas FPGAs are competitive in terms of performance per Watt figures.

...read moreread less

39 citations

Proceedings Article•DOI•

Comparing the power and performance of Intel's SCC to state-of-the-art CPUs and GPUs

[...]

Ehsan Totoni¹, Babak Behzad¹, Swapnil Ghike¹, Josep Torrellas¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Apr 2012

TL;DR: The results show that the GPGPU has outstanding results in performance, power consumption and energy efficiency for many applications, but it requires significant programming effort and is not general enough to show the same level of efficiency for all the applications.

...read moreread less

Abstract: Power dissipation and energy consumption are becoming increasingly important architectural design constraints in different types of computers, from embedded systems to large-scale supercomputers. To continue the scaling of performance, it is essential that we build parallel processor chips that make the best use of exponentially increasing numbers of transistors within the power and energy budgets. Intel SCC is an appealing option for future many-core architectures. In this paper, we use various scalable applications to quantitatively compare and analyze the performance, power consumption and energy efficiency of different cutting-edge platforms that differ in architectural build. These platforms include the Intel Single-Chip Cloud Computer (SCC) many-core, the Intel Core i7 general-purpose multi-core, the Intel Atom low-power processor, and the Nvidia ION2 GPGPU. Our results show that the GPGPU has outstanding results in performance, power consumption and energy efficiency for many applications, but it requires significant programming effort and is not general enough to show the same level of efficiency for all the applications. The “light-weight” many-core presents an opportunity for better performance per watt over the “heavy-weight” multi-core, although the multi-core is still very effective for some sophisticated applications. In addition, the low-power processor is not necessarily energy-efficient, since the runtime delay effect can be greater than the power savings.

...read moreread less

39 citations

Collapse

Network Information

Performance

Metrics

315

Papers

6,353

Citations

No. of papers in the topic in previous years
Year	Papers
2021	14
2020	15
2019	15
2018	36
2017	25
2016	31

Performance per watt

Papers published on a yearly basis

Papers

Network Information

Related Topics (5)

Performance

Metrics