scispace - formally typeset
Search or ask a question
Topic

Performance per watt

About: Performance per watt is a research topic. Over the lifetime, 315 publications have been published within this topic receiving 5778 citations.


Papers
More filters
Proceedings ArticleDOI
07 Oct 2013
TL;DR: APOGEE as mentioned in this paper uses adjacent threads to more efficiently identify address patterns and dynamically adapt the timeliness of prefetching, which reduces thread contexts to hide memory latency and thus sustain performance.
Abstract: Modern graphics processing units (GPUs) combine large amounts of parallel hardware with fast context switching among thousands of active threads to achieve high performance. However, such designs do not translate well to mobile environments where power constraints often limit the amount of hardware. In this work, we investigate the use of prefetching as a means to increase the energy efficiency of GPUs. Classically, CPU prefetching results in higher performance but worse energy efficiency due to unnecessary data being brought on chip. Our approach, called APOGEE, uses an adaptive mechanism to dynamically detect and adapt to the memory access patterns found in both graphics and scientific applications that are run on modern GPUs to achieve prefetching efficiencies of over 90%. Rather than examining threads in isolation, APOGEE uses adjacent threads to more efficiently identify address patterns and dynamically adapt the timeliness of prefetching. The net effect of APOGEE is that fewer thread contexts are necessary to hide memory latency and thus sustain performance. This reduction in thread contexts and related hardware translates to simplification of hardware and leads to a reduction in power. For Graphics and GPGPU applications, APOGEE enables an 8X reduction in multi-threading hardware, while providing a performance benefit of 19%. This translates to a 52% increase in performance per watt over systems with high multi-threading and 33% over existing GPU prefetching techniques.

43 citations

Journal ArticleDOI
TL;DR: This special section contains 12 papers produced by the Cloud-Sea Computing Systems project team, presenting research results relating to sensing and REST 2.0, the elastic processor, the hyperparallel server, and the cloud-sea storage.
Abstract: We are entering a new era of computing, characterized by the need to handle over one zettabyte (1021 bytes, or ZB) of data. The world’s capacities to sense, transmit, store, and process information need to grow three orders of magnitude, while maintain an energy consumption level similar to that of the year 2010. In other words, we need to produce thousand-fold improvement in performance per watt. To face this challenge, in 2012 the Chinese Academy of Sciences launched a 10-year strategic priority research initiative called the Next Generation Information and Communication Technology initiative (the NICT initiative). A research thrust of the NICT program is the Cloud-Sea Computing Systems project. The main idea is to augment conventional cloud computing by cooperation and integration of the cloud-side systems and the sea-side systems, where the "sea-side" refers to an augmented client side consisting of human facing and physical world facing devices and subsystems. The Cloud-Sea Computing Systems project consists of four research tasks: a new computing model called REST 2.0 which extends the REST (representational state transfer) architectural style of Web computing to cloud-sea computing, a three-tier storage system architecture capable of managing ZB of data, a billion-thread datacenter server with high energy efficiency, and an elastic processor aiming at energy efficiency of one trillion operations per second per watt. This special section contains 12 papers produced by the Cloud-Sea Computing Systems project team, presenting research results relating to sensing and REST 2.0, the elastic processor, the hyperparallel server, and the cloud-sea storage.

42 citations

Proceedings ArticleDOI
10 Oct 2011
TL;DR: This work proposes a heterogeneous multicore architecture with a Dynamic Core Morphing (DCM) capability, and shows that dynamic morphing of cores can provide performance/watt gains of 43% and 16% on an average, when compared to the homogeneous and baseline heterogeneous configurations, respectively.
Abstract: The trend toward multicore processors is moving the emphasis in computation from sequential to parallel processing. However, not all applications can be parallelized and benefit from multiple cores. Such applications lead to under-utilization of parallel resources, hence sub-optimal performance/watt. They may however, benefit from powerful uniprocessors. On the other hand, not all applications can take advantage of more powerful uniprocessors. To address competing requirements of diverse applications, we propose a heterogeneous multicore architecture with a Dynamic Core Morphing (DCM) capability. Depending on the computational demands of the currently executing applications, the resources of a few tightly coupled cores are morphed at runtime. We present a simple hardware-based algorithm to monitor the time-varying computational needs of the application and when deemed beneficial, trigger reconfiguration of the cores at fine-grain time scales to maximize the performance/watt of the application. The proposed dynamic scheme is then compared against a baseline static heterogeneous multicore configuration and an equivalent homogeneous configuration. Our results show that dynamic morphing of cores can provide performance/watt gains of 43% and 16% on an average, when compared to the homogeneous and baseline heterogeneous configurations, respectively.

42 citations

Proceedings ArticleDOI
29 Jul 2009
TL;DR: Results show that, for gravitational force calculation and many-body simulations in general, GPUs are very competitive in terms of performance and performance per dollar figures, whereas FPGAs are competitive in Terms of performance per Watt figures.
Abstract: In this paper, we describe the implementation of gravitational force calculation for N-body simulations in the context of Astrophysics. It will describe high performance implementations on general purpose processors, GPUs, and FPGAs, and compare them using a number of criteria including speed performance, power efficiency and cost of development. These results show that, for gravitational force calculation and many-body simulations in general, GPUs are very competitive in terms of performance and performance per dollar figures, whereas FPGAs are competitive in terms of performance per Watt figures.

39 citations

Proceedings ArticleDOI
01 Apr 2012
TL;DR: The results show that the GPGPU has outstanding results in performance, power consumption and energy efficiency for many applications, but it requires significant programming effort and is not general enough to show the same level of efficiency for all the applications.
Abstract: Power dissipation and energy consumption are becoming increasingly important architectural design constraints in different types of computers, from embedded systems to large-scale supercomputers. To continue the scaling of performance, it is essential that we build parallel processor chips that make the best use of exponentially increasing numbers of transistors within the power and energy budgets. Intel SCC is an appealing option for future many-core architectures. In this paper, we use various scalable applications to quantitatively compare and analyze the performance, power consumption and energy efficiency of different cutting-edge platforms that differ in architectural build. These platforms include the Intel Single-Chip Cloud Computer (SCC) many-core, the Intel Core i7 general-purpose multi-core, the Intel Atom low-power processor, and the Nvidia ION2 GPGPU. Our results show that the GPGPU has outstanding results in performance, power consumption and energy efficiency for many applications, but it requires significant programming effort and is not general enough to show the same level of efficiency for all the applications. The “light-weight” many-core presents an opportunity for better performance per watt over the “heavy-weight” multi-core, although the multi-core is still very effective for some sophisticated applications. In addition, the low-power processor is not necessarily energy-efficient, since the runtime delay effect can be greater than the power savings.

39 citations

Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
81% related
Benchmark (computing)
19.6K papers, 419.1K citations
80% related
Programming paradigm
18.7K papers, 467.9K citations
77% related
Compiler
26.3K papers, 578.5K citations
77% related
Scalability
50.9K papers, 931.6K citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202114
202015
201915
201836
201725
201631