Topic

Performance per watt

About: Performance per watt is a research topic. Over the lifetime, 315 publications have been published within this topic receiving 5778 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A queueing theoretic approach for performance evaluation of low-power multi-core embedded systems

[...]

Arslan Munir¹, Ann Gordon-Ross², Sanjay Ranka², Farinaz Koushanfar¹•Institutions (2)

Rice University¹, University of Florida²

01 Jan 2014-Journal of Parallel and Distributed Computing

TL;DR: Performance and power results indicate that multi-core embedded system architectures that leverage shared last-level caches provide the best LLC performance per watt but may introduce main memory response time and throughput bottlenecks for high cache miss rates, whereas architectures leveraging a hybrid of private and shared LLCs alleviate main memory bottlenECks at the expense of reduced performance per Watt.

...read moreread less

5 citations

Patent•

Power efficient stack of multicore microprocessors

[...]

Heller Jr Thomas J¹•Institutions (1)

IBM¹

16 Nov 2009

TL;DR: In this article, a stack of microprocessor chips that are designed to work together in a multiprocessor system is discussed, and the hypervisor or operating system controls the utilization of individual chips of a stack.

...read moreread less

Abstract: A computing system has a stack of microprocessor chips that are designed to work together in a multiprocessor system. The chips are interconnected with 3D through vias, or alternatively by compatible package carriers having the interconnections, while logically the chips in the stack are interconnected via specialized cache coherent interconnections. All of the chips in the stack use the same logical chip design, even though they can be easily personalized by setting specialized latches on the chips. One or more of the individual microprocessor chips utilized in the stack are implemented in a silicon process that is optimized for high performance while others are implemented in a silicon process that is optimized for power consumption i.e. for the best performance per Watt of electrical power consumed. The hypervisor or operating system controls the utilization of individual chips of a stack.

...read moreread less

5 citations

Journal Article•DOI•

Lightweight Chip Multi-Threading (LCMT): Maximizing Fine-Grained Parallelism On-Chip

[...]

Sheng Li¹, Shannon K. Kuntz¹, Jay B. Brockman¹, Peter M. Kogge¹•Institutions (1)

University of Notre Dame¹

01 Jul 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This work proposes a Lightweight Chip Multi-Threaded (LCMT) architecture that further exploits thread-level parallelism (TLP) by incorporating direct architectural support for an “unlimited” number of dynamically created lightweight threads with very low thread management and synchronization overhead.

...read moreread less

Abstract: Irregular and dynamic applications, such as graph problems and agent-based simulations, often require fine-grained parallelism to achieve good performance. However, current multicore processors only provide architectural support for coarse-grained parallelism, making it necessary to use software-based multithreading environments to effectively implement fine-grained parallelism. Although these software-based environments have demonstrated superior performance over heavyweight, OS-level threads, they are still limited by the significant overhead involved in thread management and synchronization. In order to address this, we propose a Lightweight Chip Multi-Threaded (LCMT) architecture that further exploits thread-level parallelism (TLP) by incorporating direct architectural support for an “unlimited” number of dynamically created lightweight threads with very low thread management and synchronization overhead. The LCMT architecture can be implemented atop a mainstream architecture with minimum extra hardware to leverage existing legacy software environments. We compare the LCMT architecture with a Niagara-like baseline architecture. Our results show up to 1.8X better scalability, 1.91X better performance, and more importantly, 1.74X better performance per watt, using the LCMT architecture for irregular and dynamic benchmarks, when compared to the baseline architecture. The LCMT architecture delivers similar performance to the baseline architecture for regular benchmarks.

...read moreread less

5 citations

Proceedings Article•DOI•

The Heterogeneous System Architecture: It's beyond the GPU

[...]

Paul Blinzer¹•Institutions (1)

Advanced Micro Devices¹

14 Jul 2014

TL;DR: The presentation gives the audience a high-level understanding of the goals of HSA, the HSA system architecture properties and its use models by system software, tools and applications.

...read moreread less

Abstract: Summary form only given. The use of GPUs in computation intensive tasks has an ever increasing impact across all platforms - including embedded - sometimes even used to create new forms of currency (Bitcoin, Litecoin, ...). And the exponential improvements in Performance per Watt gains are still ongoing unabated. At the same time, due to their “design heritage” as primarily 3D accelerators, GPUs have several properties that make it a SW challenge to unlock their full benefit in many real-world application scenarios, be it due to limiting API's (proprietary or limited functionality) or properties that require an advanced understanding of the platform architecture and managing the memory and other system resources, beyond the reach of the “average programmer”. The Heterogeneous System Architecture is established by the HSA Foundation to address many of the current shortcomings at a system architecture and programming model level while providing a great foundation for already established SW models, and in addition to the GPU allow extending the architecture to other specialty processors like DSPs, FPGAs and others to interoperate within the SW framework, a main task for the next level of work in the HSA Foundation. The HSA Foundation, a not-for-profit consortium of SOC and SOC IP vendors, OEMs, academia, OSVs and ISVs defining a consistent heterogeneous platform architecture to make it dramatically easier to program heterogeneous parallel devices like GPUs and other accelerators. The presentation gives the audience a high-level understanding of the goals of HSA, the HSA system architecture properties and its use models by system software, tools and applications.

...read moreread less

5 citations

Book Chapter•DOI•

Smith-Waterman Acceleration in Multi-GPUs: A Performance per Watt Analysis

[...]

Jesús Pérez Serrano¹, Edans Flavius de Oliveira Sandes², Alba Cristina Magalhaes Alves de Melo², Manuel Ujaldón¹•Institutions (2)

University of Málaga¹, University of Brasília²

26 Apr 2017

TL;DR: A performance per watt analysis of CUDAlign 4.0, a parallel strategy to obtain the optimal alignment of huge DNA sequences in multi-GPU platforms using the exact Smith-Waterman method demonstrates a good correlation between the performance attained and the extra energy required.

...read moreread less

Abstract: We present a performance per watt analysis of CUDAlign 4.0, a parallel strategy to obtain the optimal alignment of huge DNA sequences in multi-GPU platforms using the exact Smith-Waterman method. Speed-up factors and energy consumption are monitored on different stages of the algorithm with the goal of identifying advantageous scenarios to maximize acceleration and minimize power consumption. Experimental results using CUDA on a set of GeForce GTX 980 GPUs illustrate their capabilities as high-performance and low-power devices, with a energy cost to be more attractive when increasing the number of GPUs. Overall, our results demonstrate a good correlation between the performance attained and the extra energy required, even in scenarios where multi-GPUs do not show great scalability.

...read moreread less

5 citations

Collapse

Network Information

Performance

Metrics

315

Papers

6,353

Citations

No. of papers in the topic in previous years
Year	Papers
2021	14
2020	15
2019	15
2018	36
2017	25
2016	31

Performance per watt

Papers published on a yearly basis

Papers

Network Information

Related Topics (5)

Performance

Metrics