Topic

Performance per watt

About: Performance per watt is a research topic. Over the lifetime, 315 publications have been published within this topic receiving 5778 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Photonic Interconnects for Exascale and Datacenter Architectures

[...]

Avinash Kodi¹, Brian Neel², William C. Brantley²•Institutions (2)

Ohio University¹, Advanced Micro Devices²

25 Jul 2014-IEEE Micro

TL;DR: Results indicate that multitier topologies are comparable to the single-level dragonfly topology in terms of power and latency while providing higher bisection and reduced area overhead, albeit at higher packet latency owing to increased diameter.

...read moreread less

Abstract: Exascale and datacenter systems require terabits per second of internode communication bandwidth to meet the performance demands of high-performance computing applications. High-radix routers combined with scalable dragonfly topology have been proposed to reduce execution time and improve power dissipation. Although the dragonfly network has low diameter for exascale networks, fewer global links reduce the bisection bandwidth and require adaptive routing to prevent hot spots due to congestion. Moreover, the number of ports in a high-radix router affects the router cost when implemented with alternate emerging technologies. In this article, the authors advocate multitier network topologies that combine scalable topologies for local (intracabinet) and global (intercabinet) interconnects such as the k-ary n-cube, the flattened butterfly, and the dragonfly, to lead to improved bisection, manageable radix, and reduced link costs, albeit at higher packet latency owing to increased diameter. Because the performance per watt delivered by metallic interconnects or coaxial cables significantly exceeds the available power budget, we envision an entire exascale network composed of photonic links for communication and CMOS routers for switching. Results indicate that multitier topologies are comparable to the single-level dragonfly topology in terms of power and latency while providing higher bisection and reduced area overhead.

...read moreread less

7 citations

Journal Article•DOI•

Performance and energy aware scheduling simulator for HPC: evaluating different resource selection methods

[...]

César Gómez-Martín, Miguel A. Vega-Rodríguez¹, José-Luis González-Sánchez•Institutions (1)

University of Extremadura¹

10 Dec 2015-Concurrency and Computation: Practice and Experience

TL;DR: The usefulness of the simulator for this type of studies is demonstrated and it is concluded that the superior behavior of multiobjective algorithms makes them recommended for use in modern scheduling systems.

...read moreread less

Abstract: Today, in an energy-aware society, job scheduling is becoming an important task for computer engineers and system analysts that may lead to a performance per Watt trade-off of computing infrastructures. Thus, new algorithms, and a simulator of computing environments, may help information and communications technology and data center managers to make decisions with a solid experimental basis. There are several simulators that try to address performance and, somehow, estimate energy consumption, but there are none in which the energy model is based on benchmark data that have been countersigned by independent bodies such as the Standard Performance Evaluation Corporation. This is the reason why we have implemented a performance and energy-aware scheduling PEAS simulator for high-performance computing. Furthermore, to evaluate the simulator, we propose an implementation of the non-dominated sorting genetic algorithm-II NSGA-II algorithm, a fast and elitist multiobjective genetic algorithm, for the resource selection. With the help of the PEAS simulator, we have studied if it is possible to provide an intelligent job allocation policy that may be able to save energy and time without compromising performance. The results of our simulations show a great improvement in response time and power consumption. In most of the cases, NSGA-II performs better than other 'intelligent' algorithms like multiobjective heterogeneous earliest finish time and clearly outperforms the first-fit algorithm. We demonstrate the usefulness of the simulator for this type of studies and conclude that the superior behavior of multiobjective algorithms makes them recommended for use in modern scheduling systems. Copyright © 2015 John Wiley & Sons, Ltd.

...read moreread less

7 citations

Journal Article•DOI•

Voltage scaling and dark silicon in symmetric multicore processors

[...]

Hamid Nejatollahi¹, Mostafa E. Salehi¹•Institutions (1)

University of Tehran¹

01 Oct 2015-The Journal of Supercomputing

TL;DR: This paper proposes high-performance and energy-efficient multicore architectures for variety of parallelisms and memory-intensities in workloads and uses dynamic voltage and frequency scaling in Amdahl’s law to decrease amount of dark silicon and improve performance and performance per watt/joule.

...read moreread less

Abstract: As technology scales further, multicore and many-core processors emerge as an alternative to keep up with performance demands. However, because of power and thermal constraints, we are obliged to power off remarkable area of chip. Many innovative techniques have been presented to improve energy efficiency and maintain utilization at the highest level. In this paper, we discuss different models and methods of exploiting dark silicon, and by using dynamic voltage and frequency scaling in Amdahl's law and considering memory overheads, we attempt to decrease amount of dark silicon and improve performance and performance per watt/joule. We propose high-performance and energy-efficient multicore architectures for variety of parallelisms and memory-intensities in workloads. According to the results, by voltage scaling, for a highly parallel CPU-intensive workload, we reach improvements of approximately $$5.2{\times }$$5.2× and $$3.78{\times }$$3.78× in performance per watt and performance per joule, respectively, while about 27 % reduction of performance should be tolerated. For memory-intensive applications, a negligible change in speedup is detected by scaling, while performance per watt and performance per joule for both serial and parallel applications lead to around $$6{\times }$$6× enhancements.

...read moreread less

7 citations

Proceedings Article•DOI•

Power-efficient embedded processing with resilience and real-time constraints

[...]

Liang Wang¹, Augusto Vega², Alper Buyuktosunoglu², Pradip Bose², Kevin Skadron¹ - Show less +1 more•Institutions (2)

University of Virginia¹, IBM²

22 Jul 2015

TL;DR: This study examines a class of embedded system applications relevant to mobile vehicles to understand the limits of achievable energy efficiency under varying levels of system resilience constraints and considers static optimization of voltage-frequency settings on a per-application-segment basis.

...read moreread less

Abstract: Low-power embedded processing typically relies on dynamic voltage-frequency scaling (DVFS) in order to optimize energy usage (and therefore, battery life) However, low voltage operation exacerbates the incidence of soft errors Similarly, higher voltage operation (to meet real-time deadlines) is constrained by hard-failure rate limits In this paper, we examine a class of embedded system applications relevant to mobile vehicles We investigate the problem of assigning optimal voltage-frequency settings to individual segments within target workflows The goal of this study is to understand the limits of achievable energy efficiency (performance per watt) under varying levels of system resilience constraints To optimize for energy efficiency, we consider static optimization of voltage-frequency settings on a per-application-segment basis We consider both linear and graph-structured workflows In order to understand the loss in energy efficiency in the face of environmental uncertainties encountered by the mobile vehicle, we also study the effect of injecting random variations in the actual runtime of individual application segments A dynamic re-optimization of the voltage-frequency settings is required to cope with such in-field uncertainties

...read moreread less

7 citations

Proceedings Article•DOI•

A problem-based learning approach to GPU computing

[...]

Robert Geist¹, Joshua A. Levine¹, James Westall¹•Institutions (1)

Clemson University¹

15 Nov 2015

TL;DR: A course in GPU programming for senior undergraduates and first-year graduates that has been taught at Clemson University annually since 2010 is described, with focus on a large, real-world problem, in particular, a system for parallel solution of partial differential equations.

...read moreread less

Abstract: Compared to CPUs, modern GPUs exhibit a high ratio of computing performance per watt, and so current supercomputer designs often include multiple racks of GPUs in order to achieve high teraflop counts at minimal energy cost. GPU programming is thus becoming increasingly important, and yet it remains a challenging task. This paper describes a course in GPU programming for senior undergraduates and first-year graduates that has been taught at Clemson University annually since 2010. The course uses problem-based learning, with focus on a large, real-world problem, in particular, a system for parallel solution of partial differential equations. Although the system for solving PDEs is useful in its own right, the problem is used as a vehicle in which to explore design issues that face those attempting to achieve new levels of performance on architectures.

...read moreread less

7 citations

Collapse

Network Information

Performance

Metrics

315

Papers

6,353

Citations

No. of papers in the topic in previous years
Year	Papers
2021	14
2020	15
2019	15
2018	36
2017	25
2016	31

Performance per watt

Papers published on a yearly basis

Papers

Network Information

Related Topics (5)

Performance

Metrics