Topic

Performance per watt

About: Performance per watt is a research topic. Over the lifetime, 315 publications have been published within this topic receiving 5778 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Autonomic Workload and Resources Management of Cloud Computing Services

[...]

Farah Fargo¹, Cihan Tunc¹, Youssif Al-Nashif¹, Ali Akoglu¹, Salim Hariri¹ - Show less +1 more•Institutions (1)

University of Arizona¹

08 Sep 2014

TL;DR: This paper presents an autonomic power and performance management method for cloud systems in order to dynamically match the application requirements with "just-enough" system resources at runtime that lead to significant power reduction while meeting the quality of service requirements of the cloud applications.

...read moreread less

Abstract: The power consumption of data centers and cloud systems have increased almost three times between 2007 and 2012. Over-provisioning techniques are typically used for meeting the peak workloads. In this paper we present an autonomic power and performance management method for cloud systems in order to dynamically match the application requirements with "just-enough" system resources at runtime that lead to significant power reduction while meeting the quality of service requirements of the cloud applications. Our solution offers the following capabilities: 1) real-time monitoring of the cloud resources and workload behavior running on virtual machines (VMs), 2) determine the current operating point of both workloads and the VMs running these workloads, 3) characterize workload behavior and predict the next operating point for the VMs, 4) dynamically manage the VM resources (scaling up and down the number of cores, CPU frequency, and memory amount) at run time, and 5) assign available cloud resources that can guarantee optimal power consumption without sacrificing the QoS requirements of cloud workloads. We validate the performance of our approach using the RUB is benchmark, an auction model emulating eBay transactions that generates a wide range of workloads (such as browsing and bidding with different number of clients). Our experimental results show that our approach can lead to reduction in power consumption up to 87% when compared to the static resource allocation strategy, 72% compared to adaptive frequency scaling strategy and 66% compared to a similar multi-resource management strategy.

...read moreread less

25 citations

Proceedings Article•DOI•

Fleet: A Framework for Massively Parallel Streaming on FPGAs

[...]

James J. Thomas¹, Pat Hanrahan¹, Matei Zaharia¹•Institutions (1)

Stanford University¹

09 Mar 2020

TL;DR: Fleet is presented, a framework that offers a massively parallel streaming model for FPGAs and is effective in a number of domains well-suited to FPGA acceleration, including parsing, compression, and machine learning.

...read moreread less

Abstract: We present Fleet, a framework that offers a massively parallel streaming model for FPGAs and is effective in a number of domains well-suited for FPGA acceleration, including parsing, compression, and machine learning. Fleet requires the user to specify RTL for a processing unit that serially processes every input token in a stream, a far simpler task than writing a parallel processing unit. It then takes the user's processing unit and generates a hardware design with many copies of the unit as well as memory controllers to feed the units with separate streams and drain their outputs. Fleet includes a Chisel-based processing unit language. The language maintains Chisel's low-level performance control while adding a few productivity features, including automatic handling of ready-valid signaling and a native and automatically pipelined BRAM type. We evaluate Fleet on six different applications, including JSON parsing and integer compression, fitting hundreds of Fleet processing units on the Amazon F1 FPGA and outperforming CPU implementations by over 400x and GPU implementations by over 9x in performance per watt while requiring a similar number of lines of code.

...read moreread less

25 citations

Journal Article•DOI•

CASCADE: High Throughput Data Streaming via Decoupled Access-Execute CGRA

[...]

Dhananjaya Wijerathne¹, Zhaoying Li¹, Manupa Karunarathne¹, Anuj Pathania¹, Tulika Mitra¹ - Show less +1 more•Institutions (1)

National University of Singapore¹

07 Oct 2019-ACM Transactions in Embedded Computing Systems

TL;DR: This work proposes a novel decoupled access-execute CGRA design called CASCADE with full architecture and compiler support for high-throughput data streaming from an on-chip multi-bank memory.

...read moreread less

Abstract: A Coarse-Grained Reconfigurable Array (CGRA) is a promising high-performance low-power accelerator for compute-intensive loop kernels. While the mapping of the computations on the CGRA is a well-studied problem, bringing the data into the array at a high throughput remains a challenge. A conventional CGRA design involves on-array computations to generate memory addresses for data access undermining the attainable throughput. A decoupled access-execute architecture, on the other hand, isolates the memory access from the actual computations resulting in a significantly higher throughput.We propose a novel decoupled access-execute CGRA design called CASCADE with full architecture and compiler support for high-throughput data streaming from an on-chip multi-bank memory. CASCADE offloads the address computations for the multi-bank data memory access to a custom designed programmable hardware. An end-to-end fully-automated compiler synchronizes the conflict-free movement of data between the memory banks and the CGRA. Experimental evaluations show on average 3× performance benefit and 2.2× performance per watt improvement for CASCADE compared to an iso-area conventional CGRA with a bigger processing array in lieu of a dedicated hardware memory address generation logic.

...read moreread less

25 citations

Proceedings Article•DOI•

Extending the Power-Efficiency and Performance of Photonic Interconnects for Heterogeneous Multicores with Machine Learning

[...]

Scott Van Winkle¹, Avinash Kodi¹, Razvan Bunescu¹, Ahmed Louri²•Institutions (2)

Ohio University¹, George Washington University²

01 Feb 2018

TL;DR: This paper proposes photonic interconnects for heterogeneous multicores using a checkerboard pattern that clusters CPU-GPU cores together and implements bandwidth reconfiguration using local router information without global coordination and proposes a dynamic laser scaling technique that predicts the power level for the next epoch using the buffer occupancy of previous epoch.

...read moreread less

Abstract: As communication energy exceeds computation energy in future technologies, traditional on-chip electrical interconnects face fundamental challenges in the many-core era. Photonic interconnects have been proposed as a disruptive technology solution due to superior performance per Watt, distance independent energy consumption and CMOS compatibility for on-chip interconnects. Static power due to the laser being always switched on, varying link utilization due to spatial and temporal traffic fluctuations and thermal sensitivity are some of the critical challenges facing photonics interconnects. In this paper, we propose photonic interconnects for heterogeneous multicores using a checkerboard pattern that clusters CPU-GPU cores together and implements bandwidth reconfiguration using local router information without global coordination. To reduce the static power, we also propose a dynamic laser scaling technique that predicts the power level for the next epoch using the buffer occupancy of previous epoch. To further improve power-performance trade-offs, we also propose a regression-based machine learning technique for scaling the power of the photonic link. Our simulation results demonstrate a 34% performance improvement over a baseline electrical CMESH while consuming 25% less energy per bit when dynamically reallocating bandwidth. When dynamically scaling laser power, our buffer-based reactive and ML-based proactive prediction techniques show 40 - 65% in power savings with 0 - 14% in throughput loss depending on the reservation window size.

...read moreread less

24 citations

Proceedings Article•

Reconfigurable memory controller with programmable pattern support

[...]

Tassadaq Hussain¹, Miquel Pericas¹, Eduard Ayguadé Parra•Institutions (1)

Barcelona Supercomputing Center¹

01 Jan 2011

TL;DR: A programmable, pattern-based memory controller (PMC) that aims at improving the performance of heterogeneous or reconfigurable SoC devices, including scatter gather and strided 1D, 2D and 3D patterns.

...read moreread less

Abstract: Heterogeneous architectures are increasingly popular due to their flexibility and high performance per watt capability. A kind of heterogeneous architecture, reconfigurable systems-on-chip, offer high performance per watt through the reconfigurable logic and flexibility via multiprocessor cores. But in order to achieve the performance goals it is necessary to provide enough data to the accelerators. In this paper we describe a programmable, pattern-based memory controller (PMC) that aims at improving the performance of heterogeneous or reconfigurable SoC devices. These include scatter gather and strided 1D, 2D and 3D patterns. PMC can prefetch complete patterns into scratchpads that can then be accessed either by a microprocessor or by an accelerator. As a result, the microprocessors and accelerators can focus on computation and are relieved of having to perform address calculations. PMC has been implemented and tested on an ML505 evaluation board using the MicroBlaze softcore as the platform’s microprocessor. While PMC adds some latency, it improves performance by offloading the processor and by making better use of available bandwidths. The PMC provide 1.5x speed-ups with processor and 27x speed-ups achieved by using hardware accelerator in PMC SoC based environment while executing thresholding application.

...read moreread less

23 citations

Collapse

Network Information

Performance

Metrics

315

Papers

6,353

Citations

No. of papers in the topic in previous years
Year	Papers
2021	14
2020	15
2019	15
2018	36
2017	25
2016	31

Performance per watt

Papers published on a yearly basis

Papers

Network Information

Related Topics (5)

Performance

Metrics