Topic

Performance per watt

About: Performance per watt is a research topic. Over the lifetime, 315 publications have been published within this topic receiving 5778 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

A Power Consumption Benchmark for Reasoners on Mobile Devices

[...]

Evan W. Patton¹, Deborah L. McGuinness¹•Institutions (1)

Rensselaer Polytechnic Institute¹

19 Oct 2014

TL;DR: A new methodology for benchmarking the performance per watt of semantic web reasoners and rule engines on smartphones is introduced to provide developers with information critical for deploying semantic web tools on power-constrained devices.

...read moreread less

Abstract: We introduce a new methodology for benchmarking the performance per watt of semantic web reasoners and rule engines on smartphones to provide developers with information critical for deploying semantic web tools on power-constrained devices. We validate our methodology by applying it to three well-known reasoners and rule engines answering queries on two ontologies with expressivities in RDFS and OWL DL. While this validation was conducted on smartphones running Google's Android operating system, our methodology is general and may be applied to different hardware platforms, reasoners, ontologies, and entire applications to determine performance relevant to power consumption. We discuss the implications of our findings for balancing tradeoffs of local computation versus communication costs for semantic technologies on mobile platforms, sensor networks, the Internet of Things, and other power-constrained environments.

...read moreread less

16 citations

Journal Article•DOI•

A chip multithreaded processor for network-facing workloads

[...]

Sanjiv Kapil¹, H. McGhan¹, Jesse Lawrendra¹•Institutions (1)

Sun Microsystems¹

01 Mar 2004-IEEE Micro

TL;DR: The processor described is a member of Sun's first generation of CMT processors designed to efficiently execute network-facing workloads, and its dual-thread execution capability, compact die size, and minimal power consumption combine to produce high throughput performance per watt, per transistor, and per square millimeter of die area.

...read moreread less

Abstract: Throughput computing is based on chip multithreading processor design technology. In CMT technology, maximizing the amount of work accomplished per unit of time or other relevant resource, rather than minimizing the time needed to complete a given task or set of tasks, defines performance. By CMT standards, the best processor accomplishes the most work per second of time, per watt of expended power, per square millimeter of die area, and so on (that is, it operates most efficiently). The processor described is a member of Sun's first generation of CMT processors designed to efficiently execute network-facing workloads. Network-facing systems primarily service network clients and are often grouped together under die label "Web servers". The processor's dual-thread execution capability, compact die size, and minimal power consumption combine to produce high throughput performance per watt, per transistor, and per square millimeter of die area. Given the short design cycle Sun needed to create the processor, the result is a compelling early proof of the value of throughput computing.

...read moreread less

16 citations

Proceedings Article•DOI•

Post-silicon CPU adaptation made practical using machine learning

[...]

Stephen J. Tarsa¹, Hong Wang¹, Rangeen Basu Roy Chowdhury¹, Julien Sebot¹, Gautham N. Chinya¹, Jayesh Gaur¹, Karthik Sankaranarayanan¹, Chit-Kwan Lin¹, Robert S. Chappell¹, Ronak Singhal¹ - Show less +6 more•Institutions (1)

Intel¹

22 Jun 2019

TL;DR: This paper presents an adaptive CPU based on Intel SkyLake that closes the loop to deployment, and provides a novel mechanism for post-silicon customization, and shows how to optimize PPW using models trained to different SLAs or to specific applications, e.g. to improve datacenter hardware in situ.

...read moreread less

Abstract: Processors that adapt architecture to workloads at runtime promise compelling performance per watt (PPW) gains, offering one way to mitigate diminishing returns from pipeline scaling. State-of-the-art adaptive CPUs deploy machine learning (ML) models on-chip to optimize hardware by recognizing workload patterns in event counter data. However, despite breakthrough PPW gains, such designs are not yet widely adopted due to the potential for systematic adaptation errors in the field. This paper presents an adaptive CPU based on Intel SkyLake that (1) closes the loop to deployment, and (2) provides a novel mechanism for post-silicon customization. Our CPU performs predictive cluster gating, dynamically setting the issue width of a clustered architecture while clock-gating unused resources. Gating decisions are driven by ML adaptation models that execute on an existing microcontroller, minimizing design complexity and allowing performance characteristics to be adjusted with the ease of a firmware update. Crucially, we show that although adaptation models can suffer from statistical blindspots that risk degrading performance on new workloads, these can be reduced to minimal impact with careful design and training. Our adaptive CPU improves PPW by 31.4% over a comparable non-adaptive CPU on SPEC2017, and exhibits two orders of magnitude fewer Service Level Agreement (SLA) violations than the state-of-the-art. We show how to optimize PPW using models trained to different SLAs or to specific applications, e.g. to improve datacenter hardware in situ. The resulting CPU meets real world deployment criteria for the first time and provides a new means to tailor hardware to individual customers, even as their needs change.

...read moreread less

16 citations

Proceedings Article•DOI•

Parameterized FPGA-based architecture for parallel 1-D filtering algorithms

[...]

Sami Hasan¹, Said Boussakta¹, Alex Yakovlev¹•Institutions (1)

Newcastle University¹

09 May 2011

TL;DR: Parallel 1-D signal filtering algorithm is implemented as a parameterized efficient FPGA-based architecture using Xilinx System Generator and shows excellent performance results of power consumption down to and maximum frequency of up to (216 MHz).

...read moreread less

Abstract: Parallel 1-D signal filtering algorithm is implemented as a parameterized efficient FPGA-based architecture using Xilinx System Generator. The implemented algorithm is a linear indirect filters achieved by a parallel FFT/point-by-point complex inner product/ IFFT convolution unit array. The implemented architecture manifests a 38 % higher performance per Watt at maximum frequency. The parameterized implementation provides rapid system-level FPGA prototyping and operating frequency portability. Consequently, the results are obtained independent of the two targeted Virtex-6 FPGA boards, namely xc6vlX240Tl–1lff1759 and xc6vlX130Tl–1lff1156, to achieve lower power consumption of (1.6 W) and down to (0.99 W) respectively at a maximum frequency of up to (216 MHz). A case study of real-time speech filtering shows excellent performance results of power consumption down to (0.99W) at maximum frequency of up to (216 MHz).

...read moreread less

16 citations

Proceedings Article•DOI•

RAPID: In-Memory Analytical Query Processing Engine with Extreme Performance per Watt

[...]

Cagri Balkesen¹, Nitin Kunal¹, Georgios Giannikis¹, Pit Fender¹, Seema Sundara¹, Felix Schmidt¹, Jarod Wen¹, Sandeep R. Agrawal¹, Arun Raghavan¹, Venkatanathan Varadarajan¹, Anand Viswanathan¹, Balakrishnan Chandrasekaran¹, Sam Idicula¹, Nipun Agarwal¹, Eric Sedlar¹ - Show less +11 more•Institutions (1)

Oracle Corporation¹

27 May 2018

TL;DR: This paper demonstrates through a carefully designed modern data processing system called RAPID and a simple, low-power processor specially tailored for data processing that at least an order of magnitude performance/power improvement in SQL processing can be achieved over a modern system running on today's complex processors.

...read moreread less

Abstract: Today, an ever increasing amount of transistors are packed into processor designs with extra features to support a broad range of applications. As a consequence, processors are becoming more and more complex and power hungry. At the same time, they only sustain an average performance for a wide variety of applications while not providing the best performance for specific applications. In this paper, we demonstrate through a carefully designed modern data processing system called RAPID and a simple, low-power processor specially tailored for data processing that at least an order of magnitude performance/power improvement in SQL processing can be achieved over a modern system running on today's complex processors. RAPID is designed from the ground up with hardware/software co-design in mind to provide architecture-conscious extreme performance while consuming less power in comparison to the modern database systems. The paper presents in detail the design and implementation of RAPID, a relational, columnar, in-memory query processing engine supporting analytical query workloads.

...read moreread less

16 citations

Collapse

Network Information

Performance

Metrics

315

Papers

6,353

Citations

No. of papers in the topic in previous years
Year	Papers
2021	14
2020	15
2019	15
2018	36
2017	25
2016	31

Performance per watt

Papers published on a yearly basis

Papers

Network Information

Related Topics (5)

Performance

Metrics