scispace - formally typeset
Search or ask a question
Topic

Performance per watt

About: Performance per watt is a research topic. Over the lifetime, 315 publications have been published within this topic receiving 5778 citations.


Papers
More filters
Book ChapterDOI
19 Oct 2014
TL;DR: A new methodology for benchmarking the performance per watt of semantic web reasoners and rule engines on smartphones is introduced to provide developers with information critical for deploying semantic web tools on power-constrained devices.
Abstract: We introduce a new methodology for benchmarking the performance per watt of semantic web reasoners and rule engines on smartphones to provide developers with information critical for deploying semantic web tools on power-constrained devices. We validate our methodology by applying it to three well-known reasoners and rule engines answering queries on two ontologies with expressivities in RDFS and OWL DL. While this validation was conducted on smartphones running Google's Android operating system, our methodology is general and may be applied to different hardware platforms, reasoners, ontologies, and entire applications to determine performance relevant to power consumption. We discuss the implications of our findings for balancing tradeoffs of local computation versus communication costs for semantic technologies on mobile platforms, sensor networks, the Internet of Things, and other power-constrained environments.

16 citations

Journal ArticleDOI
TL;DR: The processor described is a member of Sun's first generation of CMT processors designed to efficiently execute network-facing workloads, and its dual-thread execution capability, compact die size, and minimal power consumption combine to produce high throughput performance per watt, per transistor, and per square millimeter of die area.
Abstract: Throughput computing is based on chip multithreading processor design technology. In CMT technology, maximizing the amount of work accomplished per unit of time or other relevant resource, rather than minimizing the time needed to complete a given task or set of tasks, defines performance. By CMT standards, the best processor accomplishes the most work per second of time, per watt of expended power, per square millimeter of die area, and so on (that is, it operates most efficiently). The processor described is a member of Sun's first generation of CMT processors designed to efficiently execute network-facing workloads. Network-facing systems primarily service network clients and are often grouped together under die label "Web servers". The processor's dual-thread execution capability, compact die size, and minimal power consumption combine to produce high throughput performance per watt, per transistor, and per square millimeter of die area. Given the short design cycle Sun needed to create the processor, the result is a compelling early proof of the value of throughput computing.

16 citations

Proceedings ArticleDOI
22 Jun 2019
TL;DR: This paper presents an adaptive CPU based on Intel SkyLake that closes the loop to deployment, and provides a novel mechanism for post-silicon customization, and shows how to optimize PPW using models trained to different SLAs or to specific applications, e.g. to improve datacenter hardware in situ.
Abstract: Processors that adapt architecture to workloads at runtime promise compelling performance per watt (PPW) gains, offering one way to mitigate diminishing returns from pipeline scaling. State-of-the-art adaptive CPUs deploy machine learning (ML) models on-chip to optimize hardware by recognizing workload patterns in event counter data. However, despite breakthrough PPW gains, such designs are not yet widely adopted due to the potential for systematic adaptation errors in the field. This paper presents an adaptive CPU based on Intel SkyLake that (1) closes the loop to deployment, and (2) provides a novel mechanism for post-silicon customization. Our CPU performs predictive cluster gating, dynamically setting the issue width of a clustered architecture while clock-gating unused resources. Gating decisions are driven by ML adaptation models that execute on an existing microcontroller, minimizing design complexity and allowing performance characteristics to be adjusted with the ease of a firmware update. Crucially, we show that although adaptation models can suffer from statistical blindspots that risk degrading performance on new workloads, these can be reduced to minimal impact with careful design and training. Our adaptive CPU improves PPW by 31.4% over a comparable non-adaptive CPU on SPEC2017, and exhibits two orders of magnitude fewer Service Level Agreement (SLA) violations than the state-of-the-art. We show how to optimize PPW using models trained to different SLAs or to specific applications, e.g. to improve datacenter hardware in situ. The resulting CPU meets real world deployment criteria for the first time and provides a new means to tailor hardware to individual customers, even as their needs change.

16 citations

Proceedings ArticleDOI
09 May 2011
TL;DR: Parallel 1-D signal filtering algorithm is implemented as a parameterized efficient FPGA-based architecture using Xilinx System Generator and shows excellent performance results of power consumption down to and maximum frequency of up to (216 MHz).
Abstract: Parallel 1-D signal filtering algorithm is implemented as a parameterized efficient FPGA-based architecture using Xilinx System Generator. The implemented algorithm is a linear indirect filters achieved by a parallel FFT/point-by-point complex inner product/ IFFT convolution unit array. The implemented architecture manifests a 38 % higher performance per Watt at maximum frequency. The parameterized implementation provides rapid system-level FPGA prototyping and operating frequency portability. Consequently, the results are obtained independent of the two targeted Virtex-6 FPGA boards, namely xc6vlX240Tl–1lff1759 and xc6vlX130Tl–1lff1156, to achieve lower power consumption of (1.6 W) and down to (0.99 W) respectively at a maximum frequency of up to (216 MHz). A case study of real-time speech filtering shows excellent performance results of power consumption down to (0.99W) at maximum frequency of up to (216 MHz).

16 citations

Proceedings ArticleDOI
27 May 2018
TL;DR: This paper demonstrates through a carefully designed modern data processing system called RAPID and a simple, low-power processor specially tailored for data processing that at least an order of magnitude performance/power improvement in SQL processing can be achieved over a modern system running on today's complex processors.
Abstract: Today, an ever increasing amount of transistors are packed into processor designs with extra features to support a broad range of applications. As a consequence, processors are becoming more and more complex and power hungry. At the same time, they only sustain an average performance for a wide variety of applications while not providing the best performance for specific applications. In this paper, we demonstrate through a carefully designed modern data processing system called RAPID and a simple, low-power processor specially tailored for data processing that at least an order of magnitude performance/power improvement in SQL processing can be achieved over a modern system running on today's complex processors. RAPID is designed from the ground up with hardware/software co-design in mind to provide architecture-conscious extreme performance while consuming less power in comparison to the modern database systems. The paper presents in detail the design and implementation of RAPID, a relational, columnar, in-memory query processing engine supporting analytical query workloads.

16 citations

Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
81% related
Benchmark (computing)
19.6K papers, 419.1K citations
80% related
Programming paradigm
18.7K papers, 467.9K citations
77% related
Compiler
26.3K papers, 578.5K citations
77% related
Scalability
50.9K papers, 931.6K citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202114
202015
201915
201836
201725
201631