scispace - formally typeset
Search or ask a question
Topic

Performance per watt

About: Performance per watt is a research topic. Over the lifetime, 315 publications have been published within this topic receiving 5778 citations.


Papers
More filters
Proceedings ArticleDOI
01 Oct 2017
TL;DR: This paper evaluates energy consumption of data types, operators, control statements, exception, and object in Java at a granular level to help in standardizing the energy consumption traits of Java which can be leveraged by software developers to generate energy efficient code in future.
Abstract: There has been a 10,000-fold increase in performance of supercomputers since 1992 but only 300-fold improvement in performance per watt. Dynamic adaptation of hardware techniques such as fine-grain clock gating, power gating and dynamic voltage/frequency scaling, are used for many years to improve the computer's energy efficiency. However, recent demands of exascale computation, as well as the increasing carbon footprint, require new breakthrough to make ICT systems more energy efficient. Energy efficient software has not been well studied in the last decade. In this paper, we take an early step to investigate the energy efficiency of Java which is one of the most common languages used in ICT systems. We evaluate energy consumption of data types, operators, control statements, exception, and object in Java at a granular level. Intel Running Average Power Limit (RAPL) technology is applied to measure the relative power consumption of small code snippets. Several observations are found, and these results will help in standardizing the energy consumption traits of Java which can be leveraged by software developers to generate energy efficient code in future.

8 citations

Proceedings ArticleDOI
03 Sep 2015
TL;DR: A flexible parallel hardware-based architecture in conjunction with frequency scaling as a technique for reducing power consumption in video streaming applications and derived equations to ease the calculation for the level of parallelism and the maximum depth for the FIFOs used for clock domain crossing is presented.
Abstract: Reconfigurable technology fits for real-time video streaming applications. It is considered as a promising solution due to the offered performance per watt compared to other technologies. Since FPGA evolved, several techniques at different design levels starting from the circuit-level up to the system-level were proposed to reduce the power consumption of the FPGA devices. In this paper, we present a flexible parallel hardware-based architecture in conjunction with frequency scaling as a technique for reducing power consumption in video streaming applications. In this work, we derived equations to ease the calculation for the level of parallelism and the maximum depth for the FIFOs used for clock domain crossing. Accordingly, a design space was formed including all the design alternatives for the application. The preferable design alternative is selected in aware of how much hardware it costs and what power reduction goal it can satisfy. We used Xilinx Zynq ZC706 evaluation board to implement two video streaming applications: Video downscaler (1∶16) and AES encryption algorithm to verify our approach. The experimental results showed up to 19.6% power reduction for the video downscaler and up to 5.4% for the AES encryption.

8 citations

01 Jan 2014
TL;DR: A novel modeling framework called PEARL is introduced to examine a class of embedded system applications relevant to mobile, airborne vehicles and shows that the resilience constraints limit achievable efficiency and higher variability in power dissipation across workflow workflows provides more opportunities to boost efficiency, despite stringent resilience constraints.
Abstract: Low-power embedded processing relies on dy- namic voltage-frequency scaling (DVFS) in order to optimize energy usage and therefore battery life. DVFS allows the processor to continuously adapt voltage and frequency to the minimum that still meets a program's current performance requirements. However, low-voltage operation exacerbates the incidence of soft errors. Similarly, high-voltage operation (to meet real-time deadlines) is constrained by power dissipation (and associated thermal) maxima - as dictated by aging limits. In this paper, we introduce a novel modeling framework called PEARL to examine a class of embedded system applications relevant to mobile, airborne vehicles. Using PEARL, we in- vestigate the problem of assigning optimal voltage-frequency settings to individual segments within example workflows. The goal of this study is to understand the limits of achievable energy efficiency (performance per watt) under varying levels of system-resilience targets. The analysis results show that: (a) the resilience constraints limit achievable efficiency; and (b) higher variability in power dissipation across workflow seg- ments provides more opportunities to boost efficiency, despite stringent resilience constraints.

8 citations

Journal ArticleDOI
TL;DR: This work develops a strategy to simplify the pipeline, in which the square calculation task is conducted by the DSP48E1 of Xilinx 7 series FPGAs, so as to reduce the logic resource utilization of each pipeline and advantages of particle-mesh scheme are taken to overcome the bottleneck on bandwidth.
Abstract: As a modified-gravity proposal to handle the dark matter problem on galactic scales, Modified Newtonian Dynamics (MOND) has shown a great success. However, the N-body MOND simulation is quite challenged by its computation complexity, which appeals to acceleration of the simulation calculation. In this paper, we present a highly integrated accelerating solution for N-body MOND simulations. By using the FPGA-SoC, which integrates both FPGA and SoC (system on chip) in one chip, our solution exhibits potentials for better performance, higher integration, and lower power consumption. To handle the calculation bottleneck of potential summation, on one hand, we develop a strategy to simplify the pipeline, in which the square calculation task is conducted by the DSP48E1 of Xilinx 7 series FPGAs, so as to reduce the logic resource utilization of each pipeline; on the other hand, advantages of particle-mesh scheme are taken to overcome the bottleneck on bandwidth. Our experiment results show that 2 more pipelines can be integrated in Zynq-7020 FPGA-SoC with the simplified pipeline, and the bandwidth requirement is reduced significantly. Furthermore, our accelerating solution has a full range of advantages over different processors. Compared with GPU, our work is about 10 times better in performance per watt and 50% better in performance per cost.

8 citations

Proceedings ArticleDOI
18 Dec 2010
TL;DR: This paper provides an autonomic power management scheme for the resource provisioning process for large-scale data centers while meeting the Service-Level Agreement (SLA) and power requirements.
Abstract: The characteristic of dramatic fluctuation in the resource provisioning for real-time applications calls for an elastic delivery of computing services Current data center deployment schemes, which feature a strong tie between servers and applications, are increasingly challenged to ensure power efficiency in terms of multiple peak loads provisioning, optimal average resources utilization, variable runtime workloads profiling, data center manageability and overhead control on the data center Total Cost of Ownership (TCO) Researchers have exploited paradigms such as virtualization and migration for large-scale computing systems, however, there is still a long way before we can optimally address the power-performance trade-off This paper provides an autonomic power management scheme for the resource provisioning process for large-scale data centers while meeting the Service-Level Agreement (SLA) and power requirements The system status is continuously monitored using a cross-layered hierarchy to optimally scale up and down the virtual machine resources such that power and performance can be optimized We have applied our technique to autonomically manage high performance platforms with multi-core processors and multi rank memory subsystems Our experimental results show around 5625 percent platform energy savings for memory-intensive workload, 6375 percent platform energy savings for processor-intensive workload and 475 percent platform energy savings for mixed workload while maintaining

8 citations

Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
81% related
Benchmark (computing)
19.6K papers, 419.1K citations
80% related
Programming paradigm
18.7K papers, 467.9K citations
77% related
Compiler
26.3K papers, 578.5K citations
77% related
Scalability
50.9K papers, 931.6K citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202114
202015
201915
201836
201725
201631