scispace - formally typeset
Search or ask a question
Topic

Performance per watt

About: Performance per watt is a research topic. Over the lifetime, 315 publications have been published within this topic receiving 5778 citations.


Papers
More filters
01 Jan 2006
TL;DR: The results show that the algorithm, by incorporating physical interaction of the cores, consistently succeeds in maximizing the operating frequency of the most critical core while successfully relieving the thermal emergency of the core.
Abstract: Physical phenomena such as temperature and power have an increasingly important role in performance and reliability of modern process technologies. This trend will only strengthen with future generations. In this dissertation we present mechanisms for thermal management and power optimizations of integrated circuits. We present three thermal aware high-level synthesis techniques for peak temperature reduction targeting ASIC design flow. Decisions made during high-level synthesis impact the activity of functional resources and their power consumption. Power consumed is dissipated as heat. The first approach consists of two constructive temperature-aware resource allocation and binding algorithms - temperature constrained resource minimization and resource constrained temperature minimization. The second technique is an iterative temperature aware binding algorithm, which evenly distributes activity across functional units. The third mechanism combines temperature-aware scheduling and binding based on feedback from post floorplan thermal simulation. Our techniques are effective in peak temperature reduction, and reducing leakage and total power consumption. In order to maintain performance per Watt in microprocessors, there is a shift towards chip level multiprocessing paradigm. With such large-scale integration and increasing power densities Dynamic Thermal Management (DTM) continues to be a significant design effort to maintain performance and reliability. We present two mechanisms to perform real time frequency scaling as part of dynamic frequency and voltage scaling to assist DTM. The results show that our algorithm, by incorporating physical interaction of the cores, consistently succeeds in maximizing the operating frequency of the most critical core while successfully relieving the thermal emergency of the core. DTM techniques rely on accurate readings of on-die thermal sensors. Next, we present novel techniques for determining the optimal locations and allocations for thermal sensors to provide a high fidelity thermal profile of a complex microprocessor system. We show that our tool is able to create a sensor distribution for a given microprocessor architecture providing accurate thermal measurements. Increased logic density and programmability of FPGAs cause high power dissipation and on-chip temperature. We present techniques for placement and minimization of sensors, which can then be mapped onto FPGA, post-fabrication for thermal monitoring and power driven netlist partitioning for realizing low power FPGAs.

1 citations

Proceedings ArticleDOI
01 Oct 2016
TL;DR: A novel online program phase detection technique that is based on the frequency of cache misses and processor stalls which correspond to core resource bottlenecks is proposed that can demonstrate as much as 22% improvement in average performance/Watt using Instructions per Second (IPS) as the performance metric.
Abstract: Heterogeneous architectures offer the promise of higher performance/Watt compared to symmetric multi-cores. Recent works have proposed the use of non-monotonic (NM) heterogeneous architectures with diverse core types where each core has unique power and performance characteristics. However, the power and performance benefits achieved by NM architectures is highly dependent on assignment of application to the most suitable core type for all program phases. In this paper we propose a novel online program phase detection technique that is based on the frequency of cache misses and processor stalls which correspond to core resource bottlenecks. We track performance monitors to formulate a Bottleneck Type Vector (BTV) that help direct the application to most appropriate core type for execution. We compare the proposed BTV-based core assignment method to prior online core assignment approaches and demonstrate as much as 22% improvement in average performance/Watt using Instructions per Second (IPS) as the performance metric.

1 citations

Journal ArticleDOI
TL;DR: This paper analyzes disk I/O performance by assessing the disk bandwidth and latency for different reads and writes configurations for sequential and random patterns and proposes an estimation method which estimates disk latency at different disk queue depth settings.
Abstract: disk I/O performance and power consumption associated with a given cloud workload is important especially for workloads that are bounded by disk I/O. Disk performance becomes a bottleneck for achieving higher performance and lower power consumption especially when memory size is not enough to process large blocks of data. This will lead to a negative impact on the Quality-of-Service (QoS). In this paper, we analyze disk I/O performance by assessing the disk bandwidth and latency for different reads and writes configurations for sequential and random patterns. The systems used are based on ATOM D525 and Xeon X5660 processors. We analyze power consumption for both systems and provide a performance-per-watt optimum operation point. We also propose an estimation method which estimates disk latency at different disk queue depth settings. The estimation method is verified to estimate disk latency with < 5% error margin. KeywordsI/O performance, performance-per-watt analysis, cloud computing

1 citations

Patent
07 Aug 2014
TL;DR: In this article, the authors identify and enable the optimum set of processor cores in order to achieve the best performance for the lowest power consumption level or a given power budget for a given workload.
Abstract: Various aspects provide a device and method for the intelligent control of a plurality of multi-core processor cores of a multi-core integrated circuit The aspect may be identified and enable the optimum set of the processor core in order to achieve the best performance for the lowest power consumption level or a given power budget for a given workload Optimal set of processor cores may be designated day of the number of active processor cores or active core processor specific If the temperature readings of the processor core is under the threshold, the set of processor cores may be selected to provide the lowest power consumption for a given workload If the temperature reading is above the threshold value of the processor core, a set of processor cores may be selected to provide the best performance for a given power budget

1 citations

01 Jan 2009
TL;DR: Proof-of-concept testing and total cost of ownership (TCO) analysis were conducted and seamless live migration between servers based on Intel Xeon processor 5500 series and previous Intel processor generations was verified using VMware Enhanced VMotion* and Intel Virtualization Technology FlexMigration assist.
Abstract: Intel IT, together with Intel’s Digital Enterprise Group, End User Platform Integration, and Intel’s Software and Services Group, conducted proof-of-concept testing and total cost of ownership (TCO) analysis to assess the virtualization capabilities of Intel® Xeon® processor 5500 series. A server based on Intel® Xeon® processor X5570 delivered up to 2.6x the performance and up to 2.05x the performance per watt of a server based on Intel® Xeon® processor E5450, resulting in the ability to support approximately twice as many virtual machines for the same TCO. We also verified seamless live migration between servers based on Intel Xeon processor 5500 series and previous Intel® processor generations using VMware Enhanced VMotion* and Intel® Virtualization Technology FlexMigration assist.

1 citations

Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
81% related
Benchmark (computing)
19.6K papers, 419.1K citations
80% related
Programming paradigm
18.7K papers, 467.9K citations
77% related
Compiler
26.3K papers, 578.5K citations
77% related
Scalability
50.9K papers, 931.6K citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202114
202015
201915
201836
201725
201631