Topic

Performance per watt

About: Performance per watt is a research topic. Over the lifetime, 315 publications have been published within this topic receiving 5778 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Optimizing Radial Basis Function Kernel on OpenCL FPGA Platform

[...]

Zheming Jin¹, Hal Finkel¹•Institutions (1)

Argonne National Laboratory¹

01 Dec 2018

TL;DR: This paper optimize a widely used kernel, radial basis function, in a support vector machine as a case study to evaluate the potential of using FPGAs and the capabilities of high-level synthesis (HLS) for data intensive applications.

...read moreread less

Abstract: In this paper, we optimize a widely used kernel, radial basis function, in a support vector machine as a case study to evaluate the potential of using FPGAs and the capabilities of high-level synthesis (HLS) for data intensive applications. We explain the HLS flow, and use it to develop and evaluate the kernels optimized with vectorization, loop unrolling, and half-precision storage format. Our optimizations improve the kernel performance by a factor of 15.8 compared to a baseline kernel on the Nallatech 385A FPGA card that features an Intel Arria 10 GX 1150 FPGA. The half storage format can reduce the DSP and memory utilizations at the cost of increasing the logic utilization. Compared to the single-precision floating-point kernels, the half-precision kernels can reduce the dynamic power consumption on the FPGA by approximately 30%. In terms of energy efficiency, the performance per watt on the FPGA platform is approximately 3X higher than that on an Intel Xeon 16-core CPU, and 1.8X higher than that on an Nvidia Tesla K80 GPU. On the other hand, the raw performance on the FPGA is approximately 2X and 2.7X lower than that on the CPU and GPU, respectively.

...read moreread less

1 citations

Book Chapter•DOI•

A Queueing Theoretic Approach for Performance Evaluation of Low-Power Multicore-Based Parallel Embedded Systems*

[...]

Arslan Munir¹, Ann Gordon-Ross², Sanjay Ranka²•Institutions (2)

University of Nevada, Reno¹, University of Florida²

08 Jan 2016

TL;DR: This chapter presents a novel, queueing theory-based modeling technique for evaluating multicore embedded architectures that do not require architectural-level benchmark simulation, and proposes a method to quantify computing requirements of real benchmarks probabilistically.

...read moreread less

Abstract: This chapter presents a novel, queueing theory-based modeling technique for evaluating multicore embedded architectures that do not require architectural-level benchmark simulation. This modeling technique enables quick and inexpensive architectural evaluation, with respect to design time and resources, as compared to developing and/or using existing multicore simulators and running benchmarks on these simulators. Based on a preliminary evaluation using the models, architectural designers can run targeted benchmarks to verify the performance characteristics of selected multicore architectures. The chapter proposes a method to quantify computing requirements of real benchmarks probabilistically. The modeling technique provides performance evaluation for workloads with any computing requirements as opposed to simulation-driven architectural evaluation that can provide performance results for specific benchmarks. The queueing theoretic modeling approach can be used for performance per watt and performance per unit area characterizations of multicore embedded architectures, with varying number of processor cores and cache configurations, to provide a comparative analysis.

...read moreread less

1 citations

Proceedings Article•DOI•

ESHMP: A Stall-Time-Based Scheduling for Performance Heterogeneous Multicore Systems

[...]

Pengcheng Nie¹, Zhenhua Duan¹, Bohu Huang¹•Institutions (1)

Xidian University¹

02 Sep 2011

TL;DR: A new metric, ASTPI (Average Stall Time Per Instruction), is proposed, designed, implemented and evaluated, which is based on the metric, and shows that ESHMP delivers scalability while adapting to a wide variety of applications.

...read moreread less

Abstract: Recent research advocates performance heterogeneous multicore processors, where cores in the same processor have same instruction set architecture (ISA) but often different performance characteristics. These architectures are able to deliver higher performance per watt and area for programs with diverse architectural requirements than comparable homogeneous ones. However, such power and area efficiencies of performance heterogeneous multicore systems can only be accomplished when thread-to-core assignment is made according to the characteristics of both the workload and the core. In this paper, we propose a new metric, ASTPI (Average Stall Time Per Instruction), to measure the properties of threads. We design, implement and evaluate a new online monitoring approach called ESHMP, which is based on the metric. Our evaluation in the Linux 2.6.21 operating system shows that ESHMP delivers scalability while adapting to a wide variety of applications.

...read moreread less

Posted Content•

Learning Pareto-Frontier Resource Management Policies for Heterogeneous SoCs: An Information-Theoretic Approach

[...]

Aryan Deshwal¹, Syrine Belakaria¹, Ganapati Bhat¹, Janardhan Rao Doppa¹, Partha Pratim Pande¹ - Show less +1 more•Institutions (1)

Washington State University¹

14 Apr 2021-arXiv: Hardware Architecture

TL;DR: In this paper, the authors propose an information-theoretic framework referred to as PaRMIS to create Pareto-optimal resource management policies for given target applications and design objectives.

...read moreread less

Abstract: Mobile system-on-chips (SoCs) are growing in their complexity and heterogeneity (e.g., Arm's Big-Little architecture) to meet the needs of emerging applications, including games and artificial intelligence. This makes it very challenging to optimally manage the resources (e.g., controlling the number and frequency of different types of cores) at runtime to meet the desired trade-offs among multiple objectives such as performance and energy. This paper proposes a novel information-theoretic framework referred to as PaRMIS to create Pareto-optimal resource management policies for given target applications and design objectives. PaRMIS specifies parametric policies to manage resources and learns statistical models from candidate policy evaluation data in the form of target design objective values. The key idea is to select a candidate policy for evaluation in each iteration guided by statistical models that maximize the information gain about the true Pareto front. Experiments on a commercial heterogeneous SoC show that PaRMIS achieves better Pareto fronts and is easily usable to optimize complex objectives (e.g., performance per Watt) when compared to prior methods.

...read moreread less

Book Chapter•DOI•

NuPow: Managing Power on NUMA Multiprocessors with Domain-Level Voltage and Frequency Control

[...]

Changmin Ahn¹, Seung Yul Lee¹, Chanseok Kang¹, Bernhard Egger¹•Institutions (1)

Seoul National University¹

15 Sep 2020

TL;DR: NuPow as discussed by the authors is a hierarchical scheduling and power management framework for architectures with multiple cores per voltage and frequency domain and non-uniform memory access (NUMA) properties.

...read moreread less

Abstract: Power management and task placement pose two of the greatest challenges for future many-core processors in data centers. With hundreds of cores on a single die, cores experience varying memory latencies and cannot individually regulate voltage and frequency, therefore calling for new approaches to scheduling and power management. This work presents NuPow, a hierarchical scheduling and power management framework for architectures with multiple cores per voltage and frequency domain and non-uniform memory access (NUMA) properties. NuPow considers the conflicting goals of grouping virtual machines (VMs) with similar load patterns while also placing them as close as possible to the accessed data. Implemented and evaluated on existing hardware, NuPow achieves significantly better performance per watt compared to competing approaches.

...read moreread less

Collapse

Network Information

Performance

Metrics

315

Papers

6,353

Citations

No. of papers in the topic in previous years
Year	Papers
2021	14
2020	15
2019	15
2018	36
2017	25
2016	31

Performance per watt

Papers published on a yearly basis

Papers

Network Information

Related Topics (5)

Performance

Metrics