Topic

Performance per watt

About: Performance per watt is a research topic. Over the lifetime, 315 publications have been published within this topic receiving 5778 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Optimization techniques for a high level synthesis implementation of the Sobel filter

[...]

Josh Monson¹, Michael Wirthlin¹, Brad Hutchings¹•Institutions (1)

Brigham Young University¹

01 Dec 2013

TL;DR: The challenges faced by software programmers when using HLS to implement computing kernels within FPGAs are explored and the specific new knowledge and skills required by programmers to succeed at the task are identified.

...read moreread less

Abstract: For many application-specific computations, FPGA-based computing systems have been shown to provide superior performance per Watt than many general-purpose architectures. However, the benefits of FPGA-based computing are difficult to exploit since FPGAs are challenging to program and require advanced hardware design skills. Recent developments in High Level Synthesis (HLS) provide the ability to create FPGA compute accelerators entirely in `C' code. Because the circuits are described in `C', it may be possible for software programmers to “program” FPGA accelerator circuits. This paper explores the challenges faced by software programmers when using HLS to implement computing kernels within FPGAs and identifies the specific new knowledge and skills required by these programmers to succeed at the task. A high-performance Sobel edge-detection acceleration core is developed and used to demonstrate the use of the Vivado HLS tool. A variety of simple directives and code restructuring steps are applied to demonstrate a variety of Sobel edge-detection accelerators that vary in performance from 10.9 frames per second (fps) to 388 fps. The concepts outlined in this paper suggest that with proper training, software programmers are able to create a wide range of FPGA acceleration circuits.

...read moreread less

19 citations

Journal Article•DOI•

Dynamic cloud resource management for efficient media applications in mobile computing environments

[...]

Gangyong Jia¹, Guangjie Han², Jinfang Jiang², Sammy Chan³, Yuxin Liu - Show less +1 more•Institutions (3)

Hangzhou Dianzi University¹, Hohai University², City University of Hong Kong³

01 Jun 2018

TL;DR: A dynamic cloud resource management (DCRM) policy to improve the quality of service (QoS) in multimedia mobile computing is proposed, and experimental results show that DCRM behaves better in both response time and QoS, thus proving thatDCRM is good at shared resource management in mobile media cloud computing.

...read moreread less

Abstract: Single-instruction-set architecture (Single-ISA) heterogeneous multi-core processors (HMP) are superior to Symmetric Multi-core processors in performance per watt. They are popular in many aspects of the Internet of Things, including mobile multimedia cloud computing platforms. One Single-ISA HMP integrates both fast out-of-order cores and slow simpler cores, while all cores are sharing the same ISA. The quality of service (QoS) is most important for virtual machine (VM) resource management in multimedia mobile computing, particularly in Single-ISA heterogeneous multi-core cloud computing platforms. Therefore, in this paper, we propose a dynamic cloud resource management (DCRM) policy to improve the QoS in multimedia mobile computing. DCRM dynamically and optimally partitions shared resources according to service or application requirements. Moreover, DCRM combines resource-aware VM allocation to maximize the effectiveness of the heterogeneous multi-core cloud platform. The basic idea for this performance improvement is to balance the shared resource allocations with these resources requirements. The experimental results show that DCRM behaves better in both response time and QoS, thus proving that DCRM is good at shared resource management in mobile media cloud computing.

...read moreread less

19 citations

Book Chapter•DOI•

NVIDIA Jetson Platform Characterization

[...]

Hassan H. Halawa¹, Hazem A. Abdelhafez¹, Andrew Boktor¹, Matei Ripeanu¹•Institutions (1)

University of British Columbia¹

28 Aug 2017

TL;DR: This paper characterizes the NVIDIA Jetson TK1 and TX1 Platforms by characterizing the platforms’ performance using Roofline models obtained through an empirical measurement-based approach and through a case study of a heterogeneous application (matrix multiplication).

...read moreread less

Abstract: This study characterizes the NVIDIA Jetson TK1 and TX1 Platforms, both built on a NVIDIA Tegra System on Chip and combining a quad-core ARM CPU and an NVIDIA GPU. Their heterogeneous nature, as well as their wide operating frequency range, make it hard for application developers to reason about performance and determine which optimizations are worth pursuing. This paper attempts to inform developers’ choices by characterizing the platforms’ performance using Roofline models obtained through an empirical measurement-based approach as well as through a case study of a heterogeneous application (matrix multiplication). Our results highlight a difference of more than an order of magnitude in compute performance between the CPU and GPU on both platforms. Given that the CPU and GPU share the same memory bus, their Roofline models’ balance points are also more than an order of magnitude apart. We also explore the impact of frequency scaling: build CPU and GPU Roofline profiles and characterize both platforms’ balance point variation, power consumption, and performance per watt as frequency is scaled.

...read moreread less

18 citations

Proceedings Article•DOI•

ACFS: a completely fair scheduler for asymmetric single-isa multicore systems

[...]

Juan Carlos Saez¹, Adrián Pousa², Fernando Castro¹, Daniel Chaver¹, Manuel Prieto-Matias¹ - Show less +1 more•Institutions (2)

Complutense University of Madrid¹, National University of La Plata²

13 Apr 2015

TL;DR: This work proposes ACFS, an asymmetry-aware completely fair scheduler that seeks to optimize fairness while ensuring acceptable throughput, and demonstrates that ACFS achieves an average 11% fairness improvement over state-of-the-art schemes, while providing better system throughput.

...read moreread less

Abstract: Single-ISA (instruction set architecture) asymmetric multicore processors (AMPs) were shown to deliver higher performance per watt and area than symmetric CMPs (Chip Multi-Processors) for applications with diverse architectural requirements. A large body of work has demonstrated that this potential of AMP systems can be realizable via OS scheduling. Yet, existing schedulers that seek to deliver fairness on AMPs do not ensure that equal-priority applications experience the same slowdown when sharing the system. Moreover, most of these schemes are also subject to high throughput degradation and fail to effectively deal with user priorities. In this work we propose ACFS, an asymmetry-aware completely fair scheduler that seeks to optimize fairness while ensuring acceptable throughput. Our evaluation on real AMP hardware, and using scheduler implementations on a general-purpose OS, demonstrates that ACFS achieves an average 11% fairness improvement over state-of-the-art schemes, while providing better system throughput.

...read moreread less

18 citations

Proceedings Article•DOI•

High-level design using Intel FPGA OpenCL: A hyperspectral imaging spatial-spectral classifier

[...]

R. Domingo, Ruben Salvador, Himar Fabelo¹, D. Madroñal, Samuel Ortega¹, R. Lazcano, Eduardo Juarez, Gustavo M. Callico¹, César Sanz - Show less +5 more•Institutions (1)

University of Las Palmas de Gran Canaria¹

12 Jul 2017

TL;DR: This paper reviews some latest works using Intel FPGA SDK for OpenCL and the strategies for optimization, evaluating the framework for the design of a hyperspectral image spatial-spectral classifier accelerator and shows how reasonable speedups are obtained in a device with scarce computing and embedded memory resources.

...read moreread less

Abstract: Current computational demands require increasing designer's efficiency and system performance per watt. A broadly accepted solution for efficient accelerators implementation is reconfigurable computing. However, typical HDL methodologies require very specific skills and a considerable amount of designer's time. Despite the new approaches to high-level synthesis like OpenCL, given the large heterogeneity in today's devices (manycore, CPUs, GPUs, FPGAs), there is no one-fits-all solution, so to maximize performance, platform-driven optimization is needed. This paper reviews some latest works using Intel FPGA SDK for OpenCL and the strategies for optimization, evaluating the framework for the design of a hyperspectral image spatial-spectral classifier accelerator. Results are reported for a Cyclone V SoC using Intel FPGA OpenCL Offline Compiler 16.0 out-of-the-box. From a common baseline C implementation running on the embedded ARM® Cortex®-A9, OpenCL-based synthesis is evaluated applying different generic and vendor specific optimizations. Results show how reasonable speedups are obtained in a device with scarce computing and embedded memory resources. It seems a great step has been given to effectively raise the abstraction level, but still, a considerable amount of HW design skills is needed.

...read moreread less

16 citations

Collapse

Network Information

Performance

Metrics

315

Papers

6,353

Citations

No. of papers in the topic in previous years
Year	Papers
2021	14
2020	15
2019	15
2018	36
2017	25
2016	31

Performance per watt

Papers published on a yearly basis

Papers

Network Information

Related Topics (5)

Performance

Metrics