A clustered manycore processor architecture for embedded and accelerated applications

doi:10.1109/HPEC.2013.6670342

Proceedings ArticleDOI

A clustered manycore processor architecture for embedded and accelerated applications

Benoît Dupont de Dinechin, +10 more

- pp 1-6

Chats0

TLDR

This work demonstrates that the MPPA-256 processor clustered manycore architecture is effective on two different classes of applications: embedded computing, with the implementation of a professional H.264 video encoder that runs in real-time at low power; and high-performance computing,with the acceleration of a financial option pricing application.

Abstract:

The Kalray MPPA-256 processor integrates 256 user cores and 32 system cores on a chip with 28nm CMOS technology. Each core implements a 32-bit 5-issue VLIW architecture. These cores are distributed across 16 compute clusters of 16+1 cores, and 4 quad-core I/O subsystems. Each compute cluster and I/O subsystem owns a private address space, while communication and synchronization between them is ensured by data and control Networks-On-Chip (NoC). The MPPA-256 processor is also fitted with a variety of I/O controllers, in particular DDR, PCI, Ethernet, Interlaken and GPIO. We demonstrate that the MPPA-256 processor clustered manycore architecture is effective on two different classes of applications: embedded computing, with the implementation of a professional H.264 video encoder that runs in real-time at low power; and high-performance computing, with the acceleration of a financial option pricing application. In the first case, a cyclostatic dataflow programming environment is utilized, that automates application distribution over the execution resources. In the second case, an explicit parallel programming model based on POSIX processes, threads, and NoC-specific IPC is used.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

TicToc: Time Traveling Optimistic Concurrency Control

Xiangyao Yu, +3 more

TL;DR: TicToc is presented, a new optimistic concurrency control algorithm that avoids the scalability and concurrency bottlenecks of prior T/O schemes and achieves up to 92% better throughput while reducing the abort rate by 3.3x over these previous algorithms.

...read moreread less

Journal ArticleDOI

Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations

Himar Fabelo, +21 more

- 19 Mar 2018 -

PLOS ONE

TL;DR: This study presents the development of a novel classification method taking into account the spatial and spectral characteristics of the hyperspectral images to help neurosurgeons to accurately determine the tumor boundaries in surgical-time during the resection, avoiding excessive excision of normal tissue or unintentionally leaving residual tumor.

...read moreread less

Journal ArticleDOI

An Intraoperative Visualization System Using Hyperspectral Imaging to Aid in Brain Tumor Delineation.

Himar Fabelo, +22 more

- 01 Feb 2018 -

Sensors

TL;DR: In this preliminary study, thematic maps obtained from a validation database of seven hyperspectral images of in vivo brain tissue captured and processed during neurosurgical operations demonstrate that the system is able to discriminate between normal and tumor tissue in the brain.

...read moreread less

Journal ArticleDOI

PULP: A Ultra-Low Power Parallel Accelerator for Energy-Efficient and Flexible Embedded Vision

Francesco Conti, +4 more

TL;DR: PULP (Parallel processing Ultra-Low Power platform), an architecture built on clusters of tightly-coupled OpenRISC ISA cores, with advanced techniques for fast performance and energy scalability that exploit the capabilities of the STMicroelectronics UTBB FD-SOI 28nm technology is proposed.

...read moreread less

Journal ArticleDOI

Mixed-criticality scheduling on cluster-based manycores with shared communication and storage resources

Georgia Giannopoulou, +4 more

- 01 Jul 2016 -

Real-time Systems

TL;DR: A combined analysis of computing, memory and communication scheduling in a mixed-criticality setting is introduced and a considered cluster-based architecture model describes closely state-of-the-art many-core platforms, such as the Kalray MPPA®-256.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

The Semantics of a Simple Language for Parallel Programming.

Gilles Kahn

TL;DR: A simple language for parallel programming is described and its mathematical properties are studied to make a case for more formal languages for systems programming and the design of operating systems.

...read moreread less

Proceedings ArticleDOI

Active messages: a mechanism for integrated communication and computation

Thorsten von Eicken, +3 more

TL;DR: It is shown that active messages are sufficient to implement the dynamically scheduled languages for which message driven machines were designed and, with this mechanism, latency tolerance becomes a programming/compiling concern.

...read moreread less

Proceedings Article

TreadMarks: distributed shared memory on standard workstations and operating systems

P. Keleher, +3 more

TL;DR: A performance evaluation of TreadMarks running on Ultrix using DECstation-5000/240's that are connected by a 100-Mbps switch-based ATM LAN and a 10-Mbps Ethernet supports the contention that, with suitable networking technology, DSM is a viable technique for parallel computation on clusters of workstations.

...read moreread less

Journal ArticleDOI

Cycle-static dataflow

G. Bilsen, +3 more

- 01 Feb 1996 -

IEEE Transactions on Signal Processing

TL;DR: The CSDF paradigm is an extension of synchronous dataflow that still allows for static scheduling and, thus, a very efficient implementation of an application and it is indicated that CSDF is essential for modelling prescheduled components, like application-specific integrated circuits.

...read moreread less

Journal ArticleDOI

Software Component Models

Kung-Kiu Lau, +1 more

- 01 Oct 2007 -

IEEE Transactions on Software Engineerin...

TL;DR: This paper surveys and analyzes current component models and classify them into a taxonomy based on commonly accepted desiderata for CBD, and describes its key characteristics and evaluates them with respect to these Desiderata.

...read moreread less

IEEE Micro

Digital data processing system utilizing a unique arithmetic logic unit for handling uniquely identifiable addresses for operands and instructions

Richard G. Bratt, +10 more

A clustered manycore processor architecture for embedded and accelerated applications

Citations

TicToc: Time Traveling Optimistic Concurrency Control

Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations

An Intraoperative Visualization System Using Hyperspectral Imaging to Aid in Brain Tumor Delineation.

PULP: A Ultra-Low Power Parallel Accelerator for Energy-Efficient and Flexible Embedded Vision

Mixed-criticality scheduling on cluster-based manycores with shared communication and storage resources

References

The Semantics of a Simple Language for Parallel Programming.

Active messages: a mechanism for integrated communication and computation

TreadMarks: distributed shared memory on standard workstations and operating systems

Cycle-static dataflow

Software Component Models

Related Papers (5)

NeuFlow: A runtime reconfigurable dataflow processor for vision

Helios: heterogeneous multiprocessing with satellite kernels

1.1 Computing's energy problem (and what we can do about it)

Hexagon DSP: An Architecture Optimized for Mobile Multimedia and Communications

Digital data processing system utilizing a unique arithmetic logic unit for handling uniquely identifiable addresses for operands and instructions