scispace - formally typeset
Proceedings ArticleDOI

A clustered manycore processor architecture for embedded and accelerated applications

Reads0
Chats0
TLDR
This work demonstrates that the MPPA-256 processor clustered manycore architecture is effective on two different classes of applications: embedded computing, with the implementation of a professional H.264 video encoder that runs in real-time at low power; and high-performance computing,with the acceleration of a financial option pricing application.
Abstract
The Kalray MPPA-256 processor integrates 256 user cores and 32 system cores on a chip with 28nm CMOS technology. Each core implements a 32-bit 5-issue VLIW architecture. These cores are distributed across 16 compute clusters of 16+1 cores, and 4 quad-core I/O subsystems. Each compute cluster and I/O subsystem owns a private address space, while communication and synchronization between them is ensured by data and control Networks-On-Chip (NoC). The MPPA-256 processor is also fitted with a variety of I/O controllers, in particular DDR, PCI, Ethernet, Interlaken and GPIO. We demonstrate that the MPPA-256 processor clustered manycore architecture is effective on two different classes of applications: embedded computing, with the implementation of a professional H.264 video encoder that runs in real-time at low power; and high-performance computing, with the acceleration of a financial option pricing application. In the first case, a cyclostatic dataflow programming environment is utilized, that automates application distribution over the execution resources. In the second case, an explicit parallel programming model based on POSIX processes, threads, and NoC-specific IPC is used.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

TicToc: Time Traveling Optimistic Concurrency Control

TL;DR: TicToc is presented, a new optimistic concurrency control algorithm that avoids the scalability and concurrency bottlenecks of prior T/O schemes and achieves up to 92% better throughput while reducing the abort rate by 3.3x over these previous algorithms.
Journal ArticleDOI

Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations

TL;DR: This study presents the development of a novel classification method taking into account the spatial and spectral characteristics of the hyperspectral images to help neurosurgeons to accurately determine the tumor boundaries in surgical-time during the resection, avoiding excessive excision of normal tissue or unintentionally leaving residual tumor.
Journal ArticleDOI

PULP: A Ultra-Low Power Parallel Accelerator for Energy-Efficient and Flexible Embedded Vision

TL;DR: PULP (Parallel processing Ultra-Low Power platform), an architecture built on clusters of tightly-coupled OpenRISC ISA cores, with advanced techniques for fast performance and energy scalability that exploit the capabilities of the STMicroelectronics UTBB FD-SOI 28nm technology is proposed.
Journal ArticleDOI

Mixed-criticality scheduling on cluster-based manycores with shared communication and storage resources

TL;DR: A combined analysis of computing, memory and communication scheduling in a mixed-criticality setting is introduced and a considered cluster-based architecture model describes closely state-of-the-art many-core platforms, such as the Kalray MPPA®-256.
References
More filters
Proceedings Article

The Semantics of a Simple Language for Parallel Programming.

Gilles Kahn
TL;DR: A simple language for parallel programming is described and its mathematical properties are studied to make a case for more formal languages for systems programming and the design of operating systems.
Proceedings ArticleDOI

Active messages: a mechanism for integrated communication and computation

TL;DR: It is shown that active messages are sufficient to implement the dynamically scheduled languages for which message driven machines were designed and, with this mechanism, latency tolerance becomes a programming/compiling concern.
Proceedings Article

TreadMarks: distributed shared memory on standard workstations and operating systems

TL;DR: A performance evaluation of TreadMarks running on Ultrix using DECstation-5000/240's that are connected by a 100-Mbps switch-based ATM LAN and a 10-Mbps Ethernet supports the contention that, with suitable networking technology, DSM is a viable technique for parallel computation on clusters of workstations.
Journal ArticleDOI

Cycle-static dataflow

TL;DR: The CSDF paradigm is an extension of synchronous dataflow that still allows for static scheduling and, thus, a very efficient implementation of an application and it is indicated that CSDF is essential for modelling prescheduled components, like application-specific integrated circuits.
Journal ArticleDOI

Software Component Models

TL;DR: This paper surveys and analyzes current component models and classify them into a taxonomy based on commonly accepted desiderata for CBD, and describes its key characteristics and evaluates them with respect to these Desiderata.
Related Papers (5)