scispace - formally typeset
Search or ask a question
Author

Thomas Lehnig

Bio: Thomas Lehnig is an academic researcher. The author has contributed to research in topics: Shared memory & Programmer. The author has an hindex of 1, co-authored 1 publications receiving 263 citations.

Papers
More filters
Proceedings ArticleDOI
13 Nov 2010
TL;DR: The programmer's view of this chip is described and RCCE is described: the native message passing model created for the SCC processor, an intermediate case, sharing traits of message passing and shared memory architectures.
Abstract: The number of cores integrated onto a single die is expected to climb steadily in the foreseeable future. This move to many-core chips is driven by a need to optimize performance per watt. How best to connect these cores and how to program the resulting many-core processor, however, is an open research question. Designs vary from GPUs to cache-coherent shared memory multiprocessors to pure distributed memory chips. The 48-core SCC processor reported in this paper is an intermediate case, sharing traits of message passing and shared memory architectures. The hardware has been described elsewhere. In this paper, we describe the programmer's view of this chip. In particular we describe RCCE: the native message passing model created for the SCC processor.

267 citations


Cited by
More filters
Journal ArticleDOI

[...]

01 Oct 2015
TL;DR: Within the T-CREST project the authors propose novel solutions for time-predictable multi-core architectures that are optimized for the WCET instead of the average-case execution time.
Abstract: Real-time systems need time-predictable platforms to allow static analysis of the worst-case execution time (WCET). Standard multi-core processors are optimized for the average case and are hardly analyzable. Within the T-CREST project we propose novel solutions for time-predictable multi-core architectures that are optimized for the WCET instead of the average-case execution time. The resulting time-predictable resources (processors, interconnect, memory arbiter, and memory controller) and tools (compiler, WCET analysis) are designed to ease WCET analysis and to optimize WCET performance. Compared to other processors the WCET performance is outstanding.The T-CREST platform is evaluated with two industrial use cases. An application from the avionic domain demonstrates that tasks executing on different cores do not interfere with respect to their WCET. A signal processing application from the railway domain shows that the WCET can be reduced for computation-intensive tasks when distributing the tasks on several cores and using the network-on-chip for communication. With three cores the WCET is improved by a factor of 1.8 and with 15 cores by a factor of 5.7.The T-CREST project is the result of a collaborative research and development project executed by eight partners from academia and industry. The European Commission funded T-CREST.

166 citations

Journal ArticleDOI
TL;DR: This paper describes two communication libraries available on the Single-Chip Cloud Computer: RCCE and Rckmb, a light-weight, minimal library for writing message passing parallel applications and SCC's non-cache-coherent shared memory for transferring data between cores without needing to go off-chip.
Abstract: Many-core chips are changing the way high-performance computing systems are built and programmed. As it is becoming increasingly difficult to maintain cache coherence across many cores, manufacturers are exploring designs that do not feature any cache coherence between cores. Communications on such chips are naturally implemented using message passing, which makes them resemble clusters, but with an important difference. Special hardware can be provided that supports very fast on-chip communications, reducing latency and increasing bandwidth. We present one such chip, the Single-Chip Cloud Computer (SCC). This is an experimental processor, created by Intel Labs. We describe two communication libraries available on SCC: RCCE and Rckmb. RCCE is a light-weight, minimal library for writing message passing parallel applications. Rckmb provides the data link layer for running network services such as TCP/IP. Both utilize SCC's non-cache-coherent shared memory for transferring data between cores without needing to go off-chip. In this paper we describe the design and implementation of RCCE and Rckmb. To compare their performance, we consider simple benchmarks run with RCCE, and MPI over TCP/IP.

93 citations

Journal ArticleDOI
TL;DR: Vicis is a fault-tolerant architecture and companion routing protocol that is robust to a large number of permanent failures, allowing communication to continue in the face of permanent transistor failures.
Abstract: Aggressive transistor scaling continues to drive increasingly complex digital designs. The large number of transistors available today enables the development of chip multiprocessors that include many cores on one die communicating through an on-chip interconnect. As the number of cores increases, scalable communication platforms, such as networks-on-chip (NoCs), have become more popular. However, as the sole communication medium, these interconnects are a single point of failure so that any permanent fault in the NoC can cause the entire system to fail. Compounding the problem, transistors have become increasingly susceptible to wear-out related failures as their critical dimensions shrink. As a result, the on-chip network has become a critically exposed unit that must be protected. To this end, we present Vicis, a fault-tolerant architecture and companion routing protocol that is robust to a large number of permanent failures, allowing communication to continue in the face of permanent transistor failures. Vicis makes use of a two-level approach. First, it attempts to work around errors within a router by leveraging reconfigurable architectural components. Second, when faults within a router disable a link's connectivity, or even an entire router, Vicis reroutes around the faulty node or link with a novel, distributed routing algorithm for meshes and tori. Tolerating permanent faults in both the router components and the reliability hardware itself, Vicis enables graceful performance degradation of networks-on-chip.

92 citations

Journal ArticleDOI
TL;DR: A new framework called MARD is presented, to protect the end points that are often the last defense, against metamorphic malware, and provides automation, platform independence, optimizations for real-time performance and modularity.

80 citations

Proceedings ArticleDOI
24 Feb 2014
TL;DR: This work identifies a shared-most OS model for multiple coherence domains: creating per-domain instances of core OS services with no shared state, while enabling other extended OS services to share state across domains.
Abstract: Mobile System-on-Chips (SoC) that incorporate heterogeneous coherence domains promise high energy efficiency to a wide range of mobile applications, yet are difficult to program. To exploit the architecture, a desirable, yet missing capability is to replicate operating system (OS) services over multiple coherence domains with minimum inter-domain communication. In designing such an OS, we set three goals: to ease application development, to simplify OS engineering, and to preserve the current OS performance. To this end, we identify a shared-most OS model for multiple coherence domains: creating per-domain instances of core OS services with no shared state, while enabling other extended OS services to share state across domains. To test the model, we build K2, a prototype OS on the TI OMAP4 SoC, by reusing most of the Linux 3.4 source. K2 presents a single system image to applications with its two kernels running on top of the two coherence domains of OMAP4. The two kernels have independent instances of core OS services, such as page allocator and interrupt management, as coordinated by K2; the two kernels share most extended OS services, such as device drivers, whose state is kept coherent transparently by K2. Despite platform constraints and unoptimized code, K2 improves energy efficiency for light OS workloads by 8x-10x, while incurring less than 6% performance overhead for a device driver shared between kernels. Our experiences with K2 show that the shared-most model is promising.

69 citations