scispace - formally typeset
Proceedings ArticleDOI

Real-Time Distributed Discrete-Event Execution with Fault Tolerance

Reads0
Chats0
TLDR
This paper takes a program transformation approach to automatically enhance DE models with incremental checkpointing and state recovery functionality and incorporates this mechanism into PTIDES for efficient execution of fault- tolerant real-time distributed DE systems.
Abstract
We build on PTIDES, a programming model for distributed embedded systems that uses discrete-event (DE) models as program specifications. PTIDES improves on distributed DE execution by allowing more concurrent event processing without backtracking. This paper discusses the general execution strategy for PTIDES, and provides two feasible implementations. This execution strategy is then extended with tolerance for hardware errors. We take a program transformation approach to automatically enhance DE models with incremental checkpointing and state recovery functionality. Our fault tolerance mechanism is lightweight and has low overhead. It requires very little human intervention. We incorporate this mechanism into PTIDES for efficient execution of fault- tolerant real-time distributed DE systems.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

The past, present and future of cyber-physical systems: a focus on models.

TL;DR: Two projects show that deterministic CPS models with faithful physical realizations are possible and practical and shows that the timing precision of synchronous digital logic can be practically made available at the software level of abstraction.
Proceedings ArticleDOI

CPS foundations

TL;DR: It is argued that cyber-physical systems present a substantial intellectual challenge that requires changes in both theories of computation and dynamical systems theory, and demands models that embrace both.
Book ChapterDOI

A Provenance-Based Fault Tolerance Mechanism for Scientific Workflows

TL;DR: This paper presents a method for capturing data value- and control- dependencies for provenance information collection in the Kepler scientific workflow system and describes how the collected information based on these dependencies could be used for a fault tolerance framework in different models of computation.
Proceedings ArticleDOI

Execution Strategies for PTIDES, a Programming Model for Distributed Embedded Systems

TL;DR: This paper first defines a general execution strategy that conforms to the DE semantics, and then specializes this strategy to give practical, implementable and distributed policies.
ReportDOI

PTIDES: A Programming Model for Distributed Real-Time Embedded Systems

TL;DR: An execution strategy that is aggressive in concurrent execution of events is presented, that allows independent events to be processed out of time stamp order and uses clock synchronization as a replacement for null message communication across distributed platforms.
References
More filters
Journal ArticleDOI

Virtual time

TL;DR: Virtual time is a new paradigm for organizing and synchronizing distributed systems which can be applied to such problems as distributed discrete event simulation and distributed database concurrency control.
Journal ArticleDOI

A survey of rollback-recovery protocols in message-passing systems

TL;DR: This survey covers rollback-recovery techniques that do not require special language constructs and distinguishes between checkpoint-based and log-based protocols, which rely solely on checkpointing for system state restoration.
Journal ArticleDOI

System structure for software fault tolerance

TL;DR: In this article, the authors present a method for structuring complex computing systems by the use of what they term "recovery blocks", "conversations", and "fault-tolerant interfaces".

Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems

Kang B. Lee, +1 more
TL;DR: A protocol is provided in this standard that enables precise synchronization of clocks in measurement and control systems implemented with technologies such as network communication, local computing, and distributed objects.
Journal ArticleDOI

Synchronization and Linearity: An Algebra for Discrete Event Systems

TL;DR: This book proposes a unified mathematical treatment of a class of 'linear' discrete event systems, which contains important subclasses of Petri nets and queuing networks with synchronization constraints, which is shown to parallel the classical linear system theory in several ways.
Related Papers (5)