scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

DMTCP: Transparent checkpointing for cluster computations and the desktop

TL;DR: DMTCP as mentioned in this paper is a transparent user-level checkpointing package for distributed applications, which is used for the runCMS experiment of the Large Hadron Collider at CERN, and it can be incorporated and distributed as a checkpoint-restart module within some larger package.
Abstract: DMTCP (Distributed MultiThreaded CheckPointing) is a transparent user-level checkpointing package for distributed applications. Checkpointing and restart is demonstrated for a wide range of over 20 well known applications, including MATLAB, Python, TightVNC, MPICH2, OpenMPI, and runCMS. RunCMS runs as a 680 MB image in memory that includes 540 dynamic libraries, and is used for the CMS experiment of the Large Hadron Collider at CERN. DMTCP transparently checkpoints general cluster computations consisting of many nodes, processes, and threads; as well as typical desktop applications. On 128 distributed cores (32 nodes), checkpoint and restart times are typically 2 seconds, with negligible run-time overhead. Typical checkpoint times are reduced to 0.2 seconds when using forked checkpointing. Experimental results show that checkpoint time remains nearly constant as the number of nodes increases on a medium-size cluster. DMTCP automatically accounts for fork, exec, ssh, mutexes/ semaphores, TCP/IP sockets, UNIX domain sockets, pipes, ptys (pseudo-terminals), terminal modes, ownership of controlling terminals, signal handlers, open file descriptors, shared open file descriptors, I/O (including the readline library), shared memory (via mmap), parent-child process relationships, pid virtualization, and other operating system artifacts. By emphasizing an unprivileged, user-space approach, compatibility is maintained across Linux kernels from 2.6.9 through the current 2.6.28. Since DMTCP is unprivileged and does not require special kernel modules or kernel patches, DMTCP can be incorporated and distributed as a checkpoint-restart module within some larger package.

Content maybe subject to copyright    Report

Citations
More filters
Dissertation
Constantinos Makassikis1
02 Feb 2011
TL;DR: MoLOToF as mentioned in this paper is a framework for tolerance aux pannes of niveau applicatif and fondee sur la realisation de sauvegardes.
Abstract: Les grappes de PCs constituent des architectures distribuees dont l'adoption se repand a cause de leur faible cout mais aussi de leur extensibilite en termes de noeuds. Notamment, l'augmentation du nombre des noeuds est a l'origine d'un nombre croissant de pannes par arret qui mettent en peril l'execution d'applications distribuees. L'absence de solutions efficaces et portables confine leur utilisation a des applications non critiques ou sans contraintes de temps. MoLOToF est un modele de tolerance aux pannes de niveau applicatif et fondee sur la realisation de sauvegardes. Pour faciliter l'ajout de la tolerance aux pannes, il propose une structuration de l'application selon des squelettes tolerants aux pannes, ainsi que des collaborations entre le programmeur et le systeme de tolerance des pannes pour gagner en efficacite. L'application de MoLOToF a des familles d'algorithmes paralleles SPMD et Maitre-Travailleur a mene aux frameworks FT-GReLoSSS et ToMaWork respectivement. Chaque framework fournit des squelettes tolerants aux pannes adaptes aux familles d'algorithmes visees et une mise en oeuvre originale. FT-GReLoSSS est implante en C++ au-dessus de MPI alors que ToMaWork est implante en Java au-dessus d'un systeme de memoire partagee virtuelle fourni par la technologie JavaSpaces. L'evaluation des frameworks montre un surcout en temps de developpement raisonnable et des surcouts en temps d'execution negligeables en l'absence de tolerance aux pannes. Les experiences menees jusqu'a 256 noeuds sur une grappe de PCs bi-coeurs, demontrent une meilleure efficacite de la solution de tolerance aux pannes de FT-GReLoSSS par rapport a des solutions existantes de niveau systeme (LAM/MPI et DMTCP).

2 citations

Journal ArticleDOI
TL;DR: A power-efficient version of local rollback is proposed to reduce power consumption for non-critical, blocked processes, using Dynamic Voltage and Frequency Scaling and clock modulation and it is estimated that for settings with high recovery overheads the total energy waste of parallel codes is reduced with the proposed local rollbacks.
Abstract: In fault tolerance for parallel and distributed systems, message logging protocols have played a prominent role in the last three decades. Such protocols enable local rollback to provide recovery from fail-stop errors. Global rollback techniques can be straightforward to implement but at times lead to slower recovery than local rollback. Local rollback is more complicated but can offer faster recovery times. In this work, we study the power and energy efficiency implications of global and local rollback. We propose a power-efficient version of local rollback to reduce power consumption for non-critical, blocked processes, using Dynamic Voltage and Frequency Scaling (DVFS) and clock modulation (CM). Our results for 3 different MPI codes on 2 parallel systems show that power-efficient local rollback reduces CPU energy waste up to 50% during the recovery phase, compared to existing global and local rollback techniques, without introducing significant overheads. Furthermore, we show that savings manifest for all blocked processes, which grow linearly with the process count. We estimate that for settings with high recovery overheads the total energy waste of parallel codes is reduced with the proposed local rollback.

2 citations

Posted Content
TL;DR: Fast Reversible Debugger (FReD) as mentioned in this paper is an automated tool to search through the process lifetime and locate the cause of a bug, which is useful when the cause is close in time to the bug manifestation.
Abstract: Reversible debuggers have been developed at least since 1970. Such a feature is useful when the cause of a bug is close in time to the bug manifestation. When the cause is far back in time, one resorts to setting appropriate breakpoints in the debugger and beginning a new debugging session. For these cases when the cause of a bug is far in time from its manifestation, bug diagnosis requires a series of debugging sessions with which to narrow down the cause of the bug. For such "difficult" bugs, this work presents an automated tool to search through the process lifetime and locate the cause. As an example, the bug could be related to a program invariant failing. A binary search through the process lifetime suffices, since the invariant expression is true at the beginning of the program execution, and false when the bug is encountered. An algorithm for such a binary search is presented within the FReD (Fast Reversible Debugger) software. It is based on the ability to checkpoint, restart and deterministically replay the multiple processes of a debugging session. It is based on GDB (a debugger), DMTCP (for checkpoint-restart), and a custom deterministic record-replay plugin for DMTCP. FReD supports complex, real-world multithreaded programs, such as MySQL and Firefox. Further, the binary search is robust. It operates on multi-threaded programs, and takes advantage of multi-core architectures during replay.

2 citations

Book ChapterDOI
19 Jun 2016
TL;DR: Application migration between nodes has been proposed as a tool to mitigate bottlenecks due to resource sharing andHardware errors can often be tolerated by the system if faulty nodes are detected and processes are migrated ahead of time.
Abstract: It is predicted that the number of cores per node will rapidly increase with the upcoming era of exascale supercomputers. As a result, multiple applications will have to share one node and compete for the (often scarce) resources available on this node. Furthermore, the growing number of hardware components causes a decrease in the mean time between failures. Application migration between nodes has been proposed as a tool to mitigate these two problems: Bottlenecks due to resource sharing can be addressed by load balancing schemes which migrate applications; and hardware errors can often be tolerated by the system if faulty nodes are detected and processes are migrated ahead of time.

2 citations

01 Jan 2011
TL;DR: The design of a package, Roomy, for parallel disk-based computation is described, which allows one to more easily produce CPU-intensive, storage-limited, data-Parallel computations, while being minimally invasive with respect to the original data-parallel algorithm.
Abstract: The design of a package, Roomy, for parallel disk-based computation is described. Many important algorithms run out of available RAM in minutes to hours. Roomy allows one to more easily produce CPU-intensive, storage-limited, data-parallel computations, while being minimally invasive with respect to the original data-parallel algorithm. Roomy supports a minimally invasive approach through its rich library of latency-tolerant parallel data structures that support delayed operations through a synchronization command. Roomy has been used to write some relatively short programs whose computations match those of some large, record-breaking computations reported in the recent literature.

2 citations

References
More filters
Journal ArticleDOI
01 May 2007
TL;DR: The IPython project as mentioned in this paper provides an enhanced interactive environment that includes, among other features, support for data visualization and facilities for distributed and parallel computation for interactive work and a comprehensive library on top of which more sophisticated systems can be built.
Abstract: Python offers basic facilities for interactive work and a comprehensive library on top of which more sophisticated systems can be built. The IPython project provides on enhanced interactive environment that includes, among other features, support for data visualization and facilities for distributed and parallel computation

3,355 citations

Journal ArticleDOI
TL;DR: An algorithm by which a process in a distributed system determines a global state of the system during a computation, which helps to solve an important class of problems: stable property detection.
Abstract: This paper presents an algorithm by which a process in a distributed system determines a global state of the system during a computation. Many problems in distributed systems can be cast in terms of the problem of detecting global states. For instance, the global state detection algorithm helps to solve an important class of problems: stable property detection. A stable property is one that persists: once a stable property becomes true it remains true thereafter. Examples of stable properties are “computation has terminated,” “ the system is deadlocked” and “all tokens in a token ring have disappeared.” The stable property detection problem is that of devising algorithms to detect a given stable property. Global state detection can also be used for checkpointing.

2,738 citations

01 Jan 1996
TL;DR: The MPI Message Passing Interface (MPI) as discussed by the authors is a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists.
Abstract: MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we describe MPICH, unique among existing implementations in its design goal of combining portability with high performance. We document its portability and performance and describe the architecture by which these features are simultaneously achieved. We also discuss the set of tools that accompany the free distribution of MPICH, which constitute the beginnings of a portable parallel programming environment. A project of this scope inevitably imparts lessons about parallel computing, the specification being followed, the current hardware and software environment for parallel computing, and project management; we describe those we have learned. Finally, we discuss future developments for MPICH, including those necessary to accommodate extensions to the MPI Standard now being contemplated by the MPI Forum.

2,065 citations

Proceedings Article
16 Jan 1995
TL;DR: In this paper, the authors describe a portable checkpointing tool for Unix that implements all applicable performance optimizations which are reported in the literature and also supports the incorporation of user directives into the creation of checkpoints.
Abstract: Checkpointing is a simple technique for rollback recovery: the state of an executing program is periodically saved to a disk file from which it can be recovered after a failure. While recent research has developed a collection of powerful techniques for minimizing the overhead of writing checkpoint files, checkpointing remains unavailable to most application developers. In this paper we describe libckpt, a portable checkpointing tool for Unix that implements all applicable performance optimizations which are reported in the literature. While libckpt can be used in a mode which is almost totally transparent to the programmer, it also supports the incorporation of user directives into the creation of checkpoints. This user-directed checkpointing is an innovation which is unique to our work.

670 citations

Proceedings Article
10 Apr 2005
TL;DR: This is the first system that can migrate unmodified applications on unmodified mainstream Intel x86-based operating system, including Microsoft Windows, Linux, Novell NetWare and others, to provide fast, transparent application migration.
Abstract: This paper describes the design and implementation of a system that uses virtual machine technology [1] to provide fast, transparent application migration. This is the first system that can migrate unmodified applications on unmodified mainstream Intel x86-based operating system, including Microsoft Windows, Linux, Novell NetWare and others. Neither the application nor any clients communicating with the application can tell that the application has been migrated. Experimental measurements show that for a variety of workloads, application downtime caused by migration is less than a second.

588 citations