scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

DMTCP: Transparent checkpointing for cluster computations and the desktop

TL;DR: DMTCP as mentioned in this paper is a transparent user-level checkpointing package for distributed applications, which is used for the runCMS experiment of the Large Hadron Collider at CERN, and it can be incorporated and distributed as a checkpoint-restart module within some larger package.
Abstract: DMTCP (Distributed MultiThreaded CheckPointing) is a transparent user-level checkpointing package for distributed applications. Checkpointing and restart is demonstrated for a wide range of over 20 well known applications, including MATLAB, Python, TightVNC, MPICH2, OpenMPI, and runCMS. RunCMS runs as a 680 MB image in memory that includes 540 dynamic libraries, and is used for the CMS experiment of the Large Hadron Collider at CERN. DMTCP transparently checkpoints general cluster computations consisting of many nodes, processes, and threads; as well as typical desktop applications. On 128 distributed cores (32 nodes), checkpoint and restart times are typically 2 seconds, with negligible run-time overhead. Typical checkpoint times are reduced to 0.2 seconds when using forked checkpointing. Experimental results show that checkpoint time remains nearly constant as the number of nodes increases on a medium-size cluster. DMTCP automatically accounts for fork, exec, ssh, mutexes/ semaphores, TCP/IP sockets, UNIX domain sockets, pipes, ptys (pseudo-terminals), terminal modes, ownership of controlling terminals, signal handlers, open file descriptors, shared open file descriptors, I/O (including the readline library), shared memory (via mmap), parent-child process relationships, pid virtualization, and other operating system artifacts. By emphasizing an unprivileged, user-space approach, compatibility is maintained across Linux kernels from 2.6.9 through the current 2.6.28. Since DMTCP is unprivileged and does not require special kernel modules or kernel patches, DMTCP can be incorporated and distributed as a checkpoint-restart module within some larger package.

Content maybe subject to copyright    Report

Citations
More filters
Dissertation
28 Apr 2010
TL;DR: Ce travail se place dans le cadre du calcul haute performance sur des plateformes d'execution de grande taille telles que les grilles de calcul selon un protocole de tolerance aux fautes original qui permet d'effectuer une reprise partielle of l'application en cas of panne.
Abstract: Ce travail se place dans le cadre du calcul haute performance sur des plateformes d'execution de grande taille telles que les grilles de calcul. Les grilles de calcul sont notamment caracterisees par (1) des changements frequents des conditions d'execution et, en particulier, par (2) une probabilite importante de defaillance due au grand nombre de composants. Pour executer une application efficacement dans un tel environnement, il est necessaire de prendre en compte ces parametres. Nos travaux de recherche reposent sur la representation abstraite de l'application sous forme d'un graphe de flot de donnees de l'environnement de programmation parallele et distribuee Athapascan/Kaapi. Nous utilisons cette representation abstraite pour apporter des solutions aux problemes (1) de reconfiguration dynamique et (2) de tolerance aux fautes. - Tout d'abord, nous proposons un mecanisme de reconfiguration dynamique qui gere, de maniere transparente pour le programmeur de la reconfiguration, les problemes d'acces concurrents sur l'etat de l'application et la coherence mutuelle des etats en cas de reconfiguration distribuee. - Ensuite, nous presentons un protocole de tolerance aux fautes original qui permet d'effectuer une reprise partielle de l'application en cas de panne. Pour cela, il determine l'ensemble des tâches de calcul strictement necessaires a la reprise de l'application. Ces contributions sont evaluees en utilisant les logiciels Kaapi et X-Kaapi sur la plateforme de calcul Grid'5000.

12 citations

Proceedings ArticleDOI
24 Sep 2012
TL;DR: This study examines the use of ARM-based clusters for low-power, high performance computing, and relies on two recent extensions to the DMTCP checkpoint-restart package to demonstrate the ability to deploy pre-configured software in virtual machines hosted in the cloud, and further to migrate cluster computation between hosts in thecloud.
Abstract: In cluster computing, power and cooling represent a significant cost compared to the hardware itself This is of special concern in the cloud, which provides access to large numbers of computers We examine the use of ARM-based clusters for low-power, high performance computing This work examines two likely use-modes: (i) a standard dedicated cluster, and (ii) a cluster of pre-configured virtual machines in the cloud A 40-node department-level cluster based on an ARM Cortex-A9 is compared against a similar cluster based on an Intel Core2 Duo, in contrast to a recent similar study on just a 4-node cluster For the NAS benchmarks on 32-node clusters, ARM was found to have a power efficiency ranging from 13 to 62 times greater than that of Intel This is despite Intel's approximately five times greater performance The particular efficiency ratio depends primarily on the size of the working set relative to L2 cache In addition to energy-efficient computing, this study also emphasizes fault tolerance: an important ingredient in high performance computing It relies on two recent extensions to the DMTCP checkpoint-restart package DMTCP was extended (i) to support ARM CPUs, and (ii) to support check pointing of the Qemu virtual machine in user-mode DMTCP is used both to checkpoint native distributed applications, and to checkpoint a network of virtual machines This latter case demonstrates the ability to deploy pre-configured software in virtual machines hosted in the cloud, and further to migrate cluster computation between hosts in the cloud

11 citations

Journal ArticleDOI
TL;DR: This paper motivates the need of redundancy elimination with a detailed analysis of checkpoint data from real scenarios and applies inline data deduplication to achieve the objective of reducing checkpoint size.
Abstract: The increasing scale, such as the size and complexity, of computer systems brings more frequent occurrences of hardware or software faults; thus fault-tolerant techniques become an essential component in high-performance computing systems. In order to achieve the goal of tolerating runtime faults, checkpoint restart is a typical and widely used method. However, the exploding sizes of checkpoint files that need to be saved to external storage pose a major scalability challenge, necessitating the design of efficient approaches to reducing the amount of checkpointing data. In this paper, we first motivate the need of redundancy elimination with a detailed analysis of checkpoint data from real scenarios. Based on the analysis, we apply inline data deduplication to achieve the objective of reducing checkpoint size. We use DMTCP, an open-source checkpoint restart package, to validate our method. Our experiment shows that, by using our method, single-computer programs can reduce the size of checkpoint file by 20% and distributed programs can reduce the size of checkpoint file by 47%.

11 citations

Proceedings ArticleDOI
23 Sep 2018
TL;DR: It is argued that enabling fault-tolerance without any modification inside target MPI applications is possible, and it could be the first step for more integrated resiliency combined with failure mitigation like ULFM.
Abstract: Fault-tolerance has always been an important topic when it comes to running massively parallel programs at scale. Statistically, hardware and software failures are expected to occur more often on systems gathering millions of computing units. Moreover, the larger jobs are, the more computing hours would be wasted by a crash. In this paper, we describe the work done in our MPI runtime to enable transparent checkpointing mechanism. Unlike the MPI 4.0 User-Level Failure Mitigation (ULFM) interface, our work targets solely Checkpoint/Restart (C/R) and ignores wider features such as resiliency. We show how existing transparent checkpointing methods can be practically applied to MPI implementations given a sufficient collaboration from the MPI runtime. Our C/R technique is then measured on MPI benchmarks such as IMB and Lulesh relying on Infiniband high-speed network, demonstrating that the chosen approach is sufficiently general and that performance is mostly preserved. We argue that enabling fault-tolerance without any modification inside target MPI applications is possible, and show how it could be the first step for more integrated resiliency combined with failure mitigation like ULFM.

11 citations

Proceedings ArticleDOI
13 Apr 2020
TL;DR: This work proposes to snapshot the state of fully-deployed containers and restart future container instances from a pre-started application state, which effectively reduces the startup phase with speedups between 1x and 60x.
Abstract: n fog computing environments container deployment is a frequent operation which often lies in the critical path of services being delivered to an end user. Although creating a container can be very fast, the container’s application needs to start before the container starts producing useful work. Depending on the application this startup process can be arbitrarily long. To speed up the application startup times we propose to snapshot the state of fully-deployed containers and restart future container instances from a pre-started application state. In our evaluations based on 14 real micro-service containers, this technique effectively reduces the startup phase with speedups between 1x (no speedup) and 60x.

11 citations

References
More filters
Journal ArticleDOI
01 May 2007
TL;DR: The IPython project as mentioned in this paper provides an enhanced interactive environment that includes, among other features, support for data visualization and facilities for distributed and parallel computation for interactive work and a comprehensive library on top of which more sophisticated systems can be built.
Abstract: Python offers basic facilities for interactive work and a comprehensive library on top of which more sophisticated systems can be built. The IPython project provides on enhanced interactive environment that includes, among other features, support for data visualization and facilities for distributed and parallel computation

3,355 citations

Journal ArticleDOI
TL;DR: An algorithm by which a process in a distributed system determines a global state of the system during a computation, which helps to solve an important class of problems: stable property detection.
Abstract: This paper presents an algorithm by which a process in a distributed system determines a global state of the system during a computation. Many problems in distributed systems can be cast in terms of the problem of detecting global states. For instance, the global state detection algorithm helps to solve an important class of problems: stable property detection. A stable property is one that persists: once a stable property becomes true it remains true thereafter. Examples of stable properties are “computation has terminated,” “ the system is deadlocked” and “all tokens in a token ring have disappeared.” The stable property detection problem is that of devising algorithms to detect a given stable property. Global state detection can also be used for checkpointing.

2,738 citations

01 Jan 1996
TL;DR: The MPI Message Passing Interface (MPI) as discussed by the authors is a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists.
Abstract: MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we describe MPICH, unique among existing implementations in its design goal of combining portability with high performance. We document its portability and performance and describe the architecture by which these features are simultaneously achieved. We also discuss the set of tools that accompany the free distribution of MPICH, which constitute the beginnings of a portable parallel programming environment. A project of this scope inevitably imparts lessons about parallel computing, the specification being followed, the current hardware and software environment for parallel computing, and project management; we describe those we have learned. Finally, we discuss future developments for MPICH, including those necessary to accommodate extensions to the MPI Standard now being contemplated by the MPI Forum.

2,065 citations

Proceedings Article
16 Jan 1995
TL;DR: In this paper, the authors describe a portable checkpointing tool for Unix that implements all applicable performance optimizations which are reported in the literature and also supports the incorporation of user directives into the creation of checkpoints.
Abstract: Checkpointing is a simple technique for rollback recovery: the state of an executing program is periodically saved to a disk file from which it can be recovered after a failure. While recent research has developed a collection of powerful techniques for minimizing the overhead of writing checkpoint files, checkpointing remains unavailable to most application developers. In this paper we describe libckpt, a portable checkpointing tool for Unix that implements all applicable performance optimizations which are reported in the literature. While libckpt can be used in a mode which is almost totally transparent to the programmer, it also supports the incorporation of user directives into the creation of checkpoints. This user-directed checkpointing is an innovation which is unique to our work.

670 citations

Proceedings Article
10 Apr 2005
TL;DR: This is the first system that can migrate unmodified applications on unmodified mainstream Intel x86-based operating system, including Microsoft Windows, Linux, Novell NetWare and others, to provide fast, transparent application migration.
Abstract: This paper describes the design and implementation of a system that uses virtual machine technology [1] to provide fast, transparent application migration. This is the first system that can migrate unmodified applications on unmodified mainstream Intel x86-based operating system, including Microsoft Windows, Linux, Novell NetWare and others. Neither the application nor any clients communicating with the application can tell that the application has been migrated. Experimental measurements show that for a variety of workloads, application downtime caused by migration is less than a second.

588 citations