scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

DMTCP: Transparent checkpointing for cluster computations and the desktop

TL;DR: DMTCP as mentioned in this paper is a transparent user-level checkpointing package for distributed applications, which is used for the runCMS experiment of the Large Hadron Collider at CERN, and it can be incorporated and distributed as a checkpoint-restart module within some larger package.
Abstract: DMTCP (Distributed MultiThreaded CheckPointing) is a transparent user-level checkpointing package for distributed applications. Checkpointing and restart is demonstrated for a wide range of over 20 well known applications, including MATLAB, Python, TightVNC, MPICH2, OpenMPI, and runCMS. RunCMS runs as a 680 MB image in memory that includes 540 dynamic libraries, and is used for the CMS experiment of the Large Hadron Collider at CERN. DMTCP transparently checkpoints general cluster computations consisting of many nodes, processes, and threads; as well as typical desktop applications. On 128 distributed cores (32 nodes), checkpoint and restart times are typically 2 seconds, with negligible run-time overhead. Typical checkpoint times are reduced to 0.2 seconds when using forked checkpointing. Experimental results show that checkpoint time remains nearly constant as the number of nodes increases on a medium-size cluster. DMTCP automatically accounts for fork, exec, ssh, mutexes/ semaphores, TCP/IP sockets, UNIX domain sockets, pipes, ptys (pseudo-terminals), terminal modes, ownership of controlling terminals, signal handlers, open file descriptors, shared open file descriptors, I/O (including the readline library), shared memory (via mmap), parent-child process relationships, pid virtualization, and other operating system artifacts. By emphasizing an unprivileged, user-space approach, compatibility is maintained across Linux kernels from 2.6.9 through the current 2.6.28. Since DMTCP is unprivileged and does not require special kernel modules or kernel patches, DMTCP can be incorporated and distributed as a checkpoint-restart module within some larger package.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
01 Sep 2016
TL;DR: This work presents DMTCP-PV, a new user-space transparent checkpointing system based on the concept of process virtualization, which separately models the state of each local or distributed subsystem while decoupling it from the core checkpointing engine.
Abstract: System-level checkpoint-restart is a critical technology for long-running jobs in high-performance computing. Yet, only two approaches to checkpointing MPI applications continue to survive in wide use today. One approach is to use the kernel module-based BLCR in combination with an MPI checkpoint-restart service particular to the MPI implementation in use. Unfortunately, this lacks support for some important Linux system services such as SysV IPC (e.g., shared memory objects). A second approach has been to use the original 2009 DMTCP implementation (herein referred to as DMTCP-09) for transparent, system-level checkpointing. Unfortunately, DMTCP-09 lacked support for checkpointing many of the necessary features found by MPI in a modern batch environment. These include: ssh, the InfiniBand network, process migration (restarting an MPI application on different cluster nodes), and modified file path prefixes on restart (typically due to a changing current directory, mount points, library paths, etc.). This work presents DMTCP-PV, a new user-space transparent checkpointing system based on the concept of process virtualization. This approach separately models the state of each local or distributed subsystem while decoupling it from the core checkpointing engine. By separating these concerns, a domain expert can extend checkpointing into a new domain without any knowledge of the core checkpointing engine. This allowed DMTCP-PV to address the deficiencies noted above and many others. It is shown that the runtime overhead of DMTCP-PV is generally less than 1%, and the checkpointing time is dominated by the time to write an image file to stable storage.

21 citations

Journal ArticleDOI
TL;DR: In this article, the authors discuss how to improve control software troubleshooting in software-defined networking control software, and how to find the root cause of software bugs in SDN control software.
Abstract: Software bugs are inevitable in software-defined networking control software, and troubleshooting is a tedious, time-consuming task. In this paper we discuss how to improve control software trouble...

21 citations

Journal ArticleDOI
TL;DR: Cache-based model checking as mentioned in this paper is an extension of Java PathFinder, a Java model checker, which allows to verify one process at a time and run other processes in another execution environment.
Abstract: Distributed systems are complex, being usually composed of several subsystems running in parallel. Concurrent execution and inter-process communication in these systems are prone to errors that are difficult to detect by traditional testing, which does not cover every possible program execution. Unlike testing, model checking can detect such faults in a concurrent system by exploring every possible state of the system. However, most model-checking techniques require that a system be described in a modeling language. Although this simplifies verification, faults may be introduced in the implementation. Recently, some model checkers verify program code at runtime but tend to be limited to stand-alone programs. This paper proposes cache-based model checking, which relaxes this limitation to some extent by verifying one process at a time and running other processes in another execution environment. This approach has been implemented as an extension of Java PathFinder, a Java model checker. It is a scalable and promising technique to handle distributed systems. To support a larger class of distributed systems, a checkpointing tool is also integrated into the verification system. Experimental results on various distributed systems show the capability and scalability of cache-based model checking.

21 citations

Proceedings ArticleDOI
01 Sep 2013
TL;DR: The first uniform mechanism for checkpointing a network of virtual machines is described, important for the parallel versions of common productivity software.
Abstract: The ability to easily deploy parallel computations on the Cloud is becoming ever more important The first uniform mechanism for checkpointing a network of virtual machines is described This is important for the parallel versions of common productivity software Potential examples of parallelism include Simulink for MATLAB, parallel R for the R statistical modelling language, parallel blastpy for the BLAST bioinformatics software, IPythonparallel for Python, and GNU parallel for parallel shells The checkpoint mechanism is implemented as a plugin in the DMTCP checkpoint-restart package It operates on KVM/QEMU, and has also been adapted to Lguest and pure user-space QEMU The plugin is surprisingly compact, comprising just 400 lines of code to checkpoint a single virtual machine, and 200 lines of code for a plugin to support saving and restoring network state Incremental checkpoints of the associated virtual filesystem are accommodated through the Btrfs filesystem Experiments demonstrate checkpoint times of a fraction of a second by using forked checkpointing, mmap-based restart, and incremental Btrfs-based snapshots

20 citations

Proceedings ArticleDOI
30 Oct 2015
TL;DR: This paper investigates capabilities provided by container to model and build the Micro-service infrastructure and compare different checkpoint and restore technologies for high availability, and presents preliminary performance results of the infrastructure tuned to the micro-service.
Abstract: In the present days of rapid adoption of micro-service, it is imperative to build a system to support and ensure the high performance and high availability for micro-services. Lightweight virtualization, which we also called container, has the ability to run multiple isolated sets of processes under a single kernel instance. Because of the possibility of obtaining a low overhead comparable to the near-native performance of a bare server, the container techniques, such as openvz, lxc, docker, they are widely used for micro-service (1). In this paper, we present the high availability of micro-service in containers. We investigate capabilities provided by container (docker, openvz) to model and build the Micro-service infrastructure and compare different checkpoint and restore technologies for high availability. Finally, we present preliminary performance results of the infrastructure tuned to the micro-service.

20 citations

References
More filters
Journal ArticleDOI
01 May 2007
TL;DR: The IPython project as mentioned in this paper provides an enhanced interactive environment that includes, among other features, support for data visualization and facilities for distributed and parallel computation for interactive work and a comprehensive library on top of which more sophisticated systems can be built.
Abstract: Python offers basic facilities for interactive work and a comprehensive library on top of which more sophisticated systems can be built. The IPython project provides on enhanced interactive environment that includes, among other features, support for data visualization and facilities for distributed and parallel computation

3,355 citations

Journal ArticleDOI
TL;DR: An algorithm by which a process in a distributed system determines a global state of the system during a computation, which helps to solve an important class of problems: stable property detection.
Abstract: This paper presents an algorithm by which a process in a distributed system determines a global state of the system during a computation. Many problems in distributed systems can be cast in terms of the problem of detecting global states. For instance, the global state detection algorithm helps to solve an important class of problems: stable property detection. A stable property is one that persists: once a stable property becomes true it remains true thereafter. Examples of stable properties are “computation has terminated,” “ the system is deadlocked” and “all tokens in a token ring have disappeared.” The stable property detection problem is that of devising algorithms to detect a given stable property. Global state detection can also be used for checkpointing.

2,738 citations

01 Jan 1996
TL;DR: The MPI Message Passing Interface (MPI) as discussed by the authors is a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists.
Abstract: MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we describe MPICH, unique among existing implementations in its design goal of combining portability with high performance. We document its portability and performance and describe the architecture by which these features are simultaneously achieved. We also discuss the set of tools that accompany the free distribution of MPICH, which constitute the beginnings of a portable parallel programming environment. A project of this scope inevitably imparts lessons about parallel computing, the specification being followed, the current hardware and software environment for parallel computing, and project management; we describe those we have learned. Finally, we discuss future developments for MPICH, including those necessary to accommodate extensions to the MPI Standard now being contemplated by the MPI Forum.

2,065 citations

Proceedings Article
16 Jan 1995
TL;DR: In this paper, the authors describe a portable checkpointing tool for Unix that implements all applicable performance optimizations which are reported in the literature and also supports the incorporation of user directives into the creation of checkpoints.
Abstract: Checkpointing is a simple technique for rollback recovery: the state of an executing program is periodically saved to a disk file from which it can be recovered after a failure. While recent research has developed a collection of powerful techniques for minimizing the overhead of writing checkpoint files, checkpointing remains unavailable to most application developers. In this paper we describe libckpt, a portable checkpointing tool for Unix that implements all applicable performance optimizations which are reported in the literature. While libckpt can be used in a mode which is almost totally transparent to the programmer, it also supports the incorporation of user directives into the creation of checkpoints. This user-directed checkpointing is an innovation which is unique to our work.

670 citations

Proceedings Article
10 Apr 2005
TL;DR: This is the first system that can migrate unmodified applications on unmodified mainstream Intel x86-based operating system, including Microsoft Windows, Linux, Novell NetWare and others, to provide fast, transparent application migration.
Abstract: This paper describes the design and implementation of a system that uses virtual machine technology [1] to provide fast, transparent application migration. This is the first system that can migrate unmodified applications on unmodified mainstream Intel x86-based operating system, including Microsoft Windows, Linux, Novell NetWare and others. Neither the application nor any clients communicating with the application can tell that the application has been migrated. Experimental measurements show that for a variety of workloads, application downtime caused by migration is less than a second.

588 citations