scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

DMTCP: Transparent checkpointing for cluster computations and the desktop

TL;DR: DMTCP as mentioned in this paper is a transparent user-level checkpointing package for distributed applications, which is used for the runCMS experiment of the Large Hadron Collider at CERN, and it can be incorporated and distributed as a checkpoint-restart module within some larger package.
Abstract: DMTCP (Distributed MultiThreaded CheckPointing) is a transparent user-level checkpointing package for distributed applications. Checkpointing and restart is demonstrated for a wide range of over 20 well known applications, including MATLAB, Python, TightVNC, MPICH2, OpenMPI, and runCMS. RunCMS runs as a 680 MB image in memory that includes 540 dynamic libraries, and is used for the CMS experiment of the Large Hadron Collider at CERN. DMTCP transparently checkpoints general cluster computations consisting of many nodes, processes, and threads; as well as typical desktop applications. On 128 distributed cores (32 nodes), checkpoint and restart times are typically 2 seconds, with negligible run-time overhead. Typical checkpoint times are reduced to 0.2 seconds when using forked checkpointing. Experimental results show that checkpoint time remains nearly constant as the number of nodes increases on a medium-size cluster. DMTCP automatically accounts for fork, exec, ssh, mutexes/ semaphores, TCP/IP sockets, UNIX domain sockets, pipes, ptys (pseudo-terminals), terminal modes, ownership of controlling terminals, signal handlers, open file descriptors, shared open file descriptors, I/O (including the readline library), shared memory (via mmap), parent-child process relationships, pid virtualization, and other operating system artifacts. By emphasizing an unprivileged, user-space approach, compatibility is maintained across Linux kernels from 2.6.9 through the current 2.6.28. Since DMTCP is unprivileged and does not require special kernel modules or kernel patches, DMTCP can be incorporated and distributed as a checkpoint-restart module within some larger package.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
17 Jul 2016
TL;DR: A three-phase debugging strategy is described that permits one to interactively debug long-running MPI applications that were developed for non-interactive batch environments and two strategies are proposed that take advantage of DMTCP (Distributed MultiThreaded CheckPointing) for system-level checkpointing.
Abstract: Batch environments are notoriously unfriendly because it's not easy to interactively diagnose the health of a job. A job may be terminated without warning when it reaches the end of an allotted runtime slot, or it may terminate even sooner due to an unsuspected bug that occurs only at large scale.Two strategies are proposed that take advantage of DMTCP (Distributed MultiThreaded CheckPointing) for system-level checkpointing. First, we describe a three-phase debugging strategy that permits one to interactively debug long-running MPI applications that were developed for non-interactive batch environments. Second, we review how to use the SLURM resource manager capability to easily implement extended batch sessions that overcome the typical limitation of 24 hours maximum for a single batch job on large HPC resources. We argue for greater use of this lesser known capability, as a means to remove the necessity for the application-specific checkpointing found in many long-running jobs.

1 citations

Posted Content
TL;DR: In this article, the authors present and analyze the results of a comprehensive performance assessment of several ways in which DCPMM can replace standard storage devices, and replace or augment DRAM for improving the performance of HPC scientific computations.
Abstract: As the High Performance Computing world moves towards the Exa-Scale era, huge amounts of data should be analyzed, manipulated and stored. In the traditional storage/memory hierarchy, each compute node retains its data objects in its local volatile DRAM. Whenever the DRAM's capacity becomes insufficient for storing this data, the computation should either be distributed between several compute nodes, or some portion of these data objects must be stored in a non-volatile block device such as a hard disk drive or an SSD storage device. Optane DataCenter Persistent Memory Module (DCPMM), a new technology introduced by Intel, provides non-volatile memory that can be plugged into standard memory bus slots and therefore be accessed much faster than standard storage devices. In this work, we present and analyze the results of a comprehensive performance assessment of several ways in which DCPMM can 1) replace standard storage devices, and 2) replace or augment DRAM for improving the performance of HPC scientific computations. To achieve this goal, we have configured an HPC system such that DCPMM can service I/O operations of scientific applications, replace standard storage devices and file systems (specifically for diagnostics and checkpoint-restarting), and serve for expanding applications' main memory. We focus on keeping the scientific codes with as few changes as possible, while allowing them to access the NVM transparently as if they access persistent storage. Our results show that DCPMM allows scientific applications to fully utilize nodes' locality by providing them with sufficiently-large main memory. Moreover, it can be used for providing a high-performance replacement for persistent storage. Thus, the usage of DCPMM has the potential of replacing standard HDD and SSD storage devices in HPC architectures and enabling a more efficient platform for modern supercomputing applications.

1 citations

Proceedings ArticleDOI
16 Jul 2013
TL;DR: This work proposes the users to predict the performance of their applications by generating a signature using a method called Parallel Application Signature for Performance Prediction (PAS2P), which represents the core of theirApplications, and presents an enhanced method for the signature adaptability to the application behavior.
Abstract: Predicting performance of parallel applications using benchmarks is becoming more complex. Each parallel application has different behavior and it is difficult to know which is the benchmark that most closely resembles the application, therefore the benchmarks have a low representativeness of real programs. We are proposing the users to predict the performance of their applications by generating a signature using a method called Parallel Application Signature for Performance Prediction (PAS2P), which represents the core of their applications. We are presenting an enhanced method for the signature adaptability to the application behavior. Until now, phases were considered independently, each one had a checkpoint, but now we analyze whether they are contiguous and can form a group. This is done by decreasing the storage needed for the signature, slightly increasing the signature execution time in order to predict the performance of the applications thus compensating the execution time of the signature and its size (MB). We show results from building different signatures for the same applications to allow the user to select between different options in order to create the signature.
Patent
Frédéric Dumoulin1
30 Sep 2013
TL;DR: In this paper, the authors propose a method for restoring a state of a circuit under test and test environment by stopping an application software process and solidifying test data and address pointers in a shared memory.
Abstract: The method involves stopping (20) application software process and solidifying (21) test data and address pointers in a shared memory. An emulator is stopped, and writing of the data by the emulator in the memory is authorized only in the presence of memory locations available in the memory. Backup (27) of a first file representative of a state of the emulator is performed during stopping, and the process of management module of the interface is disconnected. A backup (29) of a workable program file representative of a state of the process is performed while stopping of the process. Independent claims are also included for the following: (1) a method for restoration of a state of a circuit under test and test environment (2) a device for backup of a state of a circuit under test and a test environment (3) a device for restoration of a state of a circuit under test and test environment.
References
More filters
Journal ArticleDOI
01 May 2007
TL;DR: The IPython project as mentioned in this paper provides an enhanced interactive environment that includes, among other features, support for data visualization and facilities for distributed and parallel computation for interactive work and a comprehensive library on top of which more sophisticated systems can be built.
Abstract: Python offers basic facilities for interactive work and a comprehensive library on top of which more sophisticated systems can be built. The IPython project provides on enhanced interactive environment that includes, among other features, support for data visualization and facilities for distributed and parallel computation

3,355 citations

Journal ArticleDOI
TL;DR: An algorithm by which a process in a distributed system determines a global state of the system during a computation, which helps to solve an important class of problems: stable property detection.
Abstract: This paper presents an algorithm by which a process in a distributed system determines a global state of the system during a computation. Many problems in distributed systems can be cast in terms of the problem of detecting global states. For instance, the global state detection algorithm helps to solve an important class of problems: stable property detection. A stable property is one that persists: once a stable property becomes true it remains true thereafter. Examples of stable properties are “computation has terminated,” “ the system is deadlocked” and “all tokens in a token ring have disappeared.” The stable property detection problem is that of devising algorithms to detect a given stable property. Global state detection can also be used for checkpointing.

2,738 citations

01 Jan 1996
TL;DR: The MPI Message Passing Interface (MPI) as discussed by the authors is a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists.
Abstract: MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we describe MPICH, unique among existing implementations in its design goal of combining portability with high performance. We document its portability and performance and describe the architecture by which these features are simultaneously achieved. We also discuss the set of tools that accompany the free distribution of MPICH, which constitute the beginnings of a portable parallel programming environment. A project of this scope inevitably imparts lessons about parallel computing, the specification being followed, the current hardware and software environment for parallel computing, and project management; we describe those we have learned. Finally, we discuss future developments for MPICH, including those necessary to accommodate extensions to the MPI Standard now being contemplated by the MPI Forum.

2,065 citations

Proceedings Article
16 Jan 1995
TL;DR: In this paper, the authors describe a portable checkpointing tool for Unix that implements all applicable performance optimizations which are reported in the literature and also supports the incorporation of user directives into the creation of checkpoints.
Abstract: Checkpointing is a simple technique for rollback recovery: the state of an executing program is periodically saved to a disk file from which it can be recovered after a failure. While recent research has developed a collection of powerful techniques for minimizing the overhead of writing checkpoint files, checkpointing remains unavailable to most application developers. In this paper we describe libckpt, a portable checkpointing tool for Unix that implements all applicable performance optimizations which are reported in the literature. While libckpt can be used in a mode which is almost totally transparent to the programmer, it also supports the incorporation of user directives into the creation of checkpoints. This user-directed checkpointing is an innovation which is unique to our work.

670 citations

Proceedings Article
10 Apr 2005
TL;DR: This is the first system that can migrate unmodified applications on unmodified mainstream Intel x86-based operating system, including Microsoft Windows, Linux, Novell NetWare and others, to provide fast, transparent application migration.
Abstract: This paper describes the design and implementation of a system that uses virtual machine technology [1] to provide fast, transparent application migration. This is the first system that can migrate unmodified applications on unmodified mainstream Intel x86-based operating system, including Microsoft Windows, Linux, Novell NetWare and others. Neither the application nor any clients communicating with the application can tell that the application has been migrated. Experimental measurements show that for a variety of workloads, application downtime caused by migration is less than a second.

588 citations