scispace - formally typeset
Open AccessProceedings ArticleDOI

Checkpoint-restart for a network of virtual machines

Reads0
Chats0
TLDR
The first uniform mechanism for checkpointing a network of virtual machines is described, important for the parallel versions of common productivity software.
Abstract
The ability to easily deploy parallel computations on the Cloud is becoming ever more important The first uniform mechanism for checkpointing a network of virtual machines is described This is important for the parallel versions of common productivity software Potential examples of parallelism include Simulink for MATLAB, parallel R for the R statistical modelling language, parallel blastpy for the BLAST bioinformatics software, IPythonparallel for Python, and GNU parallel for parallel shells The checkpoint mechanism is implemented as a plugin in the DMTCP checkpoint-restart package It operates on KVM/QEMU, and has also been adapted to Lguest and pure user-space QEMU The plugin is surprisingly compact, comprising just 400 lines of code to checkpoint a single virtual machine, and 200 lines of code for a plugin to support saving and restoring network state Incremental checkpoints of the associated virtual filesystem are accommodated through the Btrfs filesystem Experiments demonstrate checkpoint times of a fraction of a second by using forked checkpointing, mmap-based restart, and incremental Btrfs-based snapshots

read more

Content maybe subject to copyright    Report

Checkpoint-Restart for a Network of Virtual Machines
Rohan Garg, Komal Sodha, Zhengping Jin, Gene Cooperman
Northeastern University
Boston, MA / USA
{rohgarg,komal,jinzp,gene}@ccs.neu.edu
Abstract—The ability to easily deploy parallel compu-
tations on the Cloud is becoming ever more important.
The first uniform mechanism for checkpointing a network
of virtual machines is described. This is important for
the parallel versions of common productivity software.
Potential examples of parallelism include Simulink for
MATLAB, parallel R for the R statistical modelling
language, parallel
blast.py for the BLAST bioinformatics
software, IPython.parallel for Python, and GNU parallel
for parallel shells. The checkpoint mechanism is imple-
mented as a plugin in the DMTCP checkpoint-restart
package. It operates on KVM/QEMU, and has also been
adapted to Lguest and pure user-space QEMU. The plugin
is surprisingly compact, comprising just 400 lines of code
to checkpoint a single virtual machine, and 200 lines of
code for a plugin to support saving and restoring network
state. Incremental checkpoints of the associated virtual
filesystem are accommodated through the Btrfs filesystem.
Experiments demonstrate checkpoint times of a fraction
of a second by using forked checkpointing, mmap-based
restart, and incremental Btrfs-based snapshots.
I. INTRODUCTION
An approach for providing fault-tolerance to complex
distributed applications is demonstrated. It is based on
checkpointing a network of virtual machines. Such a
network can be started locally, and later checkpointed
for re-deployment (restart from checkpoint images) in
the Cloud. This is especially important to support fault
tolerance and load balancing in the Cloud.
The approach also provides flexibility. It employs
DMTCP, an unprivileged, purely user-space checkpoint-
ing package. Potential examples of flexible application-
specific policies are: incremental checkpointing, dec-
laration of cutouts (regions of memory that don’t re-
quire checkpointing); application-specific memory com-
pression during checkpoint (for example, conversion
of double to float), and so on. End users can write
application-specific DMTCP plugins to support flexible
checkpointing.
This work was partially supported by the National Science
Foundation under Grant OCI-0960978.
Further, the maintainability of a proposed architecture
is important. Here, we measure maintainability by the
number of lines of new code required, beyond the base
code of a checkpoint-restart package, or the base code
of the virtual machine itself. The proposed architecture
relies on just 600 lines of new code: 400 lines of code
for a KVM-specific plugin used to checkpoint the virtual
machine, and 200 lines of code for a TUN/TAP plugin.
The two DMTCP plugins above are external libraries
loaded into an unmodified DMTCP. Source code can be
found in the contrib directory of the DMTCP repository.
(See Section II for further details of plugins.)
The approach described here saves the state of an
arbitrary guest operating system, which runs within a
virtual machine under a Linux host operating system.
The primary virtual machine described in this work is
KVM/QEMU [1]. However, to demonstrate the gener-
ality of the approach, a plugin was also developed for
Lguest [2]. That plugin required about 100 lines of code,
as well as about 40 lines of modifications to the Lguest
kernel driver to extend its API. The methodology was
also applied to pure user-space QEMU [3]. Surprisingly,
DMTCP was able to checkpoint user-space QEMU
“out-of-the-box” (without the use of additional plugins).
Experiments in Section IV-C demonstrate compatibil-
ity with DMTCP’s performance optimizations: forked
checkpointing and mmap-based fast restart. Forked
checkpointing enables virtual machine snapshot in
0.4 seconds when running with the Btrfs filesystem,
while mmap-based fast restart allows resuming from
the snapshot in 0.3 seconds. In addition, Section IV-D
shows the run-time overhead to be too small to measure
when running the nbench2 [4] benchmark program.
Snapshots (including the filesystem): In VM ter-
minology, a snapshot saves not only the state of the
virtual machine, but also the filesystem used by that
virtual machine. The Btrfs filesystem [5] can be used to
implement copy-on-write incremental snapshots. Thus,
during checkpoint of a virtual machine, one can also
create either a full snapshot or an incremental snapshot
of the guest filesystem.
978-1-4799-0898-1/13/$31.00
c
2013 IEEE

On computers where the host operating system does
not provide the Btrfs filesystem, it is still possible to
employ Btrfs. An “inner” KVM/QEMU virtual machine
can be run nested inside an “outer” KVM/QEMU virtual
machine, which in turn runs under the host operating
system. The outer VM provides Btrfs and DMTCP runs
inside the outer VM, checkpointing the inner VM.
In the rest of this paper, Section II provides back-
ground on DMTCP plugins. Section III describes a
generic mechanism for checkpoint-restart for single
virtual machines. Section IV provides experimental
running times over a variety of scenarios, Section V
describes related work, and Section VI provides the
conclusion.
II. DMTCP, KVM, AND TUN/TAP: EXTENDING
CHECKPOINT-RESTART TO VMS
DMTCP (Distributed MultiThreaded CheckPoint-
ing) [6] is used to checkpoint and restart a network of
virtual machines. DMTCP provides a facility for third-
party plugins, as well as using them in its own internal
architecture. The work described here is based on the
svn revision 1967 of DMTCP [7].
DMTCP implements transparent user-space
checkpoint-restart. It does this by saving to a checkpoint
image all of user-space memory, along with pertinent
process state (thread information, open file descriptors,
associated terminal device, stdin/stdout/stderr, sockets,
shared memory regions, etc.). Internal DMTCP plugins
employ specific algorithms to checkpoint the state of
open files, network sockets, shared memory regions,
and other special cases.
This work uses the plugin mechanism to extend
DMTCP in two directions: support for KVM, and
support for the virtual-network kernel devices TUN
and TAP. TUN/TAP is used for networking of mul-
tiple KVM-based virtual machines. First, DMTCP
is extended to support checkpointing of a single
KVM/QEMU virtual machine. Second, DMTCP is ex-
tended to support checkpointing of the TUN/TAP net-
work, including any network data “in flight”.
In order to checkpoint KVM/QEMU, it is launched
under the control of DMTCP. A typical example of
launch, checkpoint, and restart is as follows:
% dmtcp_checkpoint --with-plugin \
dmtcp_kvm_plugin.so \
dmtcp_tun_plugin.so qemu ...
% dmtcp_command --checkpoint
% dmtcp_restart qemu_
*
.dmtcp
Section II-A discusses handling of the KVM/QEMU
virtual machine, while Section II-B discusses network
handling and the use of TUN/TAP.
A. Checkpointing the KVM/QEMU Virtual Machine
QEMU uses KVM to run user-space code natively
on hardware that supports virtualization. It uses KVM’s
API to initialize and control the guest virtual machine.
This API is based on the ioctl system call.
For the rest of this discussion, the term QEMU is
used both to refer to the QEMU virtual machine monitor
(VMM), and the virtual machine itself (including the
guest operating system).
DMTCP plugins offer two primary mechanisms to ex-
tend checkpoint-restart: a run-time mechanism (wrapper
functions around library calls made by the application);
and customization of checkpoint/restart to save and
restore the state of external objects. In this case, QEMU
is the target application being checkpointed, and the
KVM kernel module is the external object whose state
must be virtualized.
The run-time portion of the KVM plugin is primarily
concerned with a function wrapper around the ioctl
system call. This wrapper function captures system calls
by QEMU to KVM. This is used to make a local copy
of the parameters that QEMU used to initialize the
new virtual machine. At the time of restart, those same
parameters are used to reset the KVM parameters to
correspond.
The remainder of the KVM plugin is concerned with
saving state at checkpoint time, and restoring state at
restart time. The KVM saved state includes the state
of the virtual CPU (registers, etc.) and the state of the
interrupt controllers. The KVM API provides explicit
system calls that the plugin used to save and restore the
above state.
Another example of KVM/QEMU state is the virtual
memory tables. These tables are contained within the
user-space memory of the QEMU process itself (here
viewing QEMU as a process in the host operating
system). At the time of restart, the original mapping
between the guest physical pages and host physical
pages has been lost. However, the DMTCP plugin does
not need to create a new mapping. This is because
the page fault causes the hypervisor to re-establish the
mapping.
Figure 1 illustrates the generic architecture of a guest
virtual machine. At the time of checkpoint, the DMTCP
plugin discovers the parameters of the KVM hypervisor
in supporting the current state of the QEMU virtual
machine. DMTCP then writes to a checkpoint image the
memory of the QEMU virtual machine, which consists
of the user-space memory of the process of the host
operating system that is running QEMU.
Figure 2 presents the launching of a fresh virtual
2

with user space)
tables (shared
vCPU0
vCPUn
Guest VM
(user space component)
VM Shell
(peripherals, IRQ, etc.)
Hardware description
Kernel Module for VM:
Kernel Space Memory
User Space Memory
vCPU threads
Async I/O
threads
virtual cores
vCPUs for
w/ kernel space)
tables (shared
Figure 1: Generic VM Architecture. This sketch shows
the VM components of interest for checkpoint-restart.
The VM shell refers to the uninitialized data structures
in the kernel driver that describes the virtual machine. A
VM launcher initializes those data structures. A generic
checkpoint-restart mechanism restores those data struc-
tures appropriately.
machine at restart time, which is then modified to
correspond to the pre-checkpoint QEMU. At the time of
restart, the DMTCP plugin requests KVM to create a
fresh virtual machine (not specific to QEMU). Then,
DMTCP replaces this fresh virtual machine (which
exists as the user-space memory of a process in the host
operating system) by the original user-space memory
from the checkpoint image. Finally, the DMTCP plugin
makes calls to the KVM kernel module to reset the
KVM parameters so as to correspond to those of the
pre-checkpoint QEMU virtual machine.
B. Checkpointing the TUN/TAP Network
A TUN/TAP plugin extends DMTCP similarly to the
KVM plugin. Wrapper functions are implemented for
ioctl to detect how the network was set up.
For background, we briefly review how DMTCP
provides checkpointing over a TCP/IP network. At the
vCPU0
vCPUn
with user space)
tables (shared
Guest VM
(user space component)
VM Shell
Kernel Module for VM:
Kernel Space Memory
User Space Memory
(Empty H/W description)
virtual cores
vCPUs for
vCPU threads
Async I/O
threads
w/ kernel space)
tables (shared
Figure 2: Re-Starting Virtual Machine from Checkpoint
Image. DMTCP Plugin re-creates the original hardware
description from the checkpoint image. In addition, the
user-space memory of the guest VM is restored by
DMTCP at the original addresses.
time of checkpoint, “drains the network”: (a) by stop-
ping user threads of all processes in the computation;
(b) receiving from each socket until all network data
“in flight” has been collected; and (c) by then writing
a checkpoint image. A “cookie” (unique set of data)
is sent through each network connection so that the
receiver can determine when no further data is in flight.
The TUN/TAP plugin employs a similar strategy,
except that TUN/TAP does not provide an analog of
a socket connection. It operates at a lower level in
which network packets generated by the guest operating
system are injected directly into the physical network.
Only the guest operating system is aware of the socket
connections being used by the applications within it.
Two alternative approaches to draining the network
are: (a) to send a broadcast packet that plays the role
of the DMTCP cookie; and (b) to wait for a specified
time sufficient for all network packets to arrive. Mech-
anism (b) is used currently. For added reliability, at the
end of writing the checkpoint image, the network is
3

checked to see if any late packets have arrived. If a late
packet is detected, the user can be warned, or a second
DMTCP checkpoint can be automatically initiated.
III. GENERIC MECHANISM FOR CHECKPOINTING A
SINGLE VIRTUAL MACHINE
The techniques employed by the KVM plugin from
Section II-A extend to other virtual machines. In partic-
ular, a DMTCP plugin was written for the Lguest virtual
machine. In this case, Lguest provides a control mecha-
nism by overloading the read and write system calls.
Plugin wrapper functions were written for these calls.
The Lguest kernel module also had to be modified with
about 40 lines of code, in order to extend the Lguest
API for read/write. This enables the Lguest plugin
to discover and restore the virtual machine state. The
plugin itself comprised 100 lines of code.
In the case of user-space QEMU (no KVM kernel
module), the task of checkpointing is even simpler.
The existing DMTCP package was found to correctly
checkpoint and restart QEMU without any additional
plugins. See Tables VII, VIII and X for timings across
Lguest, KVM/QEMU and pure QEMU.
IV. EXPERIMENTAL RESULTS
The experimental results are split into four subsec-
tions concerning: a network of virtual machines; the
use of Btrfs for filesystem snapshots; DMTCP optimiza-
tions; and performance on a commodity computer.
Scalability is tested for two different architectures:
distributed computing across a cluster of 12 nodes; and
shared memory computing employing 16 CPU cores.
Configuration (cluster of 12 nodes): Each of the
12 computers is a 12-core Intel Xeon (1.6 GHz) server
with 24 GB of RAM. The host operating system was a
64-bit version of CentOS-6.3 with Linux kernel 2.6.32.
KVM/QEMU was chosen as the VMM. The guests were
set up to run Ubuntu-12.04 Server version. DMTCP svn
revision 1967 was used for these experiments.
Configuration (single node with 16 cores): These
experiments were run on a 16-core AMD Opteron
(1 GHz) server with 128 GB of RAM. The host op-
erating system was a 64-bit version of Ubuntu-13.04
with Linux kernel 3.8. KVM/QEMU was chosen as the
VMM. The guests were set up to run Ubuntu-12.04
Server version. DMTCP svn revision 1967 was used
for these experiments.
A. Scalability of Checkpointing of Virtual Machines
Tables I, II, and III show that restart time increases
slowly with the number of VMs, while checkpoint time
is close to constant.
Further, Tables I and III show that two DMTCP
options (further analyzed in Section IV-C) can enable
checkpoint and restart in a fraction of a second. First,
in forked checkpointing, a child process is forked in
order to checkpoint while the parent continues running.
Second, in mmap-based fast restart, mmap is used to
map into RAM the memory saved within the check-
point image. Hence, the process restarts faster, while
remaining memory is paged into RAM on demand.
1) Scalability for a Distributed Network of VMs: Ta-
ble I shows checkpoint and restart timings of HPCC [8].
Number None (sec) F/C (sec) F/R (sec) F/C + F/R (sec)
Nodes Ckpt Restart Ckpt Restart Ckpt Restart Ckpt Restart
1 9.45 2.83 0.29 3.10 3.78 0.38 0.31 0.34
2 10.11 3.17 0.34 3.22 3.56 0.36 0.33 0.38
4 10.63 3.45 0.36 3.73 3.85 0.42 0.38 0.50
8 11.38 4.59 0.38 4.23 4.17 0.51 0.41 0.52
12 11.53 5.01 0.42 4.90 4.18 0.59 0.48 0.55
Table I: Checkpoint-restart of HPCC [8] benchmark on
a Gigabit Ethernet cluster, as influenced by DMTCP’s
optional optimizations: forked checkpoint (F/C) and fast
restart (F/R). DMTCP’s default gzip compression of
checkpoint images is incompatible with DMTCP F/R,
and so is not used in those cases. (Memory allocated in
each case is 1024 MB.)
2) Scalability for a Network of Virtual Machines
in Multi-Core Shared Memory: Table II shows the
efficiency for a network of virtual machines under
shared memory. Coverage over three types of parallel
middleware is demonstrated: MPI (HPCC [8]), TCP/IP
sockets (IPython [9]), and PVM (the SNOW parallel
computing framework for the R statistical programming
language [10]).
Number HPCC IPython Parallel R
of VMs Ckpt (s) Restart (s) Ckpt (s) Restart (s) Ckpt (s) Restart (s)
1 9.84 3.31 9.63 3.46 10.02 3.68
2 10.08 3.75 10.44 4.10 10.54 4.17
3 10.18 3.86 10.67 4.06 11.13 4.16
Table II: Checkpoint-restart times for virtual machines
on a single multi-core computer. (The allocated memory
in each case is 1024 MB.)
Table III shows that the two DMTCP optimizations,
forked checkpoint and fast restart, greatly enhance
checkpoint and restart times. See Section IV-C for
descriptions of those optimizations.
B. Btrfs: Incremental Snapshots of Virtual Machines
A virtual machine snapshot mechanism includes the
ability to save the current state of the VM filesystem.
This is implemented through the Btrfs copy-on-write
4

DMTCP HPCC (sec) IPython (sec) Parallel R (sec)
Optimizations Ckpt Restart Ckpt Restart Ckpt Restart
None 10.18 3.86 10.67 4.06 11.13 4.16
F/C 0.37 3.17 0.41 3.92 0.38 3.91
F/R 3.25 0.36 3.48 0.34 4.01 0.27
F/C + F/R 0.38 0.35 0.43 0.34 0.41 0.37
Table III: Checkpoint-restart of three VMs on a 16-
core computer, while running different applications. The
DMTCP optimizations are forked checkpoint (F/C) and
fast restart (F/R). DMTCP’s default gzip compression of
checkpoint images is incompatible with DMTCP F/R,
and so is not used in those cases. (Memory allocated in
each case is 1024 MB.)
filesystem for incremental snapshots of the guest virtual
filesystem. Even though the host machines in our ex-
perimental facilities did not provide a Btrfs filesystem,
we were able to support a Btrfs filesystem through
nesting of one KVM/QEMU virtual machine inside
another. The outer virtual machine provides a Btrfs
virtual filesystem for the inner one. DMTCP runs as a
process inside the outer virtual machine, and is used
to checkpoint the inner virtual machine. Networking
of the VMs is supported through TUN/TAP, as before.
Table IV demonstrates the scalability for a distributed
computation across four nodes of the cluster.
1 node (sec) 2 nodes (sec) 4 nodes (sec)
Optimizations Ckpt Restart Ckpt Restart Ckpt Restart
with Btrfs 2.36 1.20 2.45 1.65 3.68 2.35
without Btrfs 33.28 35.67 34.46 37.20 39.73 39.47
Table IV: Snapshotting up to four distributed VMs run-
ning HPCC [8] under KVM/QEMU. The Btrfs filesys-
tem is used to snapshot the filesystem using nested
VMs. (Memory allocated in each case is 384 MB. The
size of the guest filesystem is 2 GB.)
Checkpoint (s) Restart (s)
with Btrfs 1.52 0.7
Without Btrfs 10.23 12.48
Table V: Configuration is same as for Table IV, except
that three VMs run on a single 16-core computer.
Tables IV and V show the advantage of using
the copy-on-write feature of Btrfs to store the guest
VM’s filesystem. At checkpoint time a small additional
DMTCP plugin rapidly copies the state of the entire
filesystem (which appears as a single file on the outer
guest’s filesystem), using the --reflink option of the
GNU binutils copy command. At restart time the state of
the guest filesystem is similarly copied back. DMTCP’s
facilities for forked checkpointing and mmap-based fast
restart were employed.
Tables IV and V show a performance penalty for
restarting without Btrfs (using nested VMs), as com-
pared to Table II (non-nested). DMTCP resides in the
outer VM. Since the virtualization of I/O devices is
never handled by KVM, the outer KVM then transfers
control back to the outer QEMU. The outer QEMU
resides in user space memory. The continual switching
between kernel and user-space accounts for the ineffi-
ciency.
C. Optimizing: Forked Checkpointing and Fast Restart
DMTCP supports two further performance opti-
mizations: forked-checkpointing and mmap-based fast-
restart. Table VI demonstrates the much improved per-
formance when using both of these optimizations. All
experiments are run on the 16-core computer with just
a single VM.
Allocated Memory KVM/QEMU (F/C+F/R)
(MB) Checkpoint (s) Restart (s) Image Size
128 0.20 0.10 184 MB
256 0.19 0.09 310 MB
512 0.21 0.10 568 MB
768 0.22 0.10 822 MB
1024 0.21 0.10 1.1 GB
Table VI: Forked checkpoint (F/C) and fast restart (F/R)
times for an idle VM under KVM/QEMU.
1) Forked checkpointing: Times for the forked
checkpointing optimization are given for an
idle virtual machine in Table VII. This uses the
--enable-forked-checkpointing configure
option of DMTCP. At checkpoint time, after “draining
the network”, a child process is forked. The child
writes out the checkpoint image in parallel with the
parent process continuing its execution. As expected,
the parent completes its portion of the checkpoint
largely independently of the size of the checkpoint
image or allocated memory. Forked checkpointing
typically requires 0.2 seconds.
The times for checkpoint and restart for KVM/QEMU
are larger than the times for user-space QEMU. This is
because the plugin for KVM/QEMU makes extra system
calls at checkpoint and restart time. The times can be
reduced by modifying the kernel driver to implement a
new system call that coalesces all of the operations of
the previous system calls.
2) Fast Restart: Times for the fast-restart optimiza-
tion are given for an idle virtual machine in Table VIII.
This uses the --enable-fast-restart option
of DMTCP. This option uses mmap to map the check-
point image from disk directly into virtual memory,
instead of copying data from disk to virtual memory. In
5

Citations
More filters
Proceedings ArticleDOI

Transparent checkpoint-restart over infiniband

TL;DR: This work presents the first example of transparent, system-initiated checkpoint-restart that directly supports InfiniBand, and simplifies current practice by avoiding the need for a privileged kernel module.
Proceedings ArticleDOI

Design and Implementation for Checkpointing of Distributed Resources Using Process-Level Virtualization

TL;DR: This work presents DMTCP-PV, a new user-space transparent checkpointing system based on the concept of process virtualization, which separately models the state of each local or distributed subsystem while decoupling it from the core checkpointing engine.
Proceedings ArticleDOI

Checkpointing as a service in heterogeneous cloud environments

TL;DR: A non-invasive, cloud-agnostic approach is demonstrated for extending existing cloud platforms to include checkpoint-restart capability, which enables, for the first time, migration of applications from one cloud platform to another.
Posted Content

Checkpointing as a Service in Heterogeneous Cloud Environments

TL;DR: In this article, a non-invasive, cloud-agnostic approach is demonstrated for extending existing cloud platforms to include checkpoint-restart capability, which allows traditional HPC applications to take advantage of an existing cloud infrastructure.
Proceedings Article

HotRestore: a fast restore system for virtual machine cluster

TL;DR: A novel restore approach called HotRestore is presented, which restores the VMC rapidly without compromising performance and reduces the TCP backoff duration to merely dozens of milliseconds.
References
More filters
Journal ArticleDOI

Xen and the art of virtualization

TL;DR: Xen, an x86 virtual machine monitor which allows multiple commodity operating systems to share conventional hardware in a safe and resource managed fashion, but without sacrificing either performance or functionality, considerably outperform competing commercial and freely available solutions.
Journal ArticleDOI

IPython: A System for Interactive Scientific Computing

TL;DR: The IPython project as mentioned in this paper provides an enhanced interactive environment that includes, among other features, support for data visualization and facilities for distributed and parallel computation for interactive work and a comprehensive library on top of which more sophisticated systems can be built.
Proceedings Article

Remus: high availability via asynchronous virtual machine replication

TL;DR: Remus as mentioned in this paper is a high availability service that allows existing, unmodified software to be protected from the failure of the physical machine on which it runs by encapsulating protected software in a virtual machine, asynchronously propagating changed state to a backup host at frequencies as high as forty times a second.
Journal ArticleDOI

Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters

TL;DR: The motivation, design and implementation of Berkeley Lab Checkpoint/Restart (BLCR), a system-level checkpoint/restart implementation for Linux clusters that targets the space of typical High Performance Computing applications, including MPI, are described.
Journal ArticleDOI

BTRFS: The Linux B-Tree Filesystem

TL;DR: The core ideas, data structures, and algorithms of BTRFS are described, which sheds light on the challenges posed by defragmentation in the presence of snapshots, and the tradeoffs required to maintain even performance in the face of a wide spectrum of workloads.
Frequently Asked Questions (15)
Q1. What is the purpose of a checkpoint?

CHECKPOINT-RESTART TO VMSDMTCP (Distributed MultiThreaded CheckPointing) [6] is used to checkpoint and restart a network of virtual machines. 

The integration of the Btrfs copy-on-write filesystem with nested copies of KVM/QEMU was used for fast, incremental snapshots of a network of virtual machines. 

Note that on restart from a checkpoint image, the shadow page tables inside the kernel must be recreated, after which the pages will be faulted back into RAM. 

DMTCP plugins offer two primary mechanisms to extend checkpoint-restart: a run-time mechanism (wrapper functions around library calls made by the application); and customization of checkpoint/restart to save and restore the state of external objects. 

Checkpointing of distributed computations is primarily handled by one of two mechanisms today: checkpoint-restart services for MPI; and transparent checkpoint of arbitrary distributed computations. 

At thetime of checkpoint, “drains the network”: (a) by stopping user threads of all processes in the computation; (b) receiving from each socket until all network data “in flight” has been collected; and (c) by then writing a checkpoint image. 

In addition to BLCR, two other commonly used packages for single-host checkpointing are CryoPid2 [23] and OpenVZ [24] (based on CRIU [25]). 

For larger sizes (guest VMs with 512 MB to 1024 MB), the checkpoint times grow proportionally to the size of the allocated memory for the larger sizes. 

Like BlobSeer, Btrfs exposes the raw checkpoint image to the host, making it compatible with the use of DMTCP from outside both the VM and the VM kernel driver. 

DMTCP [6] was the first transparent user-space checkpoint-restart for distributed computations, and remains the most widely used example of this. 

the DMTCP plugin makes calls to the KVM kernel module to reset the KVM parameters so as to correspond to those of the pre-checkpoint QEMU virtual machine. 

Two alternative approaches to draining the network are: (a) to send a broadcast packet that plays the role of the DMTCP cookie; and (b) to wait for a specified time sufficient for all network packets to arrive. 

Tables I, II, and III show that restart time increases slowly with the number of VMs, while checkpoint time is close to constant. 

DMTCP then writes to a checkpoint image the memory of the QEMU virtual machine, which consists of the user-space memory of the process of the host operating system that is running QEMU. 

The experimental results are split into four subsections concerning: a network of virtual machines; the use of Btrfs for filesystem snapshots; DMTCP optimizations; and performance on a commodity computer.