Distributed snapshots: determining global states of distributed systems
Summary (2 min read)
Introduction
- This paper presents algorithms by which a process in a distributed system can determine a global state of the system during a computation.
- The photographers must take several snapshots and piece the snapshots together to form a picture of the overall scene.
- Examples of stable properties are “computation has terminated, ” “the system is deadlocked,” and “all tokens in a token ring have disappeared.”.
- Channels are assumed to have infinite buffers, to be error-free, and to deliver messages in the order sent.
A global state of a distributed system is a set of component process and channel
- The initial global state is one in which the state of each process is its initial state and the state of each channel is the empty sequence, also known as states.
- The system contains one token that is passed from one process to another, and hence the authors call this system the “single-token conservation” system.
- Events ‘p sends M ” and “q sends M’ ” may occur in the initial global state, and the next states after these events are different.
3.1. Motivation for the Steps of the Algorithm
- The global-state recording algorithm works as follows: Each process records its own state, and the two processes that a channel is incident on cooperate in recording the channel state.
- The algorithm, may send messages and require processes to carry out computations; however, the messages and computation required to record the global state must not interfere with the underlying computation.
- Now assume that the global state transits to in-c (because p sends the token).
- This example suggests that the recorded global state may be inconsistent if the state of c is recorded before p sends a message along c and the state of p is recorded after p sends a message along c, that is, if n > n’.
3.2 Global-State-Detection Algorithm Outline
- For each channel c, incident on, and directed away from p: p sends one marker along c after p records its state and before p sends further messages along c. Marker-Receiving Rule for a Process q.
- On receiving a marker along a channel C: if q has not recorded its state then begin q records its state; q records the state c as the empty sequence end else q records the state of c as the sequence of messages received along c after q’s state was recorded and before q received the marker along c.
3.3 Termination of the Algorithm The marker receiving and sending rules guarantee that if a marker is received along every channel, then each process will record its state and the states of all
- Hence if p records its state and there is a path (in the graph representing the system) from p to a process q, then q will record its state in finite time because, by induction, every process along the path will record its state in finite time.
- The recorded process and channel states must be collected and assembled to form the recorded global state.
- The authors shall not describe algorithms for collecting the recorded information because such algorithms have been described elsewhere [4, lo].
- A simple algorithm for collecting information in a system whose topology is strongly connected is for each process to send the information it records along all outgoing channels, and for each process receiving information for the first time to copy it and propagate it along all of its outgoing channels.
4. PROPERTIES OF THE RECORDED GLOBAL STATE
- To gain an intuitive understanding of the properties of the global state recorded by the algorithm, the authors shall study Example 2.2.
- After recording its state, q sends a marker along channel c’.
- The recording algorithm was initiated in global state 5’0 and terminated in global state s3.
- Observe that the global state S* recorded by the algorithm is not identical to any of the global states.
- So, S1, Sz, S3 that occurred in the computation.
Si = Si for all i where i # j.
- Now the authors shall show that the global state after all prerecording events and before all postrecording events in seq’ is S.
- The sequence of messages sent by p along c before p sends a marker along c is the sequence corresponding to prerecorded sends on c. Part (2) now follows.
- The purpose of this example is to show how the computation seq’ is derived from the computation seq.
- The sequence ACM Transactions on Computer Systems, Vol. 3, NO.
5. STABILITY DETECTION
- The authors now solve the stability-detection problem described in Section 1.
- A stability-detection algorithm is defined as follows: Input: A stable property y Output: A Boolean value definite with the property: (y(S,) + definite) and (definite --$ y(S,) where S, and S, are the global states of the system when the algorithm is initiated and when it terminates, respectively.
- Definite = false implies that the stable property does not hold when the algorithm is initiated.
- The outline of the current version of the proof was suggested by them.
- On partially-ordered event models of distributed computa- tions.
Did you find this useful? Give us your feedback
Citations
4,340 citations
[...]
2,280 citations
1,969 citations
Cites background from "Distributed snapshots: determining ..."
...[40, 39, 19, 53] Such lessons were not lost on the system designers of the early 1980s....
[...]
1,958 citations
1,772 citations
References
8,381 citations
867 citations
449 citations
202 citations
198 citations
Related Papers (5)
Frequently Asked Questions (10)
Q2. Why do the authors study the stability detection problem?
The authors study the stability-detection problem because it is a paradigm for many practical problems, such as distributed deadlock detection.
Q3. What is the simplest way to record a state?
To ensure that the global-state recording algorithm terminates in finite time, each process must ensure that (Ll) no marker remains forever in an incident input channel and (L2) it records its state within finite time of initiation of the algorithm.
Q4. what is the state of a postrecording event?
There may be a postrecording event ej-1 before a prerecording event ej for some j, L < j < 4; this can occur only if ej-1 and ej are in different processes (because if ej-1 and cj are in the same process and ej-1 is a postrecording event, then so is ej).
Q5. what is the state of a channel in a global state?
The authors say that seq is a computation of the system if and only if event ei can occur in global state Si, 0 5 i 5 n, where So is the initial global state andSi+l = neXt(Si, ei) for 0 5 i 5 n.
Q6. what is the value of next(S, e)?
The authors define a function next, where next (S, e) is the global state immediately after the occurrence of event e in global state S. The value of next(S, e) is defined only if event e can occur in global state S, in which case next(S, e) is the global state identical to S except that: (1) the state of p in next(S, e) is s’; (2) if e is a channel directed towards p, then the state of c in next(S, e) is c’s state in S with message M deleted from its head; and (3) if c is a channel directed away from p, then the state of c in next(S, e) is the same as c’s state in S with message M added to the tail.
Q7. what is the state of ei in seq?
Event ei in seq is called a postrecording event if and only if it is not a prerecording event-that is, if ei is in a process p and p records its state before ei in seq.
Q8. What is the effect of q recording its state?
Termination in finite time is ensured if for every process q: q spontaneously records its state or there is a path from a process p, which spontaneously records its state, to q.
Q9. What is the q-marker-receiving rule for a process?
For each channel c, incident on, and directed away from p:p sends one marker along c after p records its state and before p sends further messages along c.Marker-Receiving Rule for a Process q.
Q10. What is the state-transition diagram for q in Example 2.2?
6. State-transition diagram for process q in Example 2.2.initial globalstate A-state rZstate So1 p sends M S@QBc global state Slq sends M’S,eBD global state S2p receives M’@ADD global state S3A emptyFig.