What are the two classes of realistic small-world topologies?

The Watts-Strogatz and scale-free topologies represent two classes of realistic small-world topologies that are often used to model different natural and artificial phenomena [1, 28].

How do the authors calculate the nth central moment?

To calculate the nth central moment, given by (w − w)n, the authors can calculate all the raw moments in parallel up to the nth and combine them appropriately, or the authors can proceed in two sequential steps first calculating the average and then the appropriate central moment.

What is the mechanism used to terminate a protocol?

To implement termination, the authors adopt a very simple mechanism: each node executes the protocol for a predefined number of cycles, denoted as γ, depending on the required accuracy of the output and the convergence factor that can be achieved in the particular overlay topology adopted (see the convergence factor given in Section 3).

What is the structure of the graph between the two extremes?

For intermediate values of β, the structure of the graph lies between these two extreme cases: complete order and complete disorder.

What is the exact implementation of dynamic queries?

The exact details of the implementation of dynamic queries (if necessary) will depend on the specific environment, taking into account efficiency and performance constraints and possible sources of new queries.

What is the reason why node crashes are important?

This represents another important source of error, although the authors note that from their point of view node crashes are more important because the authors model leaves as crashes, so in the presence of churn crash events dominate all other types of failure.

Why are static topologies considered unrealistic in the presence of churn?

While static topologies are unrealistic in the presence of churn, the authors still consider them due to their theoretical importance and the fact that their protocol can in fact be applied in static networks as well, although they are not the primary focus of the present discussion.

What is the probability of a variance reduction step?

In Section 3.2 it was proven that ρ = 1/e (where ρ is the convergence factor) if the authors assume that during a cycle for each particular variance reduction step, each pair of nodes has an equal probability to perform that particular variance reduction step.

How can the authors describe the waiting time between two consecutive selections of a given node?

When iterating AVG, the waiting time between two consecutive selections of a given node can be described by the exponential distribution.

(Open Access) Gossip-based aggregation in large dynamic networks (2005) | Márk Jelasity

Q: What are the contributions in "Gossip-based aggregation in large dynamic networks∗" ?

The authors propose a gossip-based protocol for computing aggregate values over network components in a fully decentralized fashion. The authors demonstrate the efficiency and robustness of their gossip-based protocol both theoretically and experimentally under a variety of scenarios including node and communication failures. The class of aggregate functions the authors can compute is very broad and includes many useful special cases such as counting, averages, sums, products and extremal values.

Q: What are some examples of aggregation functions?

Examples of aggregation functions include network size, total free storage, maximum load, average uptime, location and intensity of hotspots, etc.

Gossip-based Aggregation in Large Dynamic Networks

∗

Márk Jelasity, Alberto Montresor and Ozalp Babaoglu

Università di Bologna

Abstract

As computer networks increase in size, become mor e hete rogeneous and span greater

geogra phic distances, applications must be designed to cope with the very large scale, poor

reliability, and often, with the extreme dynamism of the underlying network. Aggregation

is a key functio nal building block for such applications: it refers to a set of functions that

provide comp onents of a distributed system access to global information includ ing network

size, average lo ad, average uptime, location an d description of hotspots, etc. Local access

to g lobal information is often very useful, if not indispensable for building applications that

are robust and adaptive. For examp le , in an industrial control application, some aggregate

value reaching a threshold may trigger the execution of certain actions; a d istributed storage

system will want to know the total available free space; load balancing protocols may beneﬁt

from knowing the target average load so as to minimize the load they transfer. We propose

a gossip-ba sed protocol for computing aggregate values over network components in a fully

decentralized fashion. The c la ss of aggregate function s we can compute is very broa d and

includes many useful special cases such as counting, averages, sums, products and extremal

values. The protocol is suitable for extrem e ly large and highly dynamic systems due to

its proactive structure—all nodes receive the aggregate value continuously, thus being able

to track any changes in the system. The protocol is also extremely lightweight making it

suitable for many distributed applications including peer-to-peer and grid computing systems.

We demonstrate the efﬁciency and robustness of our gossip-based protocol b oth theoretica lly

and experimentally under a variety of scenarios including nod e and communication failures.

1 Introduction

Computer networks in general, and the Internet in particular, are experiencing explosive growth

in many dimensions, including size, performance, user base and geographical span. The poten-

tial for communication and access to computational resources have improved dramatically both

quantitatively and qualitatively in a relatively short time. New design paradigms such as peer-to-

peer (P2P) [18] and grid computing [14] have emerged in response to these trends. The Internet,

and all similar networks, pose special challenges for large-scale, reliable, distributed application

builders. The “best-effort” design philosophy that characterizes such networks renders the com-

munication channels inherently unreliable and the continuous ﬂux of nodes joining and leaving

the network make them highly dynamic. Control and monitoring in such systems are particularly

challenging: performing global computations requires orchestrating a huge number of nodes.

In this paper, we focus on aggregation which is a useful building block in large, unreliable

and dynamic systems [25]. Aggregation is a common name for a set of functions that provide a

∗

 ACM, 2005. This is the author’s version of the work. It is posted here by permission of ACM for your

personal use. Not for redistribution. The deﬁnitive version was published in ACM Transactions on Computer Systems,

23(3):219–252, August 2005. http://doi.acm.org/10.1145/1082469.1082470

summary of some global system property. In other words, they allow local access to global infor-

mation in order to simplify the task of controlling, monitoring and optimization in distributed ap-

plications. E xamples of aggregation functions include network size, total free storage, maximum

load, average uptime, location and intensity of hotspots, etc. Furthermore, simple aggregation

functions can be used as building blocks to support more complex protocols. For example, the

knowledge of average load in a system can be exploited to implement near-optimal load-balancing

schemes [12].

We distinguish reactive and proactive protocols for computing aggregation functions. Re-

active protocols respond to speciﬁc queries issued by nodes in the network. The answers are

returned directly to the issuer of the query while the rest of the nodes may or may not learn about

the answer. Proactive protocols, on the other hand, continuously provide the value of some ag-

gregate function to all nodes in the system in an adaptive fashion. By adaptive we mean that

if the aggregate changes due to network dynamism or because of variations in the input values,

the output of the aggregation protocol should track these changes reasonably quickly. Proactive

protocols are often useful when aggregation is used as a building block for completely decen-

tralized solutions to complex tasks. For example, in the load-balancing scheme cited above, the

knowledge of the global average load is used by each node to decide if and when it should transfer

load [12].

Contribution In this paper we introduce a robust and adaptive protocol for calculating aggre-

gates in a proactive manner. We assume that each node maintains a local approximate of the

aggregate value. The core of the protocol is a simple gossip-based communication scheme in

which each node periodically selects some other random node to communicate with. During this

communication the nodes update their local approximate values by performing some aggregation-

speciﬁc and strictly local computation based on their previous approximate values. This local

pairwise interaction is designed in such a way that all approximate values in the system will

quickly converge to the desired aggregate value.

In addition to introducing our gossip-based protocol, the contributions of this paper are three-

fold. First, we present a full-ﬂedged practical solution for proactive aggregation in dynamic

environments, complete with mechanisms for adaptivity, robustness and topology management.

Second, we show how our approach can be extended to compute complex aggregates such as vari-

ances and different means. Third, we present theoretical and experimental evidence supporting

the efﬁciency of the protocol and illustrating its robustness with respect to node and link failures

and message loss.

Outline In Section 2 we deﬁne the system model. S ection 3 describes the core idea of the proto-

col and presents theoretical and simulation results of its performance. In Section 4 we discuss the

extensions necessary for practical applications. Section 5 introduces novel algorithms for com-

puting statistical functions including several means, network size and variance. Sections 6 and 7

present analytical and experimental evidence on the high robustness of our protocol. Section 8

describes the prototype implementation of our protocol on PlanetLab and gives experimental re-

sults of its performance. Section 9 discusses related work. Finally, conclusions are drawn in

Section 10.

2 System Model

We consider a network consisting of a large collection of nodes that are assigned unique iden-

tiﬁers and that communicate through message exchanges. The network is highly dynamic; new

do exactly once in each consecutive

δ time units at a randomly picked time

q ← GETN EIGHBOR()

send s

to q

← receive(q)

← UPDATE(s

, s

)

(a) active thread

do forever

← receive(*)

send s

to sender(s

)

← UPDATE(s

, s

)

(b) passive thread

Figure 1: Push-pull gossip protocol executed by node p. The local state of p is denoted as s

nodes may join at any time, and existing nodes m ay leave, either voluntarily or by crashing. Our

approach does not require any mechanism speciﬁc to leaves: spontaneous crashes and voluntary

leaves are treated uniformly. Thus, in the following, we limit our discussion to node crashes.

Byzantine failures, with nodes behaving arbitrarily, are excluded from the present discussion (but

see [11]).

We assume that nodes are connected through an existing routed network, such as the Internet,

where every node can potentially communicate with every other node. To actually communicate,

a node has to know the identiﬁers of a set of other nodes, called its neighbors. This neighborhood

relation over the nodes deﬁnes the topology of an overlay network. Given the large scale and

the dynamicity of our envisioned system, neighborhoods are typically limited to small subsets

of the entire network. The set of neighbors of a node (thus the overlay network topology) can

change dynamically. Communication incurs unpredictable delays and is subject to failures. Single

messages m ay be lost, links between pairs of nodes may break. Occasional performance failures

(e.g., delay in receiving or sending a message in time) can be seen as general communication

failures, and are treated as such. Nodes have access to local clocks that can measure the passage

of real time with reasonable accuracy, that is, with small short-term drift.

In this paper we focus on node and communication failures. Some other aspects of the model

that are outside of the scope of the present analysis (such as clock drift and message delays) are

discussed only informally in Section 4.

3 Gossip-based Aggregation

We assume that each node in the network holds a numeric value. In a practical setting, this value

can characterize any (possibly dynamic) aspect of the node or its environment (e.g., the load at

the node, available storage space, temperature measured by a sensor network, etc.). The task of a

proactive protocol is to continously provide all nodes with an up-to-date estimate of an aggregate

function, computed over the values held by the current set of nodes.

3.1 The Basic Aggregation Protocol

Our basic aggregation protocol is based on the “push-pull gossiping” scheme illustrated in F ig-

ure 1. Each node p executes two different threads. The active thread periodically initiates an

information exchange with a random neighbor q by sending it a message containing the local

state s

and waiting for a response with the remote state s

. The passive thread waits for mes-

sages sent by an initiator and replies with the local state. The term push-pull refers to the fact

that each information exchange is performed in a symmetric manner: both participants send and

receive their states.

Even though the system is not synchronous, we ﬁnd it convenient to describe the protocol

execution in terms of consecutive real time intervals of length δ called cycles that are enumerated

starting from some convenient point.

Method GETNEIGHBOR can be thought of as an underlying service to the aggregation proto-

col, which is normally (but not necessarily) implemented by sampling a locally available set of

neighbors. In other words, an overlay network is applied to ﬁnd communication partners. In

Section 3.2 we will assume that GETNEIGHBOR returns a uniform random sample over the entire

set of nodes. In Section 4.4 we revisit this service from a practical point of view, by looking at

realistic implementations based on non-uniform or dynamically changing overlay topologies.

Method UPDATE computes a new local state based on the current local state and the remote

state received during the information exchange. The output of UPDATE and the semantics of the

node state depend on the speciﬁc aggregation function being implemented by the protocol. In

this section, we limit the discussion to computing the average over the set of numbers distributed

among the nodes. Additional functions (most of them derived from the averaging protocol) are

described in Section 5.

In the case of computing the average, each node stores a single numeric value representing the

current estimate of the ﬁnal aggregation output w hich is the global average. Each node initializes

the estimate with the local value it holds. Method UPDATE(s

, s

), where s

and s

are the esti-

mates exchanged by p and q, returns (s

+ s

)/2. After one exchange, the sum of the two local

estimates remains unchanged since method UPDATE simply redistributes the initial sum equally

among the two nodes. So, the operation does not change the global average but it decreases the

variance over the set of all estimates in the system.

It is easy to see that the variance tends to zero, that is, the value at each node will converge

to the true global average, as long as the network of nodes is not partitioned into disjoint clusters.

To see this, one should consider the minimal value in the system. It can be proven that there

is a positive probability in each cycle that either the number of instances of the minimal value

decreases or the global minimum increases if there are different values from the minimal value

(otherwise we are done because all values are equal). The idea is that if there is at least one

different value, than at least one of the instances of the minimal values will have a neighbor with

a different (thus larger) value and so it will have a positive probability to be matched with this

neighbor.

In the following, we give basic theoretical results that characterize the speed of the conver-

gence of the variance. We will show that each cycle results in a reduction of the variance by a

constant factor, which provides exponential convergence. We will assume that no failures oc-

cur and that the starting point of the protocol is synchronized. Later in the paper, all of these

assumptions will be relaxed.

3.2 Theoretical Analysis of Gossip-based Aggregation

We begin by introducing the conceptual framework and notations to be used for the purpose of

the mathematical analysis. We proceed by calculating convergence rates for various algorithms.

Our results are validated and illustrated by numerical simulation when necessary.

We will treat the averaging protocol as an iterative variance reduction algorithm over a vector

of numbers. In this framework, we can formulate our approach as follows. We are given an initial

vector of numbers w

= (w

0,1

. . . w

0,N

). The elements of this vector correspond to the initial

values at the nodes. We shall model this vector by assuming that w

0,1

, . . . , w

0,N

are independent

random variables with identical expected values and a ﬁnite variance.

The assumption of identical expected values is not as restrictive as it may seem. Too see

this, observe that after any permutation of the initial values, the statistical behavior of the system

// vector w is the input

do N times

(i, j) = GETPAIR()

// perform elementary variance reduction step

= w

= (w

+ w

)/2

return w

Figure 2: Skeleton of global algorithm AVG used to model the distributed protocol of Figure 1.

remains unchanged since the protocol causes nodes to communicate in random order. This means

that if we analyze the model in which we ﬁrst apply a random permutation over the variables,

we will obtain identical predictions for convergence. But if we apply a permutation, then we

essentially transform the original vector of variables into another vector in which all variables

have identical distribution, so the assumption of identical expected values holds.

In more detail, starting with random variables w

0,1

, . . . , w

0,N

with arbitrary expected values,

after a random permutation, the new value at index i, denoted b

, will have the distribution

P (b

< x) =

j=1

P (w

< x) (1)

since all variables can be shifted to any position with equal probability. That is, while obtaining an

equivalent probability model as mentioned above, the distributions of random variables b

, . . . , b

are now identical. Note that the assumption of independence is technically violated (variables

, . . . , b

are not independent), but in the case of large networks, the consequences will be

insigniﬁcant.

When considering the network as a whole, one cycle of the averaging protocol can be seen

as a variance reduction algorithm (let us call it AVG) which takes a vector w of length N as a

parameter and produces a new vector w

′

= AVG(w) of the same length. In other words, AVG is a

a single, central algorithm operating globally on the distributed state of the system, as opposed to

the distributed protocol of Figure 1. This centralized view of the protocol serves to simplify our

theoretical analysis of its behavior.

The consecutive cycles of the protocol result in a series of vectors w

, w

, . . ., where w

i+1

AVG(w

). The elements of vector w

are denoted as w

= (w

i,1

. . . w

i,N

). Algorithm AVG

is illustrated in Figure 2 and takes w as a parameter and modiﬁes it in place producing a new

vector. The behavior of our distributed gossip-based protocol can be reproduced by an appropriate

implementation of GETPAIR. In addition, other implementations of GETPAI R are possible that do

not necessarily map to any distributed protocol but are of theoretical interest. We will discuss

some important special cases as part of our analysis.

We introduce the following empirical statistics for characterizing the state of the system in

cycle i:

k=1

i,k

(2)

= σ

N − 1

k=1

i,k

−

)

(3)

where

is the target value of the protocol and σ

is a variance-like measure of homogeneity

that characterizes the quality of local approximations. In other words, it expresses the deviation

Gossip-based aggregation in large dynamic networks

Figures

Citations

Internet of things: Vision, applications and research challenges

PeerSim: A scalable P2P simulator

State Estimation and Sliding-Mode Control of Markovian Jump Singular Systems

Gossip-based peer sampling

Design Patterns from Biology for Distributed Computing

References

Collective dynamics of small-world networks

Linked: The New Science of Networks

Reaching Agreement in the Presence of Faults

Small Worlds: The Dynamics of Networks between Order and Randomness

Epidemic algorithms for replicated database maintenance

Related Papers (5)

Gossip-based computation of aggregate information

Epidemic algorithms for replicated database maintenance

Chord: A scalable peer-to-peer lookup service for internet applications

Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Randomized gossip algorithms

Frequently Asked Questions (11)

Q1. What are the contributions in "Gossip-based aggregation in large dynamic networks∗" ?

Q2. What are some examples of aggregation functions?

Q3. What are the two classes of realistic small-world topologies?

Q4. How do the authors calculate the nth central moment?

Q5. What is the mechanism used to terminate a protocol?

Q6. What is the structure of the graph between the two extremes?

Q7. What is the exact implementation of dynamic queries?

Q8. What is the reason why node crashes are important?

Q9. Why are static topologies considered unrealistic in the presence of churn?

Q10. What is the probability of a variance reduction step?

Q11. How can the authors describe the waiting time between two consecutive selections of a given node?