What is the effect of l on the reliability of the system?

With buffers for notifications of infinite length, as the authors have supposed in the analysis, reliability would remain constant as l becomes smaller.

How does the probability of a partitioning change with increasing n be reproduced?

The fact that the membership becomes more stable with an increased n can be intuitively reproduced since, with a large system, membership information becomes more sparsely distributed, and the probability of having concentrated exclusive knowledge becomes vanishingly small.

What are the main problems of network-level protocols?

Network-level protocols have turned out to be insufficient: IP multicast [6] lacks reliability guarantees, and reliable protocols do not scale well.

how to combine their membership approach with other gossip-based event dissemination algorithms?

The authors are indeed currently investigating how to combine their membership approach with other gossip-based event dissemination algorithms, e.g., using loggers to ensure strong reliability guarantees whenever this is required (cf. rpbcast).

Why is a higher fanout required to obtain similar results than with lpbcast?

In fact, because repetitions and hops are limited in the case of pbcast, a higher fanout is required to obtain similar results than with lpbcast (F = 5 here vs F = 3 in Figure 6(a)).

How many processes are likely to crash during a run?

The probability of a message loss does not exceed a predefined ε > 0, and the number of process crashes in a run does not exceed f < n.

(Open Access) Lightweight probabilistic broadcast (2001) | Patrick Eugster

Q: What are the contributions in "Lightweight probabilistic broadcast" ?

This paper presents Lightweight Probabilistic Broadcast ( lpbcast ), a novel gossip-based broadcast algorithm which preserves the inherent throughput scalability of traditional gossip-based algorithms and adds a notion of membership management scalability: every process only knows a random subset of fixed size of the processes in the system. The authors formally analyze their broadcast algorithm in terms of scalability with respect to the size of individual views, and compare the analytical results both with simulations and concrete measurements.

Q: What is the key concept underlying the scalability properties of gossip-based broadcast algorithms?

Decentralization is the key concept underlying the scalability properties of gossip-based broadcast algorithms, i.e., the overall load of retransmissions is reduced by decentralizing the effort.

Q: What is the nature of gossip-based broadcast protocols?

While retransmission requests in SRM can be handled by any process but lead to the re-broadcasting of a message, gossip-based protocols abide even better to the nature of peerto-peer computing, by relying on pairwise interaction between peers.

Q: Why does the lpbcast algorithm not scale over a couple of hundred processes?

Due to the overhead of message loss detection and reparation, protocols offering such strong guarantees do not scale over a couple of hundred processes [25].

Lightweight Probabilistic Broadcast

P. Th. Eugster

R. Guerraoui

S. B. Handurukande

A.-M. Kermarrec

P. Kouznetsov

Federal Institute of Technology, Lausanne, Switzerland

Microsoft Research, Cambridge, UK

Abstract

The growing interest in peer-to-peer applicationshas underlined the importance of scalability in modern distributed

systems. Not surprisingly, much research effort has been invested in gossip-based broadcast protocols. These trade

the traditional strong reliability guarantees against very good “scalability” properties. Scalability is in that context

usually expressed in terms of throughput, but there is only little work on how to reduce the overhead of membership

management at large scale.

This paper presents Lightweight Probabilistic Broadcast (lpbcast), a novel gossip-based broadcast algorithm which

preserves the inherent throughput scalability of traditional gossip-based algorithms and adds a notion of membership

management scalability: every process only knows a random subset of ﬁxed size of the processes in the system. We

formally analyze our broadcast algorithm in terms of scalability with respect to the size of individual views, and

compare the analytical results both with simulations and concrete measurements.

1 Introduction

Large scale event dissemination. Peer-to-peer computing has recently received much attention, as shown by the

success of large scale decentralized applications like Gnutella [30] or Groove [12]. In peer-to-peer computing, every

process acts as client and server, and scalability is a major concern.

The scalability properties solicited from such applications have evolved from hundreds to thousands of participants,

but adequate algorithms for reliable propagation of events at large scale are still lacking. Network-level protocols

have turned out to be insufﬁcient: IP multicast [6] lacks reliability guarantees, and reliable protocols do not scale

well. The well-known Reliable Multicast Transport Protocol (RMTP) [24] for instance generates a ﬂoad of positive

acknowledgementsfrom receivers, loading both the network and the sender, where these acknowledgementsconverge.

Any form of membership ([21, 15, 2]) is hidden by such network-level protocols, which makes them consequently

also difﬁcult to exploit with more dynamic dissemination (ﬁltering, e.g., [22]), emphasizing the need for new forms of

application-level broadcast.

Gossip-based broadcast algorithms. Gossip-based broadcast algorithms (e.g., [4, 26, 18]) appear to be more ad-

equate in the ﬁeld of large scale event dissemination, than the “classical” strongly reliable approaches [14]. Though

such gossip-based approaches have proven good scalablility characteristics in terms of throughput, they often rely

on the assumption that every process knows every other process. When managing large numbers of processes, i.e.,

a large number of references to processes acting as event producers and/or consumers, this assumption becomes a

barrier to scalability. In fact, the data structures necessary to store the view of such a large scale membership consume

considerable memory resources, let aside the communication required to ensure the consistency of the membership.

Partial view. Message routing and membership management are sometimes delegated to dedicated servers

in order

to relief application processes. This only defers the problem, since those servers are limited in resources as well. To

further increase scalability, the membership view should be split, i.e., every participating process should only dispose

of a partial view of the system. In order to avoid the isolation of processes or the partition of the membership,

especially in the case of failures, membership information should nevertheless be shared by processes to some extent:

introducing a certain degree of redundancy between the individual views is crucial to avoid single points of failure.

Gossip-based membership. While certain systems rely on a deterministic scheme to establish the individual views

[28, 18], we introduce here a new completely randomized approach. The local view of every individual member

consists in a random process list which continuously evolves, but never exceeds a ﬁxed size. In short, after adding

new processes to a view, it is truncated to the maximum length by removing randomly chosen entries. To ensure a

uniform distribution of membership knowledge among processes, every gossip message – besides notifying events –

mainly also piggybacks a set of process identiﬁers which are used to update views. The membership protocol and

the effective dissemination of events are thus dealt with at the same level. This symmetry is precisely the key to our

formal analysis.

Contributions. We presentin this paperour strongly scalable decentralized algorithm for event dissemination, called

lpbcast, which we have used to implement a static publish/subscribe

scheme based on topics [8]. We conveyour claim

of scalability in two steps. First, we formally analyze our algorithm using a stochastic approach, pointing out the fact

that, with perfectly uniformly distributed individual views, the view size has no impact on the latency of delivery of

an event. We similarly show that for a given view size, the probability of partition creation in the system decreases as

the system grows in size. Second, we give some practical results that support the analytical approach, both in terms of

simulation and prototype measurements.

It is important to notice that our membership approach is not intrinsically tied to our Lightweight Probabilistic

Broadcast (lpbcast) algorithm. We illustrate this by applying our membership scheme to the well-known pbcast [4]

algorithm.

Roadmap. Section 2 gives an overview of related gossip-based broadcast protocols. Section 3 presents our lpbcast

algorithm and explains our randomized approach. Section 4 presents a formal analysis of our algorithm in terms

These are also called event servers [5], routing daemons [27], or message brokers [1].

Due to its decoupling nature, the publish/subscribe paradigm has been used in various large scale contexts, e.g, [20, 9].

of scalability and reliability. Section 5 gives some simulation and practical results supporting the formal analysis.

Section 6 discusses the distribution of the views and also proves the general applicability of our membership approach

by combining it with pbcast and contrasting the consolidated algorithm with lpbcast. Section 7 concludes the paper.

2 Background: Probabilistic Algorithms

The achievement of strong reliability guarantees (in the sense of [14]) in practical distributed systems requires

expensive mechanisms to detect missing messages and initiate retransmissions. Due to the overhead of message

loss detection and reparation, protocols offering such strong guarantees do not scale over a couple of hundred pro-

cesses [25].

2.1 Reliability vs Scalability

Gossip,orrumor mongering algorithms [7], are a class of epidemiologic algorithms, which have been introduced as

an alternative to such “traditional” reliable broadcast protocols. They have ﬁrst been developed for replicated database

consistency management [7]. The main motivation is to trade the reliability guarantees offered by costly deterministic

protocols against weaker reliability guarantees, but in return obtain very good scalability properties.

Their analysis is usually based on stochastics similar to the theory of epidemics [3], where the execution is broken

down in steps. Probabilities are associated to these steps, and such algorithms are therefore sometimes also referred to

as probabilistic algorithms. The degree of reliability is typically expressed by a probability; like the probability 1-α

of reaching all processes in the system for any given message, or by a probability 1-β of reaching any given process

with any given message. Ideally, α resp. β are precisely quantiﬁable.

2.2 Basic Concepts

Decentralization is the key concept underlying the scalability properties of gossip-based broadcast algorithms, i.e.,

the overall load of retransmissions is reduced by decentralizing the effort. In contrast to sender-reliable protocols

(e.g., Reliable Multicast Transport Protocol (RMTP) [24]) or receiver-reliable protocols (e.g., Log-Based Receiver-

Reliable Multicast (LBRM) [16]),

gossip-based broadcast protocols are part of the class of peer-based protocols,

just like Scalable Reliable Multicast (SRM) [10]. While retransmission requests in SRM can be handled by any

process but lead to the re-broadcasting of a message, gossip-based protocols abide even better to the nature of peer-

to-peer computing, by relying on pairwise interaction between peers. More precisely, retransmissions are initiated in

most gossip-based algorithms by having every process periodically (every T ms – step interval) send a digest of the

messages it has delivered to a randomly chosen subset of processes inside the system (gossip subset). The size of

the subset is usually ﬁxed, and is commonly called fanout (F ). Gossip protocols differ in the number of times the

same information is gossiped, i.e., every process might gossip the same information only a limited number of times

(repetitions are limited) and/or the same information might be forwarded only a limited number of times (hops are

limited).

In the ﬁrst class of protocols (e.g. RMTP), senders wait for acknowledgements from receivers, while in the second class (e.g., LBRM), receivers

are responsible for detecting missing messages and soliciting retransmissions from senders.

2.3 Membership Tracking in Gossip-Based Algorithms

Membership tracking in gossip-based algorithms is a challenging issue. Early approaches like [11] admit that

the individual views of processes diverge temporarily, but assume that they eventually converge in “stable” phases.

These views however represent the “complete” membership, which becomes a bottleneck at an increased scale. The

Bimodal Multicast [4] and Directional Gossip [18] algorithms are representatives of a new generation of probabilistic

algorithms – aware of the problem of scalable membership management.

Bimodal Multicast. Bimodal Multicast (also called pbcast) relies on two phases. A “classical” best-effort multicast

protocol (e.g., IP multicast) is used for a ﬁrst rough dissemination of messages. A second phase assures reliability with

a certain probability, by using a peer-based retransmission based on gossips:

every process in the system periodically

gossips a digest of its received messages, and gossip receivers can solicit such messages from the sender if they have

not received them previously.

In [4], the membership problem is not dealt with, but the authors refer to another paper which deals with failure

detection based on gossips [29], while a third paper describes Capt’n Cook [28], a gossip-based resource location

protocol for the Internet, which can in that sense be seen as a membership protocol.

This protocol enables the

reduction of the view of each individual process: each process has a precise view of its immediate neighbours, while

the knowledge becomes less exhaustive at increasing “distance”. The notion of distance is expressed in terms of host

addresses. [28] however only considers the propagation of membership information and it is thus not clear how this

membership interacts with pbcast.

Directional Gossip. Directional Gossip is a protocol especially targeted at wide area networks. By taking into

account the topology of the networks and the current processes, optimizations are performed. More precisely, a weight

is computed for each neighbour node, representing the connectivity of that given node. The larger the weight of a

node, the more possibilities exist thus for it to be infected by any node. The protocol applies a simple heuristic, which

consists in choosing nodes with higher weights with a smaller probability than nodes with smaller weights. That way,

redundant sends are reduced. The algorithm is also based on partial views, in the sense that there is a single gossip

server per LAN which acts as a bridge to other LANs. This however leads to a static hierarchy, in which the failure of

a gossip server can isolate several processes from the remaining system.

In contrast to the deterministic hierarchical membership approaches in Directional Gossip or Capt’n Cook, our

lpbcast algorithm has a probabilistic approach to membership: each process has a random partial view of the system.

lpbcast is light weight in the sense that it consumes little resources in terms of memory and requires no dedicated

messages for membership management: gossip messages are used to disseminate notiﬁcations

and to propagate

In order to offer a complete guarantee of delivery, Reliable Probabilistic Multicast (rpbcast) [26] adds a deterministic third phase to the pbcast

protocol, in which centralized loggers are used if the second gossip-based phase fails.

This is commonly referred to as gossip pull in contrast to gossip push, where gossip senders are updated by gossip receivers with messages

missing in the digest gossiped by the former one (rpbcast uses gossip push). The term anti-entropy usually refers to a mixed push/pull variant,

where two processes symmetrically update each other.

Gossip-based garbage collection is dealt with in [13].

These notiﬁcations constitute the actual payload of the gossip messages, and can be viewed as application messages. In contrast, gossip mes-

sages constitute protocol messages. This distinction was not made previously, since gossips are seldom used as “primary” means of dissemination.

digests of received events, but also to propagate membership information.

3 Lightweight Probabilistic Broadcast (lpbcast)

In this section, we present our completely decentralized lightweight probabilistic algorithm for event dissemination

based on partial views. Though the parts concerning the event dissemination and the membership respectively can

be considered as independent, we present our solution as a monolithical algorithm. This is done in order to simplify

presentation, and to emphasize the possibility of dealing with membership and event dissemination at the same level.

3.1 System Model

We consider a system of processes Π={p

, ...}. Processes join and leave the system dynamically and have

ordered distinct identiﬁers. We assume for presentation simplicity that there is not more than one process per node of

the network.

Though our algorithm has been implemented in the context of topic-based publish/subscribe [8], we present it with

respect to a single topic, and do not discuss the effect of scaling up topics. In other terms, Π can be considered

as a single topic or group, and joining/leaving Π can be viewed as subscribing/unsubscribing from the topic. Such

subscriptions/unsubscriptions are assumed to be rare compared to the large ﬂow of events, and every process in Π can

subscribe to and/or publish events.

3.2 Gossip Messages

Our lpbcast algorithm is based on non-synchronized periodical gossips, where a gossip message contains several

types of information. To be more precise, a gossip message serves four purposes:

Notiﬁcations: A message piggybacks notiﬁcations received (for the ﬁrst time) since the last outgoing gossip message.

Each process stores these notiﬁcations in a variable events. Every such notiﬁcation is only gossiped at most once.

Older notiﬁcations are stored in a different buffer, which is only required to satisfy retransmission requests.

Notiﬁcation identiﬁers: Each message also carries a digest (history) of notiﬁcations that the sending process has

received. To that end, every process stores identiﬁers of notiﬁcations it has already delivered in a variable eventIds.

We suppose that these identiﬁers are unique, and include the identiﬁer of the originator. That way, the buffer can be

optimized by only retaining for each sender the identiﬁers of notiﬁcations delivered since the last one delivered in

sequence.

Unsubscriptions: A gossip message also piggybacks a subset of unsubscriptions. This type of information enables

the gradual removal of processes which have unsubscribed from local views. Unsubscriptions that are eligible to be

forwarded with the next gossip(s) are stored in a variable unSubs.

Subscriptions: A set of subscriptions are attached to each message. These subscriptions are buffered in subs.A

gossip receiver uses these subscriptions to update its view, stored in a variable view.

Lightweight probabilistic broadcast

Figures

Citations

Scribe: a large-scale and decentralized application-level multicast infrastructure

SplitStream: high-bandwidth multicast in cooperative environments

A survey of attack and defense techniques for reputation systems

Bullet: high bandwidth data dissemination using an overlay mesh

SCRIBE: The Design of a Large-Scale Event Notification Infrastructure

References

The many faces of publish/subscribe

The Mathematical Theory of Infectious Diseases and its applications

Epidemic algorithms for replicated database maintenance

A reliable multicast framework for light-weight sessions and application level framing

Distributed Systems

Related Papers (5)

Epidemic algorithms for replicated database maintenance

Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Chord: A scalable peer-to-peer lookup service for internet applications

A scalable content-addressable network

A reliable multicast framework for light-weight sessions and application level framing

Frequently Asked Questions (10)

Q1. What are the contributions in "Lightweight probabilistic broadcast" ?

Q2. What is the effect of l on the reliability of the system?

Q3. What is the key concept underlying the scalability properties of gossip-based broadcast algorithms?

Q4. How does the probability of a partitioning change with increasing n be reproduced?

Q5. What is the nature of gossip-based broadcast protocols?

Q6. Why does the lpbcast algorithm not scale over a couple of hundred processes?

Q7. What are the main problems of network-level protocols?

Q8. how to combine their membership approach with other gossip-based event dissemination algorithms?

Q9. Why is a higher fanout required to obtain similar results than with lpbcast?

Q10. How many processes are likely to crash during a run?