scispace - formally typeset
Open AccessProceedings ArticleDOI

Lightweight probabilistic broadcast

Reads0
Chats0
TLDR
Lightweight Probabilistic Broadcast is presented, a novel gossip-based broadcast algorithm which preserves the inherent throughput scalability of traditional gossip- based algorithms and adds a notion of membership management scalability: every process only knows a random subset of fixed size of the processes in the system.
Abstract
The growing interest in peer-to-peer applications has underlined the importance of scalability in modern distributed systems. Not surprisingly, much research effort has been invested in gossip-based broadcast protocols. These trade the traditional strong reliability guarantees against very good "scalability" properties. Scalability is in that context usually expressed in terms of throughput and delivery latency, but there is only little work on how to reduce the overhead of membership management on a large scale. The paper presents Lightweight Probabilistic Broadcast (lpbcast), a novel gossip-based broadcast algorithm which preserves the inherent throughput scalability of traditional gossip-based algorithms and adds a notion of membership management scalability: every process only knows a random subset of fixed size of the processes in the system. We formally analyze our broadcast algorithm in terms of scalability with respect to the size of individual views, and compare the analytical results both with simulations and concrete measurements.

read more

Content maybe subject to copyright    Report

Lightweight Probabilistic Broadcast
P. Th. Eugster
1
R. Guerraoui
1
S. B. Handurukande
1
A.-M. Kermarrec
2
P. Kouznetsov
1
1
Federal Institute of Technology, Lausanne, Switzerland
2
Microsoft Research, Cambridge, UK
Abstract
The growing interest in peer-to-peer applicationshas underlined the importance of scalability in modern distributed
systems. Not surprisingly, much research effort has been invested in gossip-based broadcast protocols. These trade
the traditional strong reliability guarantees against very good “scalability” properties. Scalability is in that context
usually expressed in terms of throughput, but there is only little work on how to reduce the overhead of membership
management at large scale.
This paper presents Lightweight Probabilistic Broadcast (lpbcast), a novel gossip-based broadcast algorithm which
preserves the inherent throughput scalability of traditional gossip-based algorithms and adds a notion of membership
management scalability: every process only knows a random subset of fixed size of the processes in the system. We
formally analyze our broadcast algorithm in terms of scalability with respect to the size of individual views, and
compare the analytical results both with simulations and concrete measurements.
1 Introduction
Large scale event dissemination. Peer-to-peer computing has recently received much attention, as shown by the
success of large scale decentralized applications like Gnutella [30] or Groove [12]. In peer-to-peer computing, every
process acts as client and server, and scalability is a major concern.
The scalability properties solicited from such applications have evolved from hundreds to thousands of participants,
but adequate algorithms for reliable propagation of events at large scale are still lacking. Network-level protocols
have turned out to be insufficient: IP multicast [6] lacks reliability guarantees, and reliable protocols do not scale
well. The well-known Reliable Multicast Transport Protocol (RMTP) [24] for instance generates a fload of positive
acknowledgementsfrom receivers, loading both the network and the sender, where these acknowledgementsconverge.
Any form of membership ([21, 15, 2]) is hidden by such network-level protocols, which makes them consequently
also difficult to exploit with more dynamic dissemination (filtering, e.g., [22]), emphasizing the need for new forms of
application-level broadcast.

Gossip-based broadcast algorithms. Gossip-based broadcast algorithms (e.g., [4, 26, 18]) appear to be more ad-
equate in the field of large scale event dissemination, than the “classical” strongly reliable approaches [14]. Though
such gossip-based approaches have proven good scalablility characteristics in terms of throughput, they often rely
on the assumption that every process knows every other process. When managing large numbers of processes, i.e.,
a large number of references to processes acting as event producers and/or consumers, this assumption becomes a
barrier to scalability. In fact, the data structures necessary to store the view of such a large scale membership consume
considerable memory resources, let aside the communication required to ensure the consistency of the membership.
Partial view. Message routing and membership management are sometimes delegated to dedicated servers
1
in order
to relief application processes. This only defers the problem, since those servers are limited in resources as well. To
further increase scalability, the membership view should be split, i.e., every participating process should only dispose
of a partial view of the system. In order to avoid the isolation of processes or the partition of the membership,
especially in the case of failures, membership information should nevertheless be shared by processes to some extent:
introducing a certain degree of redundancy between the individual views is crucial to avoid single points of failure.
Gossip-based membership. While certain systems rely on a deterministic scheme to establish the individual views
[28, 18], we introduce here a new completely randomized approach. The local view of every individual member
consists in a random process list which continuously evolves, but never exceeds a fixed size. In short, after adding
new processes to a view, it is truncated to the maximum length by removing randomly chosen entries. To ensure a
uniform distribution of membership knowledge among processes, every gossip message besides notifying events
mainly also piggybacks a set of process identifiers which are used to update views. The membership protocol and
the effective dissemination of events are thus dealt with at the same level. This symmetry is precisely the key to our
formal analysis.
Contributions. We presentin this paperour strongly scalable decentralized algorithm for event dissemination, called
lpbcast, which we have used to implement a static publish/subscribe
2
scheme based on topics [8]. We conveyour claim
of scalability in two steps. First, we formally analyze our algorithm using a stochastic approach, pointing out the fact
that, with perfectly uniformly distributed individual views, the view size has no impact on the latency of delivery of
an event. We similarly show that for a given view size, the probability of partition creation in the system decreases as
the system grows in size. Second, we give some practical results that support the analytical approach, both in terms of
simulation and prototype measurements.
It is important to notice that our membership approach is not intrinsically tied to our Lightweight Probabilistic
Broadcast (lpbcast) algorithm. We illustrate this by applying our membership scheme to the well-known pbcast [4]
algorithm.
Roadmap. Section 2 gives an overview of related gossip-based broadcast protocols. Section 3 presents our lpbcast
algorithm and explains our randomized approach. Section 4 presents a formal analysis of our algorithm in terms
1
These are also called event servers [5], routing daemons [27], or message brokers [1].
2
Due to its decoupling nature, the publish/subscribe paradigm has been used in various large scale contexts, e.g, [20, 9].

of scalability and reliability. Section 5 gives some simulation and practical results supporting the formal analysis.
Section 6 discusses the distribution of the views and also proves the general applicability of our membership approach
by combining it with pbcast and contrasting the consolidated algorithm with lpbcast. Section 7 concludes the paper.
2 Background: Probabilistic Algorithms
The achievement of strong reliability guarantees (in the sense of [14]) in practical distributed systems requires
expensive mechanisms to detect missing messages and initiate retransmissions. Due to the overhead of message
loss detection and reparation, protocols offering such strong guarantees do not scale over a couple of hundred pro-
cesses [25].
2.1 Reliability vs Scalability
Gossip,orrumor mongering algorithms [7], are a class of epidemiologic algorithms, which have been introduced as
an alternative to such “traditional” reliable broadcast protocols. They have first been developed for replicated database
consistency management [7]. The main motivation is to trade the reliability guarantees offered by costly deterministic
protocols against weaker reliability guarantees, but in return obtain very good scalability properties.
Their analysis is usually based on stochastics similar to the theory of epidemics [3], where the execution is broken
down in steps. Probabilities are associated to these steps, and such algorithms are therefore sometimes also referred to
as probabilistic algorithms. The degree of reliability is typically expressed by a probability; like the probability 1-α
of reaching all processes in the system for any given message, or by a probability 1-β of reaching any given process
with any given message. Ideally, α resp. β are precisely quantifiable.
2.2 Basic Concepts
Decentralization is the key concept underlying the scalability properties of gossip-based broadcast algorithms, i.e.,
the overall load of retransmissions is reduced by decentralizing the effort. In contrast to sender-reliable protocols
(e.g., Reliable Multicast Transport Protocol (RMTP) [24]) or receiver-reliable protocols (e.g., Log-Based Receiver-
Reliable Multicast (LBRM) [16]),
3
gossip-based broadcast protocols are part of the class of peer-based protocols,
just like Scalable Reliable Multicast (SRM) [10]. While retransmission requests in SRM can be handled by any
process but lead to the re-broadcasting of a message, gossip-based protocols abide even better to the nature of peer-
to-peer computing, by relying on pairwise interaction between peers. More precisely, retransmissions are initiated in
most gossip-based algorithms by having every process periodically (every T ms step interval) send a digest of the
messages it has delivered to a randomly chosen subset of processes inside the system (gossip subset). The size of
the subset is usually fixed, and is commonly called fanout (F ). Gossip protocols differ in the number of times the
same information is gossiped, i.e., every process might gossip the same information only a limited number of times
(repetitions are limited) and/or the same information might be forwarded only a limited number of times (hops are
limited).
3
In the first class of protocols (e.g. RMTP), senders wait for acknowledgements from receivers, while in the second class (e.g., LBRM), receivers
are responsible for detecting missing messages and soliciting retransmissions from senders.

2.3 Membership Tracking in Gossip-Based Algorithms
Membership tracking in gossip-based algorithms is a challenging issue. Early approaches like [11] admit that
the individual views of processes diverge temporarily, but assume that they eventually converge in “stable” phases.
These views however represent the “complete” membership, which becomes a bottleneck at an increased scale. The
Bimodal Multicast [4] and Directional Gossip [18] algorithms are representatives of a new generation of probabilistic
algorithms aware of the problem of scalable membership management.
Bimodal Multicast. Bimodal Multicast (also called pbcast) relies on two phases. A “classical” best-effort multicast
protocol (e.g., IP multicast) is used for a first rough dissemination of messages. A second phase assures reliability with
a certain probability, by using a peer-based retransmission based on gossips:
4
every process in the system periodically
gossips a digest of its received messages, and gossip receivers can solicit such messages from the sender if they have
not received them previously.
5
In [4], the membership problem is not dealt with, but the authors refer to another paper which deals with failure
detection based on gossips [29], while a third paper describes Capt’n Cook [28], a gossip-based resource location
protocol for the Internet, which can in that sense be seen as a membership protocol.
6
This protocol enables the
reduction of the view of each individual process: each process has a precise view of its immediate neighbours, while
the knowledge becomes less exhaustive at increasing “distance”. The notion of distance is expressed in terms of host
addresses. [28] however only considers the propagation of membership information and it is thus not clear how this
membership interacts with pbcast.
Directional Gossip. Directional Gossip is a protocol especially targeted at wide area networks. By taking into
account the topology of the networks and the current processes, optimizations are performed. More precisely, a weight
is computed for each neighbour node, representing the connectivity of that given node. The larger the weight of a
node, the more possibilities exist thus for it to be infected by any node. The protocol applies a simple heuristic, which
consists in choosing nodes with higher weights with a smaller probability than nodes with smaller weights. That way,
redundant sends are reduced. The algorithm is also based on partial views, in the sense that there is a single gossip
server per LAN which acts as a bridge to other LANs. This however leads to a static hierarchy, in which the failure of
a gossip server can isolate several processes from the remaining system.
In contrast to the deterministic hierarchical membership approaches in Directional Gossip or Capt’n Cook, our
lpbcast algorithm has a probabilistic approach to membership: each process has a random partial view of the system.
lpbcast is light weight in the sense that it consumes little resources in terms of memory and requires no dedicated
messages for membership management: gossip messages are used to disseminate notifications
7
and to propagate
4
In order to offer a complete guarantee of delivery, Reliable Probabilistic Multicast (rpbcast) [26] adds a deterministic third phase to the pbcast
protocol, in which centralized loggers are used if the second gossip-based phase fails.
5
This is commonly referred to as gossip pull in contrast to gossip push, where gossip senders are updated by gossip receivers with messages
missing in the digest gossiped by the former one (rpbcast uses gossip push). The term anti-entropy usually refers to a mixed push/pull variant,
where two processes symmetrically update each other.
6
Gossip-based garbage collection is dealt with in [13].
7
These notifications constitute the actual payload of the gossip messages, and can be viewed as application messages. In contrast, gossip mes-
sages constitute protocol messages. This distinction was not made previously, since gossips are seldom used as “primary” means of dissemination.

digests of received events, but also to propagate membership information.
3 Lightweight Probabilistic Broadcast (lpbcast)
In this section, we present our completely decentralized lightweight probabilistic algorithm for event dissemination
based on partial views. Though the parts concerning the event dissemination and the membership respectively can
be considered as independent, we present our solution as a monolithical algorithm. This is done in order to simplify
presentation, and to emphasize the possibility of dealing with membership and event dissemination at the same level.
3.1 System Model
We consider a system of processes Π={p
1
,p
2
, ...}. Processes join and leave the system dynamically and have
ordered distinct identifiers. We assume for presentation simplicity that there is not more than one process per node of
the network.
Though our algorithm has been implemented in the context of topic-based publish/subscribe [8], we present it with
respect to a single topic, and do not discuss the effect of scaling up topics. In other terms, Π can be considered
as a single topic or group, and joining/leaving Π can be viewed as subscribing/unsubscribing from the topic. Such
subscriptions/unsubscriptions are assumed to be rare compared to the large flow of events, and every process in Π can
subscribe to and/or publish events.
3.2 Gossip Messages
Our lpbcast algorithm is based on non-synchronized periodical gossips, where a gossip message contains several
types of information. To be more precise, a gossip message serves four purposes:
Notifications: A message piggybacks notifications received (for the first time) since the last outgoing gossip message.
Each process stores these notifications in a variable events. Every such notification is only gossiped at most once.
Older notifications are stored in a different buffer, which is only required to satisfy retransmission requests.
Notification identifiers: Each message also carries a digest (history) of notifications that the sending process has
received. To that end, every process stores identifiers of notifications it has already delivered in a variable eventIds.
We suppose that these identifiers are unique, and include the identifier of the originator. That way, the buffer can be
optimized by only retaining for each sender the identifiers of notifications delivered since the last one delivered in
sequence.
Unsubscriptions: A gossip message also piggybacks a subset of unsubscriptions. This type of information enables
the gradual removal of processes which have unsubscribed from local views. Unsubscriptions that are eligible to be
forwarded with the next gossip(s) are stored in a variable unSubs.
Subscriptions: A set of subscriptions are attached to each message. These subscriptions are buffered in subs.A
gossip receiver uses these subscriptions to update its view, stored in a variable view.

Citations
More filters
Journal ArticleDOI

Scribe: a large-scale and decentralized application-level multicast infrastructure

TL;DR: Simulation results, based on a realistic network topology model, show that Scribe scales across a wide range of groups and group sizes, and balances the load on the nodes while achieving acceptable delay and link stress when compared with Internet protocol multicast.
Journal ArticleDOI

SplitStream: high-bandwidth multicast in cooperative environments

TL;DR: The design and implementation of SplitStream are presented and experimental results show that SplitStream distributes the forwarding load among all peers and can accommodate peers with different bandwidth capacities while imposing low overhead for forest construction and maintenance.
Journal ArticleDOI

A survey of attack and defense techniques for reputation systems

TL;DR: This work contributes to understanding which design components of reputation systems are most vulnerable, what are the most appropriate defense mechanisms and how these defense mechanisms can be integrated into existing or future reputation systems to make them resilient to attacks.
Journal ArticleDOI

Bullet: high bandwidth data dissemination using an overlay mesh

TL;DR: This paper presents Bullet, a scalable and distributed algorithm that enables nodes spread across the Internet to self-organize into a high bandwidth overlay mesh, and finds that, relative to tree-based solutions, Bullet reduces the need to perform expensive bandwidth probing.
Book ChapterDOI

SCRIBE: The Design of a Large-Scale Event Notification Infrastructure

TL;DR: Scribe is built on top of Pastry, a generic peer-to-peer object location and routing substrate overlayed on the Internet, and leverages Pastry's reliability, self-organization and locality properties.
References
More filters
Journal ArticleDOI

The many faces of publish/subscribe

TL;DR: This paper factors out the common denominator underlying these variants: full decoupling of the communicating entities in time, space, and synchronization to better identify commonalities and divergences with traditional interaction paradigms.
Proceedings ArticleDOI

Epidemic algorithms for replicated database maintenance

TL;DR: This paper descrikrs several randomized algorit, hms for dist,rihut.ing updates and driving t,he replicas toward consist,c>nc,y.
Journal ArticleDOI

A reliable multicast framework for light-weight sessions and application level framing

TL;DR: An adaptive algorithm is demonstrated that uses the results of previous loss recovery events to adapt the control parameters used for future loss recovery, and provides good performance over a wide range of underlying topologies.
Book

Distributed Systems

Related Papers (5)
Frequently Asked Questions (10)
Q1. What are the contributions in "Lightweight probabilistic broadcast" ?

This paper presents Lightweight Probabilistic Broadcast ( lpbcast ), a novel gossip-based broadcast algorithm which preserves the inherent throughput scalability of traditional gossip-based algorithms and adds a notion of membership management scalability: every process only knows a random subset of fixed size of the processes in the system. The authors formally analyze their broadcast algorithm in terms of scalability with respect to the size of individual views, and compare the analytical results both with simulations and concrete measurements. 

With buffers for notifications of infinite length, as the authors have supposed in the analysis, reliability would remain constant as l becomes smaller. 

Decentralization is the key concept underlying the scalability properties of gossip-based broadcast algorithms, i.e., the overall load of retransmissions is reduced by decentralizing the effort. 

The fact that the membership becomes more stable with an increased n can be intuitively reproduced since, with a large system, membership information becomes more sparsely distributed, and the probability of having concentrated exclusive knowledge becomes vanishingly small. 

While retransmission requests in SRM can be handled by any process but lead to the re-broadcasting of a message, gossip-based protocols abide even better to the nature of peerto-peer computing, by relying on pairwise interaction between peers. 

Due to the overhead of message loss detection and reparation, protocols offering such strong guarantees do not scale over a couple of hundred processes [25]. 

Network-level protocols have turned out to be insufficient: IP multicast [6] lacks reliability guarantees, and reliable protocols do not scale well. 

The authors are indeed currently investigating how to combine their membership approach with other gossip-based event dissemination algorithms, e.g., using loggers to ensure strong reliability guarantees whenever this is required (cf. rpbcast). 

In fact, because repetitions and hops are limited in the case of pbcast, a higher fanout is required to obtain similar results than with lpbcast (F = 5 here vs F = 3 in Figure 6(a)). 

The probability of a message loss does not exceed a predefined ε > 0, and the number of process crashes in a run does not exceed f < n.