scispace - formally typeset
Open AccessProceedings ArticleDOI

SCOPE: scalable consistency maintenance in structured P2P systems

Reads0
Chats0
TLDR
The theoretical analyses and experimental results demonstrate that SCOPE can effectively maintain replica consistency while preventing hot spot and node-failure problems and its efficiency in maintenance and failure-recovery is particularly attractive to the deployment of large-scale P2P systems.
Abstract
While current peer-to-peer (P2P) systems facilitate static file sharing, newly developed applications demand that P2P systems be able to manage dynamically changing files. Maintaining consistency between frequently updated files and their replicas is a fundamental reliability requirement for a P2P system. In this paper, we present SCOPE, a structured P2P system supporting consistency among a large number of replicas. By building a replica-partition-tree (RPT) for each key, SCOPE keeps track of the locations of replicas and then propagates update notifications. Our theoretical analyses and experimental results demonstrate that SCOPE can effectively maintain replica consistency while preventing hot spot and node-failure problems. Its efficiency in maintenance and failure-recovery is particularly attractive to the deployment of large-scale P2P systems.

read more

Content maybe subject to copyright    Report

SCOPE: Scalable Consistency Maintenance in
Structured P2P Systems
Xin Chen
Ask Jeeves Inc.
& Department of Computer Science
College of William and Mary
xinchen@cs.wm.edu
Shansi Ren, Haining Wang, Xiaodong Zhang
Department of Computer Science
College of William and Mary
Williamsburg, VA 23187-8795, USA
{sren, hnw, zhang}@cs.wm.edu
Abstract While current Peer-to-Peer (P2P) systems facilitate
static file sharing, newly-developed applications demand that
P2P systems be able to manage dynamically-changing files.
Maintaining consistency between frequently-updated files and
their replicas is a fundamental reliability requirement for a P2P
system. In this paper, we present SCOPE, a structured P2P system
supporting consistency among a large number of replicas. By
building a replica-partition-tree (RPT) for each key, SCOPE
keeps track of the locations of replicas and then propagates
update notifications. Our theoretical analyses and experimental
results demonstrate that SCOPE can effectively maintain replica
consistency while preventing hot-spot and node-failure problems.
Its efficiency in maintenance and failure-recovery is particularly
attractive to the deployment of large-scale P2P systems.
Keywords: structured P2P systems, replica consistency, hierar-
chical trees.
I. INTRODUCTION
Structured P2P systems have been successfully designed
and implemented for global storage utility (such as PAST
[29], CFS [9], OceanStore [17], and Pangaea [30]), publishing
systems (such as FreeNet [7] and Scribe [4]), and Web-
related services (such as Squirrel [14], SFR [33], and Beehive
[21]). Among all these P2P-based applications, replication
and caching have been widely used to improve scalability
and performance. However, little attention has been paid to
maintaining replica consistency in structured P2P systems. On
one hand, without effective replica consistency maintenance, a
P2P system is limited to providing only static or infrequently-
updated object sharing. On the other hand, newly-developed
classes of P2P applications do need consistency support to
deliver frequently-updated contents, such as directory service,
online auction, and remote collaboration. In these applications,
files are frequently changed, and maintaining consistency
among replicas is a must for correctness. Therefore, scalable
consistency maintenance is essential to improve service quality
of existing P2P applications, and to meet the basic requirement
of newly-developed P2P applications.
Existing structured P2P systems rely on distributed hash
tables (DHTs) to assign objects to different nodes. Each node
is expected to receive roughly the same number of objects,
thanks to the load balance achieved by DHTs. However, the
system may become unbalanced when objects have different
popularities and numbers of replicas. In a scalable replica
updating mechanism, the location of a replica must be trace-
able, and no broadcasting is needed for the propagation of
an update notification. Current structured P2P systems take
a straightforward approach to track replica locations [32],
[24]—a single node stores the locations of all replicas. This
approach provides us with a simple solution of maintaining
data consistency. However, it only works well if the number of
replicas per object is relatively small in a reliable P2P system.
Otherwise, several problems may occur as follows.
Hot-spot problem: due to the different objects’ populari-
ties, the number of replicas per object varies significantly,
making the popular nodes heavily loaded while other
nodes carry much less replicas.
Node-failure problem: if the hashed node fails, update
notifications have to be propagated by broadcasting.
Privacy problem: the hashed node knows all replicas’
locations, which violates the privacy of original content
holders.
To address the deficiencies in existing structured P2P
systems, we propose a structured P2P system with replica
consistency support, called Scalable COnsistency maintenance
in structured PEer-to-peer systems (SCOPE). Unlike existing
structured P2P systems, SCOPE distributes all replicas’ loca-
tion information to a large number of nodes, thus preventing
hot-spot and node-failure problems. It also avoids recording
explicitly the IP address or node ID of a node that stores a
replica, thus protecting the privacy of the node. By building
a replica-partition-tree (RPT) for each key, SCOPE keeps
track of the location of replicas and then propagates update
notifications. We introduce three new operations in SCOPE to
maintain consistency.
Subscribe: when a node has an object and needs to keep
it up-to-date, it calls subscribe to receive a notification of
the object update.
Unsubscribe: when a node neither needs a replica nor
keeps it up-to-date, it calls unsubscribe to stop receiving
update notifications.
Update: when a node needs to change the content of an
object, it calls update to propagate the update notifica-
tion
1
to all subscribed nodes.
1
invalidation message or the key itself.

In SCOPE, we allow multiple writers to co-exist, since the
update operation on a key can be invoked by any node keeping
a replica of that key. In contrast, in some practical applications,
usually only one node is authorized to update a key. SCOPE
can be easily applied to single-writer applications.
Since SCOPE directly utilizes DHTs to manage object repli-
cas, it effectively supports consistency among a large number
of peers. As a general solution, SCOPE can be deployed
in any existing structured P2P systems, such as CAN [24],
Chord [32], Pastry [28], and Tapestry [36]. Our theoretical
analyses and simulation experiments show that SCOPE can
achieve replica consistency in a scalable and efficient manner.
In an N-node network, each peer is guaranteed to keep at
most O(log N) partition vectors for a single key, regardless
of the key’s value and its popularity. Due to the hierarchical
management, only O(1) nodes are updated when a node joins
or leaves, and only O(log
2
N) messages are transmitted to
recover a node failure.
The remainder of the paper is organized as follows. Section
2 surveys related work. Section 3 presents the RPT structure
in SCOPE. Section 4 describes the operations defined in
SCOPE. Maintenance and recovery procedures are introduced
in Section 5. We evaluate the performance of SCOPE using
Pastry routing algorithm in Section 6. In Section 7, we briefly
discuss SCOPE design alternatives. Finally, we conclude the
paper in Section 8.
II. R
ELATED WORK
Replication is effective to improve the scalability and ob-
ject availability of a P2P system. However, most proposed
replication schemes are focused on how to create replicas.
Maintaining consistency among a number of replicas is not
fully investigated, posing a challenge for building a consistent
large-scale P2P system. Different from all proposed solutions,
our approach utilizes the nature of DHTs to organize the
replicas in a distributed way. Therefore, it has better scalability
and higher efficiency.
Some existing file-sharing P2P systems assume that the
shared data are static or read-only, so that no update mech-
anism is needed. Most unstructured P2P systems, including
centralized ones (e.g., Napster) and decentralized ones (e.g.,
Gnutella), do not guarantee consistency among replicas. Re-
searchers have designed several algorithms to support con-
sistency in a best-effort way. In [10], a hybrid push/pull
algorithm is used to propagate updates to related nodes,
where flooding is substituted by rumor spreading to reduce
communication overhead. At every step of rumor spreading,
a node pushes updates to a subset of related nodes it knows,
only providing partial consistency. Similarly, in Gnutella, Lan
et al. [18] proposed to use the flooding-based active push
for static objects and the adaptive polling-based passive pull
for dynamic objects. However, it is hard to determine the
polling frequency, thus essentially no guaranteed consistency
is provided. In [27], Roussopoulos and Baker proposed an
incentive-based algorithm called CUP to cache metadata—
lookup results—and keep them updated in a structured P2P
system. However, CUP only caches the metadata, not the
object itself, along the lookup path with limited consistency
support. So, it cannot maintain consistency among the replicas
of an object. Considering the topology mismatch problem
between overlays and their physical layers in structured P2P
systems, [25] proposed an adaptive topology adjusting method
to reduce the average routing latency of a query. In [13],
a network of streaming media servers is organized into a
structured P2P system to fully utilize local cached copies of
an object, so that the average streaming start-up time can be
reduced.
For applications demanding consistency support among
replicas, different solutions have been proposed in various P2P
systems. Most proposed P2P-based publish/subscribe systems
record paths from subscribers to publishers, and use them
to propagate updates. As an anonymous P2P storage and
information retrieval system, FreeNet [7] protects the privacy
of both authors and readers. It uses a content-hash key to
distinguish different versions of a file. An update is routed
to other nodes based on key closeness. However, the update is
not guaranteed to reach every replica. Based on Pastry, Scribe
[4] provides a decentralized event notification mechanism for
publishing systems. A node can be a publisher by creating
a topic, and other nodes can become its subscribers through
registration. The paths from subscribers to the publisher are
recorded for update notifications. However, if any node on
the path fails, some subscribers are not reachable unless
broadcasting is used.
Being a major P2P application, a wide-area file system
relies on replication to improve its performance. In [8], a
decentralized replication solution is used to achieve practical
availability, without considering replica consistency. PAST
[29] is a P2P-based file system for large-scale persistent
storage service. In PAST, a user can specify the number
of replicas of a file through central management. Although
PAST utilizes caching to shorten client-perceived latency, it
does not maintain consistency of cached contents. Similarly,
CFS [9] is a P2P read-only storage system, and avoids most
cache inconsistency problems by content hashes. Each client
has to validate the freshness of a received file by itself, and
stale replicas are removed from caches by LRU replacement.
OceanStore [17] maintains two-tier replicas: a small durable
primary tier and a large, soft-state second tier. The primary
tier is organized as a Byzantine inner ring, keeping the most
up-to-date data. The replicas in the second tier are connected
through multicast trees, i.e., dissemination trees (d-tree). Pe-
riodic heartbeat messages are sent for fault resilience, which
incurs significant communication overhead. Similar solutions
have been used in P2P-based real-time multimedia stream-
ing (e.g., Bayeux [37] and SplitStream [5]). Pangaea [30]
creates replicas aggressively to improve overall performance.
By organizing all replicas of a file in a strongly-connected
graph, it propagates an update from one server to the others
through flooding, which does not scale well with a large
number of replicas. Automatic replica regeneration [35] has
been proposed to provide higher availability with a small

number of replicas, which are organized in a lease graph.
A two-phase write protocol is used to optimize reads and
linearize the read/write process.
Most newly-proposed Web services on P2P structures still
employ the time-to-live (TTL) mechanism to refresh their
replicas. For example, Squirrel [14] is such a system based on
the Pastry routing protocol. The freshness of a cached object
is determined by the Web cache expiration policy (e.g., TTL
field in response headers). In order to facilitate Web object
references, Semantic Free Reference (SFR) [33] has been
proposed to resolve the object locations. Based on DHTs, SFR
utilizes the caches of different infrastructure levels to improve
the resolving latency. Beehive, designed for domain name
systems [21], [22], provides O(1) lookup latency. Different
from widely used passive caching, it uses proactive replication
to significantly reduce the lookup latency. In [12], Gedik et
al. used a dynamic passive replication scheme to provide
reliable service for a P2P Internet monitoring system, where
the replication list is maintained by each Continual Queries
(CQ) owner.
III. T
HE BASE OF SCOPE PROTOCOL
The SCOPE protocol specifies: (1) how to record the
locations of all replicas; (2) how to propagate update noti-
fications to related peers; (3) how to join or leave the system
as a peer; and (4) how to recover from a node’s failure.
This section describes how to record the replica locations by
building a replica-partition-tree (RPT)—a distributed structure
for load balancing in SCOPE. The operation algorithms and
maintenance procedures will be presented in Sections 4 and
5, respectively.
A. Overview
In DHTs each key is assigned to a node according to its
identifier, and we call this original key-holder the primary
node of the key. To avoid the primary node becoming the hot
spot, SCOPE splits the whole identifier space into partitions
and selects one representative node in each partition to record
the replica locations within that partition. Each partition may
be further divided into smaller ones, in which child nodes are
selected as the representatives to take charge of the smaller
partitions. As the root of this partition-tree, the primary node
only records the key existence in the partition one level
beneath, while its child representative nodes record the key
existence in the partitions two levels below the root; and so
on and so forth. In this way, the overhead of maintaining
consistency at one node is greatly reduced and undertaken
by the representative nodes at lower levels. Since the hash
function used by DHTs distributes keys to the whole identifier
space, the load of tree maintenance is balanced among all
nodes at any partition level. Note that the location information
at any level is obtainable from representative nodes at lower
levels, the partition-tree also provides a recovery mechanism
to handle a node failure.
7
5 5 5
00
00
11
11
0
0
0
1
1
1
00
00
00
11
11
11
00
00
11
11
00
00
00
11
11
11
00
00
00
11
11
11
00
00
00
11
11
11
00
00
11
11
0
0
1
1
00
00
00
11
11
11
00
00
00
11
11
11
0
0
1
1
00
00
00
11
11
11
0
0
0
1
1
1
0
0
0
1
1
1
0
0
1
1
0
0
1
1
0
0
0
1
1
1
00
00
00
11
11
11
0
0
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
0
1
1
7
3
4
( a ) ( c )( b )
0
1
2
3
4
5
6
0
1
2
3
4
5
6
7
57 1
3
0
1
2
5
6
Fig. 1. (a) A 3-bit identifier space; (b) The same identifier space with two
partitions; (c) The same identifier space with four partitions.
B. Partitioning Identifier Space
A consistent hash function (e.g. SHA-1) assigns each node
and each key an m-bit identifier, respectively. If we use
a smaller identifier space, the key identifier can be easily
calculated by keeping a certain number of least significant bits.
By adding different most significant bits, the same key can be
mapped to multiple smaller equally-sized identifier spaces with
different identifier ranges. A partition can be further divided
into smaller ones, and it records the existence of all keys in
its sub-partitions. Figure 1(a) shows an identifier space with
m =3. Suppose there is a key 5 in the space. If the original
space is split into two partitions as shown in Figure 1(b), one
with space [0, 3] and the other with space [4, 7], the key can be
hashed to 1 in the first partition and 5 in the second partition,
respectively. If we further split each partition into two sub-
partitions as Figure 1(c) illustrated, the identifiers of the same
key can be located in the smaller spaces at 1, 3, 5, and 7,
respectively. Figure 2 shows the root of key 5 (101) in the
original 3-bit identifier space and its representative nodes in the
two-level partitions. At the top level, the root node is located at
5 (101). At the intermediate level, the two least significant bits
(01) are inherited from the root, while the different value (0 or
1) is set at the most significant bit to locate the representative
nodes R1 and R2 in the two partitions, respectively. At the
bottom level, only the least significant bit (1) is inherited from
the root but two most significant bits are set to four different
values (00/01/10/11) in order to determine the locations of
representative nodes R11, R12, R21, and R22, respectively.
Note that the partitioning is logical and the same node can
reside in multiple levels. For example, the root node (101) is
used as the representative node in all partition levels.
C. Building Replica-Partition-Trees (RPTs)
1) Basic Structure: After partitioning the identifier space
as mentioned above, we build an RPT for each key by
recursively checking the existence of replicas in the partitions.
The primary node of a key in the original identifier space
is the root of RPT(s). The representative node of a key in
each partition, recording the locations of replicas at the lower
levels, becomes one intermediate node of RPT(s). The leaves
of RPT(s) are those representative nodes at the bottom level.
Each node of RPT(s) uses a vector to record the existence of
replicas in its sub-trees, with one bit for each child partition.

101
1 01
0 01
11 1
00 1
01 1
root
Bottom
Top
Intermediat
e
3−bit ID space
R1
R22
10 1 R21
R12
R11
R2
first level partition
second partition
Fig. 2. Key 5 (101) in a 3-bit identifier space and its representative nodes
at different levels of partitions.
Figure 3(a) shows an example with the identifier space of
[0, 7]. The nodes 0, 4, and 7 in the space keep a replica of key
5. The RPT for key 5 is shown in Figure 3(b). At the top level,
a 2-bit vector is used to indicate the existence of replicas in the
two sub-trees. At the bottom level, four 2-bit vectors are used
to indicate the existence of key 5 in all eight possible positions
from 0 to 7. In general, if the identifier space is 2
M
, the height
of RPT(s) for any key is O(M). Consider that most DHTs use
a 160-bit SHA-1 hashing function, which may result in tens
of partition levels. For example, if we split each partition into
64 (2
8
) pieces, we will have 20 levels. Obviously, too many
levels of partitions would make the RPT construction and the
update propagation inefficient.
2) Scalable RPT: Since the number of nodes is much
smaller than the identifier space, our goal is to reduce the
heights of RPTs to the logarithm of the number of nodes. In
the partitioning algorithm presented above, each partition is
recursively divided into smaller ones until only one identifier
remains. The leaf nodes of RPTs record the existence of keys
at the corresponding identifiers. However, if a partition only
contains one node, there is no need for further partitioning
to locate the node. For example, as shown in Figure 3(a),
only node 0 exists in the partition of [0, 3]. During sub-
scribe/unsubscribe operations, node 0 only needs to inform the
primary node of key 5, which records the first level partition
[0, 3] and [4, 7]. When the key is modified, it can directly notify
node 0 by sending an invalidation message to the first identifier
in [0, 3], which is 0. By removing the redundant leaf nodes,
we can build a much shorter RPT. The RPT after the removal
of redundant leaf nodes, is shown in Figure 3(c).
The method given above can significantly reduce the par-
tition levels if nodes are distributed sparsely. However, even
if the total number of nodes is small, the number of partition
levels could still be large when most nodes are close to each
other. Figure 4(a) shows an example with the identifier space
of [0,7], where two nodes 6 and 7 subscribe key 5. The RPT is
illustrated in Figure 4(b). Both nodes are in the same partition
until the identifier space is decreased to 1—the bottom level
of the partition. The height of this RPT is 3, and it cannot be
condensed by reducing leaf nodes. In general, if the nodes’
identifiers happen to be consecutive and we only remove the
0
Space: [0, 7]
Id: 5
Space: [0, 7]
Id: 5
5
5
5
5
0
0
0
1
1
1
00
00
11
11
00
00
00
11
11
11
0
0
0
1
1
1
0
0
1
1
00
00
11
11
00
00
00
11
11
11
0
0
0
1
1
1
1
1
1
1
Space: [4, 7]
]
Id: 5
( a ) ( b ) ( c )
1
1
1
10
1
10 1
0100 10
Space: [0, 3] Space: [4, 7]
Space: [0, 1]
Space: [2, 3] Space: [4, 5] Space: [6, 7]
Id: 5Id: 1
Id: 1
Id: 3 Id: 5 Id: 7
2
3
4
5
6
7
Fig. 3. (a) In the identifier space of [0,7], nodes 0, 4, and 7 subscribe key
5; (b) The RPT of key 5; (c) The RPT after removing redundant leaf nodes.
leaf nodes as above, the height of RPT(s) will still be O(M).
We resolve this problem by removing the redundant inter-
mediate nodes. If all nodes in a partition are clustered in
one of its lower-level partitions, it is possible to reduce the
intermediate nodes. Figure 4(c) shows one optimized RPT.
The intermediate node for the partition [4, 7] is removed since
only one of its lower-level partition [6, 7] has nodes. Thus, the
height of the RPT is decreased from 3 to 2.
Space: [6, 7]
Space: [0, 7]
Id: 5
Space: [0, 7]
Id: 5
5
00
00
11
11
0
0
0
1
1
1
00
00
00
11
11
11
00
00
00
11
11
11
0
0
1
1
00
00
00
11
11
11
00
00
11
11
00
00
11
11
1
3
6
7
2
1
1
Id: 7
Space: [6, 7]
]
1
1
1
1
Space: [4, 7]
Id: 5
Id: 7
0
00
1
( a ) ( b ) ( c )
0
4
5
5
5
Fig. 4. (a) Nodes 6 and 7 subscribe key 5; (b) The RPT for key 5 after
removing redundant leaf nodes; (c) The RPT after removing both redundant
leaf nodes and intermediate nodes.
Theorem 1: For an N-node network with partition size of
2
m
, the average height of RPTs is O(
log N
m
), regardless of the
size of an identifier space.
Proof: Suppose the whole identifier space is 2
M
.Every
partitioning generates 2
m
smaller equally-sized partitions,
each with size of 1/2
m
of the previous partition range. After
log N
m
time partitioning, the identifier range of each partition
is reduced to 2
M
/2
log N
=2
M
/N . The height of RPT grows
to
log N
m
, with maximal height log N at m =1. Note that the
expected number of node identifiers in the range of this size
is 1. Due to identifier randomness induced by SHA-1 hash
function, the average height of all RPTs is O(
log N
m
).
D. Load Balancing
RPT effectively balances the load across the network, dis-
regarding the key values and their popularities. By using RPT,
we conclude that:
Theorem 2: In an N -node network with partition size of
2
m
, for a key with C subscribers, the average number of
partition vectors in its RPT is O(log N · C).
Proof: In the RPT of the key, only one root is located
at the top level. At the second level, at most min (2
m
,C)
representative nodes have one partition vector. At the xth level

of the RPT, at most min (2
xm
,C) representative nodes are
involved. The total number of vectors S of the RPT is:
S =1+min(2
m
,C)+min(2
2m
,C)
+... +min(2
log N
m
m
,C)
=
a1
i=0
2
im
+
log N
m
i=a
C
=(log
m
N a)C +
2
am
1
2
m
1
,
for 2
(a1)m
<C 2
am
, 1 <a
log N
m
Compared with the number of subscribers C, the number
of vectors is increased to (log N a)+
2
am
1
(2
m
1)C
. Since
2
am
1
(2
m
1)C
<
2
am
(2
m
1)2
(a1)m
=
2
m
2
m
1
, which is less than 2
(m 1), the maximal value is achieved when a =1, and
the total number of vectors in the RPT is O(log N ) times of
the number of subscribers.
IV. OPERATION ALGORITHMS
A. Subscribe/Unsubscribe
The subscribe/unsubscribe procedures are initiated by sub-
scribers and proceed toward the root of an RPT. The process
can be implemented in an iterative or recursive way. With
iteration, the subscriber itself has to inform all representative
nodes one by one. With recursion, each representative node is
responsible for forwarding the subscriptions to the next higher
level until the root node is reached. In SCOPE, we implement
the subscribe/unsubscribe operations recursively for routing
efficiency.
At the beginning, each subscriber locates its immediate
upper-level partition from its predecessor’s and successor’s
identifiers. Then, the node sends subscribe/unsubscribe re-
quests to the upper-level representative node. The repre-
sentative node checks if it has a vector allocated for the
key. If so, it sets/unsets the corresponding bit, and the sub-
scribe/unsubscribe procedure terminates there. Otherwise, it
creates/deletes the vector of the key, sets/unsets the bit, and
continues forwarding subscribe/unsubscribe requests to the
representative node at the next upper-level partition. This
process proceeds until it reaches the root of the RPT. The
routing algorithms of the operations depend on the type of the
specific structured P2P systems. In this section, we use Pastry
as the base routing scheme for the purpose of analysis. Note
that similar analysis is applicable to other hypercube routing
algorithms as well.
Figure 5 illustrates a subscribe/unsubscribe process
in a 3-bit identifier space, where node 2 (010) sub-
scribes/unsubscribes key 5 (101). At first, node 2 notifies the
representative node 3 (011) at the bottom level, then node 3
informs the representative node 1 (001) at the intermediate
level. Finally, node 1 informs the root node 5, which is the
representative node of the whole space.
In order to improve routing efficency, every node maintains
level indices to indicate the node’s partitions at different
101
0 01
01 1
root
3−bit ID space
010
Fig. 5. Node 2 (010) subscribe Key 5 (101) in a 3-bit identifier space.
levels. As we have pointed out before, reducing intermediate
partitions makes the height of RPT different from the depth of
partitioning. The level index is used to record the change of
the RPT height with the increase of partitioning. Its maximal
length is equal to M for an identifier space with size of 2
M
.
The i
th
entry in a level index is the height of the RPT at i
th
partition level.
0
0
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
00
00
00
11
11
11
0
0
0
1
1
1
0
0
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
1
2
3
4
5
7
1
6
1
(b)(a)
0
1
2
3
4
5
7
12
1
6
12
00
00
00
11
11
11
0
0
0
0
1
1
1
00
00
11
11
00
00
00
11
11
11
00
00
00
11
11
11
0
0
0
1
1
1
00
00
00
11
11
11
Fig. 6. Level index changes after node 3 joins in a 3-bit space.
Figure 6 shows an example of the level index in a 3-
bit identifier space. Before node 3 joins, nodes 1 and 4 are
identified after first-time partitioning. Both of them have the
same level index {1, , }, where represents an empty
space. With the participation of node 3, the whole space needs
to be partitioned twice to identify nodes 0 and 3, and no
redundant intermediate partition exists. The RPT grows as
partitioning proceeds, and the level indices of nodes 0 and
3 become {1, 2, }. Comparatively, node 4 is identified after
the first-time partitioning and its level index is {1, , }.
With the Pastry routing table and leaf set, a node can reach
any other node in a range of 2
xm
within O(log(
N
2
M xm
)) hops.
When a node initiates a subscribe/unsubscribe operation, it
also forwards its level index to the representative nodes at
upper levels. Each intermediate representative uses the level
index to derive the location of its higher level representative.
Lemma 1: For an N-node network with partition size of 2
m
in a 2
M
identifier space, in any range of 2
xm
, on average a
node can find the successor of a key in O(log(
N
2
M xm
)) hops.
Proof: We use Pastry as the base routing algorithm, and
assume that a node’s routing table is organized into log
2
b
M
levels with 2
b
1 entries each. Due to the usage of the
SHA-1 hash function, node and key identifiers are randomly

Citations
More filters
Journal ArticleDOI

Review: A survey on content-centric technologies for the current Internet: CDN and P2P solutions

TL;DR: This survey considers the transition of the Internet from a reliable fault-tolerant network for host-to-host communication to a content-centric network, i.e. a network mostly devoted to support efficient generation, sharing and access to content.
Journal ArticleDOI

IRM: Integrated File Replication and Consistency Maintenance in P2P Systems

TL;DR: An Integrated file Replication and consistency Maintenance mechanism (IRM) that integrates the two techniques in a systematic and harmonized manner and achieves high efficiency in file replication and consistency maintenance at a significantly low cost.
Journal ArticleDOI

Efficient and Scalable Consistency Maintenance for Heterogeneous Peer-to-Peer Systems

TL;DR: A scalable and efficient consistency maintenance scheme for heterogeneous P2P systems that takes the heterogeneity nature into account and forms the replica nodes of a key into a locality-aware hierarchical structure, in which the upper layer is DHT-based and consists of powerful and stable replica nodes, while a replica node at the lower layer attaches to a physically close upper layer node.
Journal ArticleDOI

A framework for distributed knowledge management: Design and implementation

TL;DR: This paper describes a framework for implementing distributed ontology-based knowledge management systems (DOKMS), and presents an implementation of the designed framework in the K-link+ system and shows the suitability of this approach through a use case.
Proceedings Article

An efficient data replication and load balancing technique for fog computing environment

TL;DR: The paper proposes an efficient load balancing algorithm for a Fog-Cloud based architecture which uses data replication technique for maintaining data in Fog networks which reduces overall dependency on big data centers.
References
More filters
Proceedings ArticleDOI

Chord: A scalable peer-to-peer lookup service for internet applications

TL;DR: Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.
Book ChapterDOI

Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

TL;DR: Pastry as mentioned in this paper is a scalable, distributed object location and routing substrate for wide-area peer-to-peer ap- plications, which performs application-level routing and object location in a po- tentially very large overlay network of nodes connected via the Internet.
Proceedings ArticleDOI

A scalable content-addressable network

TL;DR: The concept of a Content-Addressable Network (CAN) as a distributed infrastructure that provides hash table-like functionality on Internet-like scales is introduced and its scalability, robustness and low-latency properties are demonstrated through simulation.
Journal ArticleDOI

Chord: a scalable peer-to-peer lookup protocol for Internet applications

TL;DR: Results from theoretical analysis and simulations show that Chord is scalable: Communication cost and the state maintained by each node scale logarithmically with the number of Chord nodes.
Journal ArticleDOI

OceanStore: an architecture for global-scale persistent storage

TL;DR: OceanStore monitoring of usage patterns allows adaptation to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What contributions have the authors mentioned in the paper "Scope: scalable consistency maintenance in structured p2p systems" ?

In this paper, the authors present SCOPE, a structured P2P system supporting consistency among a large number of replicas. 

Considering the topology mismatch problem between overlays and their physical layers in structured P2P systems, [25] proposed an adaptive topology adjusting method to reduce the average routing latency of a query. 

When the number of keys is 105, the maximal number of messages on a single node increases to 1410 in SCOPE, but still only about one seventh of 10406 in the centralized solution. 

In [13], a network of streaming media servers is organized into a structured P2P system to fully utilize local cached copies of an object, so that the average streaming start-up time can be reduced. 

Most unstructured P2P systems, including centralized ones (e.g., Napster) and decentralized ones (e.g., Gnutella), do not guarantee consistency among replicas. 

If the authors use a smaller identifier space, the key identifier can be easily calculated by keeping a certain number of least significant bits. 

1) Basic Structure: After partitioning the identifier space as mentioned above, the authors build an RPT for each key by recursively checking the existence of replicas in the partitions. 

Considering the tradeoff between routing latency and storage overhead, their partitioning scheme could be dynamic, in which the number of partitions is adaptively changed with respect to the popularity of a key. 

Some existing file-sharing P2P systems assume that the shared data are static or read-only, so that no update mechanism is needed. 

In [12], Gedik et al. used a dynamic passive replication scheme to provide reliable service for a P2P Internet monitoring system, where the replication list is maintained by each Continual Queries (CQ) owner. 

From Lemma 1, at each level, on average a query node can reach the successor of a key in the same partition at level l in log N/2lm hops, the average routing length of a subscribe/unsubscribe operation (hop(sub/unsub)) is:hop(sub/unsub) = log N + log N2m + logN22m+... + log N2log N= log N + (log N − m) + (log N − 2m) +... + (log N − log Nm m)= log2 N 2m − log N 2 . 

Specified as the Pastry default parameters, the routing table of each node has 40 levels and each level consists of 15 entries; the leaf set of each node has 32 entries. 

Theorem 4: For an N -node network with partition size of 2m, on average, update operations can be finished in O( log2 N 2m )hops.