Efficient and Scalable Consistency Maintenance for Heterogeneous Peer-to-Peer Systems

doi:10.1109/TPDS.2008.46

Home
/
Papers
/
Efficient and Scalable Consistency Maintenance for Heterogeneous Peer-to-Peer Systems

Journal Article•DOI•

Efficient and Scalable Consistency Maintenance for Heterogeneous Peer-to-Peer Systems

Zhenyu Li¹, Gaogang Xie¹, Zhongcheng Li¹•Institutions (1)

Chinese Academy of Sciences¹

01 Dec 2008-IEEE Transactions on Parallel and Distributed Systems (IEEE)-Vol. 19, Iss: 12, pp 1695-1708

TL;DR: A scalable and efficient consistency maintenance scheme for heterogeneous P2P systems that takes the heterogeneity nature into account and forms the replica nodes of a key into a locality-aware hierarchical structure, in which the upper layer is DHT-based and consists of powerful and stable replica nodes, while a replica node at the lower layer attaches to a physically close upper layer node.

read less

Abstract: Consistency maintenance mechanism is necessary for the emerging peer-to-peer applications due to their frequent data updates. Centralized approaches suffer single point of failure, while previous decentralized approaches incur too many duplicate update messages because of locality-ignorant structures. To address this issue, we propose a scalable and efficient consistency maintenance scheme for heterogeneous P2P systems. Our scheme takes the heterogeneity nature into account and forms the replica nodes of a key into a locality-aware hierarchical structure, in which the upper layer is DHT-based and consists of powerful and stable replica nodes, while a replica node at the lower layer attaches to a physically close upper layer node. A d-ary update message propagation tree (UMPT) is dynamically built upon the upper layer for propagating the updated contents. As a result, the tree structure does not need to be maintained all the time, saving a lot of cost. Through theoretical analyses and comprehensive simulations, we examine the efficiency and scalability of this design. The results show that, compared with previous designs, especially locality-ignorant ones, our approach is able to reduce the cost by about 25-67 percent.

...read moreread less

Citations

PDF

Open Access

More filters

Book•

Distributed and Cloud Computing: From Parallel Processing to the Internet of Things

[...]

Kai Hwang, Jack Dongarra, Geoffrey C. Fox

31 Oct 2011

TL;DR: This book will teach you how to create high-performance, scalable, reliable systems, providing comprehensive coverage of distributed and cloud computing, including: Facilitating management, debugging, migration, and disaster recovery through virtualization

...read moreread less

Abstract: From the leading minds in the field, Distributed and Cloud Computing is the first modern, up-to-date distributed systems textbook Starting with an overview of modern distributed models, the book exposes the design principles, systems architecture, and innovative applications of parallel, distributed, and cloud computing systems It will teach you how to create high-performance, scalable, reliable systems, providing comprehensive coverage of distributed and cloud computing, including: Facilitating management, debugging, migration, and disaster recovery through virtualization Clustered systems for research or ecommerce applications Designing systems as web services Social networking systems using peer-to-peer computing Principles of cloud computing using examples from open-source and commercial applications Using examples from open-source and commercial vendors, the text describes cloud-based systems for research, e-commerce, social networking and more Complete coverage of modern distributed computing technology including clusters, the grid, service-oriented architecture, massively parallel processors, peer-to-peer networking, and cloud computing Includes case studies from the leading distributed computing vendors: Amazon, Microsoft, Google, and more Designed to meet the needs of students taking a distributed systems course, each chapter includes exercises and further reading, with lecture slides and solutions available online

...read moreread less

307 citations

Journal Article•DOI•

Review: A survey on content-centric technologies for the current Internet: CDN and P2P solutions

[...]

Andrea Passarella¹•Institutions (1)

National Research Council¹

01 Jan 2012-Computer Communications

TL;DR: This survey considers the transition of the Internet from a reliable fault-tolerant network for host-to-host communication to a content-centric network, i.e. a network mostly devoted to support efficient generation, sharing and access to content.

...read moreread less

224 citations

Proceedings Article•DOI•

Selective Data replication for Online Social Networks with Distributed Datacenters

[...]

Guoxin Liu¹, Haiying Shen¹, Harrison Chandler¹•Institutions (1)

Clemson University¹

01 Oct 2013

TL;DR: This paper aims to reduce inter-datacenter communications while still achieving low service latency, and proposes Selective Data replication mechanism in Distributed Datacenters that incorporates three strategies to further enhance its performance: locality-aware multicast update tree, replica deactivation, and datacenter congestion control.

...read moreread less

Abstract: Though the new OSN model with many worldwide distributed small datacenters helps reduce service latency, it brings a problem of higher inter-datacenter communication load. In Facebook, each datacenter has a full copy of all data and the master datacenter updates all other datacenters, which obviously generates tremendous load in this new model. Distributed data storage that only stores a user's data to his/her geographically-closest datacenters mitigates the problem. However, frequent interactions between far-away users lead to frequent inter-datacenter communication and hence long service latency. In this paper, we aim to reduce inter-datacenter communications while still achieve low service latency. We first verify the benefits of the new model and present OSN typical properties that lay the basis of our design. We then propose Selective Data replication mechanism in Distributed Datacenters (SD3). In SD3, a datacenter jointly considers update rate and visit rate to select user data for replication, and further atomizes a user's different types of data (e.g., status update, friend post) for replication, making sure that a replica always reduces inter-datacenter communication. The results of trace-driven experiments on the real-world PlanetLab testbed demonstrate the higher efficiency and effectiveness of SD3 in comparison to other replication methods.

...read moreread less

61 citations

Journal Article•DOI•

IRM: Integrated File Replication and Consistency Maintenance in P2P Systems

[...]

Haiying Shen¹•Institutions (1)

Clemson University¹

01 Jan 2010-IEEE Transactions on Parallel and Distributed Systems

TL;DR: An Integrated file Replication and consistency Maintenance mechanism (IRM) that integrates the two techniques in a systematic and harmonized manner and achieves high efficiency in file replication and consistency maintenance at a significantly low cost.

...read moreread less

Abstract: In peer-to-peer file sharing systems, file replication and consistency maintenance are widely used techniques for high system performance. Despite significant interdependencies between them, these two issues are typically addressed separately. Most file replication methods rigidly specify replica nodes, leading to low replica utilization, unnecessary replicas and hence extra consistency maintenance overhead. Most consistency maintenance methods propagate update messages based on message spreading or a structure without considering file replication dynamism, leading to inefficient file update and hence high possibility of outdated file response. This paper presents an Integrated file Replication and consistency Maintenance mechanism (IRM) that integrates the two techniques in a systematic and harmonized manner. It achieves high efficiency in file replication and consistency maintenance at a significantly low cost. Instead of passively accepting replicas and updates, each node determines file replication and update polling by dynamically adapting to time-varying file query and update rates, which avoids unnecessary file replications and updates. Simulation results demonstrate the effectiveness of IRM in comparison with other approaches. It dramatically reduces overhead and yields significant improvements on the efficiency of both file replication and consistency maintenance approaches.

...read moreread less

61 citations

Cites background or methods from "Efficient and Scalable Consistency ..."

...On the other hand, in addition to centralized methods [8], [9], which are not suitable to decentralized large-scale P2P systems, most consistency maintenance methods update files by relying on a structure [10], [11], [12], [13], [14] or message spreading [15], [16], [17]....
[...]
...We compared the performance of IRM with SCOPE [11], Hierarchy [10], and Push/poll [15] methods in terms of file consistency maintenance cost and the capability to keep the fidelity of file consistency....
[...]
...In Hierarchy, we set the number of nodes in a cluster to 16....
[...]
...Hierarchy is a hierarchical structure with super nodes on the upper level and regular nodes on the lower level....
[...]
...We use Hierarchy to denote the work in [10] that builds a hierarchical structure for file consistency maintenance....
[...]

Journal Article•DOI•

Selective Data Replication for Online Social Networks with Distributed Datacenters

[...]

Guoxin Liu¹, Haiying Shen¹, Harrison Chandler²•Institutions (2)

Clemson University¹, University of Michigan²

01 Aug 2016-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper proposes Selective Data replication mechanism in Distributed Datacenters (SD3), where in SD3, a datacenter jointly considers update rate and visit rate to select user data for replication, and further atomizes a user's different types of data for replicate, making sure that a replica always reduces inter-datacenter communication.

...read moreread less

Abstract: Though the new OSN model, which deploys datacenters globally, helps reduce service latency, it causes higher inter-datacenter communication load. In Facebook, each datacenter has a full copy of all data, and the master datacenter updates all other datacenters, generating tremendous load in this new model. Distributed data storage, which only stores a user's data to his/her geographically closest datacenters mitigates the problem. However, frequent interactions between distant users lead to frequent inter-datacenter communication and hence long service latencies. In this paper, we aim to reduce inter-datacenter communications while still achieving low service latency. We first verify the benefits of the new model and present OSN typical properties that underlie the basis of our design. We then propose Selective Data replication mechanism in Distributed Datacenters ( $SD^3$ ). Since replicas need inter-datacenter data updates, datacenters in $SD^3$ jointly consider update rates and visit rates to select user data for replication; furthermore, $SD^3$ atomizes users’ different types of data (e.g., status update, friend post, music) for replication, ensuring that a replica always reduces inter-datacenter communication. $SD^3$ also incorporates three strategies to further enhance its performance: locality-aware multicast update tree, replica deactivation, and datacenter congestion control. The results of trace-driven experiments on the real-world PlanetLab testbed demonstrate the higher efficiency and effectiveness of $SD^3$ in comparison to other replication methods and the effectiveness of its three schemes.

...read moreread less

49 citations

Cites background from "Efficient and Scalable Consistency ..."

...Many structures for data updating [43], [44], [45]...
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Chord: A scalable peer-to-peer lookup service for internet applications

[...]

Ion Stoica¹, Robert Morris², David R. Karger², M. Frans Kaashoek², Hari Balakrishnan² - Show less +1 more•Institutions (2)

University of California, Berkeley¹, Massachusetts Institute of Technology²

27 Aug 2001

TL;DR: Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.

...read moreread less

Abstract: A fundamental problem that confronts peer-to-peer applications is to efficiently locate the node that stores a particular data item. This paper presents Chord, a distributed lookup protocol that addresses this problem. Chord provides support for just one operation: given a key, it maps the key onto a node. Data location can be easily implemented on top of Chord by associating a key with each data item, and storing the key/data item pair at the node to which the key maps. Chord adapts efficiently as nodes join and leave the system, and can answer queries even if the system is continuously changing. Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.

...read moreread less

10,286 citations

Proceedings Article•DOI•

A scalable content-addressable network

[...]

Sylvia Ratnasamy¹, Paul Francis², Mark Handley², Richard M. Karp¹, Scott Shenker² - Show less +1 more•Institutions (2)

University of California, Berkeley¹, AT&T²

27 Aug 2001

TL;DR: The concept of a Content-Addressable Network (CAN) as a distributed infrastructure that provides hash table-like functionality on Internet-like scales is introduced and its scalability, robustness and low-latency properties are demonstrated through simulation.

...read moreread less

Abstract: Hash tables - which map "keys" onto "values" - are an essential building block in modern software systems. We believe a similar functionality would be equally valuable to large distributed systems. In this paper, we introduce the concept of a Content-Addressable Network (CAN) as a distributed infrastructure that provides hash table-like functionality on Internet-like scales. The CAN is scalable, fault-tolerant and completely self-organizing, and we demonstrate its scalability, robustness and low-latency properties through simulation.

...read moreread less

6,703 citations

Proceedings Article•DOI•

Measurement study of peer-to-peer file sharing systems

[...]

Stefan Saroiu¹, P. Krishna Gummadi¹, Steven D. Gribble¹•Institutions (1)

University of Washington¹

10 Dec 2001

TL;DR: This measurement study seeks to precisely characterize the population of end-user hosts that participate in Napster and Gnutella, and shows that there is significant heterogeneity and lack of cooperation across peers participating in these systems.

...read moreread less

Abstract: The popularity of peer-to-peer multimedia file sharing applications such as Gnutella and Napster has created a flurry of recent research activity into peer-to-peer architectures. We believe that the proper evaluation of a peer-to-peer system must take into account the characteristics of the peers that choose to participate. Surprisingly, however, few of the peer-to-peer architectures currently being developed are evaluated with respect to such considerations. In this paper, we remedy this situation by performing a detailed measurement study of the two popular peer-to-peer file sharing systems, namely Napster and Gnutella. In particular, our measurement study seeks to precisely characterize the population of end-user hosts that participate in these two systems. This characterization includes the bottleneck bandwidths between these hosts and the Internet at large, IP-level latencies to send packets to these hosts, how often hosts connect and disconnect from the system, how many files hosts share and download, the degree of cooperation between the hosts, and several correlations between these characteristics. Our measurements show that there is significant heterogeneity and lack of cooperation across peers participating in these systems.

...read moreread less

2,189 citations

Proceedings Article•DOI•

How to model an internetwork

[...]

Ellen Zegura¹, Kenneth L. Calvert¹, S. Bhattacharjee¹•Institutions (1)

Georgia Institute of Technology¹

24 Mar 1996

TL;DR: This work considers the problem of efficiently generating graph models that accurately reflect the topological properties of real internetworks, and proposes efficient methods for generating topologies with particular properties, including a transit-stub model that correlates well with the internet structure.

...read moreread less

Abstract: Graphs are commonly used to model the structure of internetworks, for the study of problems ranging from routing to resource reservation. A variety of graph models are found in the literature, including regular topologies such as rings or stars, "well-known" topologies such as the original ARPAnet, and randomly generated topologies. Less common is any discussion of how closely these models correlate with real network topologies. We consider the problem of efficiently generating graph models that accurately reflect the topological properties of real internetworks. We compare the properties of graphs generated using various methods with those of real internets. We also propose efficient methods for generating topologies with particular properties, including a transit-stub model that correlates well with the internet structure. Improved models for the internetwork structure have the potential to impact the significance of simulation studies of internetworking solutions, providing a basis for the validity of the conclusions.

...read moreread less

1,764 citations

Proceedings Article•DOI•

Wide-area cooperative storage with CFS

[...]

Frank Dabek¹, M. Frans Kaashoek¹, David R. Karger¹, Robert Morris¹, Ion Stoica² - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, University of California, Berkeley²

21 Oct 2001

TL;DR: The Cooperative File System is a new peer-to-peer read-only storage system that provides provable guarantees for the efficiency, robustness, and load-balance of file storage and retrieval with a completely decentralized architecture that can scale to large systems.

...read moreread less

Abstract: The Cooperative File System (CFS) is a new peer-to-peer read-only storage system that provides provable guarantees for the efficiency, robustness, and load-balance of file storage and retrieval. CFS does this with a completely decentralized architecture that can scale to large systems. CFS servers provide a distributed hash table (DHash) for block storage. CFS clients interpret DHash blocks as a file system. DHash distributes and caches blocks at a fine granularity to achieve load balance, uses replication for robustness, and decreases latency with server selection. DHash finds blocks using the Chord location protocol, which operates in time logarithmic in the number of servers.CFS is implemented using the SFS file system toolkit and runs on Linux, OpenBSD, and FreeBSD. Experience on a globally deployed prototype shows that CFS delivers data to clients as fast as FTP. Controlled tests show that CFS is scalable: with 4,096 servers, looking up a block of data involves contacting only seven servers. The tests also demonstrate nearly perfect robustness and unimpaired performance even when as many as half the servers fail.

...read moreread less

1,733 citations