scispace - formally typeset
Search or ask a question

Showing papers presented at "International Workshop on Peer-to-Peer Systems in 2003"


Book ChapterDOI
21 Feb 2003
TL;DR: Koorde is a new distributed hash table (DHT) based on Chord 15 and the de Bruijn graphs 2 that meets various lower bounds, such as O(log n) hops per lookup request with only 2 neighbors per node.
Abstract: Koorde is a new distributed hash table (DHT) based on Chord 15 and the de Bruijn graphs 2. While inheriting the simplicity of Chord, Koorde meets various lower bounds, such as O(log n) hops per lookup request with only 2 neighbors per node (where n is the number of nodes in the DHT), and O(log n/log log n) hops per lookup request with O(log n) neighbors per node.

618 citations


Book ChapterDOI
21 Feb 2003
TL;DR: An ongoing effort to define common APIs for structured peer-to-peer overlays and the key abstractions that can be built on them is described to facilitate independent innovation in overlay protocols, services, and applications, to allow direct experimental comparisons, and to encourage application development by third parties.
Abstract: In this paper, we describe an ongoing effort to define common APIs for structured peer-to-peer overlays and the key abstractions that can be built on them. In doing so, we hope to facilitate independent innovation in overlay protocols, services, and applications, to allow direct experimental comparisons, and to encourage application development by third parties. We provide a snapshot of our efforts and discuss open problems in an effort to solicit feedback from the research community.

578 citations


Book ChapterDOI
21 Feb 2003
TL;DR: It is asserted that any society-and in the context of this article, any large-scale distributed system-must address both death (failure) and the establishment and maintenance of infrastructure (which is a major motivation for taxes, so as to justify the title!).
Abstract: It has been reported [25] that life holds but two certainties, death and taxes. And indeed, it does appear that any society-and in the context of this article, any large-scale distributed system-must address both death (failure) and the establishment and maintenance of infrastructure (which we assert is a major motivation for taxes, so as to justify our title!).

545 citations


Book ChapterDOI
21 Feb 2003
TL;DR: This paper explores the space of designing load-balancing algorithms that uses the notion of “virtual servers” and presents three schemes that differ primarily in the amount of information used to decide how to re-arrange load.
Abstract: Most P2P systems that provide a DHT abstraction distribute objects among “peer nodes” by choosing random identifiers for the objects. This could result in an O(log N) imbalance. Besides, P2P systems can be highly heterogeneous, i.e. they may consist of peers that range from old desktops behind modem lines to powerful servers connected to the Internet through high-bandwidth lines. In this paper, we address the problem of load balancing in such P2P systems. We explore the space of designing load-balancing algorithms that uses the notion of “virtual servers”. We present three schemes that differ primarily in the amount of information used to decide how to re-arrange load. Our simulation results show that even the simplest scheme is able to balance the load within 80% of the optimal value, while the most complex scheme is able to balance the load within 95% of the optimal value.

473 citations


Book ChapterDOI
21 Feb 2003
TL;DR: SplitStream is a high-bandwidth content distribution system based on application-level multicast that distributes the forwarding load among all the participants, and is able to accommodate participating nodes with different bandwidth capacities.
Abstract: In tree-based multicast systems, a relatively small number of interior nodes carry the load of forwarding multicast messages. This works well when the interior nodes are dedicated infrastructure routers. But it poses a problem in cooperative application-level multicast, where participants expect to contribute resources proportional to the benefit they derive from using the system. Moreover, many participants may not have the network capacity and availability required of an interior node in high-bandwidth multicast applications. SplitStream is a high-bandwidth content distribution system based on application-level multicast. It distributes the forwarding load among all the participants, and is able to accommodate participating nodes with different bandwidth capacities. We sketch the design of SplitStream and present some preliminary performance results.

422 citations


Book ChapterDOI
21 Feb 2003
TL;DR: It is suggested that the peer-to-peer network does not have enough capacity to make naive use of either of search techniques attractive for Web search, and a number of compromises that might achieve the last order of magnitude are suggested.
Abstract: This paper discusses the feasibility of peer-to-peer full-text keyword search of the Web. Two classes of keyword search techniques are in use or have been proposed: flooding of queries over an overlay network (as in Gnutella), and intersection of index lists stored in a distributed hash table. We present a simple feasibility analysis based on the resource constraints and search workload. Our study suggests that the peer-to-peer network does not have enough capacity to make naive use of either of search techniques attractive for Web search. The paper presents a number of existing and novel optimizations for P2P search based on distributed hash tables, estimates their effects on performance, and concludes that in combination these optimizations would bring the problem to within an order of magnitude of feasibility. The paper suggests a number of compromises that might achieve the last order of magnitude.

341 citations


Book ChapterDOI
21 Feb 2003
TL;DR: This paper suggests the direct application of the “power of two choices” paradigm, whereby an item is stored at the less loaded of two (or more) random alternatives, and considers how associating a small constant number of hash values with a key can be extended to support other load balancing strategies, including load-stealing or load-shedding, as well as providing natural fault-tolerance mechanisms.
Abstract: Distributed hash tables have recently become a useful building block for a variety of distributed applications. However, current schemes based upon consistent hashing require both considerable implementation complexity and substantial storage overhead to achieve desired load balancing goals. We argue in this paper that these goals can be achieved more simply and more cost-effectively. First, we suggest the direct application of the “power of two choices” paradigm, whereby an item is stored at the less loaded of two (or more) random alternatives. We then consider how associating a small constant number of hash values with a key can naturally be extended to support other load balancing strategies, including load-stealing or load-shedding, as well as providing natural fault-tolerance mechanisms.

305 citations


Book ChapterDOI
21 Feb 2003
TL;DR: This paper explores a new point in design space in which increased memory usage and constant background communication overheads are tolerated to reduce file lookup times and increase stability to failures and churn.
Abstract: A peer-to-peer (p2p) distributed hash table (DHT) system allows hosts to join and fail silently (or leave), as well as to insert and retrieve files (objects). This paper explores a new point in design space in which increased memory usage and constant background communication overheads are tolerated to reduce file lookup times and increase stability to failures and churn. Our system, called Kelips, uses peer-to-peer gossip to partially replicate file index information. In Kelips, (a) under normal conditions, file lookups are resolved within 1 RPC, independent of system size, and (b) membership changes (e.g., even when a large number of nodes fail) are detected and disseminated to the system quickly. Per-node memory requirements are small in medium-sized systems. When there are failures, lookup success is ensured through query rerouting. Kelips achieves load balancing comparable to existing systems. Locality is supported by using topologically aware gossip mechanisms. Initial results of an ongoing experimental study are also discussed.

298 citations


Book ChapterDOI
21 Feb 2003
TL;DR: Lighthouse as mentioned in this paper is a scalable location mechanism for wide-area networks, which can be used to avoid the communication bottlenecks and single-points-of-failure that otherwise limit the practicality of such systems.
Abstract: This paper introduces Lighthouse, a scalable location mechanism for wide-area networks. Unlike existing vector-based systems such as GNP, we show how network-location can be established without using a fixed set of reference points. This lets us avoid the communication bottlenecks and single-points-of-failure that otherwise limit the practicality of such systems.

263 citations


Book ChapterDOI
21 Feb 2003
TL;DR: An efficient algorithm for performing a broadcast operation with minimal cost in structured DHT-based P2P networks and considers broadcasting as a basic service that adds to existing DHTs the ability to search using arbitrary queries as well as dissiminate/collect global information.
Abstract: In this position paper, we present an efficient algorithm for performing a broadcast operation with minimal cost in structured DHT-based P2P networks. In a system of N nodes, a broadcast message originating at an arbitrary node reaches all other nodes after exactly N − 1 messages. We emphasize the perception of a class of DHT systems as a form of distributed k-ary search and we take advantage of that perception in constructing a spanning tree that is utilized for efficient broadcasting. We consider broadcasting as a basic service that adds to existing DHTs the ability to search using arbitrary queries as well as dissiminate/collect global information.

238 citations


Book ChapterDOI
21 Feb 2003
TL;DR: This paper advocates a different model in which peer to peer users are expected to be rational and self-interested, and considers problems that arise when a networking infrastructure contains rational agents.
Abstract: Much of the existing work in peer to peer networking assumes that users will follow prescribed protocols without deviation. This assumption ignores the user’s ability to modify the behavior of an algorithm for self-interested reasons. We advocate a different model in which peer to peer users are expected to be rational and self-interested. This model is found in the emergent fields of Algorithmic Mechanism Design (AMD) and Distributed Algorithmic Mechanism Design (DAMD), both of which introduce game-theoretic ideas into a computational system. We, as designers, must create systems (peer to peer search, routing, distributed auctions, resource allocation, etc.) that allow nodes to behave rationally while still achieving good overall system outcomes. This paper has three goals. The first is to convince the reader that rationality is a real issue in peer to peer networks. The second is to introduce mechanism design as a tool that can be used when designing networks with rational nodes. The third is to describe three open problems that are relevant in the peer to peer setting but are unsolved in existing AMD/DAMD work. In particular, we consider problems that arise when a networking infrastructure contains rational agents.

Book ChapterDOI
21 Feb 2003
TL;DR: Simulations using real traces show that the cost of overlay maintenance in realistic dynamic environments and novel techniques to reduce this cost by adapting to the operating conditions enable high reliability and performance even in very adverse conditions with low maintenance cost.
Abstract: Structured peer-to-peer overlay networks provide a useful substrate for building distributed applications but there are general concerns over the cost of maintaining these overlays. The current approach is to configure the overlays statically and conservatively to achieve the desired reliability even under uncommon adverse conditions. This results in high cost in the common case, or poor reliability in worse than expected conditions. We analyze the cost of overlay maintenance in realistic dynamic environments and design novel techniques to reduce this cost by adapting to the operating conditions. With our techniques, the concerns over the overlay maintenance cost are no longer warranted. Simulations using real traces show that they enable high reliability and performance even in very adverse conditions with low maintenance cost.

Book ChapterDOI
21 Feb 2003
TL;DR: It is shown how requiring nodes to publish auditable records of their usage can give nodes economic incentives to report their usage truthfully, and simulation results are presented that show the communication overhead of auditing is small and scales well to large networks.
Abstract: Cooperative peer-to-peer applications are designed to share the resources of each computer in an overlay network for the common good of everyone. However, users do not necessarily have an incentive to donate resources to the system if they can get the system’s resources for free. This paper presents architectures for fair sharing of storage resources that are robust against collusions among nodes. We show how requiring nodes to publish auditable records of their usage can give nodes economic incentives to report their usage truthfully, and we present simulation results that show the communication overhead of auditing is small and scales well to large networks.

Book ChapterDOI
21 Feb 2003
TL;DR: When nodes leave the network in the middle of uploads, the algorithm minimizes the duplicate information shared by nodes with truncated downloads, so any two peers with partial knowledge of a given file can almost always fully benefit from each other's knowledge.
Abstract: This paper presents a novel algorithm for downloading big files from multiple sources in peer-to-peer networks. The algorithm is simple, but offers several compelling properties. It ensures low hand-shaking overhead between peers that download files (or parts of files) from each other. It is computationally efficient, with cost linear in the amount of data transfered. Most importantly, when nodes leave the network in the middle of uploads, the algorithm minimizes the duplicate information shared by nodes with truncated downloads. Thus, any two peers with partial knowledge of a given file can almost always fully benefit from each other’s knowledge. Our algorithm is made possible by the recent introduction of linear-time, rateless erasure codes.

Book ChapterDOI
21 Feb 2003
TL;DR: A construction which is fault tolerant against random deletions and has an optimal degree-dilation tradeoff is shown, which has improved parameters when compared to other DHT’s.
Abstract: We introduce a distributed hash table (DHT) with logarithmic degree and logarithmic dilation. We show two lookup algorithms. The first has a message complexity of log n and is robust under random deletion of nodes. The second has parallel time of log n and message complexity of log2 n. It is robust under spam induced by a random subset of the nodes. We then show a construction which is fault tolerant against random deletions and has an optimal degree-dilation tradeoff. The construction has improved parameters when compared to other DHT’s. Its main merits are its simplicity, its flexibility and the fresh ideas introduced in its design. It is very easy to modify and to add more sophisticated protocols, such as dynamic caching and erasure correcting codes.

Book ChapterDOI
21 Feb 2003
TL;DR: Coral as mentioned in this paper is a peer-to-peer content distribution system that uses a distributed sloppy hash table (DSHT) to locate nearby copies of a file regardless of its popularity, without causing hot spots in the indexing infrastructure.
Abstract: We are building Coral, a peer-to-peer content distribution system. Coral creates self-organizing clusters of nodes that fetch information from each other to avoid communicating with more distant or heavily-loaded servers. Coral indexes data, but does not store it. The actual content resides where it is used, such as in nodes’ local web caches. Thus, replication happens exactly in proportion to demand. We present two novel mechanisms that let Coral achieve scalability and high performance. First, a new abstraction called a distributed sloppy hash table (DSHT) lets nodes locate nearby copies of a file, regardless of its popularity, without causing hot spots in the indexing infrastructure. Second, based on the DSHT interface, we introduce a decentralized clustering algorithm by which nodes can find each other and form clusters of varying network diameters.

Book ChapterDOI
21 Feb 2003
TL;DR: A highly scalable, efficient and robust infrastructure, called SOMO, is developed to perform resource management for P2P DHT by gathering and disseminating system metadata in O(logN) time with a self-organizing and self-healing data overlay.
Abstract: In this paper, we first describe the concept of data overlay, which is a mechanism to implement arbitrary data structure on top of any structured P2P DHT. With this abstraction, we developed a highly scalable, efficient and robust infrastructure, called SOMO, to perform resource management for P2P DHT. It does so by gathering and disseminating system metadata in O(logN) time with a self-organizing and self-healing data overlay. Our preliminary results of using SOMO to balance routing traffic with node capacities in a prefix-based overlay have demonstrated the utility of data overlay as well as the potential of SOMO.

Book ChapterDOI
21 Feb 2003
TL;DR: A wealth of research is described describing how to reliably disseminate, and to later retrieve, data in a scalable and load-balanced manner in a peer-to-peer system.
Abstract: Existing peer-to-peer systems implement a single function well: data lookup. There is now a wealth of research describing how to reliably disseminate, and to later retrieve, data in a scalable and load-balanced manner.

Book ChapterDOI
21 Feb 2003
TL;DR: PeerNet as mentioned in this paper is a peer-to-peer-based network layer for large networks, which is not an overlay on top of IP, it is an alternative to the IP layer.
Abstract: An unwritten principle of the Internet Protocol is that the IP address of a node also serves as its identifier. We observe that many scalability problems result from this principle, especially when we consider mobile networks. In this work, we examine how we would design a network with a separation between address and identity. We develop PeerNet, a peer-to-peer-based network layer for large networks. PeerNet is not an overlay on top of IP, it is an alternative to the IP layer. In PeerNet, the address reflects the node’s current location in the network. This simplifies routing significantly but creates two new challenges: the need for consistent address allocation and an efficient node lookup service. We develop fully distributed solutions to address these and other issues using a per-node state of O(log N), where N is the number of nodes in the network. PeerNet is a radically different alternative to current network layers, and our initial design suggests that the PeerNet approach is promising and worth further examination.

Book ChapterDOI
21 Feb 2003
TL;DR: This work denotes as peer selection the entire process of switching among peers and finally settling on one, and uses the methodology of machine learning for the construction of good peer selection strategies from past experience.
Abstract: In a peer-to-peer file-sharing system, a client desiring a particular file must choose a source from which to download. The problem of selecting a good data source is difficult because some peers may not be encountered more than once, and many peers are on low-bandwidth connections. Despite these facts, information obtained about peers just prior to the download can help guide peer selection. A client can gain additional time savings by aborting bad download attempts until an acceptable peer is discovered. We denote as peer selection the entire process of switching among peers and finally settling on one. Our main contribution is to use the methodology of machine learning for the construction of good peer selection strategies from past experience. Decision tree learning is used for rating peers based on low-cost information, and Markov decision processes are used for deriving a policy for switching among peers. Preliminary results with the Gnutella network demonstrate the promise of this approach.

Proceedings Article
01 Jan 2003
TL;DR: This paper explores the use of mechanism design techniques to offset the incentives for strategic behavior and facilitate the formation of networks with desirable global properties.
Abstract: Agents in a peer-to-peer system typically have incentives to influence its network structure, either to reduce their costs or to increase their ability to capture value. The problem is compounded when agents can join and leave the system dynamically. This paper explores the use of mechanism design techniques to offset the incentives for strategic behavior and facilitate the formation of networks with desirable global properties.

Book ChapterDOI
21 Feb 2003
TL;DR: The case for developing application-driven benchmarks for structured peer-to-peer overlays is presented, a model of the services they provide applications is given, and the results of two preliminary benchmarks are described and presented.
Abstract: Considerable research effort has recently been devoted to the design of structured peer-to-peer overlays, a term we use to encompass Content-Addressable Networks (CANs), Distributed Hash Tables (DHTs), and Decentralized Object Location and Routing networks (DOLRs). These systems share the property that they consistently map a large set of identifiers to a set of nodes in a network, and while at first sight they provide very similar services, they nonetheless embody a wide variety of design alternatives. We present the case for developing application-driven benchmarks for such overlays, give a model of the services they provide applications, describe and present the results of two preliminary benchmarks, and discuss the implications of our tests for application writers. We are unaware of other empirical comparative work in this area.

Book ChapterDOI
21 Feb 2003
TL;DR: A scalable overlay network that provides controlled data placement and routing locality guarantees by organizing data primarily by lexicographic ordering of string names and can be later used to efficiently reconnect an organization's SkipNet back into the global one.
Abstract: SkipNet is a scalable overlay network that provides controlled data placement and routing locality guarantees by organizing data primarily by lexicographic ordering of string names. A key side-effect of the SkipNet design is that all nodes from an organization form one or a few contiguous overlay segments. When an entire organization disconnects from the rest of the system, repair of only a few pointers quickly enables efficient routing throughout the disconnected organization; full repair is done as a subsequent background task. These same operations can be later used to efficiently reconnect an organization’s SkipNet back into the global one.

Book ChapterDOI
21 Feb 2003
TL;DR: It is observed that distributed hash tables (DHTs) provide an elegant and convenient platform for realizing these goals, and a general-purpose DHT-based Semantic-Free Referencing (SFR) architecture is presented.
Abstract: Every distributed system that employs linking requires a Reference Resolution Service (RRS) to convert link references to locations. We argue that the Web’s use of DNS for this function is a bad idea. This paper discusses the nature, design, and use of a scalable and dynamic RRS. We make two principal arguments about the nature of reference resolution: first, that there should be a general-purpose application-independent substrate for reference resolution, and second that the references themselves should be unstructured and semantic-free. We observe that distributed hash tables (DHTs) provide an elegant and convenient platform for realizing these goals, and we present a general-purpose DHT-based Semantic-Free Referencing (SFR) architecture.

Book ChapterDOI
21 Feb 2003
TL;DR: A general model, called the Search/Index Link (SIL) model, for studying peer-to-peer search networks, that allows to analyze and visualize existing network architectures and discover novel architectures that have desirable properties.
Abstract: We present a general model, called the Search/Index Link (SIL) model, for studying peer-to-peer search networks. This model allows us to analyze and visualize existing network architectures. It also allows us to discover novel architectures that have desirable properties. Finally, it can be used as a starting point for developing new network construction techniques.

Book ChapterDOI
21 Feb 2003
TL;DR: A layperson’s introduction to the copyright law principles most pertinent to peer-to-peer developers, including contributory and vicarious liability principles and potential legal defenses are presented.
Abstract: The future of peer-to-peer file-sharing and related technologies is entwined, for better or worse, with copyright law. This paper aims to present a layperson’s introduction to the copyright law principles most pertinent to peer-to-peer developers, including contributory and vicarious liability principles and potential legal defenses. After describing the current shape of the law, the paper concludes with twelve specific strategies that peer-to-peer developers can undertake to reduce their copyright vulnerabilities.

Book ChapterDOI
21 Feb 2003
TL;DR: The workshop had attracted 166 submissions out of which 27 position papers had been accepted for presentation and the program included four invited talks.
Abstract: Attendees were welcomed by Frans Kasshoek and Ion Stoica. The workshop had attracted 166 submissions out of which 27 position papers had been accepted for presentation. In addition, the program included four invited talks.