Showing papers in "arXiv: Distributed, Parallel, and Cluster Computing in 2006"

PDF

Open Access

Posted Content•

The Computational and Storage Potential of Volunteer Computing

[...]

Dustin Anderson¹, Gilles Fedak²•Institutions (2)

University of California, Berkeley¹, French Institute for Research in Computer Science and Automation²

16 Feb 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: It is shown that volunteer computing can support applications that are significantly more data-intensive, or have larger memory and storage requirements, than those in current projects.

...read moreread less

Abstract: "Volunteer computing" uses Internet-connected computers, volunteered by their owners, as a source of computing power and storage. This paper studies the potential capacity of volunteer computing. We analyzed measurements of over 330,000 hosts participating in a volunteer computing project. These measurements include processing power, memory, disk space, network throughput, host availability, user-specified limits on resource usage, and host churn. We show that volunteer computing can support applications that are significantly more data-intensive, or have larger memory and storage requirements, than those in current projects.

...read moreread less

316 citations

Book Chapter•DOI•

Global grids and software toolkits: A study of four grid middleware technologies

[...]

Parvin Asadzadeh¹, Rajkumar Buyya¹, Chun Ling Kei¹, Deepa Nayar¹, Srikumar Venugopal¹ - Show less +1 more•Institutions (1)

University of Melbourne¹

23 Jan 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This chapter presents the implementation of a resource broker for UNICORE as this functionality was not supported in the previous version of Gridbus and a comparison of these systems on the basis of the architecture, implementation model and several other features is included.

...read moreread less

Abstract: Grid is an infrastructure that involves the integrated and collaborative use of computers, networks, databases and scientific instruments owned and managed by multiple organizations Grid applications often involve large amounts of data and/or computing resources that require secure resource sharing across organizational boundaries This makes Grid application management and deployment a complex undertaking Grid middlewares provide users with seamless computing ability and uniform access to resources in the heterogeneous Grid environment Several software toolkits and systems have been developed, most of which are results of academic research projects, all over the world This chapter will focus on four of these middlewares—UNICORE, Globus, Legion and Gridbus It also presents our implementation of a resource broker for UNICORE as this functionality was not supported in it A comparison of these systems on the basis of the architecture, implementation model and several other features is included

...read moreread less

69 citations

Posted Content•

Utility Computing and Global Grids

[...]

Chee Shin Yeo, Marcos Dias De Assuncao, Jia Yu, Anthony Sulistio, Srikumar Venugopal, Martin Placek, Rajkumar Buyya - Show less +3 more

12 May 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This chapter focuses on the use of Grid technologies to achieve utility computing by presenting an overview of how Grids can support utility computing through the architecture of Utility Grids.

...read moreread less

Abstract: This chapter focuses on the use of Grid technologies to achieve utility computing. An overview of how Grids can support utility computing is first presented through the architecture of Utility Grids. Then, utility-based resource allocation is described in detail at each level of the architecture. Finally, some industrial solutions for utility computing are discussed.

...read moreread less

39 citations

Posted Content•

Distributed Metadata with the AMGA Metadata Catalog

[...]

Nuno Santos, Birger Koblitz

19 Apr 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The replication and distribution mechanisms the authors have designed and implemented into the AMGA Metadata Catalog, which is part of the gLite software stack being developed for the EGEE project, are presented.

...read moreread less

Abstract: Catalog Services play a vital role on Data Grids by allowing users and applications to discover and locate the data needed. On large Data Grids, with hundreds of geographically distributed sites, centralized Catalog Services do not provide the required scalability, performance or fault-tolerance. In this article, we start by presenting and discussing the general requirements on Grid Catalogs of applications being developed by the EGEE user community. This provides the motivation for the second part of the article, where we present the replication and distribution mechanisms we have designed and implemented into the AMGA Metadata Catalog, which is part of the gLite software stack being developed for the EGEE project. Implementing these mechanisms in the catalog itself has the advantages of not requiring any special support from the relational database back-end, of being database independent, and of allowing tailoring the mechanisms to the specific requirements and characteristics of Metadata Catalogs.

...read moreread less

38 citations

Proceedings Article•DOI•

A General Framework for Scalability and Performance Analysis of DHT Routing Systems

[...]

Joseph S. Kong¹, Jesse S. A. Bridgewater¹, Vwani P. Roychowdhury¹•Institutions (1)

University of California, Los Angeles¹

28 Mar 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The reachable component method (RCM) is presented for analyzing the performance of different DHT routing systems subject to random failures and finds that in the large-network limit, the routability of certain DHT systems go to zero for any non-zero probability of node failure.

...read moreread less

Abstract: In recent years, many DHT-based P2P systems have been proposed, analyzed, and certain deployments have reached a global scale with nearly one million nodes. One is thus faced with the question of which particular DHT system to choose, and whether some are inherently more robust and scalable. Toward developing such a comparative framework, we present the reachable component method (RCM) for analyzing the performance of different DHT routing systems subject to random failures. We apply RCM to five DHT systems and obtain analytical expressions that characterize their routability as a continuous function of system size and node failure probability. An important consequence is that in the large-network limit, the routability of certain DHT systems go to zero for any non-zero probability of node failure. These DHT routing algorithms are therefore unscalable, while some others, including Kademlia, which powers the popular eDonkey P2P system, are found to be scalable.

...read moreread less

25 citations

Posted Content•

Non-Clairvoyant Batch Sets Scheduling: Fairness is Fair enough

[...]

Julien Robert¹, Nicolas Schabanel²•Institutions (2)

École normale supérieure de Lyon¹, Centre national de la recherche scientifique²

19 Dec 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: It is proved that the algorithm EquioEqui is (2 + √3 + o(1)) lnn/ln ln n-competitive, where n is the maximum size of a set, which is optimal up to a constant factor.

...read moreread less

Abstract: Scheduling questions arise naturally in many different areas among which operating system design, compiling,... In real life systems, the characteristics of the jobs (such as release time and processing time) are usually unknown and unpredictable beforehand. The system is typically unaware of the remaining work in each job or of the ability of the job to take advantage of more resources. Following these observations, we adopt the job model by Edmonds et al (2000, 2003) in which the jobs go through a sequence of different phases. Each phase consists of a certain quantity of work and a speed-up function that models how it takes advantage of the number of processors it receives. We consider the non-clairvoyant online setting where a collection of jobs arrives at time 0. We consider the metrics setflowtime introduced by Robert et al (2007). The goal is to minimize the sum of the completion time of the sets, where a set is completed when all of its jobs are done. If the input consists of a single set of jobs, this is simply the makespan of the jobs; and if the input consists of a collection of singleton sets, it is simply the flowtime of the jobs. We show that the non-clairvoyant strategy EQUIoEQUI that evenly splits the available processors among the still unserved sets and then evenly splits these processors among the still uncompleted jobs of each unserved set, achieves a competitive ratio (2+\sqrt3+o(1))\frac{ln n}{lnln n} for the setflowtime minimization and that this is asymptotically optimal (up to a constant factor), where n is the size of the largest set. For makespan minimization, we show that the non-clairvoyant strategy EQUI achieves a competitive ratio of (1+o(1))\frac{ln n}{lnln n}, which is again asymptotically optimal.

...read moreread less

22 citations

Posted Content•

General Compact Labeling Schemes for Dynamic Trees

[...]

Amos Korman¹•Institutions (1)

Technion – Israel Institute of Technology¹

30 May 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This paper investigates labeling schemes for dynamic trees with asymptotically optimal label sizes and sublinear amortized message complexity for the ancestry relation, the id-based and label-based nearest common ancestor relation and the routing function.

...read moreread less

Abstract: Let $F$ be a function on pairs of vertices. An {\em $F$- labeling scheme} is composed of a {\em marker} algorithm for labeling the vertices of a graph with short labels, coupled with a {\em decoder} algorithm allowing one to compute $F(u,v)$ of any two vertices $u$ and $v$ directly from their labels. As applications for labeling schemes concern mainly large and dynamically changing networks, it is of interest to study {\em distributed dynamic} labeling schemes. This paper investigates labeling schemes for dynamic trees. This paper presents a general method for constructing labeling schemes for dynamic trees. Our method is based on extending an existing {\em static} tree labeling scheme to the dynamic setting. This approach fits many natural functions on trees, such as ancestry relation, routing (in both the adversary and the designer port models), nearest common ancestor etc.. Our resulting dynamic schemes incur overheads (over the static scheme) on the label size and on the communication complexity. Informally, for any function $k(n)$ and any static $F$-labeling scheme on trees, we present an $F$-labeling scheme on dynamic trees incurring multiplicative overhead factors (over the static scheme) of $O(\log_{k(n)} n)$ on the label size and $O(k(n)\log_{k(n)} n)$ on the amortized message complexity. In particular, by setting $k(n)=n^{\epsilon}$ for any $0<\epsilon<1$, we obtain dynamic labeling schemes with asymptotically optimal label sizes and sublinear amortized message complexity for all the above mentioned functions.

...read moreread less

21 citations

Posted Content•

Self-Stabilizing Byzantine Pulse Synchronization

[...]

Ariel Daliot¹, Danny Dolev¹•Institutions (1)

Hebrew University of Jerusalem¹

24 Aug 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The presented algorithm grants nodes the ability to infer that eventually all correct nodes will invoke their pulses within a very short time interval of each other and will do so regularly.

...read moreread less

Abstract: The ``Pulse Synchronization'' problem can be loosely described as targeting to invoke a recurring distributed event as simultaneously as possible at the different nodes and with a frequency that is as regular as possible. This target becomes surprisingly subtle and difficult to achieve when facing both transient and permanent failures. In this paper we present an algorithm for pulse synchronization that self-stabilizes while at the same time tolerating a permanent presence of Byzantine faults. The Byzantine nodes might incessantly try to de-synchronize the correct nodes. Transient failures might throw the system into an arbitrary state in which correct nodes have no common notion what-so-ever, such as time or round numbers, and can thus not infer anything from their own local states upon the state of other correct nodes. The presented algorithm grants nodes the ability to infer that eventually all correct nodes will invoke their pulses within a very short time interval of each other and will do so regularly. Pulse synchronization has previously been shown to be a powerful tool for designing general self-stabilizing Byzantine algorithms and is hitherto the only method that provides for the general design of efficient practical protocols in the confluence of these two fault models. The difficulty, in general, to design any algorithm in this fault model may be indicated by the remarkably few algorithms resilient to both fault models. The few published self-stabilizing Byzantine algorithms are typically complicated and sometimes converge from an arbitrary initial state only after exponential or super exponential time.

...read moreread less

20 citations

Proceedings Article•DOI•

A Case for Cooperative and Incentive-Based Coupling of Distributed Clusters

[...]

Rajiv Ranjan, Aaron Harwood, Rajkumar Buyya

15 May 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, a Grid-Federation environment is proposed, which allows the transparent use of resources from the federation when local resources are insufficient to meet its users' requirements, and the use of computational economy methodology in coordinating resource allocation not only facilitates QoS based scheduling, but also enhances utility delivered by resources.

...read moreread less

Abstract: Research interest in Grid computing has grown significantly over the past five years. Management of distributed resources is one of the key issues in Grid computing. Central to management of resources is the effectiveness of resource allocation as it determines the overall utility of the system. The current approaches to superscheduling in a grid environment are non-coordinated since application level schedulers or brokers make scheduling decisions independently of the others in the system. Clearly, this can exacerbate the load sharing and utilization problems of distributed resources due to suboptimal schedules that are likely to occur. To overcome these limitations, we propose a mechanism for coordinated sharing of distributed clusters based on computational economy. The resulting environment, called \emph{Grid-Federation}, allows the transparent use of resources from the federation when local resources are insufficient to meet its users' requirements. The use of computational economy methodology in coordinating resource allocation not only facilitates the QoS based scheduling, but also enhances utility delivered by resources.

...read moreread less

19 citations

Posted Content•

Entity Based Peer-to-Peer in a Data Grid Environment

[...]

Benoit Hudzia, Liam McDermott, Tariq N. Ellahi, M. Tahar Kechadi

29 Aug 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: A new Grid system dedicated to deal with data issues, called DGET (Data Grid Environment and Tools), characterized by its peer-to-peer communication system and entity-based architecture, taking advantage of the main functionality of both systems; P2P and Grid.

...read moreread less

Abstract: During the last decade there has been a huge interest in Grid technologies, and numerous Grid projects have been initiated with various visions of the Grid. While all these visions have the same goal of resource sharing, they differ in the functionality that a Grid supports, the grid characterisation, programming environments, etc. In this paper we present a new Grid system dedicated to deal with data issues, called DGET (Data Grid Environment and Tools). DGET is characterized by its peer-to-peer communication system and entity-based architecture, therefore, taking advantage of the main functionality of both systems; P2P and Grid. DGET is currently under development and a prototype implementing the main components is in its first phase of testing. In this paper we limit our description to the system architectural features and to the main differences with other systems.

...read moreread less

17 citations

Posted Content•

Economy-based Content Replication for Peering Content Delivery Networks

[...]

Al-Mukaddim Khan Pathan, Rajkumar Buyya, James Broberg, Kris Bubendorfer

04 Dec 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, an open, scalable, and Service-Oriented Architecture (SOA)-based system is presented to assist the creation of open Content and Service Delivery Networks (CSDNs), which scale and support sharing of resources through peering with other CSDNs.

...read moreread less

Abstract: Existing Content Delivery Networks (CDNs) exhibit the nature of closed delivery networks which do not cooperate with other CDNs and in practice, islands of CDNs are formed. The logical separation between contents and services in this context results in two content networking domains. In addition to that, meeting the Quality of Service requirements of users according to negotiated Service Level Agreement is crucial for a CDN. Present trends in content networks and content networking capabilities give rise to the interest in interconnecting content networks. Hence, in this paper, we present an open, scalable, and Service-Oriented Architecture (SOA)-based system that assist the creation of open Content and Service Delivery Networks (CSDNs), which scale and supports sharing of resources through peering with other CSDNs. To encourage resource sharing and peering arrangements between different CDN providers at global level, we propose using market-based models by introducing an economy-based strategy for content replication.

...read moreread less

Posted Content•

Strategies for Replica Placement in Tree Networks

[...]

Yves Robert¹, Anne Benoit¹, Veronika Rehn¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

08 Nov 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors discuss and compare several policies to place replicas in tree networks, subject to server capacity and QoS constraints, and assess the impact of these new policies on the total replication cost.

...read moreread less

Abstract: In this paper, we discuss and compare several policies to place replicas in tree networks, subject to server capacity and QoS constraints. The client requests are known beforehand, while the number and location of the servers are to be determined. The standard approach in the literature is to enforce that all requests of a client be served by the closest server in the tree. We introduce and study two new policies. In the first policy, all requests from a given client are still processed by the same server, but this server can be located anywhere in the path from the client to the root. In the second policy, the requests of a given client can be processed by multiple servers. One major contribution of this paper is to assess the impact of these new policies on the total replication cost. Another important goal is to assess the impact of server heterogeneity, both from a theoretical and a practical perspective. In this paper, we establish several new complexity results, and provide several efficient polynomial heuristics for NP-complete instances of the problem. These heuristics are compared to an absolute lower bound provided by the formulation of the problem in terms of the solution of an integer linear program.

...read moreread less

Journal Article•DOI•

Bulk Scheduling with DIANA Scheduler

[...]

Ashiq Anjum, Richard McClatchey, Arshad Ali, Ian Willers

08 Aug 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, a Data Intensive and Network Aware (DIANA) scheduling engine is proposed for grid analysis. But it is not suitable for data replication or data movement.

...read moreread less

Abstract: Results from the research and development of a Data Intensive and Network Aware (DIANA) scheduling engine, to be used primarily for data intensive sciences such as physics analysis, are described. In Grid analyses, tasks can involve thousands of computing, data handling, and network resources. The central problem in the scheduling of these resources is the coordinated management of computation and data at multiple locations and not just data replication or movement. However, this can prove to be a rather costly operation and efficient sing can be a challenge if compute and data resources are mapped without considering network costs. We have implemented an adaptive algorithm within the so-called DIANA Scheduler which takes into account data location and size, network performance and computation capability in order to enable efficient global scheduling. DIANA is a performance-aware and economy-guided Meta Scheduler. It iteratively allocates each job to the site that is most likely to produce the best performance as well as optimizing the global queue for any remaining jobs. Therefore it is equally suitable whether a single job is being submitted or bulk scheduling is being performed. Results indicate that considerable performance improvements can be gained by adopting the DIANA scheduling approach.

...read moreread less

Posted Content•

SLA-Based Coordinated Superscheduling Scheme and Performance for Computational Grids

[...]

Rajiv Ranjan, Aaron Harwood, Rajkumar Buyya

15 May 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This work presents a market-based SLA coordination mechanism that allows resource owners to have finer degree of control over the resource allocation and superschedulers to bid for SLA contracts in the contract net with focus on completing the job within the user specified deadline.

...read moreread less

Abstract: The Service Level Agreement~(SLA) based grid superscheduling approach promotes coordinated resource sharing. Superscheduling is facilitated between administratively and topologically distributed grid sites by grid schedulers such as Resource brokers. In this work, we present a market-based SLA coordination mechanism. We based our SLA model on a well known \emph{contract net protocol}. The key advantages of our approach are that it allows:~(i) resource owners to have finer degree of control over the resource allocation that was previously not possible through traditional mechanism; and (ii) superschedulers to bid for SLA contracts in the contract net with focus on completing the job within the user specified deadline. In this work, we use simulation to show the effectiveness of our proposed approach.

...read moreread less

Journal Article•DOI•

Heterogeneous Strong Computation Migration

[...]

Anolan Milanés, Noemi Rodriguez, Bruno Schulze

22 Dec 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The objective of this report is to survey the problem of strong migration in heterogeneous environments like the grids', the related implementation issues and the current solutions.

...read moreread less

Abstract: The continuous increase in performance requirements, for both scientific computation and industry, motivates the need of a powerful computing infrastructure The Grid appeared as a solution for inexpensive execution of heavy applications in a parallel and distributed manner It allows combining resources independently of their physical location and architecture to form a global resource pool available to all grid users However, grid environments are highly unstable and unpredictable Adaptability is a crucial issue in this context, in order to guarantee an appropriate quality of service to users Migration is a technique frequently used for achieving adaptation The objective of this report is to survey the problem of strong migration in heterogeneous environments like the grids', the related implementation issues and the current solutions

...read moreread less

Book Chapter•DOI•

Discovering Network Topology in the Presence of Byzantine Faults

[...]

Mikhail Nesterenko¹, Sébastien Tixeuil•Institutions (1)

Kent State University¹

22 Nov 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors study the problem of Byzantine-robust topology discovery in an arbitrary asynchronous network and show that the problem is solvable only if the connectivity of the network exceeds the number of faults in the system.

...read moreread less

Abstract: We study the problem of Byzantine-robust topology discovery in an arbitrary asynchronous network. We formally state the weak and strong versions of the problem. The weak version requires that either each node discovers the topology of the network or at least one node detects the presence of a faulty node. The strong version requires that each node discovers the topology regardless of faults. We focus on non-cryptographic solutions to these problems. We explore their bounds. We prove that the weak topology discovery problem is solvable only if the connectivity of the network exceeds the number of faults in the system. Similarly, we show that the strong version of the problem is solvable only if the network connectivity is more than twice the number of faults. We present solutions to both versions of the problem. The presented algorithms match the established graph connectivity bounds. The algorithms do not require the individual nodes to know either the diameter or the size of the network. The message complexity of both programs is low polynomial with respect to the network size. We describe how our solutions can be extended to add the property of termination, handle topology changes and perform neighborhood discovery.

...read moreread less

Posted Content•

Asymptotic Analysis of a Leader Election Algorithm

[...]

Christian Lavault¹, Guy Louchard²•Institutions (2)

University of Paris¹, Université libre de Bruxelles²

08 Jul 2006-arXiv: Distributed, Parallel, and Cluster Computing

Abstract: Itai and Rodeh showed that, on the average, the communication of a leader election algorithm takes no more than $LN$ bits, where $L \simeq 2.441716$ and $N$ denotes the size of the ring. We give a precise asymptotic analysis of the average number of rounds M(n) required by the algorithm, proving for example that $\dis M(\infty) := \lim\_{n\to \infty} M(n) = 2.441715879...$, where $n$ is the number of starting candidates in the election. Accurate asymptotic expressions of the second moment $M^{(2)}(n)$ of the discrete random variable at hand, its probability distribution, and the generalization to all moments are given. Corresponding asymptotic expansions $(n\to \infty)$ are provided for sufficiently large $j$, where $j$ counts the number of rounds. Our numerical results show that all computations perfectly fit the observed values. Finally, we investigate the generalization to probability $t/n$, where $t$ is a non negative real parameter. The real function $\dis M(\infty,t) := \lim\_{n\to \infty} M(n,t)$ is shown to admit \textit{one unique minimum} $M(\infty,t^{*})$ on the real segment $(0,2)$. Furthermore, the variations of $M(\infty,t)$ on thewhole real line are also studied in detail.

...read moreread less

Posted Content•

RAFDA : A policy-aware middleware supporting the flexible separation of application logic from distribution

[...]

Scott Mervyn Walker, Alan Dearle, Stuart J. Norcross, Graham N. C. Kirby, Andrew McCarthy¹ - Show less +1 more•Institutions (1)

University of St Andrews¹

01 Jan 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This paper explores technology that provides control over the extent to which inter-address-space communication is exposed to programmers, in order to aid the creation, maintenance and evolution of distributed applications.

...read moreread less

Abstract: Middleware technologies often limit the way in which object classes may be used in distributed applications due to the fixed distribution policies that they impose. These policies permeate applications developed using existing middleware systems and force an unnatural encoding of application level semantics. For example, the application programmer has no direct control over inter-address-space parameter passing semantics. Semantics are fixed by the distribution topology of the application, which is dictated early in the design cycle. This creates applications that are brittle with respect to changes in distribution. This paper explores technology that provides control over the extent to which inter-address-space communication is exposed to programmers, in order to aid the creation, maintenance and evolution of distributed applications. The described system permits arbitrary objects in an application to be dynamically exposed for remote access, allowing applications to be written without concern for distribution. Programmers can conceal or expose the distributed nature of applications as required, permitting object placement and distribution boundaries to be decided late in the design cycle and even dynamically. Inter-address-space parameter passing semantics may also be decided independently of object implementation and at varying times in the design cycle, again possibly as late as run-time. Furthermore, transmission policy may be defined on a per-class, per-method or per-parameter basis, maximizing plasticity. This flexibility is of utility in the development of new distributed applications, and the creation of management and monitoring infrastructures for existing applications.

...read moreread less

Posted Content•

Large Scale In Silico Screening on Grid Infrastructures

[...]

17 Nov 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The deployment of large scale in silico docking within the framework of the WISDOM initiative against Malaria and Avian Flu requiring about 105 years of CPU on the EGEE, Auvergrid and TWGrid infrastructures demonstrated the relevance of large-scale grid infrastructure for the virtual screening by molecular docking.

...read moreread less

Abstract: Large-scale grid infrastructures for in silico drug discovery open opportunities of particular interest to neglected and emerging diseases. In 2005 and 2006, we have been able to deploy large scale in silico docking within the framework of the WISDOM initiative against Malaria and Avian Flu requiring about 105 years of CPU on the EGEE, Auvergrid and TWGrid infrastructures. These achievements demonstrated the relevance of large-scale grid infrastructures for the virtual screening by molecular docking. This also allowed evaluating the performances of the grid infrastructures and to identify specific issues raised by large-scale deployment.

...read moreread less

Posted Content•

Transparent Migration of Multi-Threaded Applications on a Java Based Grid

[...]

Tariq N. Ellahi¹, Benoit Hudzia, Liam McDermott, M. Tahar Kechadi•Institutions (1)

University College Dublin¹

29 Aug 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: An overview of migration support in a java based grid middleware called DGET, which includes multi-threaded migration and asynchronous migration as well, is presented.

...read moreread less

Abstract: Grid computing has enabled pooling a very large number of heterogeneous resource administered by different security domains. Applications are dynamically deployed on the resources available at the time. Dynamic nature of the resources and applications requirements makes needs the grid middleware to support the ability of migrating a running application to a different resource. Especially, Grid applications are typically long running and thus stoping them and starting them from scratch is not a feasible option. This paper presents an overview of migration support in a java based grid middleware called DGET. Migration support in DGET includes multi-threaded migration and asynchronous migration as well.

...read moreread less

Posted Content•

IP over P2P: Enabling Self-configuring Virtual IP Networks for Grid Computing

[...]

Arijit Ganguly¹, Abhishek Agrawal¹, P. Oscar Boykin¹, Renato Figueiredo¹•Institutions (1)

University of Florida¹

22 Mar 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: IPOP as discussed by the authors is a system for creating virtual IP networks on top of a P2P overlay, which enables seamless access to Grid resources spanning multiple domains by aggregating them into a virtual IP network that is completely isolated from the physical network.

...read moreread less

Abstract: Peer-to-peer (P2P) networks have mostly focused on task oriented networking, where networks are constructed for single applications, i.e. file-sharing, DNS caching, etc. In this work, we introduce IPOP, a system for creating virtual IP networks on top of a P2P overlay. IPOP enables seamless access to Grid resources spanning multiple domains by aggregating them into a virtual IP network that is completely isolated from the physical network. The virtual IP network provided by IPOP supports deployment of existing IP-based protocols over a robust, self-configuring P2P overlay. We present implementation details as well as experimental measurement results taken from LAN, WAN, and Planet-Lab tests.

...read moreread less

Posted Content•

Circle Formation of Weak Robots and Lyndon Words

[...]

Yoann Dieudonné¹, Franck Petit¹•Institutions (1)

University of Picardie Jules Verne¹

22 May 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors show how Lyndon words can be used in the distributed control of a set of n weak mobile robots, which are anonymous, memoryless, without any common sense of direction, and unable to communicate in other way than observation.

...read moreread less

Abstract: A Lyndon word is a non-empty word strictly smaller in the lexicographic order than any of its suffixes, except itself and the empty word. In this paper, we show how Lyndon words can be used in the distributed control of a set of n weak mobile robots. By weak, we mean that the robots are anonymous, memoryless, without any common sense of direction, and unable to communicate in an other way than observation. An efficient and simple deterministic protocol to form a regular n-gon is presented and proven for n prime.

...read moreread less

Posted Content•

Labeling Schemes with Queries

[...]

Amos Korman¹, Shay Kutten¹•Institutions (1)

Technion – Israel Institute of Technology¹

29 Sep 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, the authors introduce the notion of labeling schemes with queries, in which the value of a node can be inferred by inspecting not only the labels of the node but also the label of some additional vertices.

...read moreread less

Abstract: We study the question of ``how robust are the known lower bounds of labeling schemes when one increases the number of consulted labels''. Let $f$ be a function on pairs of vertices. An $f$-labeling scheme for a family of graphs $\cF$ labels the vertices of all graphs in $\cF$ such that for every graph $G\in\cF$ and every two vertices $u,v\in G$, the value $f(u,v)$ can be inferred by merely inspecting the labels of $u$ and $v$. This paper introduces a natural generalization: the notion of $f$-labeling schemes with queries, in which the value $f(u,v)$ can be inferred by inspecting not only the labels of $u$ and $v$ but possibly the labels of some additional vertices. We show that inspecting the label of a single additional vertex (one {\em query}) enables us to reduce the label size of many labeling schemes significantly.

...read moreread less

Posted Content•

A peer-to-peer middleware framework for resilient persistent programming

[...]

Alan Dearle, Graham N. C. Kirby, Stuart J. Norcross, Andrew McCarthy¹•Institutions (1)

University of St Andrews¹

01 Jan 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The vision is of an infrastructure within which an application can be developed and distributed with minimal modification, whereupon the application becomes resilient to certain failure modes, and can be achieved within a spectrum of application programmer intervention, ranging from minimal to totally prescriptive, as desired.

...read moreread less

Abstract: The persistent programming systems of the 1980s offered a programming model that integrated computation and long-term storage. In these systems, reliable applications could be engineered without requiring the programmer to write translation code to manage the transfer of data to and from non-volatile storage. More importantly, it simplified the programmer's conceptual model of an application, and avoided the many coherency problems that result from multiple cached copies of the same information. Although technically innovative, persistent languages were not widely adopted, perhaps due in part to their closed-world model. Each persistent store was located on a single host, and there were no flexible mechanisms for communication or transfer of data between separate stores. Here we re-open the work on persistence and combine it with modern peer-to-peer techniques in order to provide support for orthogonal persistence in resilient and potentially long-running distributed applications. Our vision is of an infrastructure within which an application can be developed and distributed with minimal modification, whereupon the application becomes resilient to certain failure modes. If a node, or the connection to it, fails during execution of the application, the objects are re-instantiated from distributed replicas, without their reference holders being aware of the failure. Furthermore, we believe that this can be achieved within a spectrum of application programmer intervention, ranging from minimal to totally prescriptive, as desired. The same mechanisms encompass an orthogonally persistent programming model. We outline our approach to implementing this vision, and describe current progress.

...read moreread less

Posted Content•

A Case for Peering of Content Delivery Networks

[...]

Rajkumar Buyya¹, Al-Mukaddim Khan Pathan¹, James Broberg², Zahir Tari²•Institutions (2)

University of Melbourne¹, RMIT University²

06 Sep 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors present an open, scalable and Service-Oriented Architecture based system to assist the creation of open Content and Service Delivery Networks (CSDN) that scale and support sharing of resources with other CSDNs.

...read moreread less

Abstract: The proliferation of Content Delivery Networks (CDN) reveals that existing content networks are owned and operated by individual companies. As a consequence, closed delivery networks are evolved which do not cooperate with other CDNs and in practice, islands of CDNs are formed. Moreover, the logical separation between contents and services in this context results in two content networking domains. But present trends in content networks and content networking capabilities give rise to the interest in interconnecting content networks. Finding ways for distinct content networks to coordinate and cooperate with other content networks is necessary for better overall service. In addition to that, meeting the QoS requirements of users according to the negotiated Service Level Agreements between the user and the content network is a burning issue in this perspective. In this article, we present an open, scalable and Service-Oriented Architecture based system to assist the creation of open Content and Service Delivery Networks (CSDN) that scale and support sharing of resources with other CSDNs.

...read moreread less

Posted Content•

Lossy Bulk Synchronous Parallel Processing Model for Very Large Scale Grids

[...]

Elankovan A Sundararajan, Aaron Harwood¹, Kotagiri Ramamohanarao•Institutions (1)

University of Melbourne¹

20 Nov 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This paper considers the use of UDP and examines the relationship between packet loss and speedup with respect to the number of grid nodes, and demonstrates that by using an appropriate number of packet copies, one can increase performance of parallel program.

...read moreread less

Abstract: The performance of a parallel algorithm in a very large scale grid is significantly influenced by the underlying Internet protocols and inter-connectivity. Many grid programming platforms use TCP due to its reliability, usually with some optimizations to reduce its costs. However, TCP does not perform well in a high bandwidth and high delay network environment. On the other hand, UDP is the fastest protocol available because it omits connection setup process, acknowledgments and retransmissions sacrificing reliable transfer. Many new bulk data transfer schemes using UDP for data transmission such as RBUDP, Tsunami, and SABUL have been introduced and shown to have better performance compared to TCP. In this paper, we consider the use of UDP and examine the relationship between packet loss and speedup with respect to the number of grid nodes. Our measurement suggests that packet loss rates between 5%-15% on average are not uncommon between PlanetLab nodes that are widely distributed over the Internet. We show that transmitting multiple copies of same packet produces higher speedup. We show the minimum number of packet duplication required to maximize the possible speedup for a given number of nodes using a BSP based model. Our work demonstrates that by using an appropriate number of packet copies, we can increase performance of parallel program.

...read moreread less

Posted Content•

A Calculus for Sensor Networks

[...]

Miguel S. Silva, Francisco Martins, Luís Lopes, Joao Barros

19 Dec 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: A Calculus for Sensor Networks (CSN) is proposed that captures the main abstractions for programming applications for this class of devices and shows its expressiveness by providing implementations for several examples of typical operations on sensor networks.

...read moreread less

Abstract: We consider the problem of providing a rigorous model for programming wireless sensor networks. Assuming that collisions, packet losses, and errors are dealt with at the lower layers of the protocol stack, we propose a Calculus for Sensor Networks (CSN) that captures the main abstractions for programming applications for this class of devices. Besides providing the syntax and semantics for the calculus, we show its expressiveness by providing implementations for several examples of typical operations on sensor networks. Also included is a detailed discussion of possible extensions to CSN that enable the modeling of other important features of these networks such as sensor state, sampling strategies, and network security.

...read moreread less

Posted Content•

About the Lifespan of Peer to Peer Networks

[...]

Rudi Cilibrasi¹, Zvi Lotker¹, Alfredo Navarra², Stéphane Pérennes³, Paul M. B. Vitányi⁴ - Show less +1 more•Institutions (4)

Centrum Wiskunde & Informatica¹, L'Abri², French Institute for Research in Computer Science and Automation³, University of Amsterdam⁴

07 Dec 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors analyze the ability of peer-to-peer networks to deliver a complete file among the peers, and they motivate a broad generalization of network behavior organizing it into one of two successive phases.

...read moreread less

Abstract: We analyze the ability of peer to peer networks to deliver a complete file among the peers. Early on we motivate a broad generalization of network behavior organizing it into one of two successive phases. According to this view the network has two main states: first centralized - few sources (roots) hold the complete file, and next distributed - peers hold some parts (chunks) of the file such that the entire network has the whole file, but no individual has it. In the distributed state we study two scenarios, first, when the peers are ``patient'', i.e, do not leave the system until they obtain the complete file; second, peers are ``impatient'' and almost always leave the network before obtaining the complete file.

...read moreread less

Posted Content•

A verification algorithm for Declarative Concurrent Programming

[...]

Jean Krivine

22 Jun 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: A verification method for distributed systems based on decoupling forward and backward behaviour is proposed, using an event structure based algorithm that constructs its causal compression relative to a choice of observable actions.

...read moreread less

Abstract: A verification method for distributed systems based on decoupling forward and backward behaviour is proposed This method uses an event structure based algorithm that, given a CCS process, constructs its causal compression relative to a choice of observable actions Verifying the original process equipped with distributed backtracking on non-observable actions, is equivalent to verifying its relative compression which in general is much smaller We call this method Declarative Concurrent Programming (DCP) DCP technique compares well with direct bisimulation based methods Benchmarks for the classic dining philosophers problem show that causal compression is rather efficient both time- and space-wise State of the art verification tools can successfully handle more than 15 agents, whereas they can handle no more than 5 following the traditional direct method; an altogether spectacular improvement, since in this example the specification size is exponential in the number of agents

...read moreread less

Posted Content•

Energy Efficient Randomized Communication in Unknown AdHoc Networks

[...]

Petra Berenbrink, Colin Cooper, Zengjian Hu

15 Dec 2006-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors studied broadcasting and gossiping algorithms in random and general ad hoc networks, and proposed a broadcast algorithm with an expected number of O(log n) messages per node, where n is the number of nodes.

...read moreread less

Abstract: This paper studies broadcasting and gossiping algorithms in random and general AdHoc networks. Our goal is not only to minimise the broadcasting and gossiping time, but also to minimise the energy consumption, which is measured in terms of the total number of messages (or transmissions) sent. We assume that the nodes of the network do not know the network, and that they can only send with a fixed power, meaning they can not adjust the areas sizes that their messages cover. We believe that under these circumstances the number of transmissions is a very good measure for the overall energy consumption. For random networks, we present a broadcasting algorithm where every node transmits at most once. We show that our algorithm broadcasts in $O(\log n)$ steps, w.h.p, where $n$ is the number of nodes. We then present a $O(d \log n)$ ($d$ is the expected degree) gossiping algorithm using $O(\log n)$ messages per node. For general networks with known diameter $D$, we present a randomised broadcasting algorithm with optimal broadcasting time $O(D \log (n/D) + \log^2 n)$ that uses an expected number of $O(\log^2 n / \log (n/D))$ transmissions per node. We also show a tradeoff result between the broadcasting time and the number of transmissions: we construct a network such that any oblivious algorithmusing a time-invariant distribution requires $\Omega(\log^2 n / \log (n/D))$ messages per node in order to finish broadcasting in optimal time. This demonstrates the tightness of our upper bound. We also show that no oblivious algorithm can complete broadcasting w.h.p. using $o(\log n)$ messages per node.

...read moreread less