Showing papers in "arXiv: Distributed, Parallel, and Cluster Computing in 2009"

PDF

Open Access

Posted Content•

Modeling and Simulation of Scalable Cloud Computing Environments and the CloudSim Toolkit: Challenges and Opportunities

[...]

Rajkumar Buyya¹, Rajiv Ranjan², Rodrigo N. Calheiros¹•Institutions (2)

University of Melbourne¹, University of New South Wales²

28 Jul 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This paper proposes CloudSim: an extensible simulation toolkit that enables modelling and simulation of Cloud computing environments and allows simulation of multiple Data Centers to enable a study on federation and associated policies for migration of VMs for reliability and automatic scaling of applications.

...read moreread less

Abstract: Cloud computing aims to power the next generation data centers and enables application service providers to lease data center capabilities for deploying applications depending on user QoS (Quality of Service) requirements. Cloud applications have different composition, configuration, and deployment requirements. Quantifying the performance of resource allocation policies and application scheduling algorithms at finer details in Cloud computing environments for different application and service models under varying load, energy performance (power consumption, heat dissipation), and system size is a challenging problem to tackle. To simplify this process, in this paper we propose CloudSim: an extensible simulation toolkit that enables modelling and simulation of Cloud computing environments. The CloudSim toolkit supports modelling and creation of one or more virtual machines (VMs) on a simulated node of a Data Center, jobs, and their mapping to suitable VMs. It also allows simulation of multiple Data Centers to enable a study on federation and associated policies for migration of VMs for reliability and automatic scaling of applications.

...read moreread less

1,033 citations

Posted Content•

CloudSim: A Novel Framework for Modeling and Simulation of Cloud Computing Infrastructures and Services

[...]

Rodrigo N. Calheiros, Rajiv Ranjan, César A. F. De Rose, Rajkumar Buyya

14 Mar 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This paper proposes CloudSim: a new generalized and extensible simulation framework that enables seamless modelling, simulation, and experimentation of emerging Cloud computing infrastructures and management services.

...read moreread less

Abstract: Cloud computing focuses on delivery of reliable, secure, fault-tolerant, sustainable, and scalable infrastructures for hosting Internet-based application services. These applications have different composition, configuration, and deployment requirements. Quantifying the performance of scheduling and allocation policy on a Cloud infrastructure (hardware, software, services) for different application and service models under varying load, energy performance (power consumption, heat dissipation), and system size is an extremely challenging problem to tackle. To simplify this process, in this paper we propose CloudSim: a new generalized and extensible simulation framework that enables seamless modelling, simulation, and experimentation of emerging Cloud computing infrastructures and management services. The simulation framework has the following novel features: (i) support for modelling and instantiation of large scale Cloud computing infrastructure, including data centers on a single physical computing node and java virtual machine; (ii) a self-contained platform for modelling data centers, service brokers, scheduling, and allocations policies; (iii) availability of virtualization engine, which aids in creation and management of multiple, independent, and co-hosted virtualized services on a data center node; and (iv) flexibility to switch between space-shared and time-shared allocation of processing cores to virtualized services.

...read moreread less

537 citations

Posted Content•

Survey of clustering algorithms for MANET

[...]

Ratish Agarwal, Mahesh Motwani

11 Dec 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: A survey of different clustering schemes for ad hoc networks, developed by researchers which focus on different performance metrics is presented.

...read moreread less

Abstract: Many clustering schemes have been proposed for ad hoc networks. A systematic classification of these clustering schemes enables one to better understand and make improvements. In mobile ad hoc networks, the movement of the network nodes may quickly change the topology resulting in the increase of the overhead message in topology maintenance. Protocols try to keep the number of nodes in a cluster around a pre-defined threshold to facilitate the optimal operation of the medium access control protocol. The clusterhead election is invoked on-demand, and is aimed to reduce the computation and communication costs. A large variety of approaches for ad hoc clustering have been developed by researchers which focus on different performance metrics. This paper presents a survey of different clustering schemes.

...read moreread less

229 citations

Posted Content•

Aneka: A Software Platform for .NET-based Cloud Computing

[...]

Christian Vecchiola¹, Xingchen Chu², Rajkumar Buyya²•Institutions (2)

University of Genoa¹, University of Melbourne²

26 Jul 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: Aneka is a platform for deploying Clouds developing applications on top of it that provides a runtime environment and a set of APIs that allow developers to build .NET applications that leverage their computation on either public or private clouds.

...read moreread less

Abstract: Aneka is a platform for deploying Clouds developing applications on top of it. It provides a runtime environment and a set of APIs that allow developers to build .NET applications that leverage their computation on either public or private clouds. One of the key features of Aneka is the ability of supporting multiple programming models that are ways of expressing the execution logic of applications by using specific abstractions. This is accomplished by creating a customizable and extensible service oriented runtime environment represented by a collection of software containers connected together. By leveraging on these architecture advanced services including resource reservation, persistence, storage management, security, and performance monitoring have been implemented. On top of this infrastructure different programming models can be plugged to provide support for different scenarios as demonstrated by the engineering, life science, and industry applications.

...read moreread less

101 citations

Posted Content•

Peer-to-Peer Cloud Provisioning: Service Discovery and Load-Balancing

[...]

Rajiv Ranjan, Liang Zhao, Xiaomin Wu, Anna Liu

10 Dec 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, a layered peer-to-peer cloud provisioning architecture is presented, with particular emphasis on service discovery and load-balancing, and an experimental evaluation is presented that demonstrates the feasibility of building next generation Cloud provisioning systems based on P2P network management and information dissemination models.

...read moreread less

Abstract: This chapter presents: (i) a layered peer-to-peer Cloud provisioning architecture; (ii) a summary of the current state-of-the-art in Cloud provisioning with particular emphasis on service discovery and load-balancing; (iii) a classification of the existing peer-to-peer network management model with focus on extending the DHTs for indexing and managing complex provisioning information; and (iv) the design and implementation of novel, extensible software fabric (Cloud peer) that combines public/private clouds, overlay networking and structured peer-to-peer indexing techniques for supporting scalable and self-managing service discovery and load-balancing in Cloud computing environments. Finally, an experimental evaluation is presented that demonstrates the feasibility of building next generation Cloud provisioning systems based on peer-to-peer network management and information dissemination models. The experimental test-bed has been deployed on a public cloud computing platform, Amazon EC2, which demonstrates the effectiveness of the proposed peer-to-peer Cloud provisioning software fabric.

...read moreread less

89 citations

Posted Content•

Building on Quicksand

[...]

Pat Helland¹, David G. Campbell¹•Institutions (1)

Microsoft¹

09 Sep 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: Asynchronous state capture as discussed by the authors is a probabilistic model for fault tolerance in distributed systems, where the primary system will acknowledge the work request and its actions without waiting to ensure that the backup is notified of the work.

...read moreread less

Abstract: Reliable systems have always been built out of unreliable components. Early on, the reliable components were small such as mirrored disks or ECC (Error Correcting Codes) in core memory. These systems were designed such that failures of these small components were transparent to the application. Later, the size of the unreliable components grew larger and semantic challenges crept into the application when failures occurred. As the granularity of the unreliable component grows, the latency to communicate with a backup becomes unpalatable. This leads to a more relaxed model for fault tolerance. The primary system will acknowledge the work request and its actions without waiting to ensure that the backup is notified of the work. This improves the responsiveness of the system. There are two implications of asynchronous state capture: 1) Everything promised by the primary is probabilistic. There is always a chance that an untimely failure shortly after the promise results in a backup proceeding without knowledge of the commitment. Hence, nothing is guaranteed! 2) Applications must ensure eventual consistency. Since work may be stuck in the primary after a failure and reappear later, the processing order for work cannot be guaranteed. Platform designers are struggling to make this easier for their applications. Emerging patterns of eventual consistency and probabilistic execution may soon yield a way for applications to express requirements for a "looser" form of consistency while providing availability in the face of ever larger failures. This paper recounts portions of the evolution of these trends, attempts to show the patterns that span these changes, and talks about future directions as we continue to "build on quicksand".

...read moreread less

77 citations

Posted Content•

Energy-Efficient Scheduling of HPC Applications in Cloud Computing Environments

[...]

Saurabh Garg, Chee Shin Yeo, Arun Anandasivam, Rajkumar Buyya

08 Sep 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This work proposes near-optimal scheduling policies that exploits heterogeneity across multiple data centers for a Cloud provider that are able to achieve on average up to 30% of energy savings in comparison to profit based scheduling policies leading to higher profit and less carbon emissions.

...read moreread less

Abstract: The use of High Performance Computing (HPC) in commercial and consumer IT applications is becoming popular. They need the ability to gain rapid and scalable access to high-end computing capabilities. Cloud computing promises to deliver such a computing infrastructure using data centers so that HPC users can access applications and data from a Cloud anywhere in the world on demand and pay based on what they use. However, the growing demand drastically increases the energy consumption of data centers, which has become a critical issue. High energy consumption not only translates to high energy cost, which will reduce the profit margin of Cloud providers, but also high carbon emissions which is not environmentally sustainable. Hence, energy-efficient solutions are required that can address the high increase in the energy consumption from the perspective of not only Cloud provider but also from the environment. To address this issue we propose near-optimal scheduling policies that exploits heterogeneity across multiple data centers for a Cloud provider. We consider a number of energy efficiency factors such as energy cost, carbon emission rate, workload, and CPU power efficiency which changes across different data center depending on their location, architectural design, and management system. Our carbon/energy based scheduling policies are able to achieve on average up to 30% of energy savings in comparison to profit based scheduling policies leading to higher profit and less carbon emissions.

...read moreread less

67 citations

Posted Content•

Distributed Abstract Optimization via Constraints Consensus: Theory and Applications

[...]

Giuseppe Notarstefano, Francesco Bullo¹•Institutions (1)

University of California, Santa Barbara¹

30 Oct 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This work proposes novel constraints consensus algorithms for distributed abstract programs with guaranteed finite-time convergence to a global optimum and shows how the constraints consensus algorithm may be applied to suitable target localization and formation control problems.

...read moreread less

Abstract: Distributed abstract programs are a novel class of distributed optimization problems where (i) the number of variables is much smaller than the number of constraints and (ii) each constraint is associated to a network node. Abstract optimization programs are a generalization of linear programs that captures numerous geometric optimization problems. We propose novel constraints consensus algorithms for distributed abstract programs: as each node iteratively identifies locally active constraints and exchanges them with its neighbors, the network computes the active constraints determining the global optimum. The proposed algorithms are appropriate for networks with weak time-dependent connectivity requirements and tight memory constraints. We show how suitable target localization and formation control problems can be tackled via constraints consensus.

...read moreread less

65 citations

Posted Content•

A Linear Programming Driven Genetic Algorithm for Meta-Scheduling on Utility Grids

[...]

Saurabh Garg¹, Pramod Kumar Konugurthi², Rajkumar Buyya¹•Institutions (2)

University of Melbourne¹, Department of Space²

08 Mar 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: A novel algorithm LPGA (linear programming driven genetic algorithm) which combines the capabilities of LP and GA is proposed which offers the best meta-schedule for utility grids which minimize combined cost of all users in a coordinated manner.

...read moreread less

Abstract: The user-level brokers in grids consider individual application QoS requirements and minimize their cost without considering demands from other users. This results in contention for resources and sub-optimal schedules. Meta-scheduling in grids aims to address this scheduling problem, which is NP hard due to its combinatorial nature. Thus, many heuristic-based solutions using Genetic Algorithm (GA) have been proposed, apart from traditional algorithms such as Greedy and FCFS. We propose a Linear Programming/Integer Programming model (LP/IP) for scheduling these applications to multiple resources. We also propose a novel algorithm LPGA (Linear programming driven Genetic Algorithm) which combines the capabilities of LP and GA. The aim of this algorithm is to obtain the best metaschedule for utility grids which minimize combined cost of all users in a coordinated manner. Simulation results show that our proposed integrated algorithm offers the best schedule having the minimum processing cost with negligible time overhead.

...read moreread less

62 citations

Journal Article•DOI•

PetFMM--A dynamically load-balancing parallel fast multipole library

[...]

Felipe A. Cruz¹, Matthew G. Knepley², Lorena A. Barba³•Institutions (3)

University of Bristol¹, University of Chicago², Boston University³

15 May 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: An extensible parallel library for N‐body interactions utilizing the fast multipole method, designed to be extensible, with a view to unifying efforts involving many algorithms based on the same principles as the FMM and enabling easy development of scientific application codes.

...read moreread less

Abstract: Fast algorithms for the computation of $N$-body problems can be broadly classified into mesh-based interpolation methods, and hierarchical or multiresolution methods. To this last class belongs the well-known fast multipole method (FMM), which offers O(N) complexity. This paper presents an extensible parallel library for $N$-body interactions utilizing the FMM algorithm, built on the framework of PETSc. A prominent feature of this library is that it is designed to be extensible, with a view to unifying efforts involving many algorithms based on the same principles as the FMM and enabling easy development of scientific application codes. The paper also details an exhaustive model for the computation of tree-based $N$-body algorithms in parallel, including both work estimates and communications estimates. With this model, we are able to implement a method to provide automatic, a priori load balancing of the parallel execution, achieving optimal distribution of the computational work among processors and minimal inter-processor communications. Using a client application that performs the calculation of velocity induced by $N$ vortex particles, ample verification and testing of the library was performed. Strong scaling results are presented with close to a million particles in up to 64 processors, including both speedup and parallel efficiency. The library is currently able to achieve over 85% parallel efficiency for 64 processors. The software library is open source under the PETSc license; this guarantees the maximum impact to the scientific community and encourages peer-based collaboration for the extensions and applications.

...read moreread less

57 citations

Posted Content•

CRDTs: Consistency without concurrency control

[...]

Mihai Leția, Nuno Preguiça, Marc Shapiro

06 Jul 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This work exhibits a non-trivial CRDT: a shared edit buffer called Treedoc, and discusses how the CRDT concept can be generalised, and its limitations.

...read moreread less

Abstract: A CRDT is a data type whose operations commute when they are concurrent. Replicas of a CRDT eventually converge without any complex concurrency control. As an existence proof, we exhibit a non-trivial CRDT: a shared edit buffer called Treedoc. We outline the design, implementation and performance of Treedoc. We discuss how the CRDT concept can be generalised, and its limitations.

...read moreread less

Posted Content•

Optimization of multiple vehicle routing problems using approximation algorithms

[...]

R. Nallusamy, K. Duraiswamy, R. Dhanalaksmi, P. Parthiban

01 Jan 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: After the application of the various heuristic techniques, it was found that the Genetic algorithm gave a better result and a more optimal tour for mVRPs in short computational time than other Algorithms due to the extensive search and constructive nature of the algorithm.

...read moreread less

Abstract: This paper deals with generating of an optimized route for multiple Vehicle routing Problems (mVRP). We used a methodology of clustering the given cities depending upon the number of vehicles and each cluster is allotted to a vehicle. k- Means clustering algorithm has been used for easy clustering of the cities. In this way the mVRP has been converted into VRP which is simple in computation compared to mVRP. After clustering, an optimized route is generated for each vehicle in its allotted cluster. Once the clustering had been done and after the cities were allocated to the various vehicles, each cluster/tour was taken as an individual Vehicle Routing problem and the steps of Genetic Algorithm were applied to the cluster and iterated to obtain the most optimal value of the distance after convergence takes place. After the application of the various heuristic techniques, it was found that the Genetic algorithm gave a better result and a more optimal tour for mVRPs in short computational time than other Algorithms due to the extensive search and constructive nature of the algorithm.

...read moreread less

Posted Content•

A New Fuzzy Approach for Dynamic Load Balancing Algorithm

[...]

Abbas Karimi¹, Faraneh Zarafshan, Adznan B. Jantan, Abdul Rahman Ramli, M. Iqbal Saripan - Show less +1 more•Institutions (1)

Islamic Azad University¹

02 Oct 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This paper presents a new approach for implementing dynamic load balancing algorithm with fuzzy logic, which can face to uncertainty and inconsistency of previous algorithms, and shows better response time than round robin and randomize algorithm.

...read moreread less

Abstract: Load balancing is the process of improving the Performance of a parallel and distributed system through is distribution of load among the processors(1-2). Most of the previous work in load balancing and distributed decision making in general, do not effectively take into account the uncertainty and inconsistency in state information but in fuzzy logic, we have advantage of using crisps inputs. In this paper, we present a new approach for implementing dynamic load balancing algorithm with fuzzy logic, which can face to uncertainty and inconsistency of previous algorithms, further more our algorithm shows better response time than round robin and randomize algorithm respectively 30.84% and 45.45%.

...read moreread less

Journal Article•

FastFlow: Efficient Parallel Streaming Applications on Multi-core

[...]

Aldinucci Marco¹, Torquati Massimo, Meneghin Massimiliano•Institutions (1)

University of Turin¹

02 Sep 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This paper introduces FastFlow, a low-level programming framework based on lock-free queues explicitly designed to support high-level languages for streaming applications, and compares it with state-of-the-art programming frameworks such as Cilk, OpenMP, and Intel TBB.

...read moreread less

Abstract: Shared memory multiprocessors come back to popularity thanks to rapid spreading of commodity multi-core architectures. As ever, shared memory programs are fairly easy to write and quite hard to optimise; providing multi-core programmers with optimising tools and programming frameworks is a nowadays challenge. Few efforts have been done to support effective streaming applications on these architectures. In this paper we introduce FastFlow, a low-level programming framework based on lock-free queues explicitly designed to support high-level languages for streaming applications. We compare FastFlow with state-of-the-art programming frameworks such as Cilk, OpenMP, and Intel TBB. We experimentally demonstrate that FastFlow is always more efficient than all of them in a set of micro-benchmarks and on a real world application; the speedup edge of FastFlow over other solutions might be bold for fine grain tasks, as an example +35% on OpenMP, +226% on Cilk, +96% on TBB for the alignment of protein P01111 against UniProt DB using Smith-Waterman algorithm.

...read moreread less

Posted Content•

BlobSeer: How to Enable Efficient Versioning for Large Object Storage under Heavy Access Concurrency

[...]

Bogdan Nicolae¹, Gabriel Antoniu¹, Luc Bougé¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

07 May 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, the authors propose an efficient versioning scheme allowing a large number of clients to concurrently read, write and append data to huge blobs that are fragmented and distributed at a very large scale.

...read moreread less

Abstract: To accommodate the needs of large-scale distributed P2P systems, scalable data management strategies are required, allowing applications to efficiently cope with continuously growing, highly dis tributed data. This paper addresses the problem of efficiently stor ing and accessing very large binary data objects (blobs). It proposesan efficient versioning scheme allowing a large number of clients to concurrently read, write and append data to huge blobs that are fragmented and distributed at a very large scale. Scalability under heavy concurrency is achieved thanks to an original metadata scheme, based on a distributed segment tree built on top of a Distributed Hash Table (DHT). Our approach has been implemented and experimented within our BlobSeer prototype on the Grid'5000 testbed, using up to 175 nodes.

...read moreread less

Posted Content•

The Open Cloud Testbed: A Wide Area Testbed for Cloud Computing Utilizing High Performance Network Services

[...]

Robert L. Grossman, Yunhong Gu, Michal Sabala, Collin Bennett, Jonathan Seidman, Joe Mambretti - Show less +2 more

28 Jul 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The Open Cloud Testbed is designed and implemented, and several utilities to support the development of cloud computing systems and services are developed, including novel node and network provisioning services, a monitoring system, and a RPC system.

...read moreread less

Abstract: Recently, a number of cloud platforms and services have been developed for data intensive computing, including Hadoop, Sector, CloudStore (formerly KFS), HBase, and Thrift. In order to benchmark the performance of these systems, to investigate their interoperability, and to experiment with new services based on flexible compute node and network provisioning capabilities, we have designed and implemented a large scale testbed called the Open Cloud Testbed (OCT). Currently the OCT has 120 nodes in four data centers: Baltimore, Chicago (two locations), and San Diego. In contrast to other cloud testbeds, which are in small geographic areas and which are based on commodity Internet services, the OCT is a wide area testbed and the four data centers are connected with a high performance 10Gb/s network, based on a foundation of dedicated lightpaths. This testbed can address the requirements of extremely large data streams that challenge other types of distributed infrastructure. We have also developed several utilities to support the development of cloud computing systems and services, including novel node and network provisioning services, a monitoring system, and a RPC system. In this paper, we describe the OCT architecture and monitoring system. We also describe some benchmarks that we developed and some interoperability studies we performed using these benchmarks.

...read moreread less

Proceedings Article•DOI•

High-Performance Cloud Computing: A View of Scientific Applications

[...]

Christian Vecchiola¹, Suraj Pandey¹, Rajkumar Buyya¹•Institutions (1)

University of Melbourne¹

11 Oct 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: Aneka, an enterprise cloud computing solution, harnesses the power of compute resources by relying on private and public clouds and delivers to users the desired Quality of Service (QoS) as mentioned in this paper.

...read moreread less

Abstract: Scientific computing often requires the availability of a massive number of computers for performing large scale experiments. Traditionally, these needs have been addressed by using high-performance computing solutions and installed facilities such as clusters and super computers, which are difficult to setup, maintain, and operate. Cloud computing provides scientists with a completely new model of utilizing the computing infrastructure. Compute resources, storage resources, as well as applications, can be dynamically provisioned (and integrated within the existing infrastructure) on a pay per use basis. These resources can be released when they are no more needed. Such services are often offered within the context of a Service Level Agreement (SLA), which ensure the desired Quality of Service (QoS). Aneka, an enterprise Cloud computing solution, harnesses the power of compute resources by relying on private and public Clouds and delivers to users the desired QoS. Its flexible and service based infrastructure supports multiple programming paradigms that make Aneka address a variety of different scenarios: from finance applications to computational science. As examples of scientific computing in the Cloud, we present a preliminary case study on using Aneka for the classification of gene expression data and the execution of fMRI brain imaging workflow.

...read moreread less

Journal Article•DOI•

PT-Scotch: A tool for efficient parallel graph ordering

[...]

Cédric Chevalier¹, François Pellegrini¹•Institutions (1)

L'Abri¹

08 Jul 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The PT-Scotch algorithm as discussed by the authors uses the classical nested dissection approach but relies on several novel features to solve the parallel graph bipartitioning problem, and produces consistently better orderings on large numbers of processors.

...read moreread less

Abstract: The parallel ordering of large graphs is a difficult problem, because on the one hand minimum degree algorithms do not parallelize well, and on the other hand the obtainment of high quality orderings with the nested dissection algorithm requires efficient graph bipartitioning heuristics, the best sequential implementations of which are also hard to parallelize. This paper presents a set of algorithms, implemented in the PT-Scotch software package, which allows one to order large graphs in parallel, yielding orderings the quality of which is only slightly worse than the one of state-of-the-art sequential algorithms. Our implementation uses the classical nested dissection approach but relies on several novel features to solve the parallel graph bipartitioning problem. Thanks to these improvements, PT-Scotch produces consistently better orderings than ParMeTiS on large numbers of processors.

...read moreread less

Posted Content•

Using Graphics Processors for Parallelizing Hash-based Data Carving

[...]

Sylvain Collange¹, Yoginder S. Dandass², Marc Daumas¹, David Defour¹•Institutions (2)

University of Perpignan¹, Mississippi State University²

09 Jan 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: Results are presented from research into the use of Graphics Processing Units (GPUs) in detecting specific image file byte patterns in disk clusters and the GPU-based implementation outperforms the software implementation by a significant margin.

...read moreread less

Abstract: The ability to detect fragments of deleted image files and to reconstruct these image files from all available fragments on disk is a key activity in the field of digital forensics. Although reconstruction of image files from the file fragments on disk can be accomplished by simply comparing the content of sectors on disk with the content of known files, this brute-force approach can be time consuming. This paper presents results from research into the use of Graphics Processing Units (GPUs) in detecting specific image file byte patterns in disk clusters. Unique identifying pattern for each disk sector is compared against patterns in known images. A pattern match indicates the potential presence of an image and flags the disk sector for further in-depth examination to confirm the match. The GPU-based implementation outperforms the software implementation by a significant margin.

...read moreread less

Posted Content•

Byzantine Convergence in Robots Networks: The Price of Asynchrony

[...]

Zohir Bouzid, Maria Potop-Butucaru¹, Sébastien Tixeuil•Institutions (1)

French Institute for Research in Computer Science and Automation¹

04 Aug 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, a deterministic algorithm for fully asynchronous, uni-dimensional robot networks that are prone to Byzantine failures has been proposed, where oblivious robots with arbitrary initial positions are required to eventually converge to an a priori unknown position despite a subset of them exhibiting Byzantine behavior.

...read moreread less

Abstract: We study the convergence problem in fully asynchronous, uni-dimensional robot networks that are prone to Byzantine (i.e. malicious) failures. In these settings, oblivious anonymous robots with arbitrary initial positions are required to eventually converge to an a apriori unknown position despite a subset of them exhibiting Byzantine behavior. Our contribution is twofold. We propose a deterministic algorithm that solves the problem in the most generic settings: fully asynchronous robots that operate in the non-atomic CORDA model. Our algorithm provides convergence in 5f+1-sized networks where f is the upper bound on the number of Byzantine robots. Additionally, we prove that 5f+1 is a lower bound whenever robot scheduling is fully asynchronous. This constrasts with previous results in partially synchronous robots networks, where 3f+1 robots are necessary and sufficient.

...read moreread less

Posted Content•

Architecture and Performance Models for QoS-Driven Effective Peering of Content Delivery Networks

[...]

Mukaddim Pathan¹, Rajkumar Buyya¹•Institutions (1)

University of Melbourne¹

28 Jul 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, the authors present an architecture to support peering arrangements between CDNs, based on a Virtual Organization (VO) model, which is achieved through proper policy management of negotiated Service Level Agreements (SLAs) between peers.

...read moreread less

Abstract: The proprietary nature of existing Content Delivery Networks (CDNs) means they are closed and do not naturally cooperate. A CDN is expected to provide high performance Internet content delivery through global coverage, which might be an obstacle for new CDN providers, as well as affecting commercial viability of existing ones. Finding ways for distinct CDNs to coordinate and cooperate with other CDNs is necessary to achieve better overall service, as perceived by end-users, at lower cost. In this paper, we present an architecture to support peering arrangements between CDNs, based on a Virtual Organization (VO) model. Our approach promotes peering among providers, while upholding user perceived performance. This is achieved through proper policy management of negotiated Service Level Agreements (SLAs) between peers. We also present a Quality of Service (QoS)-driven performance modeling approach for peering CDNs in order to predict the user perceived performance. We show that peering between CDNs upholds user perceived performance by satisfying the target QoS. The methodology presented in this paper provides CDNs a way to dynamically distribute user requests to other peers according to different request-redirection policies. The model-based approach helps an overloaded CDN to return to a normal state by offloading excess requests to the peers. It also assists in making concrete QoS guarantee for a CDN provider. Our approach endeavors to achieve scalability and resource sharing among CDNs through effective peering in a user transparent manner, thus evolving past the current landscape where non-cooperative and distinct CDNs exist.

...read moreread less

Posted Content•

Accelerator-Oriented Algorithm Transformation for Temporal Data Mining

[...]

Debprakash Patnaik¹, Sean P. Ponce¹, Yong Cao¹, Naren Ramakrishnan¹•Institutions (1)

Virginia Tech¹

13 May 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, the authors present a novel implementation of a frequent episode discovery algorithm by revisiting "in-the-large" issues such as problem decomposition and memory access patterns.

...read moreread less

Abstract: Temporal data mining algorithms are becoming increasingly important in many application domains including computational neuroscience, especially the analysis of spike train data. While application scientists have been able to readily gather multi-neuronal datasets, analysis capabilities have lagged behind, due to both lack of powerful algorithms and inaccessibility to powerful hardware platforms. The advent of GPU architectures such as Nvidia's GTX 280 offers a cost-effective option to bring these capabilities to the neuroscientist's desktop. Rather than port existing algorithms onto this architecture, we advocate the need for algorithm transformation, i.e., rethinking the design of the algorithm in a way that need not necessarily mirror its serial implementation strictly. We present a novel implementation of a frequent episode discovery algorithm by revisiting "in-the-large" issues such as problem decomposition as well as "in-the-small" issues such as data layouts and memory access patterns. This is non-trivial because frequent episode discovery does not lend itself to GPU-friendly data-parallel mapping strategies. Applications to many datasets and comparisons to CPU as well as prior GPU implementations showcase the advantages of our approach.

...read moreread less

Posted Content•

New Results in the Simultaneous Message Passing Model

[...]

Rahul Jain¹, Hartmut Klauck•Institutions (1)

National University of Singapore¹

18 Feb 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The gap between the $\smp$ model and the one-way model in communication complexity is investigated and a partial function is investigated that is exponentially more expensive in the former if quantum communication with entanglement is allowed, compared to the latter even in the deterministic case.

...read moreread less

Abstract: Consider the following Simultaneous Message Passing (SMP) model for computing a relation f subset of X x Y x Z. In this model Alice, on input x in X and Bob, on input y in Y, send one message each to a third party Referee who then outputs a z in Z such that (x,y,z) in f. We first show optimal 'Direct sum' results for all relations f in this model, both in the quantum and classical settings, in the situation where we allow shared resources (shared entanglement in quantum protocols and public coins in classical protocols) between Alice and Referee and Bob and Referee and no shared resource between Alice and Bob. This implies that, in this model, the communication required to compute k simultaneous instances of f, with constant success overall, is at least k-times the communication required to compute one instance with constant success. This in particular implies an earlier Direct sum result, shown by Chakrabarti, Shi, Wirth and Yao, 2001, for the Equality function (and a class of other so-called robust functions), in the classical smp model with no shared resources between any parties. Furthermore we investigate the gap between the smp model and the one-way model in communication complexity and exhibit a partial function that is exponentially more expensive in the former if quantum communication with entanglement is allowed, compared to the latter even in the deterministic case.

...read moreread less

Posted Content•

Self-stabilizing Byzantine Agreement

[...]

Ariel Daliot¹, Danny Dolev¹•Institutions (1)

Hebrew University of Jerusalem¹

02 Aug 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, a self-stabilizing Byzantine agreement algorithm is proposed to reach agreement among the correct nodes in an optimal ration of faulty to correct, by using only the assumption of eventually bounded message transmission delay.

...read moreread less

Abstract: Byzantine agreement algorithms typically assume implicit initial state consistency and synchronization among the correct nodes and then operate in coordinated rounds of information exchange to reach agreement based on the input values. The implicit initial assumptions enable correct nodes to infer about the progression of the algorithm at other nodes from their local state. This paper considers a more severe fault model than permanent Byzantine failures, one in which the system can in addition be subject to severe transient failures that can temporarily throw the system out of its assumption boundaries. When the system eventually returns to behave according to the presumed assumptions it may be in an arbitrary state in which any synchronization among the nodes might be lost, and each node may be at an arbitrary state. We present a self-stabilizing Byzantine agreement algorithm that reaches agreement among the correct nodes in an optimal ration of faulty to correct, by using only the assumption of eventually bounded message transmission delay. In the process of solving the problem, two additional important and challenging building blocks were developed: a unique self-stabilizing protocol for assigning consistent relative times to protocol initialization and a Reliable Broadcast primitive that progresses at the speed of actual message delivery time.

...read moreread less

Posted Content•

A Boundary Approximation Algorithm for Distributed Sensor Networks

[...]

Michael I. Ham¹, Marko A. Rodriguez¹•Institutions (1)

Los Alamos National Laboratory¹

22 Jan 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors presented an algorithm for boundary approximation in locally-linked sensor networks that communicate with a remote monitoring station, which reduces remote station communication by approximating boundaries via a decentralized computation executed within the sensor network.

...read moreread less

Abstract: We present an algorithm for boundary approximation in locally-linked sensor networks that communicate with a remote monitoring station. Delaunay triangulations and Voronoi diagrams are used to generate a sensor communication network and define boundary segments between sensors, respectively. The proposed algorithm reduces remote station communication by approximating boundaries via a decentralized computation executed within the sensor network. Moreover, the algorithm identifies boundaries based on differences between neighboring sensor readings, and not absolute sensor values. An analysis of the bandwidth consumption of the algorithm is presented and compared to two naive approaches. The proposed algorithm reduces the amount of remote communication (compared to the naive approaches) and becomes increasingly useful in networks with more nodes.

...read moreread less

Posted Content•

Domain Decomposition Based High Performance Parallel Computing

[...]

Mandhapati Raju, Siddhartha Kumar Khaitan

04 Nov 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The study deals with the parallelization of finite element based Navier-Stokes codes using domain decomposition and state-ofart sparse direct solvers with significant improvement in the performance of sparse directsolvers.

...read moreread less

Abstract: The study deals with the parallelization of finite element based Navier-Stokes codes using domain decomposition and state-ofart sparse direct solvers. There has been significant improvement in the performance of sparse direct solvers. Parallel sparse direct solvers are not found to exhibit good scalability. Hence, the parallelization of sparse direct solvers is done using domain decomposition techniques. A highly efficient sparse direct solver PARDISO is used in this study. The scalability of both Newton and modified Newton algorithms are tested.

...read moreread less

Posted Content•

Leader Election Problem Versus Pattern Formation Problem

[...]

Yoann Dieudonné¹, Franck Petit², Vincent Villain¹•Institutions (2)

University of Picardie Jules Verne¹, Pierre-and-Marie-Curie University²

17 Feb 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: It is deduced that both problems are equivalent for n ≥ 4 in CORDA provided the robots share the same chirality, which turns out to be necessary in order to achieve more complex tasks.

...read moreread less

Abstract: Leader election and arbitrary pattern formation are funda- mental tasks for a set of autonomous mobile robots. The former consists in distinguishing a unique robot, called the leader. The latter aims in arranging the robots in the plane to form any given pattern. The solv- ability of both these tasks turns out to be necessary in order to achieve more complex tasks. In this paper, we study the relationship between these two tasks in a model, called CORDA, wherein the robots are weak in several aspects. In particular, they are fully asynchronous and they have no direct means of communication. They cannot remember any previous observation nor computation performed in any previous step. Such robots are said to be oblivious. The robots are also uniform and anonymous, i.e, they all have the same program using no global parameter (such as an identity) allowing to differentiate any of them. Moreover, we assume that none of them share any kind of common coordinate mechanism or common sense of direction and we discuss the influence of a common handedness (i.e., chirality). In such a system, Flochini et al. proved in [11] that it is possible to elect a leader for n \geq 3 robots if it is possible to form any pattern for n \geq 3. In this paper, we show that the converse is true for n \geq 4 when the robots share a common handedness and for n \geq 5 when they do not. Thus, we deduce that with chirality (resp. without chirality) both problems are equivalent for n \geq 4 (resp. n \geq 5) in CORDA.

...read moreread less

Posted Content•

Managing Distributed MARF with SNMP

[...]

Serguei A. Mokhov, Lee Wei Huynh, Jian Li¹•Institutions (1)

Concordia University¹

30 May 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The scope of this project's work focuses on the research and prototyping of the extension of the Distributed MARF such that its services can be managed through the most popular management protocol familiarly, SNMP.

...read moreread less

Abstract: The scope of this project's work focuses on the research and prototyping of the extension of the Distributed MARF such that its services can be managed through the most popular management protocol familiarly, SNMP. The rationale behind SNMP vs. MARF's proprietary management protocols, is that can be integrated with the use of common network service and device management, so the administrators can manage MARF nodes via a already familiar protocol, as well as monitor their performance, gather statistics, set desired configuration, etc. perhaps using the same management tools they've been using for other network devices and application servers.

...read moreread less

Posted Content•

Introduction to Distributed Systems

[...]

Sabu M. Thampi

23 Nov 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: An overview of distributed computing systems is provided, ranging from simplistic data sharing to advanced systems supporting a multitude of services, and discusses client/server computing, World Wide Web and types of distributed systems.

...read moreread less

Abstract: Computing has passed through many transformations since the birth of the first computing machines. Developments in technology have resulted in the availability of fast and inexpensive processors, and progresses in communication technology have resulted in the availability of lucrative and highly proficient computer networks. Among these, the centralized networks have one component that is shared by users all the time. All resources are accessible, but there is a single point of control as well as a single point of failure. The integration of computer and networking technologies gave birth to new paradigm of computing called distributed computing in the late 1970s. Distributed computing has changed the face of computing and offered quick and precise solutions for a variety of complex problems for different fields. Nowadays, we are fully engrossed by the information age, and expending more time communicating and gathering information through the Internet. The Internet keeps on progressing along more than a few magnitudes, abiding end systems increasingly to communicate in more and more different ways. Over the years, several methods have evolved to enable these developments, ranging from simplistic data sharing to advanced systems supporting a multitude of services. This article provides an overview of distributed computing systems. The definition, architecture, characteristics of distributed systems and the various distributed computing fallacies are discussed in the beginning. Finally, discusses client/server computing, World Wide Web and types of distributed systems.

...read moreread less

Posted Content•

Review of Replication Schemes for Unstructured P2P Networks

[...]

Sabu M. Thampi, K. Chandra Sekaran¹•Institutions (1)

National Institute of Technology, Karnataka¹

10 Mar 2009-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The Q-replication technique replicates objects autonomously to suitable sites based on object popularity and site selection logic by extensively employing Q-learning concept to increase availability of objects in unstructured P2P networks.

...read moreread less

Abstract: To improve unstructured P2P system performance, one wants to minimize the number of peers that have to be probed for the shortening of the search time. A solution to the problem is to employ a replication scheme, which provides high hit rate for target files. Replication can also provide load balancing and reduce access latency if the file is accessed by a large population of users. This paper briefly describes various replication schemes that have appeared in the literature and also focuses on a novel replication technique called Q-replication to increase availability of objects in unstructured P2P networks. The Q-replication technique replicates objects autonomously to suitable sites based on object popularity and site selection logic by extensively employing Q-learning concept. I. Introduction P2P traffic keeps on increasing and its share of entire network traffic is escalating quickly. The major operations associated with decentralized unstructured P2P network can be summarized into two phases: (i) query phase and (ii) download phase. In query phase, several query packets pass through the network searching for the target objects. The heterogeneity of these query packets creates a local traffic disparity and congestion. The downloading of large objects in the download phase in response to requests also causes congestion in nodes. One proficient method for forestalling this load concentration is replication of the target objects into various sites. Replication increases object availability and fault tolerance. Single node failures, like crashes of nodes, can be tolerated as faults within the system as a whole facilitated with the help of the redundancy introduced by replicas. If a host of a replica fails, requestors may access another host with a replica. Data replicated at more than one site facilitate to minimize the number of hops before the data are found. Replicating objects to multiple sites has several issues such as selection of objects for replication, the granularity of replicas, and choosing appropriate site for hosting new replica [1]. The existing replication techniques address these issues differently. Excessive replication can cause wastage of network and peer resources and at the same time, scarcity of resources decreases the search success rate and increases the search delay. Two important aspects of replication—selection of file for replication and selection of site for hosting new replica—have a direct impact on the performance of the system. Suitable criteria should be followed for selecting a file for replication. If popular files are not replicated appropriately, overwhelming requests from peers can cause network congestions and slow download speed. Based on the location selection logic for hosting new replica, replicated copies should be placed in proximity to peers who are likely to request the resource. This allows peers to be able to search and find desired resources, and reduces delays taking place during search and downloading. The replication strategy should use different characteristics of peers such as available storage and their surrounding usage environment attributes such as network bandwidth to determine which peers should be selected to perform replications and where the resulting replicas should be stored. Majority of the existing replication methods only replicate objects to intermediate nodes between query node and target node. These replication schemes depend completely on the search path. Due to this, objects are unnecessarily replicated to low performing nodes on the search path. It is essential that the objects should not be replicated to low performing nodes since these nodes are not queried frequently by other nodes; excluding such nodes from replicating files can save bandwidth. In a network, many peers might have decided to replicate the same file at the same time. This should be managed; otherwise, the same file could be copied into nodes repeatedly. A replication scheme should be well designed to manage the frequent failure of nodes in the network to provide good success rate by maintaining replicas in other suitable peers. The various issues in replication demands a more assertive replication approach for unstructured P2P networks. This paper briefly describes various replication schemes that have appeared in the literature and also focuses on a novel replication technique called

...read moreread less