scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Distributed, Parallel, and Cluster Computing in 2010"


Posted Content
TL;DR: The results demonstrate that Cloud computing model has immense potential as it offers significant performance gains as regards to response time and cost saving under dynamic workload scenarios.
Abstract: Cloud computing is offering utility-oriented IT services to users worldwide. Based on a pay-as-you-go model, it enables hosting of pervasive applications from consumer, scientific, and business domains. However, data centers hosting Cloud applications consume huge amounts of energy, contributing to high operational costs and carbon footprints to the environment. Therefore, we need Green Cloud computing solutions that can not only save energy for the environment but also reduce operational costs. This paper presents vision, challenges, and architectural elements for energy-efficient management of Cloud computing environments. We focus on the development of dynamic resource provisioning and allocation algorithms that consider the synergy between various data center infrastructures (i.e., the hardware, power units, cooling and software), and holistically work to boost data center energy efficiency and performance. In particular, this paper proposes (a) architectural principles for energy-efficient management of Clouds; (b) energy-efficient resource allocation policies and scheduling algorithms considering quality-of-service expectations, and devices power usage characteristics; and (c) a novel software technology for energy-efficient management of Clouds. We have validated our approach by conducting a set of rigorous performance evaluation study using the CloudSim toolkit. The results demonstrate that Cloud computing model has immense potential as it offers significant performance gains as regards to response time and cost saving under dynamic workload scenarios.

603 citations


Posted Content
TL;DR: The Cost Modeling tool is evaluated using a case study of an organization that is considering the migration of some of its IT systems to the cloud, and it is shown how practitioners can use it to examine the costs of deploying their IT systems on the cloud.
Abstract: Cloud computing promises a radical shift in the provisioning of computing resource within the enterprise. This paper describes the challenges that decision makers face when assessing the feasibility of the adoption of cloud computing in their organisations, and describes our Cloud Adoption Toolkit, which has been developed to support this process. The toolkit provides a framework to support decision makers in identifying their concerns, and matching these concerns to appropriate tools/techniques that can be used to address them. Cost Modeling is the most mature tool in the toolkit, and this paper shows its effectiveness by demonstrating how practitioners can use it to examine the costs of deploying their IT systems on the cloud. The Cost Modeling tool is evaluated using a case study of an organization that is considering the migration of some of its IT systems to the cloud. The case study shows that running systems on the cloud using a traditional "always on" approach can be less cost effective, and the elastic nature of the cloud has to be used to reduce costs. Therefore, decision makers have to be able to model the variations in resource usage and their systems deployment options to obtain accurate cost estimates.

287 citations


Posted Content
TL;DR: The main contribution of this paper is to review and integrate the collection of these concepts, formalisms, and related results found in the literature into a unified coherent framework, called TVG (for timevarying graphs).
Abstract: The past few years have seen intensive research efforts carried out in some apparently unrelated areas of dynamic systems -- delay-tolerant networks, opportunistic-mobility networks, social networks -- obtaining closely related insights. Indeed, the concepts discovered in these investigations can be viewed as parts of the same conceptual universe; and the formal models proposed so far to express some specific concepts are components of a larger formal description of this universe. The main contribution of this paper is to integrate the vast collection of concepts, formalisms, and results found in the literature into a unified framework, which we call TVG (for time-varying graphs). Using this framework, it is possible to express directly in the same formalism not only the concepts common to all those different areas, but also those specific to each. Based on this definitional work, employing both existing results and original observations, we present a hierarchical classification of TVGs; each class corresponds to a significant property examined in the distributed computing literature. We then examine how TVGs can be used to study the evolution of network properties, and propose different techniques, depending on whether the indicators for these properties are a-temporal (as in the majority of existing studies) or temporal. Finally, we briefly discuss the introduction of randomness in TVGs.

253 citations


Journal ArticleDOI
TL;DR: In this article, an intelligent scheduling technique for virtual machines which monitors the workload types and deadlines, and calculate the system over head in real time to maximize number of jobs finishing within their agreed deadlines.
Abstract: The primary motivation for uptake of virtualization has been resource isolation, capacity management and resource customization allowing resource providers to consolidate their resources in virtual machines. Various approaches have been taken to integrate virtualization in to scientific Grids especially in the arena of High Performance Computing (HPC) to run grid jobs in virtual machines, thus enabling better provisioning of the underlying resources and customization of the execution environment on runtime. Despite the gains, virtualization layer also incur a performance penalty and its not very well understood that how such an overhead will impact the performance of systems where jobs are scheduled with tight deadlines. Since this overhead varies the types of workload whether they are memory intensive, CPU intensive or network I/O bound, and could lead to unpredictable deadline estimation for the running jobs in the system. In our study, we have attempted to tackle this problem by developing an intelligent scheduling technique for virtual machines which monitors the workload types and deadlines, and calculate the system over head in real time to maximize number of jobs finishing within their agreed deadlines.

233 citations


Posted Content
TL;DR: This paper discusses some of the research challenges for cloud computing from an enterprise or organizational perspective, and puts them in context by reviewing the existing body of literature in cloud computing.
Abstract: Cloud computing represents a shift away from computing as a product that is purchased, to computing as a service that is delivered to consumers over the internet from large-scale data centers - or "clouds". This paper discusses some of the research challenges for cloud computing from an enterprise or organizational perspective, and puts them in context by reviewing the existing body of literature in cloud computing. Various research challenges relating to the following topics are discussed: the organizational changes brought about by cloud computing; the economic and organizational implications of its utility billing model; the security, legal and privacy issues that cloud computing raises. It is important to highlight these research challenges because cloud computing is not simply about a technological improvement of data centers but a fundamental change in how IT is provisioned and used. This type of research has the potential to influence wider adoption of cloud computing in enterprise, and in the consumer market too.

231 citations


Posted Content
TL;DR: This paper is the first systematic review of peer-reviewed academic research published in cloud computing, and aims to provide an overview of the swiftly developing advances in the technical foundations of cloud computing and their research efforts.
Abstract: Cloud computing is the latest effort in delivering computing resources as a service. It represents a shift away from computing as a product that is purchased, to computing as a service that is delivered to consumers over the internet from large-scale data centres - or "clouds". Whilst cloud computing is gaining growing popularity in the IT industry, academia appeared to be lagging behind the rapid developments in this field. This paper is the first systematic review of peer-reviewed academic research published in this field, and aims to provide an overview of the swiftly developing advances in the technical foundations of cloud computing and their research efforts. Structured along the technical aspects on the cloud agenda, we discuss lessons from related technologies; advances in the introduction of protocols, interfaces, and standards; techniques for modelling and building clouds; and new use-cases arising through cloud computing.

164 citations


Posted Content
TL;DR: In this paper, the authors proposed a federated cloud computing environment (InterCloud) that facilitates just-in-time, opportunistic, and scalable provisioning of application services, consistently achieving QoS targets under variable workload, resource and network conditions.
Abstract: Cloud computing providers have setup several data centers at different geographical locations over the Internet in order to optimally serve needs of their customers around the world. However, existing systems do not support mechanisms and policies for dynamically coordinating load distribution among different Cloud-based data centers in order to determine optimal location for hosting application services to achieve reasonable QoS levels. Further, the Cloud computing providers are unable to predict geographic distribution of users consuming their services, hence the load coordination must happen automatically, and distribution of services must change in response to changes in the load. To counter this problem, we advocate creation of federated Cloud computing environment (InterCloud) that facilitates just-in-time, opportunistic, and scalable provisioning of application services, consistently achieving QoS targets under variable workload, resource and network conditions. The overall goal is to create a computing environment that supports dynamic expansion or contraction of capabilities (VMs, services, storage, and database) for handling sudden variations in service demands. This paper presents vision, challenges, and architectural elements of InterCloud for utility-oriented federation of Cloud computing environments. The proposed InterCloud environment supports scaling of applications across multiple vendor clouds. We have validated our approach by conducting a set of rigorous performance evaluation study using the CloudSim toolkit. The results demonstrate that federated Cloud computing model has immense potential as it offers significant performance gains as regards to response time and cost saving under dynamic workload scenarios.

155 citations


Posted Content
TL;DR: In this article, the authors give a poly-logarithmic lower bound on the complexity of local computation for a large class of optimization problems including minimum vertex cover, minimum dominating set, maximum matching, maximal independent set, and maximal matching.
Abstract: The question of what can be computed, and how efficiently, are at the core of computer science. Not surprisingly, in distributed systems and networking research, an equally fundamental question is what can be computed in a \emph{distributed} fashion. More precisely, if nodes of a network must base their decision on information in their local neighborhood only, how well can they compute or approximate a global (optimization) problem? In this paper we give the first poly-logarithmic lower bound on such local computation for (optimization) problems including minimum vertex cover, minimum (connected) dominating set, maximum matching, maximal independent set, and maximal matching. In addition we present a new distributed algorithm for solving general covering and packing linear programs. For some problems this algorithm is tight with the lower bounds, for others it is a distributed approximation scheme. Together, our lower and upper bounds establish the local computability and approximability of a large class of problems, characterizing how much local information is required to solve these tasks.

134 citations


Journal ArticleDOI
TL;DR: Gossip algorithms are attractive for in-network processing in sensor networks because they do not require any specialized routing, there is no bottleneck or single point of failure, and they are robust to unreliable wireless network conditions.
Abstract: Gossip algorithms are attractive for in-network processing in sensor networks because they do not require any specialized routing, there is no bottleneck or single point of failure, and they are robust to unreliable wireless network conditions. Recently, there has been a surge of activity in the computer science, control, signal processing, and information theory communities, developing faster and more robust gossip algorithms and deriving theoretical performance guarantees. This article presents an overview of recent work in the area. We describe convergence rate results, which are related to the number of transmitted messages and thus the amount of energy consumed in the network for gossiping. We discuss issues related to gossiping over wireless links, including the effects of quantization and noise, and we illustrate the use of gossip algorithms for canonical signal processing tasks including distributed estimation, source localization, and compression.

127 citations


Posted Content
TL;DR: The impact of the migration topology on the performance of a PGOA which uses the Island Model is analyzed and first conclusions that emerge from the conducted experiments are drawn.
Abstract: Parallel Global Optimization Algorithms (PGOA) provide an efficient way of dealing with hard optimization problems. One method of parallelization of GOAs that is frequently applied and commonly found in the contemporary literature is the so-called Island Model (IM). In this paper we analyze the impact of the migration topology on the performance of a PGOA which uses the Island Model. In particular we consider parallel Differential Evolution and Simulated Annealing with Adaptive Neighborhood and draw first conclusions that emerge from the conducted experiments.

90 citations


Posted Content
TL;DR: PaGMO is built to tackle high-dimensional global optimisation problems, and it has been successfully used to find solutions to real-life engineering problems among which the preliminary design of interplanetary spacecraft trajectories, the inverse design of nano-structured radiators and the design of non-reactive controllers for planetary rovers are found.
Abstract: A software platform for global optimisation, called PaGMO, has been developed within the Advanced Concepts Team (ACT) at the European Space Agency, and was recently released as an open-source project. PaGMO is built to tackle high-dimensional global optimisation problems, and it has been successfully used to find solutions to real-life engineering problems among which the preliminary design of interplanetary spacecraft trajectories - both chemical (including multiple flybys and deep-space maneuvers) and low-thrust (limited, at the moment, to single phase trajectories), the inverse design of nano-structured radiators and the design of non-reactive controllers for planetary rovers. Featuring an arsenal of global and local optimisation algorithms (including genetic algorithms, differential evolution, simulated annealing, particle swarm optimisation, compass search, improved harmony search, and various interfaces to libraries for local optimisation such as SNOPT, IPOPT, GSL and NLopt), PaGMO is at its core a C++ library which employs an object-oriented architecture providing a clean and easily-extensible optimisation framework. Adoption of multi-threaded programming ensures the efficient exploitation of modern multi-core architectures and allows for a straightforward implementation of the island model paradigm, in which multiple populations of candidate solutions asynchronously exchange information in order to speed-up and improve the optimisation process. In addition to the C++ interface, PaGMO's capabilities are exposed to the high-level language Python, so that it is possible to easily use PaGMO in an interactive session and take advantage of the numerous scientific Python libraries available.

Posted Content
TL;DR: In this paper, the authors present an algorithm for Byzantine agreement with high probability against an adaptive adversary, which can take over processors at any time during the protocol, up to the point of taking over arbitrarily close to a 1/3 fraction.
Abstract: We describe an algorithm for Byzantine agreement that is scalable in the sense that each processor sends only $\tilde{O}(\sqrt{n})$ bits, where $n$ is the total number of processors. Our algorithm succeeds with high probability against an \emph{adaptive adversary}, which can take over processors at any time during the protocol, up to the point of taking over arbitrarily close to a 1/3 fraction. We assume synchronous communication but a \emph{rushing} adversary. Moreover, our algorithm works in the presence of flooding: processors controlled by the adversary can send out any number of messages. We assume the existence of private channels between all pairs of processors but make no other cryptographic assumptions. Finally, our algorithm has latency that is polylogarithmic in $n$. To the best of our knowledge, ours is the first algorithm to solve Byzantine agreement against an adaptive adversary, while requiring $o(n^{2})$ total bits of communication.

Posted Content
TL;DR: This work shows the influence of thread pinning on performance using the well-known OpenMP STREAM triad benchmark, and uses the affinity and hardware counter tools to study the performance of a stencil code specifically optimized to utilize shared caches on multicore chips.
Abstract: Exploiting the performance of today's processors requires intimate knowledge of the microarchitecture as well as an awareness of the ever-growing complexity in thread and cache topology. LIKWID is a set of command-line utilities that addresses four key problems: Probing the thread and cache topology of a shared-memory node, enforcing thread-core affinity on a program, measuring performance counter metrics, and toggling hardware prefetchers. An API for using the performance counting features from user code is also included. We clearly state the differences to the widely used PAPI interface. To demonstrate the capabilities of the tool set we show the influence of thread pinning on performance using the well-known OpenMP STREAM triad benchmark, and use the affinity and hardware counter tools to study the performance of a stencil code specifically optimized to utilize shared caches on multicore chips.

Posted Content
TL;DR: It is considered how underused computing resources within an enterprise may be harnessed to improve utilization and create an elastic computing infrastructure.
Abstract: We consider how underused computing resources within an enterprise may be harnessed to improve utilization and create an elastic computing infrastructure. Most current cloud provision involves a data center model, in which clusters of machines are dedicated to running cloud infrastructure software. We propose an additional model, the ad hoc cloud, in which infrastructure software is distributed over resources harvested from machines already in existence within an enterprise. In contrast to the data center cloud model, resource levels are not established a priori, nor are resources dedicated exclusively to the cloud while in use. A participating machine is not dedicated to the cloud, but has some other primary purpose such as running interactive processes for a particular user. We outline the major implementation challenges and one approach to tackling them.

Posted Content
TL;DR: CloneCloud is a flexible application partitioner and execution runtime that enables unmodified mobile applications running in an application-level virtual machine to seamlessly off-load part of their execution from mobile devices onto device clones operating in a computational cloud.
Abstract: Mobile applications are becoming increasingly ubiquitous and provide ever richer functionality on mobile devices. At the same time, such devices often enjoy strong connectivity with more powerful machines ranging from laptops and desktops to commercial clouds. This paper presents the design and implementation of CloneCloud, a system that automatically transforms mobile applications to benefit from the cloud. The system is a flexible application partitioner and execution runtime that enables unmodified mobile applications running in an application-level virtual machine to seamlessly off-load part of their execution from mobile devices onto device clones operating in a computational cloud. CloneCloud uses a combination of static analysis and dynamic profiling to optimally and automatically partition an application so that it migrates, executes in the cloud, and re-integrates computation in a fine-grained manner that makes efficient use of resources. Our evaluation shows that CloneCloud can achieve up to 21.2x speedup of smartphone applications we tested and it allows different partitioning for different inputs and networks.

Posted Content
TL;DR: This work has developed a service-based distributed structure for the parallel execution of match workflows and proposes different strategies to partition the input data and generate multiple match tasks that can be independently executed.
Abstract: matching is an important and difficult step for integrating web data. To reduce the typically high execution time for match- ing we investigate how we can perform entity matching in parallel on a distributed infrastructure. We propose different strategies to partition the input data and generate multiple match tasks that can be independently executed. One of our strategies supports both, blocking to reduce the search space for matching and parallel matching to improve efficiency. Special attention is given to the number and size of data partitions as they impact the overall communication overhead and memory requirements of individual match tasks. We have developed a service-based distributed infra- structure for the parallel execution of match workflows. We eva- luate our approach in detail for different match strategies for matching real-world product data of different web shops. We also consider caching of input entities and affinity-based scheduling of match tasks.

Posted Content
TL;DR: This paper presents an optimal technique to map virtual machines to physical machines (nodes) such that the number of required nodes is minimized and provides two approaches based on linear programming and quadratic programming techniques that significantly improve over the existing theoretical bounds and efficiently solve the problem of virtual machine placement in data centers.
Abstract: Cloud computing provides a computing platform for the users to meet their demands in an efficient, cost-effective way. Virtualization technologies are used in the clouds to aid the efficient usage of hardware. Virtual machines (VMs) are utilized to satisfy the user needs and are placed on physical machines (PMs) of the cloud for effective usage of hardware resources and electricity in the cloud. Optimizing the number of PMs used helps in cutting down the power consumption by a substantial amount. In this paper, we present an optimal technique to map virtual machines to physical machines (nodes) such that the number of required nodes is minimized. We provide two approaches based on linear programming and quadratic programming techniques that significantly improve over the existing theoretical bounds and efficiently solve the problem of virtual machine (VM) placement in data centers.

Posted Content
TL;DR: The Cloud Adoption Toolkit, whilst still under development, shows signs that it is a useful tool for decision makers as it helps address the feasibility challenges of cloud adoption in the enterprise.
Abstract: Cloud computing promises a radical shift in the provisioning of computing resource within the enterprise This paper: i) describes the challenges that decision makers face when attempting to determine the feasibility of the adoption of cloud computing in their organisations; ii) illustrates a lack of existing work to address the feasibility challenges of cloud adoption in the enterprise; iii) introduces the Cloud Adoption Toolkit that provides a framework to support decision makers in identifying their concerns, and matching these concerns to appropriate tools/techniques that can be used to address them The paper adopts a position paper methodology such that case study evidence is provided, where available, to support claims We conclude that the Cloud Adoption Toolkit, whilst still under development, shows signs that it is a useful tool for decision makers as it helps address the feasibility challenges of cloud adoption in the enterprise

Posted Content
TL;DR: A virtual machine resource manager (Cloud Scheduler) for distributed compute clouds that boots and manages the user-customized virtual machines in response to a user's job submission and presents results on its use on both science and commercial clouds.
Abstract: The availability of Infrastructure-as-a-Service (IaaS) computing clouds gives researchers access to a large set of new resources for running complex scientific applications. However, exploiting cloud resources for large numbers of jobs requires significant effort and expertise. In order to make it simple and transparent for researchers to deploy their applications, we have developed a virtual machine resource manager (Cloud Scheduler) for distributed compute clouds. Cloud Scheduler boots and manages the user-customized virtual machines in response to a user's job submission. We describe the motivation and design of the Cloud Scheduler and present results on its use on both science and commercial clouds.

Posted Content
TL;DR: In this paper, a discrete beeping communication model is presented, where nodes have no information regarding the local or global structure of the network, don't have access to synchronized clocks and are woken up by an adversary.
Abstract: We present the \emph{discrete beeping} communication model, which assumes nodes have minimal knowledge about their environment and severely limited communication capabilities. Specifically, nodes have no information regarding the local or global structure of the network, don't have access to synchronized clocks and are woken up by an adversary. Moreover, instead on communicating through messages they rely solely on carrier sensing to exchange information. We study the problem of \emph{interval coloring}, a variant of vertex coloring specially suited for the studied beeping model. Given a set of resources, the goal of interval coloring is to assign every node a large contiguous fraction of the resources, such that neighboring nodes share no resources. To highlight the importance of the discreteness of the model, we contrast it against a continuous variant described in [17]. We present an O(1$ time algorithm that terminates with probability 1 and assigns an interval of size $\Omega(T/\Delta)$ that repeats every $T$ time units to every node of the network. This improves an $O(\log n)$ time algorithm with the same guarantees presented in \cite{infocom09}, and accentuates the unrealistic assumptions of the continuous model. Under the more realistic discrete model, we present a Las Vegas algorithm that solves $\Omega(T/\Delta)$-interval coloring in $O(\log n)$ time with high probability and describe how to adapt the algorithm for dynamic networks where nodes may join or leave. For constant degree graphs we prove a lower bound of $\Omega(\log n)$ on the time required to solve interval coloring for this model against randomized algorithms. This lower bound implies that our algorithm is asymptotically optimal for constant degree graphs.

Posted Content
TL;DR: In this paper, the authors present a taxonomy of energy-efficient design of computing systems covering the hardware, operating system, virtualization and data center levels, and discuss causes and problems of high power / energy consumption.
Abstract: Traditionally, the development of computing systems has been focused on performance improvements driven by the demand of applications from consumer, scientific and business domains. However, the ever increasing energy consumption of computing systems has started to limit further performance growth due to overwhelming electricity bills and carbon dioxide footprints. Therefore, the goal of the computer system design has been shifted to power and energy efficiency. To identify open challenges in the area and facilitate future advancements it is essential to synthesize and classify the research on power and energy-efficient design conducted to date. In this work we discuss causes and problems of high power / energy consumption, and present a taxonomy of energy-efficient design of computing systems covering the hardware, operating system, virtualization and data center levels. We survey various key works in the area and map them to our taxonomy to guide future design and development efforts. This chapter is concluded with a discussion of advancements identified in energy-efficient computing and our vision on future research directions.

Posted Content
TL;DR: A benchmark called MalStone is introduced that is specifically designed to measure the performance of cloud computing middleware that supports the type of data intensive computing common when building data mining models.
Abstract: Developing data mining algorithms that are suitable for cloud computing platforms is currently an active area of research, as is developing cloud computing platforms appropriate for data mining. Currently, the most common benchmark for cloud computing is the Terasort (and related) benchmarks. Although the Terasort Benchmark is quite useful, it was not designed for data mining per se. In this paper, we introduce a benchmark called MalStone that is specifically designed to measure the performance of cloud computing middleware that supports the type of data intensive computing common when building data mining models. We also introduce MalGen, which is a utility for generating data on clouds that can be used with MalStone.

Posted Content
TL;DR: A new format for storing sparse matrices is suggested, designed to perform well mainly on GPU devices, and its implementation in CUDA is presented.
Abstract: A new format for storing sparse matrices is suggested. It is designed to perform well mainly on GPU devices. Its implementation in CUDA is presented. Its perfor- mance is tested on 1600 dierent types of matrices. This format is compared in detail with a hybrid format, and strong and weak points of both formats are shown.

Journal ArticleDOI
TL;DR: MapReduce is described and how image coaddition is adapted to the MapReduce framework and a number of optimizations to the basic approach are described and reported, with experimental results comparing their performance.
Abstract: In the coming decade, astronomical surveys of the sky will generate tens of terabytes of images and detect hundreds of millions of sources every night. The study of these sources will involve computation challenges such as anomaly detection and classification, and moving object tracking. Since such studies benefit from the highest quality data, methods such as image coaddition (stacking) will be a critical preprocessing step prior to scientific investigation. With a requirement that these images be analyzed on a nightly basis to identify moving sources or transient objects, these data streams present many computational challenges. Given the quantity of data involved, the computational load of these problems can only be addressed by distributing the workload over a large number of nodes. However, the high data throughput demanded by these applications may present scalability challenges for certain storage architectures. One scalable data-processing method that has emerged in recent years is MapReduce, and in this paper we focus on its popular open-source implementation called Hadoop. In the Hadoop framework, the data is partitioned among storage attached directly to worker nodes, and the processing workload is scheduled in parallel on the nodes that contain the required input data. A further motivation for using Hadoop is that it allows us to exploit cloud computing resources, e.g., Amazon's EC2. We report on our experience implementing a scalable image-processing pipeline for the SDSS imaging database using Hadoop. This multi-terabyte imaging dataset provides a good testbed for algorithm development since its scope and structure approximate future surveys. First, we describe MapReduce and how we adapted image coaddition to the MapReduce framework. Then we describe a number of optimizations to our basic approach and report experimental results comparing their performance.

Posted Content
TL;DR: A precise, scalable, and online request tracing tool for multitier services of black boxes, called PreciseTracer, which achieves higher tracing accuracy and faster response time than WAP5-a black-box tracing approach.
Abstract: As more and more multi-tier services are developed from commercial off-the-shelf components or heterogeneous middleware without source code available, both developers and administrators need a request tracing tool to (1) exactly know how a user request of interest travels through services of black boxes; (2) obtain macro-level user request behavior information of services without the necessity of inundating within massive logs. Previous research efforts either accept imprecision of probabilistic correlation methods or present precise but unscalable tracing approaches that have to collect and analyze large amount of logs; Besides, previous precise request tracing approaches of black boxes fail to propose macro-level abstractions that enables debugging performance-in-the-large, and hence users have to manually interpret massive logs. This paper introduces a precise, scalable and online request tracing tool, named PreciseTracer, for multi-tier services of black boxes. Our contributions are four-fold: first, we propose a precise request tracing algorithm for multi-tier services of black boxes, which only uses application-independent knowledge; second, we respectively present micro-level and macro-level abstractions: component activity graphs and dominated causal path patterns to represent causal paths of each individual request and repeatedly executed causal paths that account for significant fractions; third, we present two mechanisms: tracing on demand and sampling to significantly increase system scalability; fourth, we design and implement an online request tracing tool. PreciseTracer's fast response, low overhead and scalability make it a promising tracing tool for large-scale production systems.

Posted Content
TL;DR: This work proposes and evaluates two MapReduce-based implementations for Sorted Neighborhood blocking that either use multiple Map Reduce jobs or apply a tailored data replication.
Abstract: Cloud infrastructures enable the efficient parallel execution of data-intensive tasks such as entity resolution on large datasets. We investigate challenges and possible solutions of using the MapReduce programming model for parallel entity resolution. In particular, we propose and evaluate two MapReduce-based implementations for Sorted Neighborhood blocking that either use multiple MapReduce jobs or apply a tailored data replication.

Posted Content
TL;DR: This work presents the first parallel algorithms that achieve increasing speed ups for an unbounded number of processors, based on two-dimensional block distribution of sparse matrices where serial sections use a novel hypersparse kernel for scalability.
Abstract: Generalized sparse matrix-matrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speed ups for an unbounded number of processors. Our algorithms are based on two-dimensional block distribution of sparse matrices where serial sections use a novel hypersparse kernel for scalability. We give a state-of-the-art MPI implementation of one of our algorithms. Our experiments show scaling up to thousands of processors on a variety of test scenarios.

Journal ArticleDOI
TL;DR: In this article, a multi-GPU implementation using a block-structured MPI parallelization is proposed for load balancing and heterogeneous computations on CPUs and GPUs, which achieves nearly perfect weak scalability on InfiniBand clusters.
Abstract: Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. We propose a multi-GPU implementation using a block-structured MPI parallelization, suitable for load balancing and heterogeneous computations on CPUs and GPUs. The overhead required for multi-GPU simulations is discussed in detail and it is demonstrated that the kernel performance can be sustained to a large extent. With our GPU implementation, we achieve nearly perfect weak scalability on InfiniBand clusters. However, in strong scaling scenarios multi-GPUs make less efficient use of the hardware than IBM BG/P and x86 clusters. Hence, a cost analysis must determine the best course of action for a particular simulation task. Additionally, weak scaling results of heterogeneous simulations conducted on CPUs and GPUs simultaneously are presented using clusters equipped with varying node configurations.

Posted Content
TL;DR: This paper reviews the present virtualization methods, virtual computing software, and provides a brief analysis of the performance issues inherent to each and presents testing results of KVM-QEMU on two current Multi-Core CPU Architectures and System Configurations.
Abstract: Virtualization has rapidly become a go-to technology for increasing efficiency in the data center. With virtualization technologies providing tremendous flexibility, even disparate architectures may be deployed on a single machine without interference. Awareness of limitations and requirements of physical hosts to be used for virtualization is important. This paper reviews the present virtualization methods, virtual computing software, and provides a brief analysis of the performance issues inherent to each. In the end we present testing results of KVM-QEMU on two current Multi-Core CPU Architectures and System Configurations.

Posted Content
TL;DR: In this article, the authors consider the problem of constructing a breadth-first spanning tree and show that it is impossible to contain the impact of Byzantine nodes in a strictly or strongly stabilizing manner.
Abstract: Self-stabilization is a versatile approach to fault-tolerance since it permits a distributed system to recover from any transient fault that arbitrarily corrupts the contents of all memories in the system. Byzantine tolerance is an attractive feature of distributed systems that permits to cope with arbitrary malicious behaviors. We consider the well known problem of constructing a breadth-first spanning tree in this context. Combining these two properties proves difficult: we demonstrate that it is impossible to contain the impact of Byzantine nodes in a strictly or strongly stabilizing manner. We then adopt the weaker scheme of topology-aware strict stabilization and we present a similar weakening of strong stabilization. We prove that the classical $min+1$ protocol has optimal Byzantine containment properties with respect to these criteria.