Showing papers in "arXiv: Distributed, Parallel, and Cluster Computing in 2017"

PDF

Open Access

Posted Content•

A Comprehensive Survey on Fog Computing: State-of-the-art and Research Challenges

[...]

Carla Mouradian¹, Diala Naboulsi¹, Sami Yangui¹, Roch Glitho¹, Monique Morrow², Paul Anthony Polakos² - Show less +2 more•Institutions (2)

Concordia University¹, Cisco Systems, Inc.²

30 Oct 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: A comprehensive survey on fog computing is presented in this article, which critically reviews the state of the art in the light of a concise set of evaluation criteria and challenges and research directions.

...read moreread less

Abstract: Cloud computing with its three key facets (i.e., IaaS, PaaS, and SaaS) and its inherent advantages (e.g., elasticity and scalability) still faces several challenges. The distance between the cloud and the end devices might be an issue for latency-sensitive applications such as disaster management and content delivery applications. Service Level Agreements (SLAs) may also impose processing at locations where the cloud provider does not have data centers. Fog computing is a novel paradigm to address such issues. It enables provisioning resources and services outside the cloud, at the edge of the network, closer to end devices or eventually, at locations stipulated by SLAs. Fog computing is not a substitute for cloud computing but a powerful complement. It enables processing at the edge while still offering the possibility to interact with the cloud. This article presents a comprehensive survey on fog computing. It critically reviews the state of the art in the light of a concise set of evaluation criteria. We cover both the architectures and the algorithms that make fog systems. Challenges and research directions are also introduced. In addition, the lessons learned are reviewed and the prospects are discussed in terms of the key role fog is likely to play in emerging technologies such as Tactile Internet.

...read moreread less

450 citations

Posted Content•

Next Generation Cloud Computing: New Trends and Research Directions

[...]

Blesson Varghese, Rajkumar Buyya

24 Jul 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors discuss the changing cloud infrastructure and consider the use of infrastructure from multiple providers and the benefit of decentralising computing away from data centers, and lay out a roadmap of challenges that will need to be addressed for realising the potential of next generation cloud systems.

...read moreread less

Abstract: The landscape of cloud computing has significantly changed over the last decade. Not only have more providers and service offerings crowded the space, but also cloud infrastructure that was traditionally limited to single provider data centers is now evolving. In this paper, we firstly discuss the changing cloud infrastructure and consider the use of infrastructure from multiple providers and the benefit of decentralising computing away from data centers. These trends have resulted in the need for a variety of new computing architectures that will be offered by future cloud infrastructure. These architectures are anticipated to impact areas, such as connecting people and devices, data-intensive computing, the service space and self-learning systems. Finally, we lay out a roadmap of challenges that will need to be addressed for realising the potential of next generation cloud systems.

...read moreread less

440 citations

Posted Content•

Blockchain Consensus Protocols in the Wild

[...]

Christian Cachin, Marko Vukolic

06 Jul 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The process of assessing and gaining confidence in the resilience of a consensus protocols exposed to faults and adversarial nodes is discussed, and the consensus protocols in some prominent permissioned blockchain platforms with respect to their fault models and resilience against attacks are reviewed.

...read moreread less

Abstract: A blockchain is a distributed ledger for recording transactions, maintained by many nodes without central authority through a distributed cryptographic protocol. All nodes validate the information to be appended to the blockchain, and a consensus protocol ensures that the nodes agree on a unique order in which entries are appended. Consensus protocols for tolerating Byzantine faults have received renewed attention because they also address blockchain systems. This work discusses the process of assessing and gaining confidence in the resilience of a consensus protocols exposed to faults and adversarial nodes. We advocate to follow the established practice in cryptography and computer security, relying on public reviews, detailed models, and formal proofs; the designers of several practical systems appear to be unaware of this. Moreover, we review the consensus protocols in some prominent permissioned blockchain platforms with respect to their fault models and resilience against attacks. The protocol comparison covers Hyperledger Fabric, Tendermint, Symbiont, R3~Corda, Iroha, Kadena, Chain, Quorum, MultiChain, Sawtooth Lake, Ripple, Stellar, and IOTA.

...read moreread less

439 citations

Posted Content•

Computation Rate Maximization for Wireless Powered Mobile-Edge Computing with Binary Computation Offloading

[...]

Suzhi Bi, Ying Jun, Zhang

29 Aug 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This paper considers a multi-user MEC network powered by the WPT, and proposes a joint optimization method based on the alternating direction method of multipliers (ADMM) decomposition technique, which enjoys a much slower increase of computational complexity as the networks size increases.

...read moreread less

Abstract: In this paper, we consider a multi-user mobile edge computing (MEC) network powered by wireless power transfer (WPT), where each energy-harvesting WD follows a binary computation offloading policy, i.e., data set of a task has to be executed as a whole either locally or remotely at the MEC server via task offloading. In particular, we are interested in maximizing the (weighted) sum computation rate of all the WDs in the network by jointly optimizing the individual computing mode selection (i.e., local computing or offloading) and the system transmission time allocation (on WPT and task offloading). The major difficulty lies in the combinatorial nature of multi-user computing mode selection and its strong coupling with transmission time allocation. To tackle this problem, we first consider a decoupled optimization, where we assume that the mode selection is given and propose a simple bi-section search algorithm to obtain the conditional optimal time allocation. On top of that, a coordinate descent method is devised to optimize the mode selection. The method is simple in implementation but may suffer from high computational complexity in a large-size network. To address this problem, we further propose a joint optimization method based on the ADMM (alternating direction method of multipliers) decomposition technique, which enjoys much slower increase of computational complexity as the networks size increases. Extensive simulations show that both the proposed methods can efficiently achieve near-optimal performance under various network setups, and significantly outperform the other representative benchmark methods considered.

...read moreread less

428 citations

Posted Content•

Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

[...]

Takuya Akiba, Shuji Suzuki, Keisuke Fukuda

12 Nov 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: It is demonstrated that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs with several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule.

...read moreread less

Abstract: We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we employed several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule. This paper also describes the details of the hardware and software of the system used to achieve the above performance.

...read moreread less

282 citations

Posted Content•

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

[...]

Yudong Chen¹, Lili Su², Jiaming Xu³•Institutions (3)

Cornell University¹, Massachusetts Institute of Technology², Purdue University³

16 May 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This work considers the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks and designs robust algorithms such that the system can learn the underlying true parameter, which is of dimension d, despite the interruption of Byzantine attacks.

...read moreread less

Abstract: We consider the problem of distributed statistical machine learning in adversarial settings, where some unknown and time-varying subset of working machines may be compromised and behave arbitrarily to prevent an accurate model from being learned. This setting captures the potential adversarial attacks faced by Federated Learning -- a modern machine learning paradigm that is proposed by Google researchers and has been intensively studied for ensuring user privacy. Formally, we focus on a distributed system consisting of a parameter server and $m$ working machines. Each working machine keeps $N/m$ data samples, where $N$ is the total number of samples. The goal is to collectively learn the underlying true model parameter of dimension $d$. In classical batch gradient descent methods, the gradients reported to the server by the working machines are aggregated via simple averaging, which is vulnerable to a single Byzantine failure. In this paper, we propose a Byzantine gradient descent method based on the geometric median of means of the gradients. We show that our method can tolerate $q \le (m-1)/2$ Byzantine failures, and the parameter estimate converges in $O(\log N)$ rounds with an estimation error of $\sqrt{d(2q+1)/N}$, hence approaching the optimal error rate $\sqrt{d/N}$ in the centralized and failure-free setting. The total computational complexity of our algorithm is of $O((Nd/m) \log N)$ at each working machine and $O(md + kd \log^3 N)$ at the central server, and the total communication cost is of $O(m d \log N)$. We further provide an application of our general results to the linear regression problem. A key challenge arises in the above problem is that Byzantine failures create arbitrary and unspecified dependency among the iterations and the aggregated gradients. We prove that the aggregated gradient converges uniformly to the true gradient function.

...read moreread less

237 citations

Posted Content•

Towards Blockchain-based Auditable Storage and Sharing of IoT Data

[...]

Hossein Shafagh¹, Lukas Burkhalter¹, Anwar Hithnawi¹, Simon Duquennoy•Institutions (1)

ETH Zurich¹

22 May 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors present a blockchain-based design for the IoT that brings a distributed access control and data management, where the authors depart from the current trust model that delegates access control of our data to a centralized trusted authority and instead empower the users with data ownership.

...read moreread less

Abstract: Today the cloud plays a central role in storing, processing, and distributing data. Despite contributing to the rapid development of IoT applications, the current IoT cloud-centric architecture has led into a myriad of isolated data silos that hinders the full potential of holistic data-driven analytics within the IoT. In this paper, we present a blockchain-based design for the IoT that brings a distributed access control and data management. We depart from the current trust model that delegates access control of our data to a centralized trusted authority and instead empower the users with data ownership. Our design is tailored for IoT data streams and enables secure data sharing. We enable a secure and resilient access control management, by utilizing the blockchain as an auditable and distributed access control layer to the storage layer. We facilitate the storage of time-series IoT data at the edge of the network via a locality-aware decentralized storage system that is managed with the blockchain technology. Our system is agnostic of the physical storage nodes and supports as well utilization of cloud storage resources as storage nodes.

...read moreread less

219 citations

Posted Content•

When Mobile Blockchain Meets Edge Computing

[...]

Zehui Xiong¹, Yang Zhang², Dusit Niyato, Ping Wang¹, Zhu Han³ - Show less +1 more•Institutions (3)

Nanyang Technological University¹, Wuhan University of Technology², University of Houston³

16 Nov 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors introduce a novel concept of edge computing for mobile blockchain and introduce an economic approach for edge computing resource management. And a prototype of mobile edge computing enabled blockchain systems is presented with experimental results to justify the proposed concept.

...read moreread less

Abstract: Blockchain, as the backbone technology of the current popular Bitcoin digital currency, has become a promising decentralized data management framework. Although blockchain has been widely adopted in many applications, e.g., finance, healthcare, and logistics, its application in mobile services is still limited. This is due to the fact that blockchain users need to solve preset proof-of-work puzzles to add new data, i.e., a block, to the blockchain. Solving the proof-of-work, however, consumes substantial resources in terms of CPU time and energy, which is not suitable for resource-limited mobile devices. To facilitate blockchain applications in future mobile Internet of Things systems, multiple access mobile edge computing appears to be an auspicious solution to solve the proof-of-work puzzles for mobile users. We first introduce a novel concept of edge computing for mobile blockchain. Then, we introduce an economic approach for edge computing resource management. Moreover, a prototype of mobile edge computing enabled blockchain systems is presented with experimental results to justify the proposed concept.

...read moreread less

197 citations

Posted Content•

TensorFlow-Serving: Flexible, High-Performance ML Serving

[...]

Christopher Olston, Fangwei Li, Jeremiah Harmsen, Jordan Soyke, Kiril Gorovoy, Li Lao, Noah Fiedel, Sukriti Ramesh, Vinu Rajashekhar - Show less +5 more

17 Dec 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: TensorFlow-Serving is described, a system to serve machine learning models inside Google which is also available in the cloud and via open-source, and ways to integrate with systems that convey new models and updated versions from training to serving.

...read moreread less

Abstract: We describe TensorFlow-Serving, a system to serve machine learning models inside Google which is also available in the cloud and via open-source. It is extremely flexible in terms of the types of ML platforms it supports, and ways to integrate with systems that convey new models and updated versions from training to serving. At the same time, the core code paths around model lookup and inference have been carefully optimized to avoid performance pitfalls observed in naive implementations. Google uses it in many production deployments, including a multi-tenant model hosting service called TFS^2.

...read moreread less

193 citations

Posted Content•

Distributed Data Stream Processing and Edge Computing: A Survey on Resource Elasticity and Future Directions

[...]

Marcos Dias De Assuncao, Alexandre da Silva Veith, Rajkumar Buyya

05 Sep 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, the authors present a survey of the state of the art on stream processing engines and mechanisms for exploiting resource elasticity features of cloud computing in stream processing and discuss solutions proposed in the literature to address them.

...read moreread less

Abstract: Under several emerging application scenarios, such as in smart cities, operational monitoring of large infrastructure, wearable assistance, and Internet of Things, continuous data streams must be processed under very short delays. Several solutions, including multiple software engines, have been developed for processing unbounded data streams in a scalable and efficient manner. More recently, architecture has been proposed to use edge computing for data stream processing. This paper surveys state of the art on stream processing engines and mechanisms for exploiting resource elasticity features of cloud computing in stream processing. Resource elasticity allows for an application or service to scale out/in according to fluctuating demands. Although such features have been extensively investigated for enterprise applications, stream processing poses challenges on achieving elastic systems that can make efficient resource management decisions based on current load. Elasticity becomes even more challenging in highly distributed environments comprising edge and cloud computing resources. This work examines some of these challenges and discusses solutions proposed in the literature to address them.

...read moreread less

142 citations

Posted Content•

Occupy the Cloud: Distributed Computing for the 99%

[...]

Eric Jonas¹, Qifan Pu¹, Shivaram Venkataraman¹, Ion Stoica¹, Benjamin Recht¹ - Show less +1 more•Institutions (1)

University of California¹

13 Feb 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: It is suggested that stateless functions are a natural fit for data processing in future computing environments, based on recent trends in network bandwidth and the advent of disaggregated storage.

...read moreread less

Abstract: Distributed computing remains inaccessible to a large number of users, in spite of many open source platforms and extensive commercial offerings. While distributed computation frameworks have moved beyond a simple map-reduce model, many users are still left to struggle with complex cluster management and configuration tools, even for running simple embarrassingly parallel jobs. We argue that stateless functions represent a viable platform for these users, eliminating cluster management overhead, fulfilling the promise of elasticity. Furthermore, using our prototype implementation, PyWren, we show that this model is general enough to implement a number of distributed computing models, such as BSP, efficiently. Extrapolating from recent trends in network bandwidth and the advent of disaggregated storage, we suggest that stateless functions are a natural fit for data processing in future computing environments.

...read moreread less

Journal Article•DOI•

SCARFF: a Scalable Framework for Streaming Credit Card Fraud Detection with Spark

[...]

Fabrizio Carcillo¹, Andrea Dal Pozzolo¹, Yann-Aël Le Borgne¹, Olivier Caelen, Yannis Mazzer, Gianluca Bontempi¹ - Show less +2 more•Institutions (1)

Université libre de Bruxelles¹

26 Sep 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: A SCAlable Real-time Fraud Finder (SCARFF) is presented which integrates Big Data tools (Kafka, Spark and Cassandra) with a machine learning approach which deals with imbalance, nonstationarity and feedback latency and shows that this framework is scalable, efficient and accurate over a big stream of transactions.

...read moreread less

Abstract: The expansion of the electronic commerce, together with an increasing confidence of customers in electronic payments, makes of fraud detection a critical factor. Detecting frauds in (nearly) real time setting demands the design and the implementation of scalable learning techniques able to ingest and analyse massive amounts of streaming data. Recent advances in analytics and the availability of open source solutions for Big Data storage and processing open new perspectives to the fraud detection field. In this paper we present a SCAlable Real-time Fraud Finder (SCARFF) which integrates Big Data tools (Kafka, Spark and Cassandra) with a machine learning approach which deals with imbalance, nonstationarity and feedback latency. Experimental results on a massive dataset of real credit card transactions show that this framework is scalable, efficient and accurate over a big stream of transactions.

...read moreread less

Posted Content•

FOCAN: A Fog-supported Smart City Network Architecture for Management of Applications in the Internet of Everything Environments

[...]

Paola G. Vinueza Naranjo¹, Zahra Pooranian², Mohammad Shojafar², Mohammad Shojafar¹, Mauro Conti², Rajkumar Buyya³ - Show less +2 more•Institutions (3)

Sapienza University of Rome¹, University of Padua², University of Melbourne³

04 Oct 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, a multi-tier architecture called Fog Computing Architecture Network (FOCAN) is proposed to reduce the latency and energy consumption of Internet of Everything (IoE) devices running various applications.

...read moreread less

Abstract: Smart city vision brings emerging heterogeneous communication technologies such as Fog Computing (FC) together to substantially reduce the latency and energy consumption of Internet of Everything (IoE) devices running various applications. The key feature that distinguishes the FC paradigm for smart cities is that it spreads communication and computing resources over the wired/wireless access network (e.g., proximate access points and base stations) to provide resource augmentation (e.g., cyberforaging) for resource and energy-limited wired/wireless (possibly mobile) things. Moreover, smart city applications are developed with the goal of improving the management of urban flows and allowing real-time responses to challenges that can arise in users' transactional relationships. This article presents a Fog-supported smart city network architecture called Fog Computing Architecture Network (FOCAN), a multi-tier structure in which the applications running on things jointly compute, route, and communicate with one another through the smart city environment to decrease latency and improve energy provisioning and the efficiency of services among things with different capabilities. An important concern that arises with the introduction of FOCAN is the need to avoid transferring data to/from distant things and instead to cover the nearest region for an IoT application. We define three types of communications between FOCAN devices (e.g., interprimary, primary, and secondary communication) to manage applications in a way that meets the quality of service standards for the IoE. One of the main advantages of FOCAN is that the devices can provide the services with low energy usage and in an efficient manner. Simulation results for a selected case study demonstrate the tremendous impact of the FOCAN energy-efficient solution on the communication performance of various types of things in smart cities.

...read moreread less

Posted Content•

The RowHammer Problem and Other Issues We May Face as Memory Becomes Denser

[...]

Onur Mutlu¹•Institutions (1)

ETH Zurich¹

02 Mar 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This work discusses the RowHammer problem in DRAM, which is a prime (and perhaps the first) example of how a circuit-level failure mechanism can cause a practical and widespread system security vulnerability, and describes and advocates a principled approach to memory reliability and security research that can enable us to better anticipate and prevent such vulnerabilities.

...read moreread less

Abstract: As memory scales down to smaller technology nodes, new failure mechanisms emerge that threaten its correct operation. If such failure mechanisms are not anticipated and corrected, they can not only degrade system reliability and availability but also, perhaps even more importantly, open up security vulnerabilities: a malicious attacker can exploit the exposed failure mechanism to take over the entire system. As such, new failure mechanisms in memory can become practical and significant threats to system security. In this work, we discuss the RowHammer problem in DRAM, which is a prime (and perhaps the first) example of how a circuit-level failure mechanism in DRAM can cause a practical and widespread system security vulnerability. RowHammer, as it is popularly referred to, is the phenomenon that repeatedly accessing a row in a modern DRAM chip causes bit flips in physically-adjacent rows at consistently predictable bit locations. It is caused by a hardware failure mechanism called DRAM disturbance errors, which is a manifestation of circuit-level cell-to-cell interference in a scaled memory technology. We analyze the root causes of the RowHammer problem and examine various solutions. We also discuss what other vulnerabilities may be lurking in DRAM and other types of memories, e.g., NAND flash memory or Phase Change Memory, that can potentially threaten the foundations of secure systems, as the memory technologies scale to higher densities. We conclude by describing and advocating a principled approach to memory reliability and security research that can enable us to better anticipate and prevent such vulnerabilities.

...read moreread less

Posted Content•DOI•

Status of Serverless Computing and Function-as-a-Service(FaaS) in Industry and Research

[...]

Geoffrey C. Fox, Vatche Ishakian, Vinod Muthusamy, Aleksander Slominski

27 Aug 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This whitepaper summarizes issues raised during the First International Workshop on Serverless Computing (WoSC) 2017 and especially in the panel and associated discussion that concluded the workshop.

...read moreread less

Abstract: This whitepaper summarizes issues raised during the First International Workshop on Serverless Computing (WoSC) 2017 held June 5th 2017 and especially in the panel and associated discussion that concluded the workshop. We also include comments from the keynote and submitted papers. A glossary at the end (section 8) defines many technical terms used in this report.

...read moreread less

Posted Content•

ENORM: A Framework For Edge NOde Resource Management

[...]

Nan Wang, Blesson Varghese, Michail Matthaiou, Dimitrios S. Nikolopoulos

12 Sep 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This paper develops the first framework to manage edge nodes, namely the Edge NOde Resource Management (ENORM) framework, and demonstrates the feasibility of the framework on a PokéMon Go-like online game use-case.

...read moreread less

Abstract: Current computing techniques using the cloud as a centralised server will become untenable as billions of devices get connected to the Internet. This raises the need for fog computing, which leverages computing at the edge of the network on nodes, such as routers, base stations and switches, along with the cloud. However, to realise fog computing the challenge of managing edge nodes will need to be addressed. This paper is motivated to address the resource management challenge. We develop the first framework to manage edge nodes, namely the Edge NOde Resource Management (ENORM) framework. Mechanisms for provisioning and auto-scaling edge node resources are proposed. The feasibility of the framework is demonstrated on a PokeMon Go-like online game use-case. The benefits of using ENORM are observed by reduced application latency between 20% - 80% and reduced data transfer and communication frequency between the edge node and the cloud by up to 95\%. These results highlight the potential of fog computing for improving the quality of service and experience.

...read moreread less

Posted Content•

Coded Computation over Heterogeneous Clusters

[...]

Amirhossein Reisizadeh¹, Saurav Prakash¹, Ramtin Pedarsani², Amir Salman Avestimehr²•Institutions (2)

University of California, Santa Barbara¹, University of Southern California²

21 Jan 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This paper proposes Heterogeneous Coded Matrix Multiplication (HCMM) algorithm for performing distributed matrix multiplication over heterogeneous clusters that is provably asymptotically optimal and provides numerical results demonstrating significant speedups of up to 49% and 34% for HCMM in comparison to the “uncoded” and “homogeneous coded” schemes.

...read moreread less

Abstract: In large-scale distributed computing clusters, such as Amazon EC2, there are several types of "system noise" that can result in major degradation of performance: bottlenecks due to limited communication bandwidth, latency due to straggler nodes, etc. On the other hand, these systems enjoy abundance of redundancy - a vast number of computing nodes and large storage capacity. There have been recent results that demonstrate the impact of coding for efficient utilization of computation and storage redundancy to alleviate the effect of stragglers and communication bottlenecks in homogeneous clusters. In this paper, we focus on general heterogeneous distributed computing clusters consisting of a variety of computing machines with different capabilities. We propose a coding framework for speeding up distributed computing in heterogeneous clusters by trading redundancy for reducing the latency of computation. In particular, we propose Heterogeneous Coded Matrix Multiplication (HCMM) algorithm for performing distributed matrix multiplication over heterogeneous clusters that is provably asymptotically optimal for a broad class of processing time distributions. Moreover, we show that HCMM is unboundedly faster than any uncoded scheme. To demonstrate practicality of HCMM, we carry out experiments over Amazon EC2 clusters where HCMM is found to be up to $61\%$, $46\%$ and $36\%$ respectively faster than three benchmark load allocation schemes - Uniform Uncoded, Load-balanced Uncoded, and Uniform Coded. Additionally, we provide a generalization to the problem of optimal load allocation in heterogeneous settings, where we take into account the monetary costs associated with the clusters. We argue that HCMM is asymptotically optimal for budget-constrained scenarios as well, and we develop a heuristic algorithm for (HCMM) load allocation for budget-limited computation tasks.

...read moreread less

Proceedings Article•DOI•

Demystifying Fog Computing: Characterizing Architectures, Applications and Abstractions

[...]

Prateeksha Varshney¹, Yogesh Simmhan¹•Institutions (1)

Indian Institute of Science¹

21 Feb 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This paper reviews various dimensions of system architecture, application characteristics and platform abstractions that are manifest in this Edge, Fog and Cloud eco-system and highlights novel capabilities of the Edge and Fog layers, such as physical and application mobility, privacy sensitivity, and a nascent runtime environment.

...read moreread less

Abstract: Internet of Things (IoT) has accelerated the deployment of millions of sensors at the edge of the network, through Smart City infrastructure and lifestyle devices. Cloud computing platforms are often tasked with handling these large volumes and fast streams of data from the edge. Recently, Fog computing has emerged as a concept for low-latency and resource-rich processing of these observation streams, to complement Edge and Cloud computing. In this paper, we review various dimensions of system architecture, application characteristics and platform abstractions that are manifest in this Edge, Fog and Cloud eco-system. We highlight novel capabilities of the Edge and Fog layers, such as physical and application mobility, privacy sensitivity, and a nascent runtime environment. IoT application case studies based on first-hand experiences across diverse domains drive this categorization. We also highlight the gap between the potential and the reality of Fog computing, and identify challenges that need to be overcome for the solution to be sustainable. Together, our article can help platform and application developers bridge the gap that remains in making Fog computing viable.

...read moreread less

Posted Content•

An OpenCL(TM) Deep Learning Accelerator on Arria 10

[...]

Utku Aydonat, Shane O'Connell, Davor Capalija, Andrew Ling, Gordon Raymond Chiu - Show less +1 more

13 Jan 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This work shows a novel architecture written in OpenCL(TM), which is referred to as a Deep Learning Accelerator (DLA), that maximizes data reuse and minimizes external memory bandwidth, and shows how the Winograd transform can be used to significantly boost the performance of the FPGA.

...read moreread less

Abstract: Convolutional neural nets (CNNs) have become a practical means to perform vision tasks, particularly in the area of image classification. FPGAs are well known to be able to perform convolutions efficiently, however, most recent efforts to run CNNs on FPGAs have shown limited advantages over other devices such as GPUs. Previous approaches on FPGAs have often been memory bound due to the limited external memory bandwidth on the FPGA device. We show a novel architecture written in OpenCL(TM), which we refer to as a Deep Learning Accelerator (DLA), that maximizes data reuse and minimizes external memory bandwidth. Furthermore, we show how we can use the Winograd transform to significantly boost the performance of the FPGA. As a result, when running our DLA on Intel's Arria 10 device we can achieve a performance of 1020 img/s, or 23 img/s/W when running the AlexNet CNN benchmark. This comes to 1382 GFLOPs and is 10x faster with 8.4x more GFLOPS and 5.8x better efficiency than the state-of-the-art on FPGAs. Additionally, 23 img/s/W is competitive against the best publicly known implementation of AlexNet on nVidia's TitanX GPU.

...read moreread less

Proceedings Article•DOI•

Affordable and Energy-Efficient Cloud Computing Clusters: The Bolzano Raspberry Pi Cloud Cluster Experiment

[...]

Pekka Abrahamsson¹, Sven Helmer¹, Nattakarn Phaphoom¹, Lorenzo Nicolodi¹, Nick Preda¹, Lorenzo Miori¹, Matteo Angriman¹, Juha Rikkila¹, Xiaofeng Wang¹, Karim Hamily¹, Sara Bugoloni¹ - Show less +7 more•Institutions (1)

Free University of Bozen-Bolzano¹

20 Sep 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The ongoing work building a Raspberry Pi cluster consisting of 300 nodes is presented, with potential use cases being an inexpensive and green test bed for cloud computing research and a robust and mobile data center for operating in adverse environments.

...read moreread less

Abstract: We present our ongoing work building a Raspberry Pi cluster consisting of 300 nodes. The unique characteristics of this single board computer pose several challenges, but also offer a number of interesting opportunities. On the one hand, a single Raspberry Pi can be purchased cheaply and has a low power consumption, which makes it possible to create an affordable and energy-efficient cluster. On the other hand, it lacks in computing power, which makes it difficult to run computationally intensive software on it. Nevertheless, by combining a large number of Raspberries into a cluster, this drawback can be (partially) offset. Here we report on the first important steps of creating our cluster: how to set up and configure the hardware and the system software, and how to monitor and maintain the system. We also discuss potential use cases for our cluster, the two most important being an inexpensive and green test bed for cloud computing research and a robust and mobile data center for operating in adverse environments.

...read moreread less

Posted Content•

Revisiting Fast Practical Byzantine Fault Tolerance.

[...]

Ittai Abraham, Guy Golan Gueta, Dahlia Malkhi, Lorenzo Alvisi, Ramakrishna Kotla, Jean-Philippe Martin - Show less +2 more

04 Dec 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: A safety violation in Zyzzyva and a liveness violation in FaB are observed, and the problem is manifested already in the first log slot.

...read moreread less

Abstract: In this note, we observe a safety violation in Zyzzyva and a liveness violation in FaB. To demonstrate these issues, we require relatively simple scenarios, involving only four replicas, and one or two view changes. In all of them, the problem is manifested already in the first log slot.

...read moreread less

Posted Content•

A Manifesto for Future Generation Cloud Computing: Research Directions for the Next Decade

[...]

Rajkumar Buyya¹, Satish Narayana Srirama², Giuliano Casale³, Rodrigo N. Calheiros⁴, Yogesh Simmhan⁵, Blesson Varghese⁶, Erol Gelenbe³, Bahman Javadi⁴, Luis M. Vaquero⁷, Marco A. S. Netto⁸, Adel Nadjaran Toosi⁹, Maria Alejandra Rodriguez¹, Ignacio M. Llorente¹⁰, Sabrina De Capitani di Vimercati¹¹, Pierangela Samarati¹¹, Dejan Milojicic¹², Carlos A. Varela¹³, Rami Bahsoon¹⁴, Marcos Dias De Assuncao, Omer Rana¹⁵, Wanlei Zhou¹⁶, Hai Jin¹⁷, Wolfgang Gentzsch, Albert Y. Zomaya⁴, Haiying Shen¹⁸ - Show less +21 more•Institutions (18)

University of Melbourne¹, University of Tartu², Imperial College London³, University of Sydney⁴, Indian Institute of Science⁵, Queen's University Belfast⁶, University of Bristol⁷, IBM⁸, Monash University, Clayton campus⁹, Complutense University of Madrid¹⁰, University of Milan¹¹, Hewlett-Packard¹², Rensselaer Polytechnic Institute¹³, University of Birmingham¹⁴, Cardiff University¹⁵, University of Technology, Sydney¹⁶, Huazhong University of Science and Technology¹⁷, University of Virginia¹⁸

24 Nov 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The proposed manifesto addresses the major open challenges in Cloud computing by identifying themajor open challenges, emerging trends, and impact areas, and offers research directions for the next decade, thus helping in the realisation of Future Generation Cloud Computing.

...read moreread less

Abstract: The Cloud computing paradigm has revolutionised the computer science horizon during the past decade and has enabled the emergence of computing as the fifth utility. It has captured significant attention of academia, industries, and government bodies. Now, it has emerged as the backbone of modern economy by offering subscription-based services anytime, anywhere following a pay-as-you-go model. This has instigated (1) shorter establishment times for start-ups, (2) creation of scalable global enterprise applications, (3) better cost-to-value associativity for scientific and high performance computing applications, and (4) different invocation/execution models for pervasive and ubiquitous applications. The recent technological developments and paradigms such as serverless computing, software-defined networking, Internet of Things, and processing at network edge are creating new opportunities for Cloud computing. However, they are also posing several new challenges and creating the need for new approaches and research strategies, as well as the re-evaluation of the models that were developed to address issues such as scalability, elasticity, reliability, security, sustainability, and application models. The proposed manifesto addresses them by identifying the major open challenges in Cloud computing, emerging trends, and impact areas. It then offers research directions for the next decade, thus helping in the realisation of Future Generation Cloud Computing.

...read moreread less

Posted Content•

Maiter: An Asynchronous Graph Processing Framework for Delta-based Accumulative Iterative Computation

[...]

Yanfeng Zhang¹, Qixin Gao¹, Lixin Gao², Cuirong Wang¹•Institutions (2)

Northeastern University (China)¹, University of Massachusetts Amherst²

16 Oct 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: Maiter as discussed by the authors proposes delta-based accumulative iterative computation (DAIC) to accelerate large-scale graph-based iterative computations, which can bypass the high-cost synchronous barriers in heterogeneous distributed environments.

...read moreread less

Abstract: Myriad of graph-based algorithms in machine learning and data mining require parsing relational data iteratively. These algorithms are implemented in a large-scale distributed environment in order to scale to massive data sets. To accelerate these large-scale graph-based iterative computations, we propose delta-based accumulative iterative computation (DAIC). Different from traditional iterative computations, which iteratively update the result based on the result from the previous iteration, DAIC updates the result by accumulating the "changes" between iterations. By DAIC, we can process only the "changes" to avoid the negligible updates. Furthermore, we can perform DAIC asynchronously to bypass the high-cost synchronous barriers in heterogeneous distributed environments. Based on the DAIC model, we design and implement an asynchronous graph processing framework, Maiter. We evaluate Maiter on local cluster as well as on Amazon EC2 Cloud. The results show that Maiter achieves as much as 60x speedup over Hadoop and outperforms other state-of-the-art frameworks.

...read moreread less

Posted Content•

A General Technique for Non-blocking Trees

[...]

Trevor Brown¹, Faith Ellen¹, Eric Ruppert²•Institutions (2)

University of Toronto¹, York University²

18 Dec 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: A general technique for obtaining provably correct, non-blocking implementations of a large class of tree data structures where pointers are directed from parents to children and an experimental performance analysis demonstrates that the Java implementation of a chromatic tree rivals, and often significantly outperforms, other leading concurrent dictionaries.

...read moreread less

Abstract: We describe a general technique for obtaining provably correct, non-blocking implementations of a large class of tree data structures where pointers are directed from parents to children. Updates are permitted to modify any contiguous portion of the tree atomically. Our non-blocking algorithms make use of the LLX, SCX and VLX primitives, which are multi-word generalizations of the standard LL, SC and VL primitives and have been implemented from single-word CAS. To illustrate our technique, we describe how it can be used in a fairly straightforward way to obtain a non-blocking implementation of a chromatic tree, which is a relaxed variant of a red-black tree. The height of the tree at any time is $O(c+ \log n)$, where $n$ is the number of keys and $c$ is the number of updates in progress. We provide an experimental performance analysis which demonstrates that our Java implementation of a chromatic tree rivals, and often significantly outperforms, other leading concurrent dictionaries.

...read moreread less

Posted Content•

A Concurrent Perspective on Smart Contracts

[...]

Ilya Sergey¹, Aquinas Hobor²•Institutions (2)

University College London¹, National University of Singapore²

17 Feb 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors explore remarkable similarities between multi-transactional behaviors of smart contracts in cryptocurrencies such as Ethereum and classical problems of shared-memory concurrency and examine two real-world examples from the Ethereum blockchain and analyze how they are vulnerable to bugs that are closely reminiscent to those that often occur in traditional concurrent programs.

...read moreread less

Abstract: In this paper, we explore remarkable similarities between multi-transactional behaviors of smart contracts in cryptocurrencies such as Ethereum and classical problems of shared-memory concurrency. We examine two real-world examples from the Ethereum blockchain and analyzing how they are vulnerable to bugs that are closely reminiscent to those that often occur in traditional concurrent programs. We then elaborate on the relation between observable contract behaviors and well-studied concurrency topics, such as atomicity, interference, synchronization, and resource ownership. The described contracts-as-concurrent-objects analogy provides deeper understanding of potential threats for smart contracts, indicate better engineering practices, and enable applications of existing state-of-the-art formal verification techniques.

...read moreread less

Proceedings Article•DOI•

Performance Overhead Comparison between Hypervisor and Container based Virtualization

[...]

Zheng Li¹, Maria Kihl¹, Qinghua Lu², Jens A Andersson¹•Institutions (2)

Lund University¹, China University of Petroleum²

04 Aug 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The evaluation results show that the virtualization's performance overhead could vary not only on a feature-by-feature basis but also on a job-to-job basis.

...read moreread less

Abstract: The current virtualization solution in the Cloud widely relies on hypervisor-based technologies. Along with the recent popularity of Docker, the container-based virtualization starts receiving more attention for being a promising alternative. Since both of the virtualization solutions are not resource-free, their performance overheads would lead to negative impacts on the quality of Cloud services. To help fundamentally understand the performance difference between these two types of virtualization solutions, we use a physical machine with "just-enough" resource as a baseline to investigate the performance overhead of a standalone Docker container against a standalone virtual machine (VM). With findings contrary to the related work, our evaluation results show that the virtualization's performance overhead could vary not only on a feature-by-feature basis but also on a job-to-job basis. Although the container-based solution is undoubtedly lightweight, the hypervisor-based technology does not come with higher performance overhead in every case. For example, Docker containers particularly exhibit lower QoS in terms of storage transaction speed.

...read moreread less

Posted Content•

ChainerMN: Scalable Distributed Deep Learning Framework

[...]

Takuya Akiba, Keisuke Fukuda, Shuji Suzuki

31 Oct 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: It is demonstrated that ChainerMN can scale the learning process of the ResNet-50 model to the ImageNet dataset up to 128 GPUs with the parallel efficiency of 90%.

...read moreread less

Abstract: One of the keys for deep learning to have made a breakthrough in various fields was to utilize high computing powers centering around GPUs. Enabling the use of further computing abilities by distributed processing is essential not only to make the deep learning bigger and faster but also to tackle unsolved challenges. We present the design, implementation, and evaluation of ChainerMN, the distributed deep learning framework we have developed. We demonstrate that ChainerMN can scale the learning process of the ResNet-50 model to the ImageNet dataset up to 128 GPUs with the parallel efficiency of 90%.

...read moreread less

Posted Content•

MST in O(1) Rounds of the Congested Clique

[...]

Tomasz Jurdzinski, Krzysztof Nowicki

26 Jul 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: A new technique which combines connected components of sample sparse subgraphs of the input graph in order to accelerate the process of uncoveringconnected components of the original input graph, and develops a sparsification technique which reduces an initial CC problem in O(1) rounds to its two restricted instances.

...read moreread less

Abstract: We present a distributed randomized algorithm finding Minimum Spanning Tree (MST) of a given graph in O(1) rounds, with high probability, in the Congested Clique model. The input graph in the Congested Clique model is a graph of n nodes, where each node initially knows only its incident edges. The communication graph is a clique with limited edge bandwidth: each two nodes (not necessarily neighbours in the input graph) can exchange $O(\log n)$ bits. As in previous works, the key part of the MST algorithm is an efficient Connected Components (CC) algorithm. However, unlike the former approaches, we do not aim at simulating the standard Boruvka algorithm, at least at initial stages of the CC algorithm. Instead, we develop a new technique which combines connected components of sample sparse subgraphs of the input graph in order to accelerate the process of uncovering connected components of the original input graph. More specifically, we develop a sparsification technique which reduces an initial CC problem in $O(1)$ rounds to its two restricted instances. The former instance has a graph with maximal degree $O(\log \log n)$ as the input -- here our sample-combining technique helps. In the latter instance, a partition of the input graph into $O(n/\log \log n)$ connected components is known. This gives an opportunity to apply previous algorithms to determine connected components in $O(1)$ rounds. Our result addresses the problem from and the $O(\log \log n)$ algorithm of Lotker et al. [SPAA 2003; SICOMP 2005], improves over previous $O(\log* n)$ algorithm of Ghaffari et al. [PODC 2016] and $O(\log \log \log n)$ algorithm of Hegeman et al. [PODC 2015] . It also determines $\Theta(1)$ round complexity in the congested clique for MST, as well as other graph problems, including bipartiteness, cut verification, s-t connectivity and cycle containment.

...read moreread less

Book Chapter•DOI•

Fog Computing in Medical Internet-of-Things: Architecture, Implementation, and Applications

[...]

Harishchandra Dubey¹, Admir Monteiro², Nicholas Constant², Mohammadreza Abtahi², Debanjan Borthakur², Leslie Mahler², Yan Sun², Qing Yang², Umer Akbar³, Kunal Mankodiya² - Show less +6 more•Institutions (3)

University of Texas at Dallas¹, University of Rhode Island², Rhode Island Hospital³

24 Jun 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors defined fog computing in the context of medical IoT and implemented and tested an fog computing system using the Intel Edison and Raspberry Pi that allows acquisition, computing, storage and communication of the various medical data such as pathological speech data of individuals with speech disorders, Phonocardiogram (PCG) signal for heart rate estimation, and Electrocardiogram (ECG)-based Q, R, S detection.

...read moreread less

Abstract: In the era when the market segment of Internet of Things (IoT) tops the chart in various business reports, it is apparently envisioned that the field of medicine expects to gain a large benefit from the explosion of wearables and internet-connected sensors that surround us to acquire and communicate unprecedented data on symptoms, medication, food intake, and daily-life activities impacting one's health and wellness. However, IoT-driven healthcare would have to overcome many barriers, such as: 1) There is an increasing demand for data storage on cloud servers where the analysis of the medical big data becomes increasingly complex, 2) The data, when communicated, are vulnerable to security and privacy issues, 3) The communication of the continuously collected data is not only costly but also energy hungry, 4) Operating and maintaining the sensors directly from the cloud servers are non-trial tasks. This book chapter defined Fog Computing in the context of medical IoT. Conceptually, Fog Computing is a service-oriented intermediate layer in IoT, providing the interfaces between the sensors and cloud servers for facilitating connectivity, data transfer, and queryable local database. The centerpiece of Fog computing is a low-power, intelligent, wireless, embedded computing node that carries out signal conditioning and data analytics on raw data collected from wearables or other medical sensors and offers efficient means to serve telehealth interventions. We implemented and tested an fog computing system using the Intel Edison and Raspberry Pi that allows acquisition, computing, storage and communication of the various medical data such as pathological speech data of individuals with speech disorders, Phonocardiogram (PCG) signal for heart rate estimation, and Electrocardiogram (ECG)-based Q, R, S detection.

...read moreread less

Posted Content•

Reclaiming memory for lock-free data structures: there has to be a better way

[...]

Trevor Brown¹•Institutions (1)

University of Toronto¹

04 Dec 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, a distributed variant of EBR is proposed, which is based on signaling and takes O(1)$ amortized steps per high-level operation on the data structure and O(mn^2)$ steps in the worst case each time an object is removed from a data structure, where n is the number of processes and m is a small constant.

...read moreread less

Abstract: Memory reclamation for lock-based data structures is typically easy. However, it is a significant challenge for lock-free data structures. Automatic techniques such as garbage collection are inefficient or use locks, and non-automatic techniques either have high overhead, or do not work for many data structures. For example, subtle problems can arise when hazard pointers, one of the most common non-automatic techniques, are applied to many lock-free data structures. Epoch based reclamation (EBR), which is by far the most efficient non-automatic technique, allows the number of unreclaimed objects to grow without bound, because one crashed process can prevent all other processes from reclaiming memory. We develop a more efficient, distributed variant of EBR that solves this problem. It is based on signaling, which is provided by many operating systems, such as Linux and UNIX. Our new scheme takes $O(1)$ amortized steps per high-level operation on the data structure and $O(1)$ steps in the worst case each time an object is removed from the data structure. At any point, $O(mn^2)$ objects are waiting to be freed, where $n$ is the number of processes and $m$ is a small constant for most data structures. Experiments show that our scheme has very low overhead: on average 10\%, and at worst 28\%, for a balanced binary search tree over many thread counts, operation mixes and contention levels. Our scheme also outperforms a highly tuned implementation of hazard pointers by an average of 75\%. Typically, memory reclamation is tightly woven into lock-free data structure code. To improve modularity and facilitate the comparison of different memory reclamation schemes, we also introduce a highly flexible abstraction. It allows a programmer to easily interchange schemes for reclamation, object pooling, allocation and deallocation with virtually no overhead, by changing a single line of code.

...read moreread less

Collapse