scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Distributed, Parallel, and Cluster Computing in 2017"


Posted Content
TL;DR: A comprehensive survey on fog computing is presented in this article, which critically reviews the state of the art in the light of a concise set of evaluation criteria and challenges and research directions.
Abstract: Cloud computing with its three key facets (i.e., IaaS, PaaS, and SaaS) and its inherent advantages (e.g., elasticity and scalability) still faces several challenges. The distance between the cloud and the end devices might be an issue for latency-sensitive applications such as disaster management and content delivery applications. Service Level Agreements (SLAs) may also impose processing at locations where the cloud provider does not have data centers. Fog computing is a novel paradigm to address such issues. It enables provisioning resources and services outside the cloud, at the edge of the network, closer to end devices or eventually, at locations stipulated by SLAs. Fog computing is not a substitute for cloud computing but a powerful complement. It enables processing at the edge while still offering the possibility to interact with the cloud. This article presents a comprehensive survey on fog computing. It critically reviews the state of the art in the light of a concise set of evaluation criteria. We cover both the architectures and the algorithms that make fog systems. Challenges and research directions are also introduced. In addition, the lessons learned are reviewed and the prospects are discussed in terms of the key role fog is likely to play in emerging technologies such as Tactile Internet.

450 citations


Posted Content
TL;DR: In this article, the authors discuss the changing cloud infrastructure and consider the use of infrastructure from multiple providers and the benefit of decentralising computing away from data centers, and lay out a roadmap of challenges that will need to be addressed for realising the potential of next generation cloud systems.
Abstract: The landscape of cloud computing has significantly changed over the last decade. Not only have more providers and service offerings crowded the space, but also cloud infrastructure that was traditionally limited to single provider data centers is now evolving. In this paper, we firstly discuss the changing cloud infrastructure and consider the use of infrastructure from multiple providers and the benefit of decentralising computing away from data centers. These trends have resulted in the need for a variety of new computing architectures that will be offered by future cloud infrastructure. These architectures are anticipated to impact areas, such as connecting people and devices, data-intensive computing, the service space and self-learning systems. Finally, we lay out a roadmap of challenges that will need to be addressed for realising the potential of next generation cloud systems.

440 citations


Posted Content
TL;DR: The process of assessing and gaining confidence in the resilience of a consensus protocols exposed to faults and adversarial nodes is discussed, and the consensus protocols in some prominent permissioned blockchain platforms with respect to their fault models and resilience against attacks are reviewed.
Abstract: A blockchain is a distributed ledger for recording transactions, maintained by many nodes without central authority through a distributed cryptographic protocol. All nodes validate the information to be appended to the blockchain, and a consensus protocol ensures that the nodes agree on a unique order in which entries are appended. Consensus protocols for tolerating Byzantine faults have received renewed attention because they also address blockchain systems. This work discusses the process of assessing and gaining confidence in the resilience of a consensus protocols exposed to faults and adversarial nodes. We advocate to follow the established practice in cryptography and computer security, relying on public reviews, detailed models, and formal proofs; the designers of several practical systems appear to be unaware of this. Moreover, we review the consensus protocols in some prominent permissioned blockchain platforms with respect to their fault models and resilience against attacks. The protocol comparison covers Hyperledger Fabric, Tendermint, Symbiont, R3~Corda, Iroha, Kadena, Chain, Quorum, MultiChain, Sawtooth Lake, Ripple, Stellar, and IOTA.

439 citations


Posted Content
TL;DR: This paper considers a multi-user MEC network powered by the WPT, and proposes a joint optimization method based on the alternating direction method of multipliers (ADMM) decomposition technique, which enjoys a much slower increase of computational complexity as the networks size increases.
Abstract: In this paper, we consider a multi-user mobile edge computing (MEC) network powered by wireless power transfer (WPT), where each energy-harvesting WD follows a binary computation offloading policy, i.e., data set of a task has to be executed as a whole either locally or remotely at the MEC server via task offloading. In particular, we are interested in maximizing the (weighted) sum computation rate of all the WDs in the network by jointly optimizing the individual computing mode selection (i.e., local computing or offloading) and the system transmission time allocation (on WPT and task offloading). The major difficulty lies in the combinatorial nature of multi-user computing mode selection and its strong coupling with transmission time allocation. To tackle this problem, we first consider a decoupled optimization, where we assume that the mode selection is given and propose a simple bi-section search algorithm to obtain the conditional optimal time allocation. On top of that, a coordinate descent method is devised to optimize the mode selection. The method is simple in implementation but may suffer from high computational complexity in a large-size network. To address this problem, we further propose a joint optimization method based on the ADMM (alternating direction method of multipliers) decomposition technique, which enjoys much slower increase of computational complexity as the networks size increases. Extensive simulations show that both the proposed methods can efficiently achieve near-optimal performance under various network setups, and significantly outperform the other representative benchmark methods considered.

428 citations


Posted Content
TL;DR: It is demonstrated that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs with several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule.
Abstract: We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we employed several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule. This paper also describes the details of the hardware and software of the system used to achieve the above performance.

282 citations


Posted Content
TL;DR: This work considers the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks and designs robust algorithms such that the system can learn the underlying true parameter, which is of dimension d, despite the interruption of Byzantine attacks.
Abstract: We consider the problem of distributed statistical machine learning in adversarial settings, where some unknown and time-varying subset of working machines may be compromised and behave arbitrarily to prevent an accurate model from being learned. This setting captures the potential adversarial attacks faced by Federated Learning -- a modern machine learning paradigm that is proposed by Google researchers and has been intensively studied for ensuring user privacy. Formally, we focus on a distributed system consisting of a parameter server and $m$ working machines. Each working machine keeps $N/m$ data samples, where $N$ is the total number of samples. The goal is to collectively learn the underlying true model parameter of dimension $d$. In classical batch gradient descent methods, the gradients reported to the server by the working machines are aggregated via simple averaging, which is vulnerable to a single Byzantine failure. In this paper, we propose a Byzantine gradient descent method based on the geometric median of means of the gradients. We show that our method can tolerate $q \le (m-1)/2$ Byzantine failures, and the parameter estimate converges in $O(\log N)$ rounds with an estimation error of $\sqrt{d(2q+1)/N}$, hence approaching the optimal error rate $\sqrt{d/N}$ in the centralized and failure-free setting. The total computational complexity of our algorithm is of $O((Nd/m) \log N)$ at each working machine and $O(md + kd \log^3 N)$ at the central server, and the total communication cost is of $O(m d \log N)$. We further provide an application of our general results to the linear regression problem. A key challenge arises in the above problem is that Byzantine failures create arbitrary and unspecified dependency among the iterations and the aggregated gradients. We prove that the aggregated gradient converges uniformly to the true gradient function.

237 citations


Posted Content
TL;DR: In this article, the authors present a blockchain-based design for the IoT that brings a distributed access control and data management, where the authors depart from the current trust model that delegates access control of our data to a centralized trusted authority and instead empower the users with data ownership.
Abstract: Today the cloud plays a central role in storing, processing, and distributing data. Despite contributing to the rapid development of IoT applications, the current IoT cloud-centric architecture has led into a myriad of isolated data silos that hinders the full potential of holistic data-driven analytics within the IoT. In this paper, we present a blockchain-based design for the IoT that brings a distributed access control and data management. We depart from the current trust model that delegates access control of our data to a centralized trusted authority and instead empower the users with data ownership. Our design is tailored for IoT data streams and enables secure data sharing. We enable a secure and resilient access control management, by utilizing the blockchain as an auditable and distributed access control layer to the storage layer. We facilitate the storage of time-series IoT data at the edge of the network via a locality-aware decentralized storage system that is managed with the blockchain technology. Our system is agnostic of the physical storage nodes and supports as well utilization of cloud storage resources as storage nodes.

219 citations


Posted Content
TL;DR: In this article, the authors introduce a novel concept of edge computing for mobile blockchain and introduce an economic approach for edge computing resource management. And a prototype of mobile edge computing enabled blockchain systems is presented with experimental results to justify the proposed concept.
Abstract: Blockchain, as the backbone technology of the current popular Bitcoin digital currency, has become a promising decentralized data management framework. Although blockchain has been widely adopted in many applications, e.g., finance, healthcare, and logistics, its application in mobile services is still limited. This is due to the fact that blockchain users need to solve preset proof-of-work puzzles to add new data, i.e., a block, to the blockchain. Solving the proof-of-work, however, consumes substantial resources in terms of CPU time and energy, which is not suitable for resource-limited mobile devices. To facilitate blockchain applications in future mobile Internet of Things systems, multiple access mobile edge computing appears to be an auspicious solution to solve the proof-of-work puzzles for mobile users. We first introduce a novel concept of edge computing for mobile blockchain. Then, we introduce an economic approach for edge computing resource management. Moreover, a prototype of mobile edge computing enabled blockchain systems is presented with experimental results to justify the proposed concept.

197 citations


Posted Content
TL;DR: TensorFlow-Serving is described, a system to serve machine learning models inside Google which is also available in the cloud and via open-source, and ways to integrate with systems that convey new models and updated versions from training to serving.
Abstract: We describe TensorFlow-Serving, a system to serve machine learning models inside Google which is also available in the cloud and via open-source. It is extremely flexible in terms of the types of ML platforms it supports, and ways to integrate with systems that convey new models and updated versions from training to serving. At the same time, the core code paths around model lookup and inference have been carefully optimized to avoid performance pitfalls observed in naive implementations. Google uses it in many production deployments, including a multi-tenant model hosting service called TFS^2.

193 citations


Posted Content
TL;DR: In this paper, the authors present a survey of the state of the art on stream processing engines and mechanisms for exploiting resource elasticity features of cloud computing in stream processing and discuss solutions proposed in the literature to address them.
Abstract: Under several emerging application scenarios, such as in smart cities, operational monitoring of large infrastructure, wearable assistance, and Internet of Things, continuous data streams must be processed under very short delays. Several solutions, including multiple software engines, have been developed for processing unbounded data streams in a scalable and efficient manner. More recently, architecture has been proposed to use edge computing for data stream processing. This paper surveys state of the art on stream processing engines and mechanisms for exploiting resource elasticity features of cloud computing in stream processing. Resource elasticity allows for an application or service to scale out/in according to fluctuating demands. Although such features have been extensively investigated for enterprise applications, stream processing poses challenges on achieving elastic systems that can make efficient resource management decisions based on current load. Elasticity becomes even more challenging in highly distributed environments comprising edge and cloud computing resources. This work examines some of these challenges and discusses solutions proposed in the literature to address them.

142 citations


Posted Content
TL;DR: It is suggested that stateless functions are a natural fit for data processing in future computing environments, based on recent trends in network bandwidth and the advent of disaggregated storage.
Abstract: Distributed computing remains inaccessible to a large number of users, in spite of many open source platforms and extensive commercial offerings. While distributed computation frameworks have moved beyond a simple map-reduce model, many users are still left to struggle with complex cluster management and configuration tools, even for running simple embarrassingly parallel jobs. We argue that stateless functions represent a viable platform for these users, eliminating cluster management overhead, fulfilling the promise of elasticity. Furthermore, using our prototype implementation, PyWren, we show that this model is general enough to implement a number of distributed computing models, such as BSP, efficiently. Extrapolating from recent trends in network bandwidth and the advent of disaggregated storage, we suggest that stateless functions are a natural fit for data processing in future computing environments.

Journal ArticleDOI
TL;DR: A SCAlable Real-time Fraud Finder (SCARFF) is presented which integrates Big Data tools (Kafka, Spark and Cassandra) with a machine learning approach which deals with imbalance, nonstationarity and feedback latency and shows that this framework is scalable, efficient and accurate over a big stream of transactions.
Abstract: The expansion of the electronic commerce, together with an increasing confidence of customers in electronic payments, makes of fraud detection a critical factor. Detecting frauds in (nearly) real time setting demands the design and the implementation of scalable learning techniques able to ingest and analyse massive amounts of streaming data. Recent advances in analytics and the availability of open source solutions for Big Data storage and processing open new perspectives to the fraud detection field. In this paper we present a SCAlable Real-time Fraud Finder (SCARFF) which integrates Big Data tools (Kafka, Spark and Cassandra) with a machine learning approach which deals with imbalance, nonstationarity and feedback latency. Experimental results on a massive dataset of real credit card transactions show that this framework is scalable, efficient and accurate over a big stream of transactions.

Posted Content
TL;DR: In this paper, a multi-tier architecture called Fog Computing Architecture Network (FOCAN) is proposed to reduce the latency and energy consumption of Internet of Everything (IoE) devices running various applications.
Abstract: Smart city vision brings emerging heterogeneous communication technologies such as Fog Computing (FC) together to substantially reduce the latency and energy consumption of Internet of Everything (IoE) devices running various applications. The key feature that distinguishes the FC paradigm for smart cities is that it spreads communication and computing resources over the wired/wireless access network (e.g., proximate access points and base stations) to provide resource augmentation (e.g., cyberforaging) for resource and energy-limited wired/wireless (possibly mobile) things. Moreover, smart city applications are developed with the goal of improving the management of urban flows and allowing real-time responses to challenges that can arise in users' transactional relationships. This article presents a Fog-supported smart city network architecture called Fog Computing Architecture Network (FOCAN), a multi-tier structure in which the applications running on things jointly compute, route, and communicate with one another through the smart city environment to decrease latency and improve energy provisioning and the efficiency of services among things with different capabilities. An important concern that arises with the introduction of FOCAN is the need to avoid transferring data to/from distant things and instead to cover the nearest region for an IoT application. We define three types of communications between FOCAN devices (e.g., interprimary, primary, and secondary communication) to manage applications in a way that meets the quality of service standards for the IoE. One of the main advantages of FOCAN is that the devices can provide the services with low energy usage and in an efficient manner. Simulation results for a selected case study demonstrate the tremendous impact of the FOCAN energy-efficient solution on the communication performance of various types of things in smart cities.

Posted Content
Onur Mutlu1
TL;DR: This work discusses the RowHammer problem in DRAM, which is a prime (and perhaps the first) example of how a circuit-level failure mechanism can cause a practical and widespread system security vulnerability, and describes and advocates a principled approach to memory reliability and security research that can enable us to better anticipate and prevent such vulnerabilities.
Abstract: As memory scales down to smaller technology nodes, new failure mechanisms emerge that threaten its correct operation. If such failure mechanisms are not anticipated and corrected, they can not only degrade system reliability and availability but also, perhaps even more importantly, open up security vulnerabilities: a malicious attacker can exploit the exposed failure mechanism to take over the entire system. As such, new failure mechanisms in memory can become practical and significant threats to system security. In this work, we discuss the RowHammer problem in DRAM, which is a prime (and perhaps the first) example of how a circuit-level failure mechanism in DRAM can cause a practical and widespread system security vulnerability. RowHammer, as it is popularly referred to, is the phenomenon that repeatedly accessing a row in a modern DRAM chip causes bit flips in physically-adjacent rows at consistently predictable bit locations. It is caused by a hardware failure mechanism called DRAM disturbance errors, which is a manifestation of circuit-level cell-to-cell interference in a scaled memory technology. We analyze the root causes of the RowHammer problem and examine various solutions. We also discuss what other vulnerabilities may be lurking in DRAM and other types of memories, e.g., NAND flash memory or Phase Change Memory, that can potentially threaten the foundations of secure systems, as the memory technologies scale to higher densities. We conclude by describing and advocating a principled approach to memory reliability and security research that can enable us to better anticipate and prevent such vulnerabilities.

Posted ContentDOI
TL;DR: This whitepaper summarizes issues raised during the First International Workshop on Serverless Computing (WoSC) 2017 and especially in the panel and associated discussion that concluded the workshop.
Abstract: This whitepaper summarizes issues raised during the First International Workshop on Serverless Computing (WoSC) 2017 held June 5th 2017 and especially in the panel and associated discussion that concluded the workshop. We also include comments from the keynote and submitted papers. A glossary at the end (section 8) defines many technical terms used in this report.

Posted Content
TL;DR: This paper develops the first framework to manage edge nodes, namely the Edge NOde Resource Management (ENORM) framework, and demonstrates the feasibility of the framework on a PokéMon Go-like online game use-case.
Abstract: Current computing techniques using the cloud as a centralised server will become untenable as billions of devices get connected to the Internet. This raises the need for fog computing, which leverages computing at the edge of the network on nodes, such as routers, base stations and switches, along with the cloud. However, to realise fog computing the challenge of managing edge nodes will need to be addressed. This paper is motivated to address the resource management challenge. We develop the first framework to manage edge nodes, namely the Edge NOde Resource Management (ENORM) framework. Mechanisms for provisioning and auto-scaling edge node resources are proposed. The feasibility of the framework is demonstrated on a PokeMon Go-like online game use-case. The benefits of using ENORM are observed by reduced application latency between 20% - 80% and reduced data transfer and communication frequency between the edge node and the cloud by up to 95\%. These results highlight the potential of fog computing for improving the quality of service and experience.

Posted Content
TL;DR: This paper proposes Heterogeneous Coded Matrix Multiplication (HCMM) algorithm for performing distributed matrix multiplication over heterogeneous clusters that is provably asymptotically optimal and provides numerical results demonstrating significant speedups of up to 49% and 34% for HCMM in comparison to the “uncoded” and “homogeneous coded” schemes.
Abstract: In large-scale distributed computing clusters, such as Amazon EC2, there are several types of "system noise" that can result in major degradation of performance: bottlenecks due to limited communication bandwidth, latency due to straggler nodes, etc. On the other hand, these systems enjoy abundance of redundancy - a vast number of computing nodes and large storage capacity. There have been recent results that demonstrate the impact of coding for efficient utilization of computation and storage redundancy to alleviate the effect of stragglers and communication bottlenecks in homogeneous clusters. In this paper, we focus on general heterogeneous distributed computing clusters consisting of a variety of computing machines with different capabilities. We propose a coding framework for speeding up distributed computing in heterogeneous clusters by trading redundancy for reducing the latency of computation. In particular, we propose Heterogeneous Coded Matrix Multiplication (HCMM) algorithm for performing distributed matrix multiplication over heterogeneous clusters that is provably asymptotically optimal for a broad class of processing time distributions. Moreover, we show that HCMM is unboundedly faster than any uncoded scheme. To demonstrate practicality of HCMM, we carry out experiments over Amazon EC2 clusters where HCMM is found to be up to $61\%$, $46\%$ and $36\%$ respectively faster than three benchmark load allocation schemes - Uniform Uncoded, Load-balanced Uncoded, and Uniform Coded. Additionally, we provide a generalization to the problem of optimal load allocation in heterogeneous settings, where we take into account the monetary costs associated with the clusters. We argue that HCMM is asymptotically optimal for budget-constrained scenarios as well, and we develop a heuristic algorithm for (HCMM) load allocation for budget-limited computation tasks.

Proceedings ArticleDOI
TL;DR: This paper reviews various dimensions of system architecture, application characteristics and platform abstractions that are manifest in this Edge, Fog and Cloud eco-system and highlights novel capabilities of the Edge and Fog layers, such as physical and application mobility, privacy sensitivity, and a nascent runtime environment.
Abstract: Internet of Things (IoT) has accelerated the deployment of millions of sensors at the edge of the network, through Smart City infrastructure and lifestyle devices. Cloud computing platforms are often tasked with handling these large volumes and fast streams of data from the edge. Recently, Fog computing has emerged as a concept for low-latency and resource-rich processing of these observation streams, to complement Edge and Cloud computing. In this paper, we review various dimensions of system architecture, application characteristics and platform abstractions that are manifest in this Edge, Fog and Cloud eco-system. We highlight novel capabilities of the Edge and Fog layers, such as physical and application mobility, privacy sensitivity, and a nascent runtime environment. IoT application case studies based on first-hand experiences across diverse domains drive this categorization. We also highlight the gap between the potential and the reality of Fog computing, and identify challenges that need to be overcome for the solution to be sustainable. Together, our article can help platform and application developers bridge the gap that remains in making Fog computing viable.

Posted Content
TL;DR: This work shows a novel architecture written in OpenCL(TM), which is referred to as a Deep Learning Accelerator (DLA), that maximizes data reuse and minimizes external memory bandwidth, and shows how the Winograd transform can be used to significantly boost the performance of the FPGA.
Abstract: Convolutional neural nets (CNNs) have become a practical means to perform vision tasks, particularly in the area of image classification. FPGAs are well known to be able to perform convolutions efficiently, however, most recent efforts to run CNNs on FPGAs have shown limited advantages over other devices such as GPUs. Previous approaches on FPGAs have often been memory bound due to the limited external memory bandwidth on the FPGA device. We show a novel architecture written in OpenCL(TM), which we refer to as a Deep Learning Accelerator (DLA), that maximizes data reuse and minimizes external memory bandwidth. Furthermore, we show how we can use the Winograd transform to significantly boost the performance of the FPGA. As a result, when running our DLA on Intel's Arria 10 device we can achieve a performance of 1020 img/s, or 23 img/s/W when running the AlexNet CNN benchmark. This comes to 1382 GFLOPs and is 10x faster with 8.4x more GFLOPS and 5.8x better efficiency than the state-of-the-art on FPGAs. Additionally, 23 img/s/W is competitive against the best publicly known implementation of AlexNet on nVidia's TitanX GPU.

Proceedings ArticleDOI
TL;DR: The ongoing work building a Raspberry Pi cluster consisting of 300 nodes is presented, with potential use cases being an inexpensive and green test bed for cloud computing research and a robust and mobile data center for operating in adverse environments.
Abstract: We present our ongoing work building a Raspberry Pi cluster consisting of 300 nodes. The unique characteristics of this single board computer pose several challenges, but also offer a number of interesting opportunities. On the one hand, a single Raspberry Pi can be purchased cheaply and has a low power consumption, which makes it possible to create an affordable and energy-efficient cluster. On the other hand, it lacks in computing power, which makes it difficult to run computationally intensive software on it. Nevertheless, by combining a large number of Raspberries into a cluster, this drawback can be (partially) offset. Here we report on the first important steps of creating our cluster: how to set up and configure the hardware and the system software, and how to monitor and maintain the system. We also discuss potential use cases for our cluster, the two most important being an inexpensive and green test bed for cloud computing research and a robust and mobile data center for operating in adverse environments.

Posted Content
TL;DR: A safety violation in Zyzzyva and a liveness violation in FaB are observed, and the problem is manifested already in the first log slot.
Abstract: In this note, we observe a safety violation in Zyzzyva and a liveness violation in FaB. To demonstrate these issues, we require relatively simple scenarios, involving only four replicas, and one or two view changes. In all of them, the problem is manifested already in the first log slot.

Posted Content
TL;DR: The proposed manifesto addresses the major open challenges in Cloud computing by identifying themajor open challenges, emerging trends, and impact areas, and offers research directions for the next decade, thus helping in the realisation of Future Generation Cloud Computing.
Abstract: The Cloud computing paradigm has revolutionised the computer science horizon during the past decade and has enabled the emergence of computing as the fifth utility. It has captured significant attention of academia, industries, and government bodies. Now, it has emerged as the backbone of modern economy by offering subscription-based services anytime, anywhere following a pay-as-you-go model. This has instigated (1) shorter establishment times for start-ups, (2) creation of scalable global enterprise applications, (3) better cost-to-value associativity for scientific and high performance computing applications, and (4) different invocation/execution models for pervasive and ubiquitous applications. The recent technological developments and paradigms such as serverless computing, software-defined networking, Internet of Things, and processing at network edge are creating new opportunities for Cloud computing. However, they are also posing several new challenges and creating the need for new approaches and research strategies, as well as the re-evaluation of the models that were developed to address issues such as scalability, elasticity, reliability, security, sustainability, and application models. The proposed manifesto addresses them by identifying the major open challenges in Cloud computing, emerging trends, and impact areas. It then offers research directions for the next decade, thus helping in the realisation of Future Generation Cloud Computing.

Posted Content
TL;DR: Maiter as discussed by the authors proposes delta-based accumulative iterative computation (DAIC) to accelerate large-scale graph-based iterative computations, which can bypass the high-cost synchronous barriers in heterogeneous distributed environments.
Abstract: Myriad of graph-based algorithms in machine learning and data mining require parsing relational data iteratively. These algorithms are implemented in a large-scale distributed environment in order to scale to massive data sets. To accelerate these large-scale graph-based iterative computations, we propose delta-based accumulative iterative computation (DAIC). Different from traditional iterative computations, which iteratively update the result based on the result from the previous iteration, DAIC updates the result by accumulating the "changes" between iterations. By DAIC, we can process only the "changes" to avoid the negligible updates. Furthermore, we can perform DAIC asynchronously to bypass the high-cost synchronous barriers in heterogeneous distributed environments. Based on the DAIC model, we design and implement an asynchronous graph processing framework, Maiter. We evaluate Maiter on local cluster as well as on Amazon EC2 Cloud. The results show that Maiter achieves as much as 60x speedup over Hadoop and outperforms other state-of-the-art frameworks.

Posted Content
TL;DR: A general technique for obtaining provably correct, non-blocking implementations of a large class of tree data structures where pointers are directed from parents to children and an experimental performance analysis demonstrates that the Java implementation of a chromatic tree rivals, and often significantly outperforms, other leading concurrent dictionaries.
Abstract: We describe a general technique for obtaining provably correct, non-blocking implementations of a large class of tree data structures where pointers are directed from parents to children. Updates are permitted to modify any contiguous portion of the tree atomically. Our non-blocking algorithms make use of the LLX, SCX and VLX primitives, which are multi-word generalizations of the standard LL, SC and VL primitives and have been implemented from single-word CAS. To illustrate our technique, we describe how it can be used in a fairly straightforward way to obtain a non-blocking implementation of a chromatic tree, which is a relaxed variant of a red-black tree. The height of the tree at any time is $O(c+ \log n)$, where $n$ is the number of keys and $c$ is the number of updates in progress. We provide an experimental performance analysis which demonstrates that our Java implementation of a chromatic tree rivals, and often significantly outperforms, other leading concurrent dictionaries.

Posted Content
TL;DR: In this article, the authors explore remarkable similarities between multi-transactional behaviors of smart contracts in cryptocurrencies such as Ethereum and classical problems of shared-memory concurrency and examine two real-world examples from the Ethereum blockchain and analyze how they are vulnerable to bugs that are closely reminiscent to those that often occur in traditional concurrent programs.
Abstract: In this paper, we explore remarkable similarities between multi-transactional behaviors of smart contracts in cryptocurrencies such as Ethereum and classical problems of shared-memory concurrency. We examine two real-world examples from the Ethereum blockchain and analyzing how they are vulnerable to bugs that are closely reminiscent to those that often occur in traditional concurrent programs. We then elaborate on the relation between observable contract behaviors and well-studied concurrency topics, such as atomicity, interference, synchronization, and resource ownership. The described contracts-as-concurrent-objects analogy provides deeper understanding of potential threats for smart contracts, indicate better engineering practices, and enable applications of existing state-of-the-art formal verification techniques.

Proceedings ArticleDOI
TL;DR: The evaluation results show that the virtualization's performance overhead could vary not only on a feature-by-feature basis but also on a job-to-job basis.
Abstract: The current virtualization solution in the Cloud widely relies on hypervisor-based technologies. Along with the recent popularity of Docker, the container-based virtualization starts receiving more attention for being a promising alternative. Since both of the virtualization solutions are not resource-free, their performance overheads would lead to negative impacts on the quality of Cloud services. To help fundamentally understand the performance difference between these two types of virtualization solutions, we use a physical machine with "just-enough" resource as a baseline to investigate the performance overhead of a standalone Docker container against a standalone virtual machine (VM). With findings contrary to the related work, our evaluation results show that the virtualization's performance overhead could vary not only on a feature-by-feature basis but also on a job-to-job basis. Although the container-based solution is undoubtedly lightweight, the hypervisor-based technology does not come with higher performance overhead in every case. For example, Docker containers particularly exhibit lower QoS in terms of storage transaction speed.

Posted Content
TL;DR: It is demonstrated that ChainerMN can scale the learning process of the ResNet-50 model to the ImageNet dataset up to 128 GPUs with the parallel efficiency of 90%.
Abstract: One of the keys for deep learning to have made a breakthrough in various fields was to utilize high computing powers centering around GPUs. Enabling the use of further computing abilities by distributed processing is essential not only to make the deep learning bigger and faster but also to tackle unsolved challenges. We present the design, implementation, and evaluation of ChainerMN, the distributed deep learning framework we have developed. We demonstrate that ChainerMN can scale the learning process of the ResNet-50 model to the ImageNet dataset up to 128 GPUs with the parallel efficiency of 90%.

Posted Content
TL;DR: A new technique which combines connected components of sample sparse subgraphs of the input graph in order to accelerate the process of uncoveringconnected components of the original input graph, and develops a sparsification technique which reduces an initial CC problem in O(1) rounds to its two restricted instances.
Abstract: We present a distributed randomized algorithm finding Minimum Spanning Tree (MST) of a given graph in O(1) rounds, with high probability, in the Congested Clique model. The input graph in the Congested Clique model is a graph of n nodes, where each node initially knows only its incident edges. The communication graph is a clique with limited edge bandwidth: each two nodes (not necessarily neighbours in the input graph) can exchange $O(\log n)$ bits. As in previous works, the key part of the MST algorithm is an efficient Connected Components (CC) algorithm. However, unlike the former approaches, we do not aim at simulating the standard Boruvka algorithm, at least at initial stages of the CC algorithm. Instead, we develop a new technique which combines connected components of sample sparse subgraphs of the input graph in order to accelerate the process of uncovering connected components of the original input graph. More specifically, we develop a sparsification technique which reduces an initial CC problem in $O(1)$ rounds to its two restricted instances. The former instance has a graph with maximal degree $O(\log \log n)$ as the input -- here our sample-combining technique helps. In the latter instance, a partition of the input graph into $O(n/\log \log n)$ connected components is known. This gives an opportunity to apply previous algorithms to determine connected components in $O(1)$ rounds. Our result addresses the problem from and the $O(\log \log n)$ algorithm of Lotker et al. [SPAA 2003; SICOMP 2005], improves over previous $O(\log* n)$ algorithm of Ghaffari et al. [PODC 2016] and $O(\log \log \log n)$ algorithm of Hegeman et al. [PODC 2015] . It also determines $\Theta(1)$ round complexity in the congested clique for MST, as well as other graph problems, including bipartiteness, cut verification, s-t connectivity and cycle containment.

Book ChapterDOI
TL;DR: In this article, the authors defined fog computing in the context of medical IoT and implemented and tested an fog computing system using the Intel Edison and Raspberry Pi that allows acquisition, computing, storage and communication of the various medical data such as pathological speech data of individuals with speech disorders, Phonocardiogram (PCG) signal for heart rate estimation, and Electrocardiogram (ECG)-based Q, R, S detection.
Abstract: In the era when the market segment of Internet of Things (IoT) tops the chart in various business reports, it is apparently envisioned that the field of medicine expects to gain a large benefit from the explosion of wearables and internet-connected sensors that surround us to acquire and communicate unprecedented data on symptoms, medication, food intake, and daily-life activities impacting one's health and wellness. However, IoT-driven healthcare would have to overcome many barriers, such as: 1) There is an increasing demand for data storage on cloud servers where the analysis of the medical big data becomes increasingly complex, 2) The data, when communicated, are vulnerable to security and privacy issues, 3) The communication of the continuously collected data is not only costly but also energy hungry, 4) Operating and maintaining the sensors directly from the cloud servers are non-trial tasks. This book chapter defined Fog Computing in the context of medical IoT. Conceptually, Fog Computing is a service-oriented intermediate layer in IoT, providing the interfaces between the sensors and cloud servers for facilitating connectivity, data transfer, and queryable local database. The centerpiece of Fog computing is a low-power, intelligent, wireless, embedded computing node that carries out signal conditioning and data analytics on raw data collected from wearables or other medical sensors and offers efficient means to serve telehealth interventions. We implemented and tested an fog computing system using the Intel Edison and Raspberry Pi that allows acquisition, computing, storage and communication of the various medical data such as pathological speech data of individuals with speech disorders, Phonocardiogram (PCG) signal for heart rate estimation, and Electrocardiogram (ECG)-based Q, R, S detection.

Posted Content
TL;DR: In this paper, a distributed variant of EBR is proposed, which is based on signaling and takes O(1)$ amortized steps per high-level operation on the data structure and O(mn^2)$ steps in the worst case each time an object is removed from a data structure, where n is the number of processes and m is a small constant.
Abstract: Memory reclamation for lock-based data structures is typically easy. However, it is a significant challenge for lock-free data structures. Automatic techniques such as garbage collection are inefficient or use locks, and non-automatic techniques either have high overhead, or do not work for many data structures. For example, subtle problems can arise when hazard pointers, one of the most common non-automatic techniques, are applied to many lock-free data structures. Epoch based reclamation (EBR), which is by far the most efficient non-automatic technique, allows the number of unreclaimed objects to grow without bound, because one crashed process can prevent all other processes from reclaiming memory. We develop a more efficient, distributed variant of EBR that solves this problem. It is based on signaling, which is provided by many operating systems, such as Linux and UNIX. Our new scheme takes $O(1)$ amortized steps per high-level operation on the data structure and $O(1)$ steps in the worst case each time an object is removed from the data structure. At any point, $O(mn^2)$ objects are waiting to be freed, where $n$ is the number of processes and $m$ is a small constant for most data structures. Experiments show that our scheme has very low overhead: on average 10\%, and at worst 28\%, for a balanced binary search tree over many thread counts, operation mixes and contention levels. Our scheme also outperforms a highly tuned implementation of hazard pointers by an average of 75\%. Typically, memory reclamation is tightly woven into lock-free data structure code. To improve modularity and facilitate the comparison of different memory reclamation schemes, we also introduce a highly flexible abstraction. It allows a programmer to easily interchange schemes for reclamation, object pooling, allocation and deallocation with virtually no overhead, by changing a single line of code.