Showing papers in "arXiv: Distributed, Parallel, and Cluster Computing in 2021"

PDF

Open Access

Posted Content•

ZeRO-Offload: Democratizing Billion-Scale Model Training

[...]

Jie Ren¹, Samyam Rajbhandari², Reza Yazdani Aminabadi², Olatunji Ruwase², Shuangyan Yang¹, Minjia Zhang², Dong Li¹, Yuxiong He² - Show less +4 more•Institutions (2)

University of California, Merced¹, Microsoft²

18 Jan 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: ZeRO-Offload democratizes large-scale model training making it accessible to even data scientists with access to just a single GPU, and combines compute and memory efficiency with ease-of-use.

...read moreread less

Abstract: Large-scale model training has been a playing ground for a limited few requiring complex model refactoring and access to prohibitively expensive GPU clusters. ZeRO-Offload changes the large model training landscape by making large model training accessible to nearly everyone. It can train models with over 13 billion parameters on a single GPU, a 10x increase in size compared to popular framework such as PyTorch, and it does so without requiring any model change from the data scientists or sacrificing computational efficiency. ZeRO-Offload enables large model training by offloading data and compute to CPU. To preserve compute efficiency, it is designed to minimize the data movement to/from GPU, and reduce CPU compute time while maximizing memory savings on GPU. As a result, ZeRO-Offload can achieve 40 TFlops/GPU on a single NVIDIA V100 GPU for 10B parameter model compared to 30TF using PyTorch alone for a 1.4B parameter model, the largest that can be trained without running out of memory. ZeRO-Offload is also designed to scale on multiple-GPUs when available, offering near linear speedup on up to 128 GPUs. Additionally, it can work together with model parallelism to train models with over 70 billion parameters on a single DGX-2 box, a 4.5x increase in model size compared to using model parallelism alone. By combining compute and memory efficiency with ease-of-use, ZeRO-Offload democratizes large-scale model training making it accessible to even data scientists with access to just a single GPU.

...read moreread less

118 citations

Proceedings Article•DOI•

Towards Demystifying Serverless Machine Learning Training

[...]

Jiawei Jiang¹, Shaoduo Gan¹, Yue Liu¹, Fanlin Wang¹, Gustavo Alonso¹, Ana Klimovic¹, Ankit Singla¹, Wentao Wu², Ce Zhang¹ - Show less +5 more•Institutions (2)

ETH Zurich¹, Microsoft²

17 May 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, a comparative study of distributed ML training over FaaS and IaaS is presented, showing that ML training pays off in serverless only for models with efficient (i.e., reduced) communication and that quickly converge.

...read moreread less

Abstract: The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-intensive applications such as ETL, query processing, or machine learning (ML). Several systems exist for training large-scale ML models on top of serverless infrastructures (e.g., AWS Lambda) but with inconclusive results in terms of their performance and relative advantage over "serverful" infrastructures (IaaS). In this paper we present a systematic, comparative study of distributed ML training over FaaS and IaaS. We present a design space covering design choices such as optimization algorithms and synchronization protocols, and implement a platform, LambdaML, that enables a fair comparison between FaaS and IaaS. We present experimental results using LambdaML, and further develop an analytic model to capture cost/performance tradeoffs that must be considered when opting for a serverless infrastructure. Our results indicate that ML training pays off in serverless only for models with efficient (i.e., reduced) communication and that quickly converge. In general, FaaS can be much faster but it is never significantly cheaper than IaaS.

...read moreread less

53 citations

Journal Article•DOI•

Blockchain platform for COVID-19 vaccine supply management

[...]

Claudia Antal¹, Tudor Cioara¹, Marcel Antal¹, Ionut Anghel¹•Institutions (1)

Technical University of Cluj-Napoca¹

04 Jan 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, blockchain technology is used for assuring the transparent tracing of COVID-19 vaccine registration, storage and delivery, and side effects self-reporting, where smart contracts are defined to monitor and track the proper vaccine distribution conditions against the safe handling rules defined by vaccine producers.

...read moreread less

Abstract: In the context of the COVID-19 pandemic, the rapid roll-out of a vaccine and the implementation of a worldwide immunization campaign is critical, but its success will depend on the availability of an operational and transparent distribution chain that can be audited by all relevant stakeholders. In this paper, we discuss how blockchain technology can be used for assuring the transparent tracing of COVID-19 vaccine registration, storage and delivery, and side effects self-reporting. We present such system implementation in which blockchain technology is used for assuring data integrity and immutability in case of beneficiary registration for vaccination, eliminating identity thefts and impersonations. Smart contracts are defined to monitor and track the proper vaccine distribution conditions against the safe handling rules defined by vaccine producers enabling the awareness of all network peers. For vaccine administration, a transparent and tamper-proof side effects self-reporting solution is provided considering person identification and administrated vaccine association. A prototype was implemented using the Ethereum test network, Ropsten, considering the COVID-19 vaccine distribution tracking conditions. The results obtained for each on-chain operation can be checked and validated on the Etherscan, demonstrating various aspects of the proposed system such as immunization actors and safe rules registration, vaccine tracking, and administration. In terms of throughput and scalability, the proposed blockchain system shows promising results.

...read moreread less

45 citations

Proceedings Article•DOI•

Characterization and Prediction of Deep Learning Workloads in Large-Scale GPU Datacenters

[...]

Qinghao Hu¹, Peng Sun, Shengen Yan, Yonggang Wen¹, Tianwei Zhang¹ - Show less +1 more•Institutions (1)

Nanyang Technological University¹

03 Sep 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors present a comprehensive study about the characteristics of DL jobs and resource management, and propose a general-purpose framework to manage resources based on historical data, which can minimize the clusterwide average job completion time by up to 6.5x and improve overall cluster utilization by 13%.

...read moreread less

Abstract: Modern GPU datacenters are critical for delivering Deep Learning (DL) models and services in both the research community and industry. When operating a datacenter, optimization of resource scheduling and management can bring significant financial benefits. Achieving this goal requires a deep understanding of the job features and user behaviors. We present a comprehensive study about the characteristics of DL jobs and resource management. First, we perform a large-scale analysis of real-world job traces from SenseTime. We uncover some interesting conclusions from the perspectives of clusters, jobs and users, which can facilitate the cluster system designs. Second, we introduce a general-purpose framework, which manages resources based on historical data. As case studies, we design: a Quasi-Shortest-Service-First scheduling service, which can minimize the cluster-wide average job completion time by up to 6.5x; and a Cluster Energy Saving service, which improves overall cluster utilization by up to 13%.

...read moreread less

45 citations

Posted Content•

Jolteon and Ditto: Network-Adaptive Efficient Consensus with Asynchronous Fallback.

[...]

Rati Gelashvili, Lefteris Kokoris-Kogias, Alberto Sonnino, Alexander Spiegelman, Zhuolun Xiang - Show less +1 more

18 Jun 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: Ditto is designed, a Byzantine SMR protocol that enjoys the best of both worlds: optimal communication on and off the happy path (linear and quadratic, respectively) and progress guarantee under asynchrony and DDoS attacks.

...read moreread less

Abstract: Existing committee-based Byzantine state machine replication (SMR) protocols, typically deployed in production blockchains, face a clear trade-off: (1) they either achieve linear communication cost in the happy path, but sacrifice liveness during periods of asynchrony, or (2) they are robust (progress with probability one) but pay quadratic communication cost. We believe this trade-off is unwarranted since existing linear protocols still have asymptotic quadratic cost in the worst case. We design Ditto, a Byzantine SMR protocol that enjoys the best of both worlds: optimal communication on and off the happy path (linear and quadratic, respectively) and progress guarantee under asynchrony and DDoS attacks. We achieve this by replacing the view-synchronization of partially synchronous protocols with an asynchronous fallback mechanism at no extra asymptotic cost. Specifically, we start from HotStuff, a state-of-the-art linear protocol, and gradually build Ditto. As a separate contribution and an intermediate step, we design a 2-chain version of HotStuff, Jolteon, which leverages a quadratic view-change mechanism to reduce the latency of the standard 3-chain HotStuff. We implement and experimentally evaluate all our systems. Notably, Jolteon's commit latency outperforms HotStuff by 200-300ms with varying system size. Additionally, Ditto adapts to the network and provides better performance than Jolteon under faulty conditions and better performance than VABA (a state-of-the-art asynchronous protocol) under faultless conditions. This proves our case that breaking the robustness-efficiency trade-off is in the realm of practicality.

...read moreread less

37 citations

Posted Content•

IFogSim2: An Extended iFogSim Simulator for Mobility, Clustering, and Microservice Management in Edge and Fog Computing Environments

[...]

Md. Redowan Mahmud, Samodha Pallewatta, Mohammad Goudarzi¹, Rajkumar Buyya•Institutions (1)

University of Melbourne¹

12 Sep 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors have developed multiple simulation models for service migration, dynamic distributed cluster formation, and microservice orchestration for edge/fog computing in this work and integrated with the existing iFogSim simulation toolkit for launching it as iFOGSim2.

...read moreread less

Abstract: Internet of Things (IoT) has already proven to be the building block for next-generation Cyber-Physical Systems (CPSs). The considerable amount of data generated by the IoT devices needs latency-sensitive processing, which is not feasible by deploying the respective applications in remote Cloud datacentres. Edge/Fog computing, a promising extension of Cloud at the IoT-proximate network, can meet such requirements for smart CPSs. However, the structural and operational differences of Edge/Fog infrastructure resist employing Cloud-based service regulations directly to these environments. As a result, many research works have been recently conducted, focusing on efficient application and resource management in Edge/Fog computing environments. Scalable Edge/Fog infrastructure is a must to validate these policies, which is also challenging to accommodate in the real-world due to high cost and implementation time. Considering simulation as a key to this constraint, various software has been developed that can imitate the physical behaviour of Edge/Fog computing environments. Nevertheless, the existing simulators often fail to support advanced service management features because of their monolithic architecture, lack of actual dataset, and limited scope for a periodic update. To overcome these issues, we have developed multiple simulation models for service migration, dynamic distributed cluster formation, and microservice orchestration for Edge/Fog computing in this work and integrated with the existing iFogSim simulation toolkit for launching it as iFogSim2. The performance of iFogSim2 and its built-in policies are evaluated using three use case scenarios and compared with the contemporary simulators and benchmark policies under different settings. Results indicate that the proposed solution outperform others in service management time, network usage, ram consumption, and simulation time.

...read moreread less

36 citations

Journal Article•DOI•

Cloud, Fog or Edge: Where to Compute?

[...]

Dragi Kimovski¹, Roland Matha¹, Josef Hammer¹, Narges Mehran¹, Hermann Hellwagner¹, Radu Prodan¹ - Show less +2 more•Institutions (1)

Information Technology University¹

25 Jan 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors provide a detailed performance and carbon footprint analysis of a selection of use case applications with complementary resource requirements across the computing continuum over a real-life evaluation testbed.

...read moreread less

Abstract: The computing continuum extends the high-performance cloud data centers with energy-efficient and low-latency devices close to the data sources located at the edge of the network. However, the heterogeneity of the computing continuum raises multiple challenges related to application management. These include where to offload an application - from the cloud to the edge - to meet its computation and communication requirements. To support these decisions, we provide in this article a detailed performance and carbon footprint analysis of a selection of use case applications with complementary resource requirements across the computing continuum over a real-life evaluation testbed.

...read moreread less

35 citations

Journal Article•DOI•

COSCO: Container Orchestration using Co-Simulation and Gradient Based Optimization for Fog Computing Environments

[...]

Shreshth Tuli¹, Shivananda R. Poojara², Satish Narayana Srirama³, Giuliano Casale¹, Nicholas R. Jennings¹ - Show less +1 more•Institutions (3)

Imperial College London¹, University of Tartu², University UCINF³

29 Apr 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, a hybrid simulation and container orchestration framework is proposed to optimize Quality of Service (QoS) parameters in large-scale fog platforms, using a gradient-based optimization strategy using back-propagation of gradients with respect to input.

...read moreread less

Abstract: Intelligent task placement and management of tasks in large-scale fog platforms is challenging due to the highly volatile nature of modern workload applications and sensitive user requirements of low energy consumption and response time. Container orchestration platforms have emerged to alleviate this problem with prior art either using heuristics to quickly reach scheduling decisions or AI driven methods like reinforcement learning and evolutionary approaches to adapt to dynamic scenarios. The former often fail to quickly adapt in highly dynamic environments, whereas the latter have run-times that are slow enough to negatively impact response time. Therefore, there is a need for scheduling policies that are both reactive to work efficiently in volatile environments and have low scheduling overheads. To achieve this, we propose a Gradient Based Optimization Strategy using Back-propagation of gradients with respect to Input (GOBI). Further, we leverage the accuracy of predictive digital-twin models and simulation capabilities by developing a Coupled Simulation and Container Orchestration Framework (COSCO). Using this, we create a hybrid simulation driven decision approach, GOBI*, to optimize Quality of Service (QoS) parameters. Co-simulation and the back-propagation approaches allow these methods to adapt quickly in volatile environments. Experiments conducted using real-world data on fog applications using the GOBI and GOBI* methods, show a significant improvement in terms of energy consumption, response time, Service Level Objective and scheduling time by up to 15, 40, 4, and 82 percent respectively when compared to the state-of-the-art algorithms.

...read moreread less

33 citations

Proceedings Article•DOI•

MIND: In-Network Memory Management for Disaggregated Data Centers.

[...]

Seung-seob Lee¹, Yanpeng Yu², Yupeng Tang², Anurag Khandelwal², Lin Zhong², Abhishek Bhattacharjee² - Show less +2 more•Institutions (2)

Association for Computing Machinery¹, Yale University²

01 Jul 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: MIND as discussed by the authors is an in-network memory management unit for rack-scale memory disaggregation, which enables transparent resource elasticity and high utilization for resources in data centers by physically separating memory and compute into network-attached resource "blades".

...read moreread less

Abstract: Memory-compute disaggregation promises transparent elasticity, high utilization and balanced usage for resources in data centers by physically separating memory and compute into network-attached resource "blades". However, existing designs achieve performance at the cost of resource elasticity, restricting memory sharing to a single compute blade to avoid costly memory coherence traffic over the network. In this work, we show that emerging programmable network switches can enable an efficient shared memory abstraction for disaggregated architectures by placing memory management logic in the network fabric. We find that centralizing memory management in the network permits bandwidth and latency-efficient realization of in-network cache coherence protocols, while programmable switch ASICs support other memory management logic at line-rate. We realize these insights into MIND, an in-network memory management unit for rack-scale memory disaggregation. MIND enables transparent resource elasticity while matching the performance of prior memory disaggregation proposals for real-world workloads.

...read moreread less

29 citations

Posted Content•DOI•

Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language.

[...]

Michael R. Crusoe, Sanne Abeln, Alexandru Iosup, Peter Amstutz, John Chilton, Nebojsa Tijanic, Hervé Ménager, Stian Soiland-Reyes, Carole Goble - Show less +5 more

14 May 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: A widely used standard for portable multilingual data analysis pipelines would enable considerable benefits to scholarly publication reuse, research/industry collaboration, regulatory cost control, and to the environment.

...read moreread less

Abstract: A widely used standard for portable multilingual data analysis pipelines would enable considerable benefits to scholarly publication reuse, research/industry collaboration, regulatory cost control, and to the environment. Published research that used multiple computer languages for their analysis pipelines would include a complete and reusable description of that analysis that is runnable on a diverse set of computing environments. Researchers would be able to easier collaborate and reuse these pipelines, adding or exchanging components regardless of programming language used; collaborations with and within the industry would be easier; approval of new medical interventions that rely on such pipelines would be faster. Time will be saved and environmental impact would also be reduced, as these descriptions contain enough information for advanced optimization without user intervention. Workflows are widely used in data analysis pipelines, enabling innovation and decision-making for the modern society. In many domains the analysis components are numerous and written in multiple different computer languages by third parties. However, lacking a standard for reusable and portable multilingual workflows, then reusing published multilingual workflows, collaborating on open problems, and optimizing their execution would be severely hampered. Moreover, only a standard for multilingual data analysis pipelines that was widely used would enable considerable benefits to research-industry collaboration, regulatory cost control, and to preserving the environment. Prior to the start of the CWL project, there was no standard for describing multilingual analysis pipelines in a portable and reusable manner. Even today / currently, although there exist hundreds of single-vendor and other single-source systems that run workflows, none is a general, community-driven, and consensus-built standard.

...read moreread less

26 citations

Proceedings Article•DOI•

LaSS: Running Latency Sensitive Serverless Computations at the Edge

[...]

Bin Wang¹, Ahmed Ali-Eldin², Prashant Shenoy¹•Institutions (2)

University of Massachusetts Amherst¹, Chalmers University of Technology²

29 Apr 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, the authors present a platform that uses model-driven approaches for running latency-sensitive serverless computations on edge resources, which can accurately predict the resources needed for serverless functions in the presence of highly dynamic workloads, and reprovision container capacity within hundreds of milliseconds while maintaining fair share allocation guarantees.

...read moreread less

Abstract: Serverless computing has emerged as a new paradigm for running short-lived computations in the cloud. Due to its ability to handle IoT workloads, there has been considerable interest in running serverless functions at the edge. However, the constrained nature of the edge and the latency sensitive nature of workloads result in many challenges for serverless platforms. In this paper, we present LaSS, a platform that uses model-driven approaches for running latency-sensitive serverless computations on edge resources. LaSS uses principled queuing-based methods to determine an appropriate allocation for each hosted function and auto-scales the allocated resources in response to workload dynamics. LaSS uses a fair-share allocation approach to guarantee a minimum of allocated resources to each function in the presence of overload. In addition, it utilizes resource reclamation methods based on container deflation and termination to reassign resources from over-provisioned functions to under-provisioned ones. We implement a prototype of our approach on an OpenWhisk serverless edge cluster and conduct a detailed experimental evaluation. Our results show that LaSS can accurately predict the resources needed for serverless functions in the presence of highly dynamic workloads, and reprovision container capacity within hundreds of milliseconds while maintaining fair share allocation guarantees.

...read moreread less

Proceedings Article•DOI•

Benchmarking, Analysis, and Optimization of Serverless Function Snapshots

[...]

Dmitrii Ustiugov¹, Plamen Petrov¹, Marios Kogias², Edouard Bugnion³, Boris Grot¹ - Show less +1 more•Institutions (3)

University of Edinburgh¹, Microsoft², École Polytechnique Fédérale de Lausanne³

16 Jan 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, the authors introduce vHive, an open-source framework for serverless experimentation with the goal of enabling researchers to study and innovate across the entire serverless stack.

...read moreread less

Abstract: Serverless computing has seen rapid adoption due to its high scalability and flexible, pay-as-you-go billing model. In serverless, developers structure their services as a collection of functions, sporadically invoked by various events like clicks. High inter-arrival time variability of function invocations motivates the providers to start new function instances upon each invocation, leading to significant cold-start delays that degrade user experience. To reduce cold-start latency, the industry has turned to snapshotting, whereby an image of a fully-booted function is stored on disk, enabling a faster invocation compared to booting a function from scratch. This work introduces vHive, an open-source framework for serverless experimentation with the goal of enabling researchers to study and innovate across the entire serverless stack. Using vHive, we characterize a state-of-the-art snapshot-based serverless infrastructure, based on industry-leading Containerd orchestration framework and Firecracker hypervisor technologies. We find that the execution time of a function started from a snapshot is 95% higher, on average, than when the same function is memory-resident. We show that the high latency is attributable to frequent page faults as the function's state is brought from disk into guest memory one page at a time. Our analysis further reveals that functions access the same stable working set of pages across different invocations of the same function. By leveraging this insight, we build REAP, a light-weight software mechanism for serverless hosts that records functions' stable working set of guest memory pages and proactively prefetches it from disk into memory. Compared to baseline snapshotting, REAP slashes the cold-start delays by 3.7x, on average.

...read moreread less

Posted Content•DOI•

Blockchain for Mobile Edge Computing: Consensus Mechanisms and Scalability

[...]

Jorge Peña Queralta¹, Tomi Westerlund¹•Institutions (1)

University of Turku¹

01 Jan 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This chapter reviews existing consensus protocols and scalability techniques in both well-established and next-generation blockchain architectures for managing MEC services and evaluates the most suitable solutions and discusses the benefits and drawbacks of the available alternatives.

...read moreread less

Abstract: Mobile edge computing (MEC) and next-generation mobile networks are set to disrupt the way intelligent and autonomous systems are interconnected. This will have an effect on a wide range of domains, from the Internet of Things to autonomous mobile robots. The integration of such a variety of MEC services in an inherently distributed architecture requires a robust system for managing hardware resources, balancing the network load and securing the distributed applications. Blockchain technology has emerged a solution for managing MEC services, with consensus protocols and data integrity checks that enable transparent and efficient distributed decision-making. In addition to transparency, the benefits from a security point of view are evident. Nonetheless, blockchain technology faces significant challenges in terms of scalability. In this chapter, we review existing consensus protocols and scalability techniques in both well-established and next-generation blockchain architectures. From this, we evaluate the most suitable solutions for managing MEC services and discuss the benefits and drawbacks of the available alternatives.

...read moreread less

Posted Content•

ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning.

[...]

Samyam Rajbhandari¹, Olatunji Ruwase¹, Jeff Rasley¹, Shaden Smith¹, Yuxiong He¹ - Show less +1 more•Institutions (1)

Microsoft¹

16 Apr 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The ZeRO-Infinity project as discussed by the authors leverages GPU, CPU, and NVMe memory to allow for unprecedented model scale on limited resources without requiring model code refactoring.

...read moreread less

Abstract: In the last three years, the largest dense deep learning models have grown over 1000x to reach hundreds of billions of parameters, while the GPU memory has only grown by 5x (16 GB to 80 GB). Therefore, the growth in model scale has been supported primarily though system innovations that allow large models to fit in the aggregate GPU memory of multiple GPUs. However, we are getting close to the GPU memory wall. It requires 800 NVIDIA V100 GPUs just to fit a trillion parameter model for training, and such clusters are simply out of reach for most data scientists. In addition, training models at that scale requires complex combinations of parallelism techniques that puts a big burden on the data scientists to refactor their model. In this paper we present ZeRO-Infinity, a novel heterogeneous system technology that leverages GPU, CPU, and NVMe memory to allow for unprecedented model scale on limited resources without requiring model code refactoring. At the same time it achieves excellent training throughput and scalability, unencumbered by the limited CPU or NVMe bandwidth. ZeRO-Infinity can fit models with tens and even hundreds of trillions of parameters for training on current generation GPU clusters. It can be used to fine-tune trillion parameter models on a single NVIDIA DGX-2 node, making large models more accessible. In terms of training throughput and scalability, it sustains over 25 petaflops on 512 NVIDIA V100 GPUs(40% of peak), while also demonstrating super linear scalability. An open source implementation of ZeRO-Infinity is available through DeepSpeed, a deep learning optimization library that makes distributed training easy, efficient, and effective.

...read moreread less

Posted Content•DOI•

Let's Wait Awhile: How Temporal Workload Shifting Can Reduce Carbon Emissions in the Cloud.

[...]

Philipp Wiesner¹, Ilja Behnke¹, Dominik Scheinert¹, Kordian Gontarska², Lauritz Thamsen¹ - Show less +1 more•Institutions (2)

Technical University of Berlin¹, University of Potsdam²

25 Oct 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, the authors examine the potential impact of shifting computational workloads towards times where the energy supply is expected to be less carbon-intensive, and identify characteristics of delay-tolerant workloads and analyze the potential for temporal workload shifting in Germany, Great Britain, France, and California over the year 2020.

...read moreread less

Abstract: Depending on energy sources and demand, the carbon intensity of the public power grid fluctuates over time. Exploiting this variability is an important factor in reducing the emissions caused by data centers. However, regional differences in the availability of low-carbon energy sources make it hard to provide general best practices for when to consume electricity. Moreover, existing research in this domain focuses mostly on carbon-aware workload migration across geo-distributed data centers, or addresses demand response purely from the perspective of power grid stability and costs. In this paper, we examine the potential impact of shifting computational workloads towards times where the energy supply is expected to be less carbon-intensive. To this end, we identify characteristics of delay-tolerant workloads and analyze the potential for temporal workload shifting in Germany, Great Britain, France, and California over the year 2020. Furthermore, we experimentally evaluate two workload shifting scenarios in a simulation to investigate the influence of time constraints, scheduling strategies, and the accuracy of carbon intensity forecasts. To accelerate research in the domain of carbon-aware computing and to support the evaluation of novel scheduling algorithms, our simulation framework and datasets are publicly available.

...read moreread less

Proceedings Article•DOI•

Rearchitecting Kubernetes for the Edge

[...]

Andrew Jeffery¹, Heidi Howard¹, Richard Mortier¹•Institutions (1)

University of Cambridge¹

06 Apr 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, the authors revisit the requirement of strong consistency and propose an eventually consistent approach instead, which enables higher performance, availability and scalability whilst still supporting the broad needs of Kubernetes.

...read moreread less

Abstract: Recent years have seen Kubernetes emerge as a primary choice for container orchestration. Kubernetes largely targets the cloud environment but new use cases require performant, available and scalable orchestration at the edge. Kubernetes stores all cluster state in etcd, a strongly consistent key-value store. We find that at larger etcd cluster sizes, offering higher availability, write request latency significantly increases and throughput decreases similarly. Coupled with approximately 30% of Kubernetes requests being writes, this directly impacts the request latency and availability of Kubernetes, reducing its suitability for the edge. We revisit the requirement of strong consistency and propose an eventually consistent approach instead. This enables higher performance, availability and scalability whilst still supporting the broad needs of Kubernetes. This aims to make Kubernetes much more suitable for performance-critical, dynamically-scaled edge solutions.

...read moreread less

Journal Article•DOI•

Joint QoS-aware and Cost-efficient Task Scheduling for Fog-Cloud Resources in a Volunteer Computing System

[...]

Farooq Hoseiny¹, Sadoon Azizi², Mohammad Shojafar³, Rahim Tafazolli³•Institutions (3)

IT University¹, University of Kurdistan², University of Surrey³

28 Apr 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, two task scheduling algorithms for VCSs, named Min-CCV and Min-V, were proposed to jointly minimize the computation, communication and delay violation cost for the Internet of Things (IoT) requests.

...read moreread less

Abstract: Volunteer computing is an Internet-based distributed computing system in which volunteers share their extra available resources to manage large-scale tasks. However, computing devices in a Volunteer Computing System (VCS) are highly dynamic and heterogeneous in terms of their processing power, monetary cost, and data transferring latency. To ensure both the high Quality of Service (QoS) and low cost for different requests, all of the available computing resources must be used efficiently. Task scheduling is an NP-hard problem that is considered one of the main critical challenges in a heterogeneous VCS. Due to this, in this paper, we design two task scheduling algorithms for VCSs, named Min-CCV and Min-V. The main goal of the proposed algorithms is jointly minimizing the computation, communication and delay violation cost for the Internet of Things (IoT) requests. Our extensive simulation results show that proposed algorithms are able to allocate tasks to volunteer fog/cloud resources more efficiently than the state-of-the-art. Specifically, our algorithms improve the deadline satisfaction task rates by around 99.5% and decrease the total cost between 15 to 53% in comparison with the genetic-based algorithm.

...read moreread less

Proceedings Article•DOI•

Basil: Breaking up BFT with ACID (transactions).

[...]

Florian Suri-Payer¹, Matthew Burke¹, Zheng Wang¹, Yunhao Zhang¹, Lorenzo Alvisi¹, Natacha Crooks² - Show less +2 more•Institutions (2)

Cornell University¹, University of California, Berkeley²

25 Sep 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: Basil as discussed by the authors leverages ACID transactions to scalably implement the abstraction of a trusted shared log in the presence of Byzantine actors and improves throughput over traditional BFT systems by four to five times, and is only four times slower than TAPIR, a non-Byzantine replicated system.

...read moreread less

Abstract: This paper presents Basil, the first transactional, leaderless Byzantine Fault Tolerant key-value store. Basil leverages ACID transactions to scalably implement the abstraction of a trusted shared log in the presence of Byzantine actors. Unlike traditional BFT approaches, Basil executes non-conflicting operations in parallel and commits transactions in a single round-trip during fault-free executions. Basil improves throughput over traditional BFT systems by four to five times, and is only four times slower than TAPIR, a non-Byzantine replicated system. Basil's novel recovery mechanism further minimizes the impact of failures: with 30% Byzantine clients, throughput drops by less than 25% in the worst-case.

...read moreread less

Journal Article•DOI•

Con-Pi: A Distributed Container-based Edge and Fog Computing Framework.

[...]

Redowan Mahmud¹, Adel Nadjaran Toosi¹•Institutions (1)

Monash University¹

10 Jan 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: Con-Pi as discussed by the authors exploits the concept of containerization and harnesses Docker containers to run IoT applications as micro-services, and operates in a distributed manner across multiple RPis and enables them to share resources.

...read moreread less

Abstract: Edge and Fog computing paradigms overcome the limitations of cloud-centric execution for different latency-sensitive Internet of Things (IoT) applications by offering computing resources closer to the data sources. Small single-board computers (SBCs) like Raspberry Pis (RPis) are widely used as computing nodes in both paradigms. These devices are usually equipped with moderate speed processors and provide support for peripheral interfacing and networking, making them well-suited to deal with IoT-driven operations such as data sensing, analysis, and actuation. However, these small Edge devices are constrained in facilitating multi-tenancy and resource sharing. The management of computing and peripheral resources through centralized entities further degrades their performance and service quality significantly. To address these issues, a fully distributed framework, named Con-Pi, is proposed in this work to manage resources at the Edge or Fog environments. Con-Pi exploits the concept of containerization and harnesses Docker containers to run IoT applications as micro-services. %Moreover, Con-Pi operates in a distributed manner across multiple RPis and enables them to share resources. The software system of the proposed framework also provides a scope to integrate different IoT applications, resource and energy management policies for Edge and Fog computing. Its performance is compared with the state-of-the-art frameworks through real-world experiments. The experimental results show that Con-Pi outperforms others in enhancing response time and managing energy usage and computing resources through its distributed offloading model. Further, we have developed an automated pest bird deterrent system using Con-Pi to demonstrate its suitability in developing practical solutions for various IoT-enabled use cases, including smart agriculture.

...read moreread less

Posted Content•

Blockchain for IoT Access Control: Recent Trends and Future Research Directions.

[...]

Shantanu Pal¹, Ali Dorri¹, Raja Jurdak¹•Institutions (1)

Queensland University of Technology¹

09 Jun 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors review the recent trends and critical needs for blockchain-based solutions for IoT access control including decentralised control, secure storage and sharing information in a trustless manner, including their benefits and limitations.

...read moreread less

Abstract: With the rapid development of wireless sensor networks, smart devices, and traditional information and communication technologies, there is tremendous growth in the use of Internet of Things (IoT) applications and services in our everyday life. IoT systems deal with high volumes of data. This data can be particularly sensitive, as it may include health, financial, location, and other highly personal information. Fine-grained security management in IoT demands effective access control. Several proposals discuss access control for the IoT, however, a limited focus is given to the emerging blockchain-based solutions for IoT access control. In this paper, we review the recent trends and critical needs for blockchain-based solutions for IoT access control. We identify several important aspects of blockchain, including decentralised control, secure storage and sharing information in a trustless manner, for IoT access control including their benefits and limitations. Finally, we note some future research directions on how to converge blockchain in IoT access control efficiently and effectively.

...read moreread less

Posted Content•

Clio: A Hardware-Software Co-Designed Disaggregated Memory System.

[...]

Zhiyuan Guo, Yizhou Shan, Xuhao Luo, Yutong Huang, Yiying Zhang - Show less +1 more

07 Aug 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: Clio as discussed by the authors is a hardware-based memory disaggregation solution that has the right amount of processing power at memory nodes, which is a clean-slate approach by starting from the requirements of memory disaggregation and designing a memory-disaggregation-native system.

...read moreread less

Abstract: Memory disaggregation has attracted great attention recently because of its benefits in efficient memory utilization and ease of management. So far, memory disaggregation research has all taken one of two approaches, building/emulating memory nodes with either regular servers or raw memory devices with no processing power. The former incurs higher monetary cost and face tail latency and scalability limitations, while the latter introduce performance, security, and management problems. Server-based memory nodes and memory nodes with no processing power are two extreme approaches. We seek a sweet spot in the middle by proposing a hardware-based memory disaggregation solution that has the right amount of processing power at memory nodes. Furthermore, we take a clean-slate approach by starting from the requirements of memory disaggregation and designing a memory-disaggregation-native system. We propose a hardware-based disaggregated memory system, Clio, that virtualizes and manages disaggregated memory at the memory node. Clio includes a new hardware-based virtual memory system, a customized network system, and a framework for computation offloading. In building Clio, we not only co-design OS functionalities, hardware architecture, and the network system, but also co-design the compute node and memory node. We prototyped Clio's memory node with FPGA and implemented its client-node functionalities in a user-space library. Clio achieves 100 Gbps throughput and an end-to-end latency of 2.5 us at median and 3.2 us at the 99th percentile. Clio scales much better and has orders of magnitude lower tail latency than RDMA, and it has 1.1x to 3.4x energy saving compared to CPU-based and SmartNIC-based disaggregated memory systems and is 2.7x faster than software-based SmartNIC solutions.

...read moreread less

Journal Article•DOI•

A Survey on Resilience in the IoT: Taxonomy, Classification and Discussion of Resilience Mechanisms

[...]

Christian Berger¹, Philipp Eichhammer¹, Hans P. Reiser¹, Jörg Domaschka², Franz J. Hauck², Gerhard Habiger² - Show less +2 more•Institutions (2)

University of Passau¹, University of Ulm²

06 Sep 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors present a taxonomy and classification of resilience and resilience mechanisms and subsequently survey state-of-the-art resilience mechanisms that have been proposed by research work and are applicable to IoT.

...read moreread less

Abstract: Internet-of-Things (IoT) ecosystems tend to grow both in scale and complexity as they consist of a variety of heterogeneous devices, which span over multiple architectural IoT layers (e.g., cloud, edge, sensors). Further, IoT systems increasingly demand the resilient operability of services as they become part of critical infrastructures. This leads to a broad variety of research works that aim to increase the resilience of these systems. In this paper, we create a systematization of knowledge about existing scientific efforts of making IoT systems resilient. In particular, we first discuss the taxonomy and classification of resilience and resilience mechanisms and subsequently survey state-of-the-art resilience mechanisms that have been proposed by research work and are applicable to IoT. As part of the survey, we also discuss questions that focus on the practical aspects of resilience, e.g., which constraints resilience mechanisms impose on developers when designing resilient systems by incorporating a specific mechanism into IoT systems.

...read moreread less

Posted Content•

Machine Learning (ML)-Centric Resource Management in Cloud Computing: A Review and Future Directions.

[...]

Tahseen Khan, Wenhong Tian, Rajkumar Buyya

09 May 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, a detailed review of challenges in ML-based resource management in current research, as well as current approaches to resolve these challenges and their advantages and limitations is presented.

...read moreread less

Abstract: Cloud computing has rapidly emerged as model for delivering Internet-based utility computing services. In cloud computing, Infrastructure as a Service (IaaS) is one of the most important and rapidly growing fields. Cloud providers provide users/machines resources such as virtual machines, raw (block) storage, firewalls, load balancers, and network devices in this service model. One of the most important aspects of cloud computing for IaaS is resource management. Scalability, quality of service, optimum utility, reduced overheads, increased throughput, reduced latency, specialised environment, cost effectiveness, and a streamlined interface are some of the advantages of resource management for IaaS in cloud computing. Traditionally, resource management has been done through static policies, which impose certain limitations in various dynamic scenarios, prompting cloud service providers to adopt data-driven, machine-learning-based approaches. Machine learning is being used to handle a variety of resource management tasks, including workload estimation, task scheduling, VM consolidation, resource optimization, and energy optimization, among others. This paper provides a detailed review of challenges in ML-based resource management in current research, as well as current approaches to resolve these challenges, as well as their advantages and limitations. Finally, we propose potential future research directions based on identified challenges and limitations in current research.

...read moreread less

Proceedings Article•DOI•

The Synergy of Complex Event Processing and Tiny Machine Learning in Industrial IoT

[...]

Haoyu Ren¹, Darko Anicic², Thomas A. Runkler¹•Institutions (2)

Technische Universität München¹, Siemens²

04 May 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, a framework that exploits ML and CEP's synergy at the edge in distributed sensor networks is proposed, where the computation from the cloud to the power-constrained IIoT devices and allow users to adapt the on-device ML model and the CEP reasoning logic flexibly on the fly without requiring to reupload the whole program.

...read moreread less

Abstract: Focusing on comprehensive networking, big data, and artificial intelligence, the Industrial Internet-of-Things (IIoT) facilitates efficiency and robustness in factory operations. Various sensors and field devices play a central role, as they generate a vast amount of real-time data that can provide insights into manufacturing. The synergy of complex event processing (CEP) and machine learning (ML) has been developed actively in the last years in IIoT to identify patterns in heterogeneous data streams and fuse raw data into tangible facts. In a traditional compute-centric paradigm, the raw field data are continuously sent to the cloud and processed centrally. As IIoT devices become increasingly pervasive and ubiquitous, concerns are raised since transmitting such amount of data is energy-intensive, vulnerable to be intercepted, and subjected to high latency. The data-centric paradigm can essentially solve these problems by empowering IIoT to perform decentralized on-device ML and CEP, keeping data primarily on edge devices and minimizing communications. However, this is no mean feat because most IIoT edge devices are designed to be computationally constrained with low power consumption. This paper proposes a framework that exploits ML and CEP's synergy at the edge in distributed sensor networks. By leveraging tiny ML and micro CEP, we shift the computation from the cloud to the power-constrained IIoT devices and allow users to adapt the on-device ML model and the CEP reasoning logic flexibly on the fly without requiring to reupload the whole program. Lastly, we evaluate the proposed solution and show its effectiveness and feasibility using an industrial use case of machine safety monitoring.

...read moreread less

Posted Content•

Energy Footprint of Blockchain Consensus Mechanisms Beyond Proof-of-Work.

[...]

Moritz Platt, Johannes Sedlmeir, Daniel Platt, Paolo Tasca, Jiahua Xu, Nikhil Vadgama, Juan Ignacio Ibañez - Show less +3 more

08 Sep 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors take a first step towards comparing the energy requirements of different proof-of-stake (PoS) based DLT systems to understand whether they achieve the goal of Sybil attack resistance.

...read moreread less

Abstract: Popular distributed ledger technology (DLT) systems using proof-of-work (PoW) for Sybil attack resistance have extreme energy requirements, drawing stern criticism from academia, businesses, and the media. DLT systems building on alternative consensus mechanisms, foremost proof-of-stake (PoS), aim to address this downside. In this paper, we take a first step towards comparing the energy requirements of such systems to understand whether they achieve this goal equally well. While multiple studies have been undertaken that analyze the energy demands of individual Blockchains, little comparative work has been done. We approach this research question by formalizing a basic consumption model for PoS blockchains. Applying this model to six archetypal blockchains generates three main findings: First, we confirm the concerns around the energy footprint of PoW by showing that Bitcoin's energy consumption exceeds the energy consumption of all PoS-based systems analyzed by at least three orders of magnitude. Second, we illustrate that there are significant differences in energy consumption among the PoSbased systems analyzed, with permissionless systems having an overall larger energy footprint. Third, we point out that the type of hardware that validators use has a considerable impact on whether PoS blockchains' energy consumption is comparable with or considerably larger than that of centralized, non-DLT systems.

...read moreread less

Posted Content•

A Review on Parallel Virtual Screening Softwares for High Performance Computers.

[...]

Natarajan Arul Murugan, Artur Podobas, Davide Gadioli, Emanuele Vitali, Gianluca Palermo, Stefano Markidis - Show less +2 more

30 Nov 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the authors discuss implementations of parallelization algorithms in virtual screening programs and discuss the nature of different scoring functions and search algorithms, together with a performance analysis of several docking softwares ported on high-performance computing architectures.

...read moreread less

Abstract: Drug discovery is the most expensive, time demanding and challenging project in biopharmaceutical companies which aims at the identification and optimization of lead compounds from large-sized chemical libraries. The lead compounds should have high affinity binding and specificity for a target associated with a disease and in addition they should have favorable pharmacodynamic and pharmacokinetic properties (grouped as ADMET properties). Overall, drug discovery is a multivariable optimization and can be carried out in supercomputers using a reliable scoring function which is a measure of binding affinity or inhibition potential of the drug-like compound. The major problem is that the number of compounds in the chemical spaces is huge making the computational drug discovery very demanding. However, it is cheaper and less time consuming when compared to experimental high throughput screening. As the problem is to find the most stable (global) minima for numerous protein-ligand complexes (at the order of 10$^6$ to 10$^{12}$), the parallel implementation of in-silico virtual screening can be exploited to make the drug discovery in affordable time. In this review, we discuss such implementations of parallelization algorithms in virtual screening programs. The nature of different scoring functions and search algorithms are discussed, together with a performance analysis of several docking softwares ported on high-performance computing architectures.

...read moreread less

Posted Content•

Federated Learning for Privacy-Preserving Open Innovation Future on Digital Health

[...]

Guodong Long, Tao Shen, Yue Tan, Leah Gerrard, Allison Clarke, Jing Jiang - Show less +2 more

24 Aug 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, the authors discuss how federated learning can enable the development of an open health ecosystem with the support of AI and propose a game-changing collaborative framework that offers knowledge sharing from diverse data with a privacy-preserving.

...read moreread less

Abstract: Privacy protection is an ethical issue with broad concern in Artificial Intelligence (AI). Federated learning is a new machine learning paradigm to learn a shared model across users or organisations without direct access to the data. It has great potential to be the next-general AI model training framework that offers privacy protection and therefore has broad implications for the future of digital health and healthcare informatics. Implementing an open innovation framework in the healthcare industry, namely open health, is to enhance innovation and creative capability of health-related organisations by building a next-generation collaborative framework with partner organisations and the research community. In particular, this game-changing collaborative framework offers knowledge sharing from diverse data with a privacy-preserving. This chapter will discuss how federated learning can enable the development of an open health ecosystem with the support of AI. Existing challenges and solutions for federated learning will be discussed.

...read moreread less

Journal Article•DOI•

LightChain: Scalable DHT-Based Blockchain

[...]

Yahya Hassanzadeh-Nazarabadi¹, Alptekin Küpçü¹, Oznur Ozkasap¹•Institutions (1)

Koç University¹

01 Sep 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: LightChain this paper is a permissionless blockchain that provides addressable blocks and transactions within the network, which makes them efficiently accessible by all peers, and each block and transaction is replicated within the DHT of peers and is retrieved in an on-demand manner.

...read moreread less

Abstract: As an append-only distributed database, blockchain is utilized in a vast variety of applications including the cryptocurrency and Internet-of-Things (IoT). The existing blockchain solutions show downsides in communication and storage scalability, as well as decentralization. In this article, we propose LightChain , which is the first blockchain architecture that operates over a Distributed Hash Table (DHT) of participating peers. LightChain is a permissionless blockchain that provides addressable blocks and transactions within the network, which makes them efficiently accessible by all peers. Each block and transaction is replicated within the DHT of peers and is retrieved in an on-demand manner. Hence, peers in LightChain are not required to retrieve or keep the entire ledger. LightChain is fair as all of the participating peers have a uniform chance of being involved in the consensus regardless of their influence such as hashing power or stake. We provide formal mathematical analysis and experimental results (simulations and cloud deployment) to demonstrate the security, efficiency, and fairness of LightChain , and show that LightChain is the only existing blockchain that can provide integrity under the corrupted majority power of peers. As we experimentally demonstrate, compared to the mainstream blockchains such as Bitcoin and Ethereum, LightChain requires around 66 times smaller per node storage, and is around 380 times faster on bootstrapping a new node to the system, and each LightChain node is rewarded equally likely for participating in the protocol.

...read moreread less

Journal Article•DOI•

VPIC 2.0: Next Generation Particle-in-Cell Simulations

[...]

Robert Bird¹, Nigel Tan¹, Scott V. Luedtke², Stephen Lien Harrell², Michela Taufer², Brian Albright³ - Show less +2 more•Institutions (3)

University of Tennessee¹, Los Alamos National Laboratory², University of Texas at Austin³

25 Feb 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, the authors demonstrate the unique challenges involved in preparing the Particle-in-Cell (VPIC) code for operation at exascale, outlining important optimizations to make VPIC efficient on accelerators.

...read moreread less

Abstract: VPIC is a general purpose Particle-in-Cell simulation code for modeling plasma phenomena such as magnetic reconnection, fusion, solar weather, and laser-plasma interaction in three dimensions using large numbers of particles. VPIC's capacity in both fidelity and scale makes it particularly well-suited for plasma research on pre-exascale and exascale platforms. In this paper we demonstrate the unique challenges involved in preparing the VPIC code for operation at exascale, outlining important optimizations to make VPIC efficient on accelerators. Specifically, we show the work undertaken in adapting VPIC to exploit the portability-enabling framework Kokkos and highlight the enhancements to VPIC's modeling capabilities to achieve performance at exascale. We assess the achieved performance-portability trade-off through a suite of studies on nine different varieties of modern pre-exascale hardware. Our performance-portability study includes weak-scaling runs on three of the top ten TOP500 supercomputers, as well as a comparison of low-level system performance of hardware from four different vendors.

...read moreread less

Proceedings Article•DOI•

Towards a Computing Platform for the LEO Edge

[...]

Tobias Pfandzelter¹, Jonathan Hasenburg¹, David Bermbach¹•Institutions (1)

Technical University of Berlin¹

06 Apr 2021-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this article, the suitability of three organization paradigms for applications considering developer requirements is analyzed and the serverless approach is the most promising solution, opening up the field for future research.

...read moreread less

Abstract: The new space race is heating up as private companies such as SpaceX and Amazon are building large satellite constellations in low-earth orbit (LEO) to provide global broadband internet access. As the number of subscribers connected to this access network grows, it becomes necessary to investigate if and how edge computing concepts can be applied to LEO satellite networks. In this paper, we discuss the unique characteristics of the LEO edge and analyze the suitability of three organization paradigms for applications considering developer requirements. We conclude that the serverless approach is the most promising solution, opening up the field for future research.

...read moreread less

Collapse