scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Parallel and Distributed Systems in 2021"


Journal ArticleDOI
Moming Duan1, Duo Liu1, Xianzhang Chen1, Renping Liu1, Yujuan Tan1, Liang Liang1 
TL;DR: A self-balancing FL framework named Astraea is built, which relieves global imbalance by adaptive data augmentation and downsampling, and for averaging the local imbalance, it creates the mediator to reschedule the training of clients based on Kullback–Leibler divergence (KLD) of their data distribution.
Abstract: Federated learning (FL) is a distributed deep learning method that enables multiple participants, such as mobile and IoT devices, to contribute a neural network while their private training data remains in local devices. This distributed approach is promising in the mobile systems where have a large corpus of decentralized data and require high privacy. However, unlike the common datasets, the data distribution of the mobile systems is imbalanced which will increase the bias of model. In this article, we demonstrate that the imbalanced distributed training data will cause an accuracy degradation of FL applications. To counter this problem, we build a self-balancing FL framework named Astraea, which alleviates the imbalances by 1) Z-score-based data augmentation, and 2) Mediator-based multi-client rescheduling. The proposed framework relieves global imbalance by adaptive data augmentation and downsampling, and for averaging the local imbalance, it creates the mediator to reschedule the training of clients based on Kullback–Leibler divergence (KLD) of their data distribution. Compared with FedAvg , the vanilla FL algorithm, Astraea shows +4.39 and +6.51 percent improvement of top-1 accuracy on the imbalanced EMNIST and imbalanced CINIC-10 datasets, respectively. Meanwhile, the communication traffic of Astraea is reduced by 75 percent compared to FedAvg .

199 citations


Journal ArticleDOI
TL;DR: An optimal double-layer PBFT is proposed and it is proved that when the nodes are evenly distributed within the sub-groups in the second layer, the communication complexity is minimized and the security threshold is analyzed based on faulty probability determined (FPD) and faulty number determined models, respectively.
Abstract: Practical Byzantine Fault Tolerance (PBFT) consensus mechanism shows a great potential to break the performance bottleneck of the Proof-of-Work (PoW)-based blockchain systems, which typically support only dozens of transactions per second and require minutes to hours for transaction confirmation. However, due to frequent inter-node communications, PBFT mechanism has a poor node scalability and thus it is typically adopted in small networks. To enable PBFT in large systems such as massive Internet of Things (IoT) ecosystems and blockchain, in this article, a scalable multi-layer PBFT-based consensus mechanism is proposed by hierarchically grouping nodes into different layers and limiting the communication within the group. We first propose an optimal double-layer PBFT and show that the communication complexity is significantly reduced. Specifically, we prove that when the nodes are evenly distributed within the sub-groups in the second layer, the communication complexity is minimized. The security threshold is analyzed based on faulty probability determined (FPD) and faulty number determined (FND) models, respectively. We also provide a practical protocol for the proposed double-layer PBFT system. Finally, the results are extended to arbitrary-layer PBFT systems with communication complexity and security analysis. Simulation results verify the effectiveness of the analytical results.

160 citations


Journal ArticleDOI
TL;DR: An online algorithm, called CEDC-O, is proposed, developed based on Lyapunov optimization, works online without requiring future information, and achieves provable close-to-optimal performance.
Abstract: In the edge computing (EC) environment, edge servers are deployed at base stations to offer highly accessible computing and storage resources to nearby app users. From the app vendor's perspective, caching data on edge servers can ensure low latency in app users’ retrieval of app data. However, an edge server normally owns limited resources due to its limited size. In this article, we investigate the collaborative caching problem in the EC environment with the aim to minimize the system cost including data caching cost, data migration cost, and quality-of-service (QoS) penalty. We model this collaborative edge data caching problem (CEDC) as a constrained optimization problem and prove that it is $\mathcal {NP}$ NP -complete. We propose an online algorithm, called CEDC-O, to solve this CEDC problem during all time slots. CEDC-O is developed based on Lyapunov optimization, works online without requiring future information, and achieves provable close-to-optimal performance. CEDC-O is evaluated on a real-world data set, and the results demonstrate that it significantly outperforms four representative approaches.

130 citations


Journal ArticleDOI
TL;DR: The first attempt to formulate this Edge Data Distribution (EDD) problem as a constrained optimization problem from the app vendor's perspective and proposes an optimal approach named EDD-IP to solve this problem exactly with the Integer Programming technique.
Abstract: Edge computing, as an extension of cloud computing, distributes computing and storage resources from centralized cloud to distributed edge servers, to power a variety of applications demanding low latency, e.g., IoT services, virtual reality, real-time navigation, etc. From an app vendor's perspective, app data needs to be transferred from the cloud to specific edge servers in an area to serve the app users in the area. However, according to the pay-as-you-go business model, distributing a large amount of data from the cloud to edge servers can be expensive. The optimal data distribution strategy must minimize the cost incurred, which includes two major components, the cost of data transmission between the cloud to edge servers and the cost of data transmission between edge servers. In the meantime, the delay constraint must be fulfilled - the data distribution must not take too long. In this article, we make the first attempt to formulate this Edge Data Distribution (EDD) problem as a constrained optimization problem from the app vendor's perspective and prove its $\mathcal {NP}$ NP -hardness. We propose an optimal approach named EDD-IP to solve this problem exactly with the Integer Programming technique. Then, we propose an $O(k)$ O ( k ) -approximation algorithm named EDD-A for finding approximate solutions to large-scale EDD problems efficiently. EDD-IP and EDD-A are evaluated on a real-world dataset and the results demonstrate that they significantly outperform three representative approaches.

108 citations


Journal ArticleDOI
TL;DR: Biscotti as discussed by the authors uses blockchain and cryptographic primitives to coordinate a privacy-preserving ML process between peering clients, which is able to both protect the privacy of an individual client's update and maintain the performance of the global model at scale when 30 percent adversaries are present in the system.
Abstract: Federated Learning is the current state-of-the-art in supporting secure multi-party machine learning (ML): data is maintained on the owner's device and the updates to the model are aggregated through a secure protocol. However, this process assumes a trusted centralized infrastructure for coordination, and clients must trust that the central service does not use the byproducts of client data. In addition to this, a group of malicious clients could also harm the performance of the model by carrying out a poisoning attack. As a response, we propose Biscotti: a fully decentralized peer to peer (P2P) approach to multi-party ML, which uses blockchain and cryptographic primitives to coordinate a privacy-preserving ML process between peering clients. Our evaluation demonstrates that Biscotti is scalable, fault tolerant, and defends against known attacks. For example, Biscotti is able to both protect the privacy of an individual client's update and maintain the performance of the global model at scale when 30 percent adversaries are present in the system.

98 citations


Journal ArticleDOI
TL;DR: This article proposes a lightweight sampling-based probabilistic approach, namely EDI-V, to help app vendors audit the integrity of their data cached on a large scale of edge servers, and proposes a new data structure named variable Merkle hash tree (VMHT) for generating the integrity proofs of those data replicas during the audit.
Abstract: Edge computing allows app vendors to deploy their applications and relevant data on distributed edge servers to serve nearby users. Caching data on edge servers can minimize users’ data retrieval latency. However, such cache data are subject to both intentional and accidental corruption in the highly distributed, dynamic, and volatile edge computing environment. Given a large number of edge servers and their limited computing resources, how to effectively and efficiently audit the integrity of app vendors’ cache data is a critical and challenging problem. This article makes the first attempt to tackle this Edge Data Integrity (EDI) problem. We first analyze the threat model and the audit objectives, then propose a lightweight sampling-based probabilistic approach, namely EDI-V, to help app vendors audit the integrity of their data cached on a large scale of edge servers. We propose a new data structure named variable Merkle hash tree (VMHT) for generating the integrity proofs of those data replicas during the audit. VMHT can ensure the audit accuracy of EDI-V by maintaining sampling uniformity. EDI-V allows app vendors to inspect their cache data and locate the corrupted ones efficiently and effectively. Both theoretical analysis and comprehensively experimental evaluation demonstrate the efficiency and effectiveness of EDI-V.

85 citations


Journal ArticleDOI
TL;DR: This work proposes a decentrailized computation offloading algorithm with the purpose of minimizing average task completion time in the pervasive edge computing networks and shows that the solution has a significant advantage compared with other representative algorithms.
Abstract: Pervasive edge computing refers to one kind of edge computing that merely relies on edge devices with sensing, storage and communication abilities to realize peer-to-peer offloading without centralized management. Due to lack of unified coordination, users always pursue profits by maximizing their own utilities. However, on one hand, users may not make appropriate scheduling decisions based on their local observations. On the other hand, how to guarantee the fairness among different edge devices in the fully decentralized environment is rather challenging. To solve the above issues, we propose a decentrailized computation offloading algorithm with the purpose of minimizing average task completion time in the pervasive edge computing networks. We first derive a Nash equilibrium among devices by stochastic game theories based on the full observations of system states. After that, we design a traffic offloading algorithm based on partial observations by integrating general adversarial imitation learning. Multiple experts can provide demonstrations, so that devices can mimic the behaviors of corresponding experts by minimizing the gaps between the distributions of their observation-action pairs. At last, theoretical and performance results show that our solution has a significant advantage compared with other representative algorithms.

84 citations


Journal ArticleDOI
TL;DR: This work first utilizes Lyapunov optimization method to decompose the long-term optimization problem into a series of instant optimization problems, then a sample average approximation-based stochastic algorithm is proposed to approximate the future expected system utility.
Abstract: The explosive growth of mobile devices promotes the prosperity of novel mobile applications, which can be realized by service offloading with the assistance of edge computing servers. However, due to limited computation and storage capabilities of a single server, long service latency hinders the continuous development of service offloading in mobile networks. By supporting multi-server cooperation, Pervasive Edge Computing (PEC) is promising to enable service migration in highly dynamic mobile networks. With the objective of maximizing the system utility, we formulate the optimization problem by jointly considering the constraints of server storage capability and service execution latency. To enable dynamic service placement, we first utilize Lyapunov optimization method to decompose the long-term optimization problem into a series of instant optimization problems. Then, a sample average approximation-based stochastic algorithm is proposed to approximate the future expected system utility. Afterwards, a distributed Markov approximation algorithm is utilized to determine the service placement configurations. Through theoretical analysis, the time complexity of our proposed algorithm is linear to the number of users, and the backlog queue of PEC servers is stable. Performance evaluations are conducted based on both synthetic and real trace-driven scenarios, with numerical results demonstrating the effectiveness of our proposed algorithm from various aspects.

83 citations


Journal ArticleDOI
TL;DR: A collection of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications, is presented, aiming to establish a common toolbox to guide both performance engineers and compiler engineers in tapping into the performance potential offered by spatial computing architectures using HLS.
Abstract: Spatial computing architectures promise a major stride in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target spatial computing architectures, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes, due to fundamentally distinct aspects of hardware design, such as programming for deep pipelines, distributed memory resources, and scalable routing. To alleviate this, we present a collection of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. We systematically identify classes of transformations (pipelining, scalability, and memory), the characteristics of their effect on the HLS code and the resulting hardware (e.g., increasing data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip dataflow, allowing for massively parallel architectures. To quantify the effect of various transformations, we cover the optimization process of a sample set of HPC kernels, provided as open source reference codes. We aim to establish a common toolbox to guide both performance engineers and compiler engineers in tapping into the performance potential offered by spatial computing architectures using HLS.

83 citations


Journal ArticleDOI
TL;DR: An overview of recent advances of resource allocation in NFV is provided and classify and summarize the representative work for solving the generalized problems by considering various QoS parameters and different scenarios (e.g., edge cloud, online provisioning, and distributed provisioning).
Abstract: Network Function Virtualization (NFV) has been emerging as an appealing solution that transforms complex network functions from dedicated hardware implementations to software instances running in a virtualized environment. Due to the numerous advantages such as flexibility, efficiency, scalability, short deployment cycles, and service upgrade, NFV has been widely recognized as the next-generation network service provisioning paradigm. In NFV, the requested service is implemented by a sequence of Virtual Network Functions (VNF) that can run on generic servers by leveraging the virtualization technology. These VNFs are pitched with a predefined order through which data flows traverse, and it is also known as the Service Function Chaining (SFC). In this article, we provide an overview of recent advances of resource allocation in NFV. We generalize and analyze four representative resource allocation problems, namely, (1) the VNF Placement and Traffic Routing problem, (2) VNF Placement problem, (3) Traffic Routing problem in NFV, and (4) the VNF Redeployment and Consolidation problem. After that, we study the delay calculation models and VNF protection (availability) models in NFV resource allocation, which are two important Quality of Service (QoS) parameters. Subsequently, we classify and summarize the representative work for solving the generalized problems by considering various QoS parameters (e.g., cost, delay, reliability, and energy) and different scenarios (e.g., edge cloud, online provisioning, and distributed provisioning). Finally, we conclude our article with a short discussion on the state-of-the-art and emerging topics in the related fields, and highlight areas where we expect high potential for future research.

79 citations


Journal ArticleDOI
TL;DR: An estimation of the model exchange time between each client and the server is proposed, based on which a fairness guaranteed algorithm termed RBCS-F for problem-solving is designed.
Abstract: The issue of potential privacy leakage during centralized AI’s model training has drawn intensive concern from the public. A Parallel and Distributed Computing (or PDC) scheme, termed Federated Learning (FL), has emerged as a new paradigm to cope with the privacy issue by allowing clients to perform model training locally, without the necessity to upload their personal sensitive data. In FL, the number of clients could be sufficiently large, but the bandwidth available for model distribution and re-upload is quite limited, making it sensible to only involve part of the volunteers to participate in the training process. The client selection policy is critical to an FL process in terms of training efficiency, the final model’s quality as well as fairness. In this article, we will model the fairness guaranteed client selection as a Lyapunov optimization problem and then a $\mathbf {C^2MAB}$ C 2 MAB -based method is proposed for estimation of the model exchange time between each client and the server, based on which we design a fairness guaranteed algorithm termed RBCS-F for problem-solving. The regret of RBCS-F is strictly bounded by a finite constant, justifying its theoretical feasibility. Barring the theoretical results, more empirical data can be derived from our real training experiments on public datasets.

Journal ArticleDOI
TL;DR: A new construct to formally define a serverless application workflow is proposed, and a heuristic algorithm named Probability Refined Critical Path Greedy algorithm (PRCP) with four greedy strategies to answer two fundamental optimization questions regarding the performance and the cost are proposed.
Abstract: Function-as-a-Service (FaaS) and serverless applications have proliferated significantly in recent years because of their high scalability, ease of resource management, and pay-as-you-go pricing model. However, cloud users are facing practical problems when they migrate their applications to the serverless pattern, which are the lack of analytical performance and billing model and the trade-off between limited budget and the desired quality of service of serverless applications. In this article, we fill this gap by proposing and answering two research questions regarding the prediction and optimization of performance and cost of serverless applications. We propose a new construct to formally define a serverless application workflow, and then implement analytical models to predict the average end-to-end response time and the cost of the workflow. Consequently, we propose a heuristic algorithm named Probability Refined Critical Path Greedy algorithm (PRCP) with four greedy strategies to answer two fundamental optimization questions regarding the performance and the cost. We extensively evaluate the proposed models by conducting experimentation on AWS Lambda and Step Functions. Our analytical models can predict the performance and cost of serverless applications with more than 98 percent accuracy. The PRCP algorithms can achieve the optimal configurations of serverless applications with 97 percent accuracy on average.

Journal ArticleDOI
TL;DR: The CASpMV customizes an auto-tuning four-way partition scheme for SpMV based on the proposed statistical model, which describes the sparse matrix structure characteristics, to make it better fit in with the computing architecture and memory hierarchy of the Sunway.
Abstract: The Sunway TaihuLight, equipped with 10 million cores, is currently the world's third fastest supercomputer. SpMV is one of core algorithms in many high-performance computing applications. This paper implements a fine-grained design for generic parallel SpMV based on the special Sunway architecture and finds three main performance limitations, i.e., storage limitation, load imbalance, and huge overhead of irregular memory accesses. To address these problems, this paper introduces a customized and accelerative framework for SpMV (CASpMV) on the Sunway. The CASpMV customizes an auto-tuning four-way partition scheme for SpMV based on the proposed statistical model, which describes the sparse matrix structure characteristics, to make it better fit in with the computing architecture and memory hierarchy of the Sunway. Moreover, the CASpMV provides an accelerative method and customized optimizations to avoid irregular memory accesses and further improve its performance on the Sunway. Our CASpMV achieves a performance improvement that ranges from 588.05 to 2118.62 percent over the generic parallel SpMV on a CG (which corresponds to an MPI process) of the Sunway on average and has good scalability on multiple CGs. The performance comparisons of the CASpMV with state-of-the-art methods on the Sunway indicate that the sparsity and irregularity of data structures have less impact on CASpMV.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors studied how to efficiently offload dependent tasks to edge nodes with limited (and predetermined) service caching, and designed an efficient convex programming based algorithm (CP) to solve this problem.
Abstract: In Mobile Edge Computing (MEC), many tasks require specific service support for execution and in addition, have a dependent order of execution among the tasks. However, previous works often ignore the impact of having limited services cached at the edge nodes on (dependent) task offloading, thus may lead to an infeasible offloading decision or a longer completion time. To bridge the gap, this article studies how to efficiently offload dependent tasks to edge nodes with limited (and predetermined) service caching. We formally define the problem of offloading dependent tasks with service caching (ODT-SC), and prove that there exists no algorithm with constant approximation for this hard problem. Then, we design an efficient convex programming based algorithm (CP) to solve this problem. Moreover, we study a special case with a homogeneous MEC and propose a favorite successor based algorithm (FS) to solve this special case with a competitive ratio of $O(1)$ O ( 1 ) . Extensive simulation results using Google data traces show that our proposed algorithms can significantly reduce applications’ completion time by about 21-47 percent compared with other alternatives.

Journal ArticleDOI
TL;DR: The task scheduling problem of microservices is defined as a cost optimization problem with deadline constraints and a statistics-based strategy to determine the configuration of containers under a streaming workload is proposed and an urgency-based workflow scheduling algorithm is proposed.
Abstract: Microservices are widely used for flexible software development. Recently, containers have become the preferred deployment technology for microservices because of fast start-up and low overhead. However, the container layer complicates task scheduling and auto-scaling in clouds. Existing algorithms do not adapt to the two-layer structure composed of virtual machines and containers, and they often ignore streaming workloads. To this end, this article proposes an Elastic Scheduling for Microservices (ESMS) that integrates task scheduling with auto-scaling. ESMS aims to minimize the cost of virtual machines while meeting deadline constraints. Specifically, we define the task scheduling problem of microservices as a cost optimization problem with deadline constraints and propose a statistics-based strategy to determine the configuration of containers under a streaming workload. Then, we propose an urgency-based workflow scheduling algorithm that assigns tasks and determines the type and quantity of instances for scale-up. Finally, we model the mapping of new containers to virtual machines as a variable-sized bin-packing problem and solve it to achieve integrated scaling of the virtual machines and containers. Via simulation-based experiments with well-known workflow applications, the ability of ESMS to improve the success ratio of meeting deadlines and reduce the cost is verified through comparison with existing algorithms.

Journal ArticleDOI
TL;DR: This work proposes a distributed and collective DRL algorithm called DC-DRL with several improvements, combining the advantages of deep neuroevolution and policy gradient to maximize the utilization of multiple environments and prevent the premature convergence.
Abstract: Mobile edge computing (MEC) is a promising solution to support resource-constrained devices by offloading tasks to the edge servers. However, traditional approaches (e.g., linear programming and game-theory methods) for computation offloading mainly focus on the immediate performance, potentially leading to performance degradation in the long run. Recent breakthroughs regarding deep reinforcement learning (DRL) provide alternative methods, which focus on maximizing the cumulative reward. Nonetheless, there exists a large gap to deploy real DRL applications in MEC. This is because: 1) training a well-performed DRL agent typically requires data with large quantities and high diversity, and 2) DRL training is usually accompanied by huge costs caused by trial-and-error. To address this mismatch, we study the applications of DRL on the multi-user computation offloading problem from a more practical perspective. In particular, we propose a distributed and collective DRL algorithm called DC-DRL with several improvements: 1) a distributed and collective training scheme that assimilates knowledge from multiple MEC environments, which not only greatly increases data amount and diversity but also spreads the exploration costs, 2) an updating method called adaptive n-step learning, which can improve training efficiency without suffering from the high variance caused by distributed training, and 3) combining the advantages of deep neuroevolution and policy gradient to maximize the utilization of multiple environments and prevent the premature convergence. Lastly, evaluation results demonstrate the effectiveness of our proposed algorithm. Compared with the baselines, the exploration costs and final system costs are reduced by at least 43 and 9.4 percent, respectively.

Journal ArticleDOI
Chubo Liu1, Fan Tang1, Yikun Hu1, Kenli Li1, Zhuo Tang1, Keqin Li1 
TL;DR: In this article, a distributed task migration algorithm based on counterfactual multi-agent (COMA) reinforcement learning approach is proposed to solve the task migration problem in MEC.
Abstract: Closer to mobile users geographically, mobile edge computing (MEC) can provide some cloud-like capabilities to users more efficiently. This enables it possible for resource-limited mobile users to offload their computation-intensive and latency-sensitive tasks to MEC nodes. For its great benefits, MEC has drawn wide attention and extensive works have been done. However, few of them address task migration problem caused by distributed user mobility, which can’t be ignored with quality of service (QoS) consideration. In this article, we study task migration problem and try to minimize the average completion time of tasks under migration energy budget. There are multiple independent users and the movement of each mobile user is memoryless with a sequential decision-making process, thus reinforcement learning algorithm based on Markov chain model is applied with low computation complexity. To further facilitate cooperation among users, we devise a distributed task migration algorithm based on counterfactual multi-agent (COMA) reinforcement learning approach to solve this problem. Extensive experiments are carried out to assess the performance of this distributed task migration algorithm. Compared with no migrating (NM) and single-agent actor-critic (AC) algorithms, the proposed distributed task migration algorithm can achieve up 30-50 percent reduction about average completion time.

Journal ArticleDOI
TL;DR: A joint partial offloading and flow scheduling heuristic (JPOFH) that decidespartial offloading ratio by considering both waiting times at the devices and start time of network flows is proposed.
Abstract: Collaborative edge computing (CEC) is a recent popular paradigm where different edge devices collaborate by sharing data and computation resources. One of the fundamental issues in CEC is to make task offloading decision. However, it is a challenging problem to solve as tasks can be offloaded to a device at multi-hop distance leading to conflicting network flows due to limited bandwidth constraint. There are some works on multi-hop computation offloading problem in the literature. However, existing works have not jointly considered multi-hop partial computation offloading and network flow scheduling that can cause network congestion and inefficient performance in terms of completion time. This article formulates the joint multi-task partial computation offloading and network flow scheduling problem to minimize the average completion time of all tasks. The formulated problem optimizes several dependent decision variables including partial offloading ratio, remote offloading device, start time of tasks, routing path, and start time of network flows. The problem is formulated as an MINLP optimization problem and shown to be NP-hard. We propose a joint partial offloading and flow scheduling heuristic (JPOFH) that decides partial offloading ratio by considering both waiting times at the devices and start time of network flows. We also do the relaxation of formulated MINLP problem to an LP problem using McCormick envelope to give a lower bound solution. Performance comparison done using simulation shows that JPOFH leads to up to 32 percent improvement in average completion time compared to benchmark solutions which do not make a joint decision.

Journal ArticleDOI
TL;DR: This work investigates offloading DNN inference requests in a 5G-enabled mobile edge cloud (MEC), and proposes exact and approximate solutions to the problem of inference offloading in MECs, and considers dynamic task offloading for inference requests, and devise an online algorithm that can be adapted in real time.
Abstract: With increasing focus on Artificial Intelligence (AI) applications, Deep Neural Networks (DNNs) have been successfully used in a number of application areas. As the number of layers and neurons in DNNs increases rapidly, significant computational resources are needed to execute a learned DNN model. This ever-increasing resource demand of DNNs is currently met by large-scale data centers with state-of-the-art GPUs. However, increasing availability of mobile edge computing and 5G technologies provide new possibilities for DNN-driven AI applications, especially where these application make use of data sets that are distributed in different locations. One fundamental process of a DNN-driven application in mobile edge clouds is the adoption of “inferencing” – the process of executing a pre-trained DNN based on newly generated image and video data from mobile devices. We investigate offloading DNN inference requests in a 5G-enabled mobile edge cloud (MEC), with the aim to admit as many inference requests as possible. We propose exact and approximate solutions to the problem of inference offloading in MECs. We also consider dynamic task offloading for inference requests, and devise an online algorithm that can be adapted in real time. The proposed algorithms are evaluated through large-scale simulations and using a real world test-bed implementation. The experimental results demonstrate that the empirical performance of the proposed algorithms outperform their theoretical counterparts and other similar heuristics reported in literature.

Journal ArticleDOI
TL;DR: In this paper, a joint task scheduling and containerizing (JTSC) scheme is developed to improve the execution efficiency of an application in an edge server, where task assignment and task containerization need to be considered together.
Abstract: Container-based operation system (OS) level virtualization has been adopted by many edge-computing platforms. However, for an edge server, inter-container communications, and container management consume significant CPU resources. Given an application composed of interdependent tasks, the number of such operations is closely related to the dependency between the scheduled tasks. Thus, to improve the execution efficiency of an application in an edge server, task scheduling and task containerizing need to be considered together. To this end, a joint task scheduling and containerizing (JTSC) scheme is developed in this article. Experiments are first carried out to quantify the resource utilization of container operations. System models are then built to capture the features of task execution in containers in an edge server with multiple processors. With these models, joint task scheduling and containerizing is conducted as follows. First, tasks are scheduled without considering containerization, which results in initial schedules. Second, based on system models and guidelines gained from the initial schedules, several containerization algorithms are designed to map tasks to containers. Third, task execution durations are updated by adding the time for inter-container communications, and then the task schedules are updated accordingly. The JTSC scheme is evaluated through extensive simulations. The results show that it reduces inefficient container operations and enhances the execution efficiency of applications by 60 percent.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper utilize data as a tuning knob and propose two efficient polynomial-time algorithms to schedule different workloads on various mobile devices, when data is identically or non-identically distributed.
Abstract: Originated from distributed learning, federated learning enables privacy-preserved collaboration on a new abstracted level by sharing the model parameters only. While the current research mainly focuses on optimizing learning algorithms and minimizing communication overhead left by distributed learning, there is still a considerable gap when it comes to the real implementation on mobile devices. In this article, we start with an empirical experiment to demonstrate computation heterogeneity is a more pronounced bottleneck than communication on the current generation of battery-powered mobile devices, and the existing methods are haunted by mobile stragglers. Further, non-identically distributed data across the mobile users makes the selection of participants critical to the accuracy and convergence. To tackle the computational and statistical heterogeneity, we utilize data as a tuning knob and propose two efficient polynomial-time algorithms to schedule different workloads on various mobile devices, when data is identically or non-identically distributed. For identically distributed data, we combine partitioning and linear bottleneck assignment to achieve near-optimal training time without accuracy loss. For non-identically distributed data, we convert it into an average cost minimization problem and propose a greedy algorithm to find a reasonable balance between computation time and accuracy. We also establish an offline profiler to quantify the runtime behavior of different devices, which serves as the input to the scheduling algorithms. We conduct extensive experiments on a mobile testbed with two datasets and up to 20 devices. Compared with the common benchmarks, the proposed algorithms achieve 2-100× speedup epoch-wise, 2–7 percent accuracy gain and boost the convergence rate by more than 100 percent on CIFAR10.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a framework for the IoT-fog-cloud systems to offer the minimal service provisioning delay through an adaptive task offloading mechanism, which is based on the fog resource to select flexibly the optimal offloading policy.
Abstract: In the IoT-based systems, the fog computing allows the fog nodes to offload and process tasks requested from IoT-enabled devices in a distributed manner instead of the centralized cloud servers to reduce the response delay. However, achieving such a benefit is still challenging in the systems with high rate of requests, which imply long queues of tasks in the fog nodes, thus exposing probably an inefficiency in terms of latency to offload the tasks. In addition, a complicated heterogeneous degree in the fog environment introduces an additional issue that many of single fogs can not process heavy tasks due to lack of available resources or limited computing capabilities. To cope with the situation, this article introduces FRATO (Fog Resource aware Adaptive Task Offloading) - a framework for the IoT-fog-cloud systems to offer the minimal service provisioning delay through an adaptive task offloading mechanism. Fundamentally, FRATO is based on the fog resource to select flexibly the optimal offloading policy, which in particular includes a collaborative task offloading solution based on the data fragment concept. In addition, two distributed fog resource allocation algorithms, namely TPRA and MaxRU are developed to deploy the optimized offloading solutions efficiently in cases of resource competition. Through the extensive simulation analysis, the FRATO-based service provisioning approaches show potential advantages in reducing the average delay significantly in the systems with high rate of service requests and heterogeneous fog environment compared with the existing solutions.

Journal ArticleDOI
TL;DR: A new searchable encryption scheme is presented that enables multi-keyword search over encrypted data under a multi-writer/multi-reader setting but also guarantees the data and search pattern privacy and is practical to be adopted in distributed systems.
Abstract: As cloud storage has been widely adopted in various applications, how to protect data privacy while allowing efficient data search and retrieval in a distributed environment remains a challenging research problem. Existing searchable encryption schemes are still inadequate on desired functionality and security/privacy perspectives. Specifically, supporting multi-keyword search under the multi-user setting, hiding search pattern and access pattern, and resisting keyword guessing attacks (KGA) are the most challenging tasks. In this article, we present a new searchable encryption scheme that addresses the above problems simultaneously, which makes it practical to be adopted in distributed systems. It not only enables multi-keyword search over encrypted data under a multi-writer/multi-reader setting but also guarantees the data and search pattern privacy. To prevent KGA, our scheme adopts a multi-server architecture, which accelerates search response, shares the workload, and lowers the key leakage risk by allowing only authorized servers to jointly test whether a search token matches a stored ciphertext. A novel subset decision mechanism is also designed as the core technique underlying our scheme and can be further used in applications other than keyword search. Finally, we prove the security and evaluate the computational and communication efficiency of our scheme to demonstrate its practicality.

Journal ArticleDOI
TL;DR: A comprehensive survey of DL compilers can be found in this article, with an emphasis on the DL oriented multi-level IRs and frontend/backend optimizations, and several insights are highlighted as the potential research directions of DL compiler.
Abstract: The difficulty of deploying various deep learning (DL) models on diverse DL hardware has boosted the research and development of DL compilers in the community. Several DL compilers have been proposed from both industry and academia such as Tensorflow XLA and TVM. Similarly, the DL compilers take the DL models described in different DL frameworks as input, and then generate optimized codes for diverse DL hardware as output. However, none of the existing survey has analyzed the unique design architecture of the DL compilers comprehensively. In this article, we perform a comprehensive survey of existing DL compilers by dissecting the commonly adopted design in details, with emphasis on the DL oriented multi-level IRs, and frontend/backend optimizations. We present detailed analysis on the design of multi-level IRs and illustrate the commonly adopted optimization techniques. Finally, several insights are highlighted as the potential research directions of DL compiler. This is the first survey article focusing on the design architecture of DL compilers, which we hope can pave the road for future research towards DL compiler.

Journal ArticleDOI
TL;DR: In this article, the authors proposed an Evolutionary Quantum Neural Network (EQNN) based workload prediction model for cloud datacenter which exploits the computational efficiency of quantum computing by encoding workload information into qubits and propagating this information through the network to estimate the workload or resource demands with enhanced accuracy proactively.
Abstract: This work presents a novel Evolutionary Quantum Neural Network (EQNN) based workload prediction model for Cloud datacenter. It exploits the computational efficiency of quantum computing by encoding workload information into qubits and propagating this information through the network to estimate the workload or resource demands with enhanced accuracy proactively. The rotation and reverse rotation effects of the Controlled-NOT (C-NOT) gate serve activation function at the hidden and output layers to adjust the qubit weights. In addition, a Self Balanced Adaptive Differential Evolution (SB-ADE) algorithm is developed to optimize qubit network weights. The accuracy of the EQNN prediction model is extensively evaluated and compared with seven state-of-the-art methods using eight real world benchmark datasets of three different categories. Experimental results reveal that the use of the quantum approach to evolutionary neural network substantially improves the prediction accuracy up to 91.6 percent over the existing approaches.

Journal ArticleDOI
TL;DR: In this article, the authors propose a modular version of PoS-based blockchain systems called e-PoS that resists the centralization of network resources by extending mining opportunities to a wider set of stakeholders.
Abstract: Blockchain applications that rely on the Proof-of-Work (PoW) have increasingly become energy inefficient with a staggering carbon footprint. In contrast, energy efficient alternative consensus protocols such as Proof-of-Stake (PoS) may cause centralization and unfairness in the blockchain system. To address these challenges, we propose a modular version of PoS-based blockchain systems called e-PoS that resists the centralization of network resources by extending mining opportunities to a wider set of stakeholders. Moreover, e-PoS leverages the in-built system operations to promote fair mining practices by penalizing malicious entities. We validate e-PoS ’s achievable objectives through theoretical analysis and simulations. Our results show that e-PoS ensures fairness and decentralization, and can be applied to existing blockchain applications.

Journal ArticleDOI
TL;DR: In this paper, a reverse game-based data trading mechanism and a privacy-preserving model verification mechanism are proposed to guard against training data leakage while the latter verifies the accuracy of a trained model with privacy preservation of the task requester's test data as well as the pool's submitted model.
Abstract: Proof of work (PoW), the most popular consensus mechanism for blockchain, requires ridiculously large amounts of energy but without any useful outcome beyond determining accounting rights among miners To tackle the drawback of PoW, we propose a novel energy-recycling consensus algorithm, namely proof of federated learning (PoFL), where the energy originally wasted to solve difficult but meaningless puzzles in PoW is reinvested to federated learning Federated learning and pooled-mining, a trend of PoW, have a natural fit in terms of organization structure However, the separation between the data usufruct and ownership in blockchain lead to data privacy leakage in model training and verification, deviating from the original intention of federal learning To address the challenge, a reverse game-based data trading mechanism and a privacy-preserving model verification mechanism are proposed The former can guard against training data leakage while the latter verifies the accuracy of a trained model with privacy preservation of the task requester's test data as well as the pool's submitted model To the best of our knowledge, our article is the first work to employ federal learning as the proof of work for blockchain Extensive simulations based on synthetic and real-world data demonstrate the effectiveness and efficiency of our proposed mechanisms

Journal ArticleDOI
TL;DR: The proposed IPPTS algorithm significantly outperforms previous list scheduling algorithms in terms of makespan, speedup, makespan standard deviation, efficiency, and frequency of best results.
Abstract: Efficient scheduling algorithms are key for attaining high performance in heterogeneous computing systems. In this article, we propose a new list scheduling algorithm for assigning task graphs to fully connected heterogeneous processors with an aim to minimize the scheduling length. The proposed algorithm, called Improved Predict Priority Task Scheduling (IPPTS) algorithm has two phases: task prioritization phase, which gives priority to tasks, and processor selection phase, which selects a processor for a task. The IPPTS algorithm has a quadratic time complexity as the related algorithms for the same goal, that is $O(t^{2} \times p)$ O ( t 2 × p ) , for $t$ t tasks and $p$ p processors. Our algorithm reduces the scheduling length significantly by looking ahead in both task prioritization phase and processor selection phase. In this way, the algorithm is looking ahead to schedule a task and its heaviest successor task to the optimistic processor, i.e., the processor that minimizes their computation and communication costs. The experiments based on both randomly generated graphs and graphs of real-world applications show that the IPPTS algorithm significantly outperforms previous list scheduling algorithms in terms of makespan, speedup, makespan standard deviation, efficiency, and frequency of best results.

Journal ArticleDOI
TL;DR: VeriML is a novel and efficient framework to bring integrity assurances and fair payments to MLaaS, and clients can be assured that ML tasks are correctly executed on an untrusted server, and the resource consumption claimed by the service provider equals to the actual workload.
Abstract: Machine Learning as a Service (MLaaS) allows clients with limited resources to outsource their expensive ML tasks to powerful servers. Despite the huge benefits, current MLaaS solutions still lack strong assurances on: 1) service correctness (i.e., whether the MLaaS works as expected); 2) trustworthy accounting (i.e., whether the bill for the MLaaS resource consumption is correctly accounted); 3) fair payment (i.e., whether a client gets the entire MLaaS result before making the payment). Without these assurances, unfaithful service providers can return improperly-executed ML task results or partially-trained ML models while asking for over-claimed rewards. Moreover, it is hard to argue for wide adoption of MLaaS to both the client and the service provider, especially in the open market without a trusted third party. In this article, we present VeriML, a novel and efficient framework to bring integrity assurances and fair payments to MLaaS. With VeriML, clients can be assured that ML tasks are correctly executed on an untrusted server, and the resource consumption claimed by the service provider equals to the actual workload. We strategically use succinct non-interactive arguments of knowledge (SNARK) on randomly-selected iterations during the ML training phase for efficiency with tunable probabilistic assurance. We also develop multiple ML-specific optimizations to the arithmetic circuit required by SNARK. Our system implements six common algorithms: linear regression, logistic regression, neural network, support vector machine, K-means and decision tree. The experimental results have validated the practical performance of VeriML.

Journal ArticleDOI
TL;DR: In this article, the authors propose Dependency and Topology-aware Failure Resilience (DTFR), a two-stage scheduler that minimizes either failure probability or redundancy cost, while maintaining low network delay.
Abstract: Edge computing services are exposed to infrastructural failures due to geographical dispersion, ad hoc deployment, and rudimentary support systems. Two unique characteristics of the edge computing paradigm necessitate a novel failure resilience approach. First, edge servers, contrary to cloud counterparts with reliable data center networks, are typically connected via ad hoc networks. Thus, link failures need more attention to ensure truly resilient services. Second, network delay is a critical factor for the deployment of edge computing services. This restricts replication decisions to geographical proximity and necessitates joint consideration of delay and resilience. In this article, we propose a novel machine learning based mechanism that evaluates the failure resilience of a service deployed redundantly on the edge infrastructure. Our approach learns the spatiotemporal dependencies between edge server failures and combines them with the topological information to incorporate link failures. Ultimately, we infer the probability that a certain set of servers fails or disconnects concurrently during service runtime. Furthermore, we introduce Dependency- and Topology-aware Failure Resilience (DTFR), a two-stage scheduler that minimizes either failure probability or redundancy cost, while maintaining low network delay. Extensive evaluation with various real-world failure traces and workload configurations demonstrate superior performance in terms of availability, number of failures, network delay, and cost with respect to the state-of-the-art schedulers.