scispace - formally typeset
Search or ask a question

Showing papers on "Overhead (computing) published in 2022"


Journal ArticleDOI
TL;DR: In this article, a multi-task federated learning (MTFL) algorithm is proposed that introduces non-federated batch-normalization (BN) layers into the federated DNN.
Abstract: Federated Learning (FL) is an emerging approach for collaboratively training Deep Neural Networks (DNNs) on mobile devices, without private user data leaving the devices. Previous works have shown that non-Independent and Identically Distributed (non-IID) user data harms the convergence speed of the FL algorithms. Furthermore, most existing work on FL measures global-model accuracy, but in many cases, such as user content-recommendation, improving individual User model Accuracy (UA) is the real objective. To address these issues, we propose a Multi-Task FL (MTFL) algorithm that introduces non-federated Batch-Normalization (BN) layers into the federated DNN. MTFL benefits UA and convergence speed by allowing users to train models personalised to their own data. MTFL is compatible with popular iterative FL optimisation algorithms such as Federated Averaging (FedAvg), and we show empirically that a distributed form of Adam optimisation (FedAvg-Adam) benefits convergence speed even further when used as the optimisation strategy within MTFL. Experiments using MNIST and CIFAR10 demonstrate that MTFL is able to significantly reduce the number of rounds required to reach a target UA, by up to $5\times$ 5 × when using existing FL optimisation strategies, and with a further $3\times$ 3 × improvement when using FedAvg-Adam. We compare MTFL to competing personalised FL algorithms, showing that it is able to achieve the best UA for MNIST and CIFAR10 in all considered scenarios. Finally, we evaluate MTFL with FedAvg-Adam on an edge-computing testbed, showing that its convergence and UA benefits outweigh its overhead.

85 citations


Journal ArticleDOI
TL;DR: In this paper, the authors propose Overlap-FedAvg, an innovative framework that loosed the chain-like constraint of federated learning and paralleled the model training phase with the model communication phase, so that the latter phase could be totally covered by the former phase.
Abstract: While petabytes of data are generated each day by a number of independent computing devices, only a few of them can be finally collected and used for deep learning (DL) due to the apprehension of data security and privacy leakage, thus seriously retarding the extension of DL. In such a circumstance, federated learning (FL) was proposed to perform model training by multiple clients’ combined data without the dataset sharing within the cluster. Nevertheless, federated learning with periodic model averaging (FedAvg) introduced massive communication overhead as the synchronized data in each iteration is about the same size as the model, and thereby leading to a low communication efficiency. Consequently, variant proposals focusing on the communication rounds reduction and data compression were proposed to decrease the communication overhead of FL. In this article, we propose Overlap-FedAvg, an innovative framework that loosed the chain-like constraint of federated learning and paralleled the model training phase with the model communication phase (i.e., uploading local models and downloading the global model), so that the latter phase could be totally covered by the former phase. Compared to vanilla FedAvg, Overlap-FedAvg was further developed with a hierarchical computing strategy, a data compensation mechanism, and a nesterov accelerated gradients (NAG) algorithm. In Particular, Overlap-FedAvg is orthogonal to many other compression methods so that they could be applied together to maximize the utilization of the cluster. Besides, the theoretical analysis is provided to prove the convergence of the proposed framework. Extensive experiments conducting on both image classification and natural language processing tasks with multiple models and datasets also demonstrate that the proposed framework substantially reduced the communication overhead and boosted the federated learning process.

34 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed an optimized solution for user assignment and resource allocation over hierarchical federated learning (FL) architecture for IoT heterogeneous systems, which is a promising solution for telemonitoring systems that demand intensive data collection, for detection, classification, and prediction of future events, from different locations while maintaining a strict privacy constraint.

21 citations


Journal ArticleDOI
TL;DR: This work proposes EDL, which enables elastic deep learning with a simple API and can be easily integrated with existing deep learning frameworks such as TensorFlow and PyTorch, and incorporates techniques that are necessary to reduce the overhead of parallelism adjustments, such as stop-free scaling and dynamic data pipeline.
Abstract: We study how to support elasticity, that is, the ability to dynamically adjust the parallelism (i.e., the number of GPUs), for deep neural network (DNN) training in a GPU cluster. Elasticity can benefit multi-tenant GPU cluster management in many ways, for example, achieving various scheduling objectives (e.g., job throughput, job completion time, GPU efficiency) according to cluster load variations, utilizing transient idle resources, and supporting performance profiling, job migration, and straggler mitigation. We propose EDL, which enables elastic deep learning with a simple API and can be easily integrated with existing deep learning frameworks such as TensorFlow and PyTorch. EDL also incorporates techniques that are necessary to reduce the overhead of parallelism adjustments, such as stop-free scaling and dynamic data pipeline. We demonstrate with experiments that EDL can indeed bring significant benefits to the above-listed applications in GPU cluster management.

19 citations


Journal ArticleDOI
TL;DR: In this article, a multi-objective RL algorithm was proposed to minimize the application completion time, energy consumption of the mobile device, and usage charge for edge computing, subject to dependency constraints.

16 citations


Journal ArticleDOI
01 Jan 2022
TL;DR: A comprehensive survey on the recent advancements in the clustering schemes for vehicular networks is presented in this article, where the authors take a holistic approach to classify the algorithms by focusing on, (i) the objective of clustering mechanisms (i.e., reliability, scalability, stability, routing overhead, and delay), (ii) general-purpose clustering algorithms, (iii) application-based (i., QoS, MAC, security, etc.) clustering, and iv) technology-based clustering (machine learning-based, nature-inspired, fuzzy logic-
Abstract: Vehicular networks are on the verge of deployment, thanks to the advancements in computation and communication technologies. This breed of ad hoc networks leverages vehicles as nodes with Vehicle-to-anything (V2X) communication paradigm. Clustering is considered one of the most important techniques used to enhance network stability, reliability, and scalability. Furthermore, clustering employs bandwidth optimization by reducing the overhead and transmission delay and helps in mitigating the hidden node problem. To date, extensive research has been done to address clustering issues in vehicular networks, and several surveys have also been published in the literature. However, a holistic approach towards clustering in vehicular networks is still lacking. In this regard, we conduct a comprehensive survey on the recent advancements in the clustering schemes for vehicular networks. We take a holistic approach to classify the algorithms by focusing on, (i) the objective of clustering mechanisms (i.e., reliability, scalability, stability, routing overhead, and delay), (ii) general-purpose clustering algorithms, (iii) application-based (i.e., QoS, MAC, security, etc.) clustering, and iv) technology-based clustering (machine learning-based, nature-inspired, fuzzy logic-based and software-defined networking-based clustering). We investigate the existing clustering mechanisms keeping in mind the factors such as cluster formation, maintenance, and management. Additionally, we present a comprehensive set of parameters for selecting cluster heads and the role of enabling technologies for cluster maintenance. Finally, we identify future research trends in clustering techniques for vehicular networks and their various breeds. This survey will act as a one-stop shop for the researchers, practitioners, and system designers to select the right clustering mechanism for their applications, services, or for their research. As a result of this survey, we can see that clustering is heavily dependent on the underlying application, context, environment, and communication paradigm. Furthermore, clustering in vehicular networks can greatly benefit from enabling technologies such as artificial intelligence.

16 citations


Journal ArticleDOI
TL;DR: The RNA genetic algorithm with hairpin genetic operators (hRNA-GA), inspired by the hairpin structure in RNA molecules, is proposed, which has better search ability and is applied to find the optimal parameters of ANFISs for modeling an actual overhead crane system.

15 citations



Journal ArticleDOI
TL;DR: In this article, the authors proposed a data replica creation scheme based on a Level of Privacy (LoP) defined by data owners and the service capacity of fog nodes, which can significantly achieve efficient replicas privacy, prediction accuracy, and outperform the existing state-of-the-art schemes in terms of computational and memory costs.

5 citations


Journal ArticleDOI
TL;DR: Data-Flow Integrity (DFI) is a well-known approach to effectively detecting a wide range of software attacks as discussed by the authors, however, its real-world application has been quite limited so far because of the prohib...
Abstract: Data-Flow Integrity (DFI) is a well-known approach to effectively detecting a wide range of software attacks. However, its real-world application has been quite limited so far because of the prohib...

5 citations


Journal ArticleDOI
TL;DR: ViTrack is a framework for efficient multi-video tracking using computation resource on the edge for commodity video surveillance systems and leverages a Markov Model based approach to efficiently recover missing information and finally derive the complete trajectory.
Abstract: Nowadays, video surveillance systems are widely deployed in various places, e.g., schools, parks, airports, roads, etc. However, existing video surveillance systems are far from full utilization due to high computation overhead in video processing. In this work, we present ViTrack, a framework for efficient multi-video tracking using computation resource on the edge for commodity video surveillance systems. In the heart of ViTrack lies a two layer spatial/temporal compressed target detection method to significantly reduce the computation overhead by combining videos from multiple cameras. Further, ViTrack derives the video relationship and camera information even in absence of camera location, direction, etc. To alleviate the impact of variant video quality and missing targets, ViTrack leverages a Markov Model based approach to efficiently recover missing information and finally derive the complete trajectory. We implement ViTrack on a real deployed video surveillance system with 110 cameras. The experiment results demonstrate that ViTrack can provide efficient trajectory tracking with processing time 45x less than the existing approach. For 110 video cameras, ViTrack can run on a Dell OptiPlex 390 computer to track given targets in almost real time. We believe ViTrack can enable practical video analysis for widely deployed commodity video surveillance systems.

Journal ArticleDOI
TL;DR: In this article, a multi-subset data aggregation scheme for the smart grid is proposed without a trusted third party, in which the control center collects the number of users in different subsets, and obtains the sum of electricity consumption in each subset, meantime individual user's data privacy is still preserved.
Abstract: Data aggregation has been widely researched to address the privacy concern when data is published, meanwhile, data aggregation only obtains the sum or average in an area. In reality, more fine-grained data brings more value for data consumers, such as more accurate management, dynamic price-adjusting in the grid system, etc. In this paper, a multi-subset data aggregation scheme for the smart grid is proposed without a trusted third party, in which the control center collects the number of users in different subsets, and obtains the sum of electricity consumption in each subset, meantime individual user’s data privacy is still preserved. In addition, the dynamic and flexible user management mechanism is guaranteed with the secret key negotiation process among users. The analysis shows MSDA not only protects users’ privacy to resist various attacks but also achieves more functionality such as multi-subset aggregation, no reliance on any trusted third party, dynamicity. And performance evaluation demonstrates that MSDA is efficient and practical in terms of communication and computation overhead.

Journal ArticleDOI
TL;DR: In this paper, an innovative reliability-aware task mapping technique is presented, based on a hybridization between Multi-Objective Optimization (MOO) and Reinforcement Learning (RL).

Journal ArticleDOI
Yi Xu1, Changgen Peng1, Weijie Tan1, Youliang Tian1, Minyao Ma, Kun Niu1 
TL;DR: Li et al. as discussed by the authors proposed a non-interactive verifiable privacy-preserving FL based on dual-servers (NIVP-DS) architecture, which improves the efficiency and security of the system and is robust to clients dropping out, based on the constraints that the communication overhead between client and server not more than 2 × that of plaintext computation.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a scalable hierarchical control plane architecture for SDN/NFV-based next-generation application domains such as immersive media delivery system and implemented the proposed architecture based on the well-known ZeroSDN controller.
Abstract: The rigidity of traditional network architectures, with tightly coupled control and data planes, impairs their ability to adapt to highly dynamic requirements of future application domains. While Software-Defined Networking (SDN) can provide the required dynamism, it suffers from scalability issues. Therefore, efforts have been made to propose alternative decentralized solutions, such as the flat distributed SDN architecture. Such alternatives address the scalability problem for mainly local flows, but are impaired by a substantial increase in the overhead for cross-domain flow setup. To manage the trade-off between scalability and overhead, there is a need for intermediate hierarchical solutions. However, these have not been explored to the complete potential so far. Furthermore, the Network Function Virtualization (NFV) paradigm complements SDN by offering computational and storage services in the form of Virtual Network Functions (VNFs). When integrated seamlessly, both SDN and NFV can offer solutions to the problems posed by highly dynamic application domains. Hence, this work proposes a scalable hierarchical SDN control plane architecture for SDN/NFV-based next-generation application domains such as immersive media delivery system. We have implemented the proposed architecture based on the well-known state-of-the-art ZeroSDN controller. To evaluate the performance of the architecture, we have implemented an on-demand immersive media (point cloud) streaming application and varied the load on the control plane using the background traffic. To benchmark our solution, we have evaluated its performance in comparison with the centralized and flat distributed architectures. We show that the proposed architecture performs better than the rest in terms of scalability, lost flows, and processing latency. Our study shows that the proposed architecture when distributed to three controllers, accepts 23% more flows with almost 70% reduced processing latency compared to the state-of-the-art ONOS controller.

Journal ArticleDOI
TL;DR: In this article, a method for searching generic tuning spaces is proposed, where the tuning spaces can contain tuning parameters changing any user-defined property of the source code and can be used to navigate the searching process towards faster implementations.



Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an efficient privacy-preserving homoglyph search scheme supporting arbitrary languages (POSA), which enhances the performance of the fuzzy keyword search in three aspects.
Abstract: Searchable encryption is an effective way to ensure the security and availability of encrypted outsourced cloud data. Among existing solutions, the keyword exact search solution is relatively inflexible, while the fuzzy keyword search solution either has a high index overhead or suffers from the false-positive. Furthermore, no existing fuzzy keyword search solution considers the homoglyph search on encrypted data. In this paper, we propose an efficient privacy-preserving homoglyph search scheme supporting arbitrary languages (POSA, in short). We enhance the performance of the fuzzy keyword search in three aspects. Firstly, we formulate the similarity of homoglyph and propose a privacy-preserving homoglyph search. Secondly, we put forward an index build mechanism without the false-positive, which reduces the storage overhead of the index and is suitable for arbitrary languages. Thirdly, POSA returns just the user’s search, i.e., all returned documents contain the search keyword or its homoglyph. The theoretical analysis and experimental evaluations on real-world datasets demonstrate the effectiveness and efficiency of POSA.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a light-weighted framework to find the nearest neighbours, which progressively expands the investigation region as a series of concentric squares, with the cell of query point as the centre, and optimize the search space by exploiting the distance between the query point and the distant candidate neighbours.

Journal ArticleDOI
TL;DR: The simulation results show that the proposed DWBA minimizes frame-rearrangement problem as compared to existing DWBA algorithms and proves to be more efficient based on average (end-to-end) delay and completion time.

Book ChapterDOI
01 Jan 2022
TL;DR: In this paper, the authors proposed a Cloud-Machine Learning (CloudML) model for encrypted heart disease datasets by employing a privacy preservation scheme in it, which is designed in such a way that it does not vary in accuracy while clustering the datasets.
Abstract: Cloud computing is the need of the twenty-first century with an exponential increase in the volume of data. Compared to any other technologies, the cloud has seen fastest adoption in the industry. The popularity of cloud is closely linked to the benefits it offers which ranges from a group of stakeholders to huge number of entrepreneurs. This enables some prominent features such as elasticity, scalability, high availability, and accessibility. So, the increase in popularity of the cloud is linked to the influx of data that involves big data with some specialized techniques and tools. Many data analysis applications use clustering techniques incorporated with machine learning to derive useful information by grouping similar data, especially in healthcare and medical department for predicting symptoms of diseases. However, the security of healthcare data with a machine learning model for classifying patient’s information and genetic data is a major concern. So, to solve such problems, this paper proposes a Cloud-Machine Learning (CloudML) Model for encrypted heart disease datasets by employing a privacy preservation scheme in it. This model is designed in such a way that it does not vary in accuracy while clustering the datasets. The performance analysis of the model shows that the proposed approach yields significant results in terms of Communication Overhead, Storage Overhead, Runtime, Scalability, and Encryption Cost.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a traffic anomaly detection scheme by analyzing and defining the specific security threat non-directional denial of service attack (ND-DoS) faced by the SDON.

Book ChapterDOI
01 Jan 2022
TL;DR: The Holistically-Nested edge detection (HED) algorithm is based on the Tensorflow deep learning framework to build a HED network model and train it, and the pixel-level segmentation of the wires in the image is realized.
Abstract: Power line detection is an important content of power system inspection, which of important significance for UAV inspection, obstacle avoidance, and visual measurement. Traditional power line detection algorithms are mostly based on straight line detection, and the power lines are further extracted from the detected straight lines. Therefore, the disturbance of background information is large, and the detection effect for curved lines is not satisfied. Classical deep learning object detection algorithms are not suitable for the detection of elongated objects such as overhead power lines. Based on the Holistically-Nested edge detection (HED) algorithm, this paper uses the Tensorflow deep learning framework to build a HED network model and train it. Through pixel-level classification of the image, the pixel-level segmentation of the wires in the image is realized. Through experimental verification, the power line detection achieves a high accuracy rate.

Journal ArticleDOI
TL;DR: In this paper, an algorithm using auxiliary data structures to crack the Adaptive Radix Tree (ART) index structure of in-memory databases has been proposed, which makes it possible to build up an ART index step by step with incessant queries, and hence avoids the poor instant availability of a complete index which is constructed once and for all.

Journal ArticleDOI
TL;DR: In this article, the authors propose a micro-architecture with lightweight out-of-order execution capability enabling instruction-level parallelism to complement the conventional Thread-Level Parallelism model.
Abstract: GPU is the dominant platform for accelerating general-purpose workloads due to its computing capacity and cost-efficiency. GPU applications cover an ever-growing range of domains. To achieve high throughput, GPUs rely on massive multi-threading and fast context switching to overlap computations with memory operations. We observe that among the diverse GPU workloads, there exists a significant class of kernels that fail to maintain a sufficient number of active warps to hide the latency of memory operations, and thus suffer from frequent stalling. We argue that the dominant Thread-Level Parallelism model is not enough to efficiently accommodate the variability of modern GPU applications. To address this inherent inefficiency, we propose a novel micro-architecture with lightweight Out-Of-Order execution capability enabling Instruction-Level Parallelism to complement the conventional Thread-Level Parallelism model. To minimize the hardware overhead, we carefully design our extension to highly re-use the existing micro-architectural structures and study various design trade-offs to contain the overall area and power overhead, while providing improved performance. We show that the proposed architecture outperforms traditional platforms by 23 percent on average for low-occupancy kernels, with an area and power overhead of 1.29 and 10.05 percent, respectively. Finally, we establish the potential of our proposal as a micro-architecture alternative by providing 16 percent speedup over a wide collection of 60 general-purpose kernels.

Journal ArticleDOI
TL;DR: In this paper, the authors optimized the power allocation factors to improve the random beamforming performance in mmWave-NOMA networks and analyzed the impact of the non-line of sight (NLOS) path alongside the line-of-sight (LOS) path under the outage probability term.

Journal ArticleDOI
TL;DR: In this paper, simple modifications to van der Hoeven's forward and inverse truncated Fourier transforms allow the algorithms to be performed in-place, and with only a linear overhead in complexity.


Journal ArticleDOI
TL;DR: In this article, the authors present an algorithm for a high performance, unbounded, portable, multi-producer/multi-consumer, lock-free FIFO (first-in first-out) queue.
Abstract: In this article we present an algorithm for a high performance, unbounded, portable, multi-producer/multi-consumer, lock-free FIFO (first-in first-out) queue. Aside from its competitive performance on current hardware, it is further characterized by its integrated memory reclamation mechanism, which is able to reliably and deterministically de-allocate nodes as soon as the final operation with a reference has concluded, similar to reference counting . This differentiates our approach from most other lock-free data structures, which usually require external (generic) memory reclamation or garbage collection mechanisms such as hazard pointers . Our deterministic memory reclamation mechanism completely prevents the build up of memory awaiting reclamation and is hence very memory efficient, yet it does not introduce any substantial performance overhead. By utilizing concrete knowledge about the internal structure and access patterns of our queue, we are able to construct and constrain the reclamation mechanism in such a way that keeps the overhead for memory management almost entirely out of the common fast path. The presented algorithm is portable to all modern 64-bit processor architectures, as it only relies on the commonly available and lock-free atomic synchronization primitives compare-and-swap and fetch-and-add .