scispace - formally typeset
Search or ask a question

Showing papers on "Scalability published in 2021"


Journal ArticleDOI
TL;DR: In this paper, a comprehensive survey of the emerging applications of federated learning in IoT networks is provided, which explores and analyzes the potential of FL for enabling a wide range of IoT services, including IoT data sharing, data offloading and caching, attack detection, localization, mobile crowdsensing and IoT privacy and security.
Abstract: The Internet of Things (IoT) is penetrating many facets of our daily life with the proliferation of intelligent services and applications empowered by artificial intelligence (AI). Traditionally, AI techniques require centralized data collection and processing that may not be feasible in realistic application scenarios due to the high scalability of modern IoT networks and growing data privacy concerns. Federated Learning (FL) has emerged as a distributed collaborative AI approach that can enable many intelligent IoT applications, by allowing for AI training at distributed IoT devices without the need for data sharing. In this article, we provide a comprehensive survey of the emerging applications of FL in IoT networks, beginning from an introduction to the recent advances in FL and IoT to a discussion of their integration. Particularly, we explore and analyze the potential of FL for enabling a wide range of IoT services, including IoT data sharing, data offloading and caching, attack detection, localization, mobile crowdsensing, and IoT privacy and security. We then provide an extensive survey of the use of FL in various key IoT applications such as smart healthcare, smart transportation, Unmanned Aerial Vehicles (UAVs), smart cities, and smart industry. The important lessons learned from this review of the FL-IoT services and applications are also highlighted. We complete this survey by highlighting the current challenges and possible directions for future research in this booming area.

319 citations


Journal ArticleDOI
TL;DR: This survey article proposes to answer the question: how to train distributed machine learning models for resource-constrained IoT devices, and highlights an overview of FL and provides a comprehensive survey of the problem statements and emerging challenges.
Abstract: Federated learning (FL) is a distributed machine learning strategy that generates a global model by learning from multiple decentralized edge clients. FL enables on-device training, keeping the client’s local data private, and further, updating the global model based on the local model updates. While FL methods offer several advantages, including scalability and data privacy, they assume there are available computational resources at each edge-device/client. However, the Internet-of-Things (IoTs) enabled devices, e.g., robots, drone swarms, and low-cost computing devices (e.g., Raspberry Pi), may have limited processing ability, low bandwidth and power, or limited storage capacity. In this survey paper, we propose to answer this question: how to train distributed machine learning models for resource-constrained IoT devices? To this end, we first explore the existing studies on FL, relative assumptions for distributed implementation using IoT devices, and explore their drawbacks. We then discuss the implementation challenges and issues when applying FL to an IoT environment. We highlight an overview of FL and provide a comprehensive survey of the problem statements and emerging challenges, particularly during applying FL within heterogeneous IoT environments. Finally, we point out the future research directions for scientists and researchers who are interested in working at the intersection of FL and resource-constrained IoT environments.

197 citations


Journal ArticleDOI
TL;DR: The IoT/IIoT critical infrastructure in industry 4.0 is introduced, and then the blockchain and edge computing paradigms are briefly presented, and it is shown how the convergence of these two paradigm can enable secure and scalable critical infrastructures.
Abstract: Critical infrastructure systems are vital to underpin the functioning of a society and economy. Due to the ever-increasing number of Internet-connected Internet-of-Things (IoT)/Industrial IoT (IIoT), and the high volume of data generated and collected, security and scalability are becoming burning concerns for critical infrastructures in industry 4.0. The blockchain technology is essentially a distributed and secure ledger that records all the transactions into a hierarchically expanding chain of blocks. Edge computing brings the cloud capabilities closer to the computation tasks. The convergence of blockchain and edge computing paradigms can overcome the existing security and scalability issues. In this article, we first introduce the IoT/IIoT critical infrastructure in industry 4.0, and then we briefly present the blockchain and edge computing paradigms. After that, we show how the convergence of these two paradigms can enable secure and scalable critical infrastructures. Then, we provide a survey on the state of the art for security and privacy and scalability of IoT/IIoT critical infrastructures. A list of potential research challenges and open issues in this area is also provided, which can be used as useful resources to guide future research.

171 citations


Journal ArticleDOI
TL;DR: An optimal double-layer PBFT is proposed and it is proved that when the nodes are evenly distributed within the sub-groups in the second layer, the communication complexity is minimized and the security threshold is analyzed based on faulty probability determined (FPD) and faulty number determined models, respectively.
Abstract: Practical Byzantine Fault Tolerance (PBFT) consensus mechanism shows a great potential to break the performance bottleneck of the Proof-of-Work (PoW)-based blockchain systems, which typically support only dozens of transactions per second and require minutes to hours for transaction confirmation. However, due to frequent inter-node communications, PBFT mechanism has a poor node scalability and thus it is typically adopted in small networks. To enable PBFT in large systems such as massive Internet of Things (IoT) ecosystems and blockchain, in this article, a scalable multi-layer PBFT-based consensus mechanism is proposed by hierarchically grouping nodes into different layers and limiting the communication within the group. We first propose an optimal double-layer PBFT and show that the communication complexity is significantly reduced. Specifically, we prove that when the nodes are evenly distributed within the sub-groups in the second layer, the communication complexity is minimized. The security threshold is analyzed based on faulty probability determined (FPD) and faulty number determined (FND) models, respectively. We also provide a practical protocol for the proposed double-layer PBFT system. Finally, the results are extended to arbitrary-layer PBFT systems with communication complexity and security analysis. Simulation results verify the effectiveness of the analytical results.

160 citations


Journal ArticleDOI
TL;DR: In this paper, the authors discuss new materials and structural designs for the engineering of soft actuators with physical intelligence and advanced properties, such as adaptability, multimodal locomotion, self-healing and multi-responsiveness.
Abstract: Inspired by physically adaptive, agile, reconfigurable and multifunctional soft-bodied animals and human muscles, soft actuators have been developed for a variety of applications, including soft grippers, artificial muscles, wearables, haptic devices and medical devices. However, the complex performance of biological systems cannot yet be fully replicated in synthetic designs. In this Review, we discuss new materials and structural designs for the engineering of soft actuators with physical intelligence and advanced properties, such as adaptability, multimodal locomotion, self-healing and multi-responsiveness. We examine how performance can be improved and multifunctionality implemented by using programmable soft materials, and highlight important real-world applications of soft actuators. Finally, we discuss the challenges and opportunities for next-generation soft actuators, including physical intelligence, adaptability, manufacturing scalability and reproducibility, extended lifetime and end-of-life strategies. Soft actuators are flexible and compliant and thus perfectly suited to interact with the human body. This Review discusses tethered, untethered and biohybrid soft actuation strategies, highlights promising real-world applications of soft robots and identifies key future challenges, such as implementing physical intelligence and end-of-life strategies.

138 citations


Journal ArticleDOI
Yuzheng Li1, Chuan Chen1, Liu Nan1, Huawei Huang1, Zibin Zheng1, Qiang Yan 
TL;DR: This work proposes a decentralized federated learning framework based on blockchain, that is, a Block-chain-based Federated Learning framework with Committee consensus (BFLC), and devise an innovative committee consensus mechanism, which can effectively reduce the amount of consensus computing and reduce malicious attacks.
Abstract: Federated learning has been widely studied and applied to various scenarios, such as financial credit, medical identification, and so on. Under these settings, federated learning protects users from exposing their private data, while cooperatively training a shared machine learning algorithm model (i.e., the global model) for a variety of realworld applications. The only data exchanged is the gradient of the model or the updated model (i.e., the local model update). However, the security of federated learning is increasingly being questioned, due to the malicious clients or central servers' constant attack on the global model or user privacy data. To address these security issues, we propose a decentralized federated learning framework based on blockchain, that is, a Block-chain-based Federated Learning framework with Committee consensus (BFLC). Without a centralized server, the framework uses blockchain for the global model storage and the local model update exchange. To enable the proposed BFLC, we also devise an innovative committee consensus mechanism, which can effectively reduce the amount of consensus computing and reduce malicious attacks. We then discuss the scalability of BFLC, including theoretical security, storage optimization, and incentives. Finally, based on a FISCO blockchain system, we perform experiments using an AlexNet model on several frameworks with a real-world dataset FEMNIST. The experimental results demonstrate the effectiveness and security of the BFLC framework.

137 citations


Journal ArticleDOI
TL;DR: The proposed initial access algorithm and pilot assignment schemes outperform their corresponding benchmarks, P-LSFD achieves scalability with a negligible performance loss compared to the conventional optimal large-scale fading decoding, and scalable fractional power control provides a controllable trade-off between user fairness and the average SE.
Abstract: How to meet the demand for increasing number of users, higher data rates, and stringent quality-of-service (QoS) in the beyond fifth-generation (B5G) networks? Cell-free massive multiple-input multiple-output (MIMO) is considered as a promising solution, in which many wireless access points cooperate to jointly serve the users by exploiting coherent signal processing. However, there are still many unsolved practical issues in cell-free massive MIMO systems, whereof scalable massive access implementation is one of the most vital. In this paper, we propose a new framework for structured massive access in cell-free massive MIMO systems, which comprises one initial access algorithm, a partial large-scale fading decoding (P-LSFD) strategy, two pilot assignment schemes, and one fractional power control policy. New closed-form spectral efficiency (SE) expressions with maximum ratio (MR) combining are derived. The simulation results show that our proposed framework provides high SE when using local partial minimum mean-square error (LP-MMSE) and MR combining. Specifically, the proposed initial access algorithm and pilot assignment schemes outperform their corresponding benchmarks, P-LSFD achieves scalability with a negligible performance loss compared to the conventional optimal large-scale fading decoding (LSFD), and scalable fractional power control provides a controllable trade-off between user fairness and the average SE.

126 citations


Journal ArticleDOI
TL;DR: The Partitional Implementation of Unified Form (PIUF) algorithm is designed and formulated to be used on a single machine if the processed dataset is very big and it cannot be entirely loaded in the memory and it can be scaled to multiple processing nodes for reducing the processing time required to find the optimal solution.
Abstract: This paper proposes as an element of novelty the Unified Form (UF) clustering algorithm, which treats Fuzzy C-Means (FCM) and K-Means (KM) algorithms as a single configurable algorithm. UF algorithm was designed to facilitate the FCM and KM algorithms software implementation by offering a solution to implement a single algorithm, which can be configured to work as FCM or KM. The second element of novelty of this paper is the Partitional Implementation of Unified Form (PIUF) algorithm, which is built upon the UF algorithm and designed to solve in an elegant manner the challenges of processing large datasets in a sequential manner and the scalability of the UF algorithm for processing datasets of any size. PIUF algorithm has the advantage of overcoming any possible hardware limitations that can occur if large volumes of data are processed (required to be stored, loaded in memory and processed by a certain specified computational system). PIUF algorithm is designed and formulated to be used on a single machine if the processed dataset is very big and it cannot be entirely loaded in the memory; at the same time it can be scaled to multiple processing nodes for reducing the processing time required to find the optimal solution. UF and PIUF algorithms are implemented and validated in BigTim platform, which is a distributed platform developed by the authors, and offers support for processing various datasets in a parallel manner but they can be implemented in any other data processing platforms. The Iris dataset is considered and next modified to obtain different datasets of different sizes in order to test the algorithms implementations in BigTim platform in different configurations. The analysis of PIUF algorithm and the comparison with FCM, KM and DBSCAN clustering algorithms are carried out using two performance indices; three performance indices are employed to evaluate the quality of the obtained clusters.

109 citations


Journal ArticleDOI
TL;DR: In this paper, a momentum-incorporated parallel stochastic gradient descent (MPSGD) algorithm is proposed to accelerate the convergence rate by integrating momentum effects into its training process.
Abstract: A recommender system (RS) relying on latent factor analysis usually adopts stochastic gradient descent (SGD) as its learning algorithm. However, owing to its serial mechanism, an SGD algorithm suffers from low efficiency and scalability when handling large-scale industrial problems. Aiming at addressing this issue, this study proposes a momentum-incorporated parallel stochastic gradient descent (MPSGD) algorithm, whose main idea is two-fold: a) implementing parallelization via a novel data-splitting strategy, and b) accelerating convergence rate by integrating momentum effects into its training process. With it, an MPSGD-based latent factor (MLF) model is achieved, which is capable of performing efficient and high-quality recommendations. Experimental results on four high-dimensional and sparse matrices generated by industrial RS indicate that owing to an MPSGD algorithm, an MLF model outperforms the existing state-of-the-art ones in both computational efficiency and scalability.

108 citations


Journal ArticleDOI
TL;DR: A comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, distillation algorithms and applications is provided.
Abstract: In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher–student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.

105 citations


Journal ArticleDOI
TL;DR: In this article, an HVAC control algorithm is proposed to solve the Markov game based on multi-agent deep reinforcement learning with attention mechanism, which does not require any prior knowledge of uncertain parameters and can operate without knowing building thermal dynamics models.
Abstract: In commercial buildings, about 40%–50% of the total electricity consumption is attributed to Heating, Ventilation, and Air Conditioning (HVAC) systems, which places an economic burden on building operators In this paper, we intend to minimize the energy cost of an HVAC system in a multi-zone commercial building with the consideration of random zone occupancy, thermal comfort, and indoor air quality comfort Due to the existence of unknown thermal dynamics models, parameter uncertainties (eg, outdoor temperature, electricity price, and number of occupants), spatially and temporally coupled constraints associated with indoor temperature and CO2 concentration, a large discrete solution space, and a non-convex and non-separable objective function, it is very challenging to achieve the above aim To this end, the above energy cost minimization problem is reformulated as a Markov game Then, an HVAC control algorithm is proposed to solve the Markov game based on multi-agent deep reinforcement learning with attention mechanism The proposed algorithm does not require any prior knowledge of uncertain parameters and can operate without knowing building thermal dynamics models Simulation results based on real-world traces show the effectiveness, robustness and scalability of the proposed algorithm

Proceedings ArticleDOI
21 Jan 2021
TL;DR: SSTVOS as discussed by the authors extracts per-pixel representations for each object in a video using sparse attention over spatio-temporal features, which allows a model to learn to attend over a history of multiple frames and provides suitable inductive bias for performing correspondence-like computations necessary for solving motion segmentation.
Abstract: In this paper we introduce a Transformer-based approach to video object segmentation (VOS). To address compounding error and scalability issues of prior work, we propose a scalable, end-to-end method for VOS called Sparse Spatiotemporal Transformers (SST). SST extracts per-pixel representations for each object in a video using sparse attention over spatiotemporal features. Our attention-based formulation for VOS allows a model to learn to attend over a history of multiple frames and provides suitable inductive bias for performing correspondence-like computations necessary for solving motion segmentation. We demonstrate the effectiveness of attention-based over recurrent networks in the spatiotemporal domain. Our method achieves competitive results on YouTube-VOS and DAVIS 2017 with improved scalability and robustness to occlusions compared with the state of the art. Code is available at https://github.com/dukebw/SSTVOS.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a scalable graph learning framework, which is based on the ideas of anchor points and bipartite graph to solve the problem of expensive time overhead, and cannot generalize to unseen data points.
Abstract: Graph-based subspace clustering methods have exhibited promising performance. However, they still suffer some of these drawbacks: they encounter the expensive time overhead, they fail to explore the explicit clusters, and cannot generalize to unseen data points. In this work, we propose a scalable graph learning framework, seeking to address the above three challenges simultaneously. Specifically, it is based on the ideas of anchor points and bipartite graph. Rather than building an n x n graph, where n is the number of samples, we construct a bipartite graph to depict the relationship between samples and anchor points. Meanwhile, a connectivity constraint is employed to ensure that the connected components indicate clusters directly. We further establish the connection between our method and the K-means clustering. Moreover, a model to process multiview data is also proposed, which is linearly scaled with respect to n. Extensive experiments demonstrate the efficiency and effectiveness of our approach with respect to many state-of-the-art clustering methods.

Proceedings ArticleDOI
13 Feb 2021
TL;DR: In this paper, a scalable neural-network inference accelerator in 16nm is presented, based on an array of programmable cores employing mixed-signal In-Memory Computing (IMC), digital near-memory computing (NMC), and localized buffering/control.
Abstract: This paper presents a scalable neural-network (NN) inference accelerator in 16nm, based on an array of programmable cores employing mixed-signal In-Memory Computing (IMC), digital Near-Memory Computing (NMC), and localized buffering/control. IMC achieves high energy efficiency and throughput for matrix-vector multiplications (MVMs), which dominate NNs; but, scalability poses numerous challenges, both technologically, going to advanced nodes to maintain gains over digital architectures, and architecturally, for full execution of diverse NNs. Recent demonstrations have explored integrating IMC in programmable processors [1, 2], but have not achieved IMC efficiency and throughput for full executions. The central challenge is drastically different physical design points and associated tradeoffs incurred by IMC compared to digital engines. Namely, IMC substantially increases compute energy efficiency and HW density/parallelism, but retains the overheads of HW virtualization (state and data swapping/buffering/communication across spatial/temporal computation mappings). The demonstrated architecture is co-designed with SW-mapping algorithms (encapsulated in a custom graph compiler), to provide efficiency across a broad range of mapping strategies, to overcome these overheads.

Proceedings ArticleDOI
Samyam Rajbhandari1, Olatunji Ruwase1, Jeff Rasley1, Shaden Smith1, Yuxiong He1 
14 Nov 2021
TL;DR: The ZeRO-Infinity project as mentioned in this paper leverages GPU, CPU, and NVMe memory to allow for unprecedented model scale on limited resources without requiring model code refactoring.
Abstract: In the last three years, the largest dense deep learning models have grown over 1000x to reach hundreds of billions of parameters, while the GPU memory has only grown by 5x (16 GB to 80 GB). Therefore, the growth in model scale has been supported primarily though system innovations that allow large models to fit in the aggregate GPU memory of multiple GPUs. However, we are getting close to the GPU memory wall. It requires 800 NVIDIA V100 GPUs just to fit a trillion parameter model for training, and such clusters are simply out of reach for most data scientists. In addition, training models at that scale requires complex combinations of parallelism techniques that puts a big burden on the data scientists to refactor their model. In this paper we present ZeRO-Infinity, a novel heterogeneous system technology that leverages GPU, CPU, and NVMe memory to allow for unprecedented model scale on limited resources without requiring model code refactoring. At the same time it achieves excellent training throughput and scalability, unencumbered by the limited CPU or NVMe bandwidth. ZeRO-Infinity can fit models with tens and even hundreds of trillions of parameters for training on current generation GPU clusters. It can be used to fine-tune trillion parameter models on a single NVIDIA DGX-2 node, making large models more accessible. In terms of training throughput and scalability, it sustains over 25 petaflops on 512 NVIDIA V100 GPUs (40% of peak), while also demonstrating super linear scalability. An open source implementation of ZeRO-Infinity is available through DeepSpeed 1.

Journal ArticleDOI
TL;DR: This article proposes a graph theory based algorithm to efficiently solve the data sharing problem, which is formulated as a maximum weighted independent set problem on the constructed conflict graph, and proposes a balanced greedy algorithm, which can make the content distribution more balanced.
Abstract: It is widely recognized that connected vehicles have the potential to further improve the road safety, transportation intelligence and enhance the in-vehicle entertainment. By leveraging the 5G enabled Vehicular Ad hoc NETworks (VANET) technology, which is referred to as 5G-VANET, a flexible software-defined communication can be achieved with ultra-high reliability, low latency, and high capacity. Many enabling applications in 5G-VANET rely on sharing mobile data among vehicles, which is still a challenging issue due to the extremely large data volume and the prohibitive cost of transmitting such data using 5G cellular networks. This article focuses on efficient cooperative data sharing in edge computing assisted 5G-VANET. First, to enable efficient cooperation between cellular communication and Dedicated Short-Range Communication (DSRC), we first propose a software-defined cooperative data sharing architecture in 5G-VANET. The cellular link allows the communications between OpenFlow enabled vehicles and the Controller to collect contextual information, while the DSRC serves as the data plane, enabling cooperative data sharing among adjacent vehicles. Second, we propose a graph theory based algorithm to efficiently solve the data sharing problem, which is formulated as a maximum weighted independent set problem on the constructed conflict graph. Specifically, considering the continuous data sharing, we propose a balanced greedy algorithm, which can make the content distribution more balanced. Furthermore, due to the fixed amount of computing resources allocated to this software-defined cooperative data sharing service, we propose an integer linear programming based decomposition algorithm to make full use of the computing resources. Extensive simulations in NS3 and SUMO demonstrate the superiority and scalability of the proposed software-defined architecture and cooperative data sharing algorithms.

Journal ArticleDOI
TL;DR: A Secured Privacy-Preserving Framework for smart agricultural UAVs is proposed and it is identified that the SP2F framework outperforms several state-of-the-art techniques in both non-blockchain and blockchain frameworks.

Journal ArticleDOI
TL;DR: In this article, the authors presented an optimized energy-efficient and secure blockchain-based software-defined IoT framework for smart networks, which ensures efficient cluster-head selection and secure network communication via the identification and isolation of rouge switches.
Abstract: Software-Defined Networking (SDN) and Blockchain are leading technologies used worldwide to establish safe network communication as well as build secure network infrastructures They provide a robust and reliable platform to address threats and face challenges such as security, privacy, flexibility, scalability, and confidentiality Driven by these assumptions, this paper presents an optimized energy-efficient and secure Blockchain-based software-defined IoT framework for smart networks Indeed, SDN and Blockchain technologies have proven to be able to suitably manage resource utilization and to develop secure network communication across the IoT ecosystem However, there is a lack of research works that present a comprehensive definition of such a framework that can meet the requirements of the IoT ecosystem (ie efficient energy utilization and reduced end-to-end delay) Therefore, in this research, we present a layered hierarchical architecture for the deployment of a distributed yet efficient Blockchain-enabled SDN-IoT framework that ensures efficient cluster-head selection and secure network communication via the identification and isolation of rouge switches Besides, the Blockchain-enabled flow-rules record keeps track of the rules enforced in the switches and maintains the consistency within the controller cluster Finally, we assess the performance of the proposed framework in a simulation environment and show that it can achieve optimized energy-utilization, end-to-end delay, and throughput compared to considered baselines, thus being able to achieve efficiency and security in the smart network

Journal ArticleDOI
TL;DR: In this paper, the authors presented a biological-inspired cognitive supercomputing system (BiCoSS) that integrates multiple granules (GRs) of SNNs to realize a hybrid compatible neuromorphic platform.
Abstract: The further exploration of the neural mechanisms underlying the biological activities of the human brain depends on the development of large-scale spiking neural networks (SNNs) with different categories at different levels, as well as the corresponding computing platforms. Neuromorphic engineering provides approaches to high-performance biologically plausible computational paradigms inspired by neural systems. In this article, we present a biological-inspired cognitive supercomputing system (BiCoSS) that integrates multiple granules (GRs) of SNNs to realize a hybrid compatible neuromorphic platform. A scalable hierarchical heterogeneous multicore architecture is presented, and a synergistic routing scheme for hybrid neural information is proposed. The BiCoSS system can accommodate different levels of GRs and biological plausibility of SNN models in an efficient and scalable manner. Over four million neurons can be realized on BiCoSS with a power efficiency of 2.8k larger than the GPU platform, and the average latency of BiCoSS is 3.62 and 2.49 times higher than conventional architectures of digital neuromorphic systems. For the verification, BiCoSS is used to replicate various biological cognitive activities, including motor learning, action selection, context-dependent learning, and movement disorders. Comprehensively considering the programmability, biological plausibility, learning capability, computational power, and scalability, BiCoSS is shown to outperform the alternative state-of-the-art works for large-scale SNN, while its real-time computational capability enables a wide range of potential applications.

Journal ArticleDOI
TL;DR: PolyShard as mentioned in this paper is a coded storage and computation protocol for blockchains, which achieves information-theoretic upper bounds on the efficiency of the storage, system throughput, as well as on trust.
Abstract: Today’s blockchain designs suffer from a trilemma claiming that no blockchain system can simultaneously achieve decentralization, security, and performance scalability. For current blockchain systems, as more nodes join the network, the efficiency of the system (computation, communication, and storage) stays constant at best. A leading idea for enabling blockchains to scale efficiency is the notion of sharding: different subsets of nodes handle different portions of the blockchain, thereby reducing the load for each individual node. However, existing sharding proposals achieve efficiency scaling by compromising on trust - corrupting the nodes in a given shard will lead to the permanent loss of the corresponding portion of data. In this paper, we settle the trilemma by demonstrating a new protocol for coded storage and computation in blockchains. In particular, we propose PolyShard : “polynomially coded sharding” scheme that achieves information-theoretic upper bounds on the efficiency of the storage, system throughput, as well as on trust, thus enabling a truly scalable system. We provide simulation results that numerically demonstrate the performance improvement over state of the arts, and the scalability of the PolyShard system. Finally, we discuss potential enhancements, and highlight practical considerations in building such a system.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this paper, a fine-grained cross-attention architecture was proposed to improve the performance of the transformer-based model. But, this approach is often inapplicable in practice for large-scale retrieval given the cost of the crossattention mechanisms required for each sample at test time.
Abstract: Our objective is language-based search of large-scale image and video datasets. For this task, the approach that consists of independently mapping text and vision to a joint embedding space, a.k.a. dual encoders, is attractive as retrieval scales and is efficient for billions of images using approximate nearest neighbour search. An alternative approach of using vision-text transformers with cross-attention gives considerable improvements in accuracy over the joint embeddings, but is often inapplicable in practice for large-scale retrieval given the cost of the cross-attention mechanisms required for each sample at test time. This work combines the best of both worlds. We make the following three contributions. First, we equip transformer-based models with a new fine-grained cross-attention architecture, providing significant improvements in retrieval accuracy whilst preserving scalability. Second, we introduce a generic approach for combining a Fast dual encoder model with our Slow but accurate transformer-based model via distillation and reranking. Finally, we validate our approach on the Flickr30K image dataset where we show an increase in inference speed by several orders of magnitude while having results competitive to the state of the art. We also extend our method to the video domain, improving the state of the art on the VATEX dataset.

Journal ArticleDOI
TL;DR: The Optimal Resource Provisioning (ORP) algorithms with different instances are developed, so as to optimize the computation capacity of edge hosts and meanwhile dynamically adjust the cloud tenancy strategy, and are proved to be with polynomial computational complexity.
Abstract: Mobile edge computing is emerging as a new computing paradigm that provides enhanced experience to mobile users via low latency connections and augmented computation capacity. As the amount of user requests is time-varying, while the computation capacity of edge hosts is limited, Cloud Assisted Mobile Edge (CAME) computing framework is introduced to improve the scalability of the edge platform. By outsourcing mobile requests to clouds with various types of instances, the CAME framework can accommodate dynamic mobile requests with diverse quality of service requirements. In order to provide guaranteed services at minimal system cost, the edge resource provisioning and cloud outsourcing of the CAME framework should be carefully designed in a cost-efficient manner. Specifically, two fundamental issues should be answered: (1) what is the optimal edge computation capacity configuration? and (2) what types of cloud instances should be tenanted and what is the amount of each type? To solve these issues, we formulate the resource provisioning in CAME framework as an optimization problem. By exploiting the piecewise convex property of this problem, the Optimal Resource Provisioning (ORP) algorithms with different instances are proposed, so as to optimize the computation capacity of edge hosts and meanwhile dynamically adjust the cloud tenancy strategy. The proposed algorithms are proved to be with polynomial computational complexity. To evaluate the performance of the ORP algorithms, extensive simulations and experiments are conducted based on both the widely-used traffic models and the Google cluster usage tracelogs, respectively. It is shown that the proposed ORP algorithms outperform the local-first and cloud-first benchmark algorithms in system flexibility and cost-efficiency.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a scalable neuromorphic fault-tolerant context-dependent learning (FCL) hardware framework, which can learn associations between stimulation and response in two contextdependent learning tasks from experimental neuroscience, despite possible faults in the hardware nodes.
Abstract: Neuromorphic computing is a promising technology that realizes computation based on event-based spiking neural networks (SNNs). However, fault-tolerant on-chip learning remains a challenge in neuromorphic systems. This study presents the first scalable neuromorphic fault-tolerant context-dependent learning (FCL) hardware framework. We show how this system can learn associations between stimulation and response in two context-dependent learning tasks from experimental neuroscience, despite possible faults in the hardware nodes. Furthermore, we demonstrate how our novel fault-tolerant neuromorphic spike routing scheme can avoid multiple fault nodes successfully and can enhance the maximum throughput of the neuromorphic network by 0.9%-16.1% in comparison with previous studies. By utilizing the real-time computational capabilities and multiple-fault-tolerant property of the proposed system, the neuronal mechanisms underlying the spiking activities of neuromorphic networks can be readily explored. In addition, the proposed system can be applied in real-time learning and decision-making applications, brain-machine integration, and the investigation of brain cognition during learning.

Journal ArticleDOI
TL;DR: In this article, the fundamental tradeoffs between communication performance, computational complexity, and fronthaul signaling requirements are thoroughly analyzed, while open problems related to these and other resource allocation problems are reviewed.
Abstract: Imagine a coverage area where each mobile device is communicating with a preferred set of wireless access points (among many) that are selected based on its needs and cooperate to jointly serve it, instead of creating autonomous cells. This effectively leads to a user-centric post-cellular network architecture, which can resolve many of the interference issues and service-quality variations that appear in cellular networks. This concept is called User-centric Cell-free Massive MIMO (multiple-input multiple-output) and has its roots in the intersection between three technology components: Massive MIMO, coordinated multipoint processing, and ultra-dense networks. The main challenge is to achieve the benefits of cell-free operation in a practically feasible way, with computational complexity and fronthaul requirements that are scalable to enable massively large networks with many mobile devices. This monograph covers the foundations of User-centric Cell-free Massive MIMO, starting from the motivation and mathematical definition. It continues by describing the state-of-the-art signal processing algorithms for channel estimation, uplink data reception, and downlink data transmission with either centralized or distributed implementation. The achievable spectral efficiency is mathematically derived and evaluated numerically using a running example that exposes the impact of various system parameters and algorithmic choices. The fundamental tradeoffs between communication performance, computational complexity, and fronthaul signaling requirements are thoroughly analyzed. Finally, the basic algorithms for pilot assignment, dynamic cooperation cluster formation, and power optimization are provided, while open problems related to these and other resource allocation problems are reviewed. All the numerical examples can be reproduced using the accompanying Matlab code.

Journal ArticleDOI
TL;DR: A comprehensive overview of fault tolerance-related issues in cloud computing is presented, emphasizing upon the significant concepts, architectural details, and the state-of-art techniques and methods.

Journal ArticleDOI
TL;DR: A collection of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications, is presented, aiming to establish a common toolbox to guide both performance engineers and compiler engineers in tapping into the performance potential offered by spatial computing architectures using HLS.
Abstract: Spatial computing architectures promise a major stride in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target spatial computing architectures, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes, due to fundamentally distinct aspects of hardware design, such as programming for deep pipelines, distributed memory resources, and scalable routing. To alleviate this, we present a collection of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. We systematically identify classes of transformations (pipelining, scalability, and memory), the characteristics of their effect on the HLS code and the resulting hardware (e.g., increasing data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip dataflow, allowing for massively parallel architectures. To quantify the effect of various transformations, we cover the optimization process of a sample set of HPC kernels, provided as open source reference codes. We aim to establish a common toolbox to guide both performance engineers and compiler engineers in tapping into the performance potential offered by spatial computing architectures using HLS.

Journal ArticleDOI
TL;DR: A distributed pricing strategy for P2P transactive energy systems considering voltage and line congestion management, which can be utilized in various power network topologies is presented and a new mutual reputation index is introduced as a product differentiation between the prosumers to consider their bilateral trading willingness.
Abstract: In recent years, the rapid growth of active consumers in the distribution networks transforms the modern power markets’ structure more independent, flexible, and distributed. Specifically, in the recent trend of peer-to-peer (P2P) transactive energy systems, the traditional consumers became prosumers (producer+consumer) who can maximize their energy utilization by sharing it with neighbors without any conventional arbitrator in the transactions. Although a distributed energy pricing scheme is inevitable in such systems to make optimal decisions, it is challenging to establish under the influence of non-linear physical network constraints with limited information. Therefore, this paper presents a distributed pricing strategy for P2P transactive energy systems considering voltage and line congestion management, which can be utilized in various power network topologies. This paper also introduces a new mutual reputation index as a product differentiation between the prosumers to consider their bilateral trading willingness. In this paper, a Fast Alternating Direction Method of Multipliers (F-ADMM) algorithm is realized instead of the standard ADMM algorithm to improve the convergence rate. The effectiveness of the proposed approach is validated through software simulations. The result shows that the algorithm is scalable, converges faster, facilitates easy implementation, and ensures maximum social welfare/profit.

Proceedings ArticleDOI
23 May 2021
TL;DR: CryptGPU as discussed by the authors is a system for privacy-preserving machine learning that implements all operations on the GPU (graphics processing unit) and achieves state-of-the-art performance on convolutional neural networks.
Abstract: We introduce CryptGPU, a system for privacy-preserving machine learning that implements all operations on the GPU (graphics processing unit). Just as GPUs played a pivotal role in the success of modern deep learning, they are also essential for realizing scalable privacy-preserving deep learning. In this work, we start by introducing a new interface to losslessly embed cryptographic operations over secret-shared values (in a discrete domain) into floating-point operations that can be processed by highly-optimized CUDA kernels for linear algebra. We then identify a sequence of "GPU-friendly" cryptographic protocols to enable privacy-preserving evaluation of both linear and non-linear operations on the GPU. Our microbenchmarks indicate that our private GPU-based convolution protocol is over 150× faster than the analogous CPU-based protocol; for non-linear operations like the ReLU activation function, our GPU-based protocol is around 10× faster than its CPU analog. With CryptGPU, we support private inference and training on convolutional neural networks with over 60 million parameters as well as handle large datasets like ImageNet. Compared to the previous state-of-the-art, our protocols achieve a 2× to 8× improvement in private inference for large networks and datasets. For private training, we achieve a 6× to 36× improvement over prior state-of-the-art. Our work not only showcases the viability of performing secure multiparty computation (MPC) entirely on the GPU to newly enable fast privacy-preserving machine learning, but also highlights the importance of designing new MPC primitives that can take full advantage of the GPU’s computing capabilities.

Journal ArticleDOI
TL;DR: This manuscript has analyzed more than forty SDN controllers in terms of following performance parameters: scalability, reliability, consistency and security, and examined the mechanisms used by variousSDN controllers to address the said performance parameters.
Abstract: Software Defined Networking simplifies design, monitoring and management of next generation networks by segregating a legacy network into a centralized control plane and a remotely programmable data plane. The intelligent centralized SDN control plane controls behavior of forwarding devices in processing the incoming packets and provides a bird-eye view of entire network at a single central point. The centralized control provides network programmability and facilitates introduction of adaptive and automatic network control. The SDN control plane can be implemented by using following three deployment models: (i) physically centralized, in which a single SDN controller is configured for a network; (ii) physically distributed but logically centralized, wherein multiple SDN controllers are used to manage a network; and (iii) hybrid, in which both legacy distributed control and centralized SDN control coexist. This manuscript presents all these control plane architectures and discusses various SDN controllers supporting these architectures. We have analyzed more than forty SDN controllers in terms of following performance parameters: scalability, reliability, consistency and security. We have examined the mechanisms used by various SDN controllers to address the said performance parameters and have highlighted the pros and cons associated with each mechanism. In addition to it, this manuscript also highlights number of research challenges and open issues in different SDN control plane architectures.

Journal ArticleDOI
TL;DR: In this paper, two new C2H6-selective MOF adsorbents (NKMOF-8-Br and -Me) were designed and synthesized with ultrahigh chemical and thermal stability, including water resistance.
Abstract: The development of new techniques and materials that can separate ethylene from ethane is highly relevant in modern applications. Although adsorption-based separation techniques using metal-organic frameworks (MOFs) have gained increasing attention, the relatively low stability (especially water resistance) and unscalable synthesis of MOFs severely limit their application in real industrial scenarios. Addressing these challenges, we rationally designed and synthesized two new C2H6-selective MOF adsorbents (NKMOF-8-Br and -Me) with ultrahigh chemical and thermal stability, including water resistance. Attributed to the nonpolar/hydrophobic pore environments and appropriate pore apertures, the MOFs can capture C2 hydrocarbon gases at ambient conditions even in high humidity. The single-crystal structures of gas@NKMOF-8 realized the direct visualization of adsorption sites of the gases. Both the single-crystal data and simulated data elucidate the mechanism of selective adsorption. Moreover, the NKMOF-8 possesses high C2H6 adsorption capacity and high selectivity, allowing for efficient C2H6/C2H4 separation, as verified by experimental breakthrough tests. Most importantly, NKMOF-8-Br and -Me can be scalably synthesized through stirring at room temperature in minutes, which confers them with great potential for industrial application. This work offers new adsorbents that can address major chemical industrial challenges and provides an in-depth understanding of the gas binding sites in a visual manner.