scispace - formally typeset
Search or ask a question

Showing papers on "Scalability published in 2020"


Proceedings ArticleDOI
Mingxing Tan1, Ruoming Pang1, Quoc V. Le1
14 Jun 2020
TL;DR: EfficientDetD7 as discussed by the authors proposes a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion, and a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time.
Abstract: Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion; Second, we propose a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time. Based on these optimizations and EfficientNet backbones, we have developed a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior art across a wide spectrum of resource constraints. In particular, with single-model and single-scale, our EfficientDetD7 achieves state-of-the-art 52.2 AP on COCO test-dev with 52M parameters and 325B FLOPs, being 4x – 9x smaller and using 13x – 42x fewer FLOPs than previous detector.

3,423 citations


Journal ArticleDOI
TL;DR: A comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications can be found in this paper.
Abstract: In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.

1,027 citations


Journal ArticleDOI
TL;DR: In this article, a Deep Reinforcement Learning-based Online Offloading (DROO) framework is proposed to optimize task offloading decisions and wireless resource allocation to the time-varying wireless channel conditions.
Abstract: Wireless powered mobile-edge computing (MEC) has recently emerged as a promising paradigm to enhance the data processing capability of low-power networks, such as wireless sensor networks and internet of things (IoT). In this paper, we consider a wireless powered MEC network that adopts a binary offloading policy, so that each computation task of wireless devices (WDs) is either executed locally or fully offloaded to an MEC server. Our goal is to acquire an online algorithm that optimally adapts task offloading decisions and wireless resource allocations to the time-varying wireless channel conditions. This requires quickly solving hard combinatorial optimization problems within the channel coherence time, which is hardly achievable with conventional numerical optimization methods. To tackle this problem, we propose a Deep Reinforcement learning-based Online Offloading (DROO) framework that implements a deep neural network as a scalable solution that learns the binary offloading decisions from the experience. It eliminates the need of solving combinatorial optimization problems, and thus greatly reduces the computational complexity especially in large-size networks. To further reduce the complexity, we propose an adaptive procedure that automatically adjusts the parameters of the DROO algorithm on the fly. Numerical results show that the proposed algorithm can achieve near-optimal performance while significantly decreasing the computation time by more than an order of magnitude compared with existing optimization methods. For example, the CPU execution latency of DROO is less than 0.1 second in a 30-user network, making real-time and optimal offloading truly viable even in a fast fading environment.

403 citations


Journal ArticleDOI
TL;DR: A novel framework called HealthFog is proposed for integrating ensemble deep learning in Edge computing devices and deployed it for a real-life application of automatic Heart Disease analysis.

387 citations


Journal ArticleDOI
TL;DR: In this article, a review of state-of-the-art scalable Gaussian process regression (GPR) models is presented, focusing on global and local approximations for subspace learning.
Abstract: The vast quantity of information brought by big data as well as the evolving computer hardware encourages success stories in the machine learning community. In the meanwhile, it poses challenges for the Gaussian process regression (GPR), a well-known nonparametric, and interpretable Bayesian model, which suffers from cubic complexity to data size. To improve the scalability while retaining desirable prediction quality, a variety of scalable GPs have been presented. However, they have not yet been comprehensively reviewed and analyzed to be well understood by both academia and industry. The review of scalable GPs in the GP community is timely and important due to the explosion of data size. To this end, this article is devoted to reviewing state-of-the-art scalable GPs involving two main categories: global approximations that distillate the entire data and local approximations that divide the data for subspace learning. Particularly, for global approximations, we mainly focus on sparse approximations comprising prior approximations that modify the prior but perform exact inference, posterior approximations that retain exact prior but perform approximate inference, and structured sparse approximations that exploit specific structures in kernel matrix; for local approximations, we highlight the mixture/product of experts that conducts model averaging from multiple local experts to boost predictions. To present a complete review, recent advances for improving the scalability and capability of scalable GPs are reviewed. Finally, the extensions and open issues of scalable GPs in various scenarios are reviewed and discussed to inspire novel ideas for future research avenues.

381 citations


Journal ArticleDOI
TL;DR: An optimization algorithm is developed to minimize the trace of the estimated ellipsoid set, and the effect from the adopted event-triggered threshold is thoroughly discussed as well.
Abstract: This paper is concerned with the distributed set-membership filtering problem for a class of general discrete-time nonlinear systems under event-triggered communication protocols over sensor networks. To mitigate the communication burden, each intelligent sensing node broadcasts its measurement to the neighboring nodes only when a predetermined event-based media-access condition is satisfied. According to the interval mathematics theory, a recursive distributed set-membership scheme is designed to obtain an ellipsoid set containing the target states of interest via adequately fusing the measurements from neighboring nodes, where both the accurate estimate on Lagrange remainder and the event-based media-access condition are skillfully utilized to improve the filter performance. Furthermore, such a scheme is only dependent on neighbor information and local adjacency weights, thereby fulfilling the scalability requirement of sensor networks. In addition, an optimization algorithm is developed to minimize the trace of the estimated ellipsoid set, and the effect from the adopted event-triggered threshold is thoroughly discussed as well. Finally, a simulation example is utilized to illustrate the usefulness of the proposed distributed set-membership filtering scheme.

271 citations


Journal ArticleDOI
TL;DR: The experimental results demonstrate that the proposed work can provide an efficient model and architecture for large-scale biologically meaningful networks, while the hardware synthesis results demonstrate low area utilization and high computational speed that supports the scalability of the approach.
Abstract: Multicompartment emulation is an essential step to enhance the biological realism of neuromorphic systems and to further understand the computational power of neurons. In this paper, we present a hardware efficient, scalable, and real-time computing strategy for the implementation of large-scale biologically meaningful neural networks with one million multi-compartment neurons (CMNs). The hardware platform uses four Altera Stratix III field-programmable gate arrays, and both the cellular and the network levels are considered, which provides an efficient implementation of a large-scale spiking neural network with biophysically plausible dynamics. At the cellular level, a cost-efficient multi-CMN model is presented, which can reproduce the detailed neuronal dynamics with representative neuronal morphology. A set of efficient neuromorphic techniques for single-CMN implementation are presented with all the hardware cost of memory and multiplier resources removed and with hardware performance of computational speed enhanced by 56.59% in comparison with the classical digital implementation method. At the network level, a scalable network-on-chip (NoC) architecture is proposed with a novel routing algorithm to enhance the NoC performance including throughput and computational latency, leading to higher computational efficiency and capability in comparison with state-of-the-art projects. The experimental results demonstrate that the proposed work can provide an efficient model and architecture for large-scale biologically meaningful networks, while the hardware synthesis results demonstrate low area utilization and high computational speed that supports the scalability of the approach.

240 citations


Posted Content
TL;DR: A new federated learning algorithm is proposed that jointly learns compact local representations on each device and a global model across all devices, which helps to keep device data private and enable communication-efficient training while retaining performance.
Abstract: Federated learning is a method of training models on private data distributed over multiple devices. To keep device data private, the global model is trained by only communicating parameters and updates which poses scalability challenges for large models. To this end, we propose a new federated learning algorithm that jointly learns compact local representations on each device and a global model across all devices. As a result, the global model can be smaller since it only operates on local representations, reducing the number of communicated parameters. Theoretically, we provide a generalization analysis which shows that a combination of local and global models reduces both variance in the data as well as variance across device distributions. Empirically, we demonstrate that local models enable communication-efficient training while retaining performance. We also evaluate on the task of personalized mood prediction from real-world mobile data where privacy is key. Finally, local models handle heterogeneous data from new devices, and learn fair representations that obfuscate protected attributes such as race, age, and gender.

235 citations


Journal ArticleDOI
TL;DR: A B5G framework is proposed that utilizes the 5G network's low-latency, high-bandwidth functionality to detect COVID-19 using chest X-ray or CT scan images, and to develop a mass surveillance system to monitor social distancing, mask wearing, and body temperature.
Abstract: Tactile edge technology that focuses on 5G or beyond 5G reveals an exciting approach to control infectious diseases such as COVID-19 internationally. The control of epidemics such as COVID-19 can be managed effectively by exploiting edge computation through the 5G wireless connectivity network. The implementation of a hierarchical edge computing system provides many advantages, such as low latency, scalability, and the protection of application and training model data, enabling COVID-19 to be evaluated by a dependable local edge server. In addition, many deep learning (DL) algorithms suffer from two crucial disadvantages: first, training requires a large COVID-19 dataset consisting of various aspects, which will pose challenges for local councils; second, to acknowledge the outcome, the findings of deep learning require ethical acceptance and clarification by the health care sector, as well as other contributors. In this article, we propose a B5G framework that utilizes the 5G network's low-latency, high-bandwidth functionality to detect COVID-19 using chest X-ray or CT scan images, and to develop a mass surveillance system to monitor social distancing, mask wearing, and body temperature. Three DL models, ResNet50, Deep tree, and Inception v3, are investigated in the proposed framework. Furthermore, blockchain technology is also used to ensure the security of healthcare data.

233 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper introduced a privacy-preserving machine learning technique named federated learning and proposed a Federated Learning-based Gated Recurrent Unit neural network algorithm (FedGRU), which differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism rather than directly sharing raw data among organizations.
Abstract: Existing traffic flow forecasting approaches by deep learning models achieve excellent success based on a large volume of datasets gathered by governments and organizations. However, these datasets may contain lots of user's private data, which is challenging the current prediction approaches as user privacy is calling for the public concern in recent years. Therefore, how to develop accurate traffic prediction while preserving privacy is a significant problem to be solved, and there is a trade-off between these two objectives. To address this challenge, we introduce a privacy-preserving machine learning technique named federated learning and propose a Federated Learning-based Gated Recurrent Unit neural network algorithm (FedGRU) for traffic flow prediction. FedGRU differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism rather than directly sharing raw data among organizations. In the secure parameter aggregation mechanism, we adopt a Federated Averaging algorithm to reduce the communication overhead during the model parameter transmission process. Furthermore, we design a Joint Announcement Protocol to improve the scalability of FedGRU. We also propose an ensemble clustering-based scheme for traffic flow prediction by grouping the organizations into clusters before applying FedGRU algorithm. Through extensive case studies on a real-world dataset, it is shown that FedGRU's prediction accuracy is 90.96% higher than the advanced deep learning models, which confirm that FedGRU can achieve accurate and timely traffic prediction without compromising the privacy and security of raw data.

217 citations


Journal ArticleDOI
TL;DR: This paper provides the review of neuromorphic CMOS-memristive architectures that can be integrated into edge computing devices and discusses why the neuromorphic architectures are useful for edge devices and shows the advantages, drawbacks, and open problems in the field of neuromemristive circuits for edge computing.
Abstract: The volume, veracity, variability, and velocity of data produced from the ever increasing network of sensors connected to Internet pose challenges for power management, scalability, and sustainability of cloud computing infrastructure. Increasing the data processing capability of edge computing devices at lower power requirements can reduce several overheads for cloud computing solutions. This paper provides the review of neuromorphic CMOS-memristive architectures that can be integrated into edge computing devices. We discuss why the neuromorphic architectures are useful for edge devices and show the advantages, drawbacks, and open problems in the field of neuromemristive circuits for edge computing.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: This paper presents a method, named Grid-GCN, for fast and scalable point cloud learning that uses a novel data structuring strategy, Coverage-Aware Grid Query (CAGQ), which improves spatial coverage while reducing the theoretical time complexity.
Abstract: Due to the sparsity and irregularity of the point cloud data, methods that directly consume points have become popular. Among all point-based models, graph convolutional networks (GCN) lead to notable performance by fully preserving the data granularity and exploiting point interrelation. However, point-based networks spend a significant amount of time on data structuring (e.g., Farthest Point Sampling (FPS) and neighbor points querying), which limit the speed and scalability. In this paper, we present a method, named Grid-GCN, for fast and scalable point cloud learning. Grid-GCN uses a novel data structuring strategy, Coverage-Aware Grid Query (CAGQ). By leveraging the efficiency of grid space, CAGQ improves spatial coverage while reducing the theoretical time complexity. Compared with popular sampling methods such as Farthest Point Sampling (FPS) and Ball Query, CAGQ achieves up to 50 times speed-up. With a Grid Context Aggregation (GCA) module, Grid-GCN achieves state-of-the-art performance on major point cloud classification and segmentation benchmarks with significantly faster runtime than previous studies. Remarkably, Grid-GCN achieves the inference speed of 50FPS on ScanNet using 81920 points as input. The supplementary xharlie.github.io/papers/GGCN_supCamReady.pdf and the code github.com/xharlie/Grid-GCN are released.

Journal ArticleDOI
TL;DR: This work introduces a privacy-preserving machine learning technique named federated learning (FL) and proposes an FL-based gated recurrent unit neural network algorithm (FedGRU) for traffic flow prediction (TFP) that differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism.
Abstract: Existing traffic flow forecasting approaches by deep learning models achieve excellent success based on a large volume of data sets gathered by governments and organizations. However, these data sets may contain lots of user’s private data, which is challenging the current prediction approaches as user privacy is calling for the public concern in recent years. Therefore, how to develop accurate traffic prediction while preserving privacy is a significant problem to be solved, and there is a tradeoff between these two objectives. To address this challenge, we introduce a privacy-preserving machine learning technique named federated learning (FL) and propose an FL-based gated recurrent unit neural network algorithm (FedGRU) for traffic flow prediction (TFP). FedGRU differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism rather than directly sharing raw data among organizations. In the secure parameter aggregation mechanism, we adopt a federated averaging algorithm to reduce the communication overhead during the model parameter transmission process. Furthermore, we design a joint announcement protocol to improve the scalability of FedGRU. We also propose an ensemble clustering-based scheme for TFP by grouping the organizations into clusters before applying the FedGRU algorithm. Extensive case studies on a real-world data set demonstrate that FedGRU can produce predictions that are merely 0.76 km/h worse than the state of the art in terms of mean average error under the privacy preservation constraint, confirming that the proposed model develops accurate traffic predictions without compromising the data privacy.

Journal ArticleDOI
TL;DR: Various architectures that support DNN executions in terms of computing units, dataflow optimization, targeted network topologies, architectures on emerging technologies, and accelerators for emerging applications are discussed.

Journal ArticleDOI
01 Apr 2020
TL;DR: The researcher found that the proposed mechanism shall enable low-latency fog computing services of the IoT applications that are a delay sensitive and reduces the communication delay significantly.
Abstract: In cloud–fog environments, the opportunity to avoid using the upstream communication channel from the clients to the cloud server all the time is possible by fluctuating the conventional concurrency control protocols. Through the present paper, the researcher aimed to introduce a new variant of the optimistic concurrency control protocol. Through the deployment of augmented partial validation protocol, IoT transactions that are read-only can be processed at the fog node locally. For final validation, update transactions are the only ones sent to the cloud. Moreover, the update transactions go through partial validation at the fog node which makes them more opportunist to commit at the cloud. This protocol reduces communication and computation at the cloud as much as possible while supporting scalability of the transactional services needed by the applications running in such environments. Based on numerical studies, the researcher assessed the partial validation procedure under three concurrency protocols. The study’s results indicate that employing the proposed mechanism shall generate benefits for IoT users. These benefits are obtained from transactional services. We evaluated the effect of deployment the partial validation at the fog node for the three concurrency protocols, namely AOCCRBSC, AOCCRB and STUBcast. We performed a set of intensive experiments to compare the three protocols with and without such deployment. The result reported a reduction in miss rate, restart rate and communication delay in all of them. The researcher found that the proposed mechanism reduces the communication delay significantly. They found that the proposed mechanism shall enable low-latency fog computing services of the IoT applications that are a delay sensitive.

Journal ArticleDOI
01 Jul 2020
TL;DR: A memristor-based annealing system that uses an analogue neuromorphic architecture based on a Hopfield neural network can solve non-deterministic polynomial-time (NP)-hard max-cut problems in an approach that is potentially more efficient than current quantum, optical and digital approaches.
Abstract: To tackle important combinatorial optimization problems, a variety of annealing-inspired computing accelerators, based on several different technology platforms, have been proposed, including quantum-, optical- and electronics-based approaches. However, to be of use in industrial applications, further improvements in speed and energy efficiency are necessary. Here, we report a memristor-based annealing system that uses an energy-efficient neuromorphic architecture based on a Hopfield neural network. Our analogue–digital computing approach creates an optimization solver in which massively parallel operations are performed in a dense crossbar array that can inject the needed computational noise through the analogue array and device errors, amplified or dampened by using a novel feedback algorithm. We experimentally show that the approach can solve non-deterministic polynomial-time (NP)-hard max-cut problems by harnessing the intrinsic hardware noise. We also use experimentally grounded simulations to explore scalability with problem size, which suggest that our memristor-based approach can offer a solution throughput over four orders of magnitude higher per power consumption relative to current quantum, optical and fully digital approaches. A memristor-based annealing system that uses an analogue neuromorphic architecture based on a Hopfield neural network can solve non-deterministic polynomial (NP)-hard max-cut problems in an approach that is potentially more efficient than current quantum, optical and digital approaches.

Proceedings ArticleDOI
18 May 2020
TL;DR: It is demonstrated how to use ZEXE to realize privacy-preserving analogues of popular applications: private user-defined assets and private decentralized exchanges for these assets.
Abstract: Ledger-based systems that support rich applications often suffer from two limitations. First, validating a transaction requires re-executing the state transition that it attests to. Second, transactions not only reveal which application had a state transition but also reveal the application’s internal state.We design, implement, and evaluate ZEXE, a ledger-based system where users can execute offline computations and subsequently produce transactions, attesting to the correctness of these computations, that satisfy two main properties. First, transactions hide all information about the offline computations. Second, transactions can be validated in constant time by anyone, regardless of the offline computation.The core of ZEXE is a construction for a new cryptographic primitive that we introduce, decentralized private computation (DPC) schemes. In order to achieve an efficient implementation of our construction, we leverage tools in the area of cryptographic proofs, including succinct zero knowledge proofs and recursive proof composition. Overall, transactions in ZEXE are 968 bytes regardless of the offline computation, and generating them takes less than 1min plus a time that grows with the offline computation.We demonstrate how to use ZEXE to realize privacy-preserving analogues of popular applications: private user-defined assets and private decentralized exchanges for these assets.

Journal ArticleDOI
TL;DR: The proposed model sufficiently exploits advantages of edge computing and blockchain to establish a privacy-preserving mechanism while considering other constraints, such as energy cost, and improves privacy protections without lowering down the performance in an energy-efficient manner.
Abstract: Contemporarily, two emerging techniques, blockchain and edge computing, are driving a dramatical rapid growth in the field of Internet-of-Things (IoT). Benefits of applying edge computing is an adoptable complementarity for cloud computing; blockchain is an alternative for constructing transparent secure environment for data storage/governance. Instead of using these two techniques independently, in this article, we propose a novel approach that integrates IoT with edge computing and blockchain, which is called blockchain-based Internet of Edge model. The proposed model, designed for a scalable and controllable IoT system, sufficiently exploits advantages of edge computing and blockchain to establish a privacy-preserving mechanism while considering other constraints, such as energy cost. We implement experiment evaluations running on Ethereum. According to our data collections, the proposed model improves privacy protections without lowering down the performance in an energy-efficient manner.

Journal ArticleDOI
TL;DR: A lightweight blockchain-based protocol called Directed Acyclic Graph-based V2G network (DV2G), which refers to any Distributed Ledger Technology (DLT) and not just the bitcoin chain of blocks, is proposed and is shown to be highly scalable and supports the micro-transactions required in V1G networks.
Abstract: The Vehicle-to-Grid (V2G) network is, where the battery-powered vehicles provide energy to the power grid, is highly emerging. A robust, scalable, and cost-optimal mechanism that can support the increasing number of transactions in a V2G network is required. Existing studies use traditional blockchain as to achieve this requirement. Blockchain-enabled V2G networks require a high computation power and are not suitable for micro-transactions due to the mining reward being higher than the transaction value itself. Moreover, the transaction throughput in the generic blockchain is too low to support the increasing number of frequent transactions in V2G networks. To address these challenges, in this paper, a lightweight blockchain-based protocol called Directed Acyclic Graph-based V2G network (DV2G) is proposed. Here blockchain refers to any Distributed Ledger Technology (DLT) and not just the bitcoin chain of blocks. A tangle data structure is used to record the transactions in the network in a secure and scalable manner. A game theory model is used to perform negotiation between the grid and vehicles at an optimized cost. The proposed model does not require the heavy computation associated to the addition of the transactions to the data structure and does not require any fees to post the transaction. The proposed model is shown to be highly scalable and supports the micro-transactions required in V2G networks.

Journal ArticleDOI
TL;DR: About 215 most important WSN clustering techniques are extracted, reviewed, categorized and classified based on clustering objectives and also the network properties such as mobility and heterogeneity, providing highly useful insights to the design of clustering Techniques in WSNs.

Journal ArticleDOI
TL;DR: A fully distributed framework to investigate the cooperative behavior of multiagent systems in the presence of distributed denial-of-service (DoS) attacks launched by multiple adversaries is developed and it is emphasized that the eigenvalue information of the Laplacian matrix is not required in the design of both the control protocol and event conditions.
Abstract: This paper develops a fully distributed framework to investigate the cooperative behavior of multiagent systems in the presence of distributed denial-of-service (DoS) attacks launched by multiple adversaries. In such an insecure network environment, two kinds of communication schemes, that is, sample-data and event-triggered communication schemes, are discussed. Then, a fully distributed control protocol with strong robustness and high scalability is well designed. This protocol guarantees asymptotic consensus against distributed DoS attacks. In this paper, “fully” emphasizes that the eigenvalue information of the Laplacian matrix is not required in the design of both the control protocol and event conditions. For the event-triggered case, two effective dynamical event-triggered schemes are proposed, which are independent of any global information. Such event-triggered schemes do not exhibit Zeno behavior even in the insecure environment. Finally, a simulation example is provided to verify the effectiveness of theoretical analysis.

Journal ArticleDOI
TL;DR: Efficient Lightweight integrated Blockchain (ELIB) model is developed to meet necessitates of IoT and shows maximum performance under several evaluation parameters, and is deployed in a smart home environment.

Journal ArticleDOI
TL;DR: This article aims to provide a comprehensive survey of distributed control and communication strategies in NMGs and advances in several promising communication and computation technologies and their potential applications in N MGs.
Abstract: Networked microgrids (NMGs) provide a promising solution for accommodating various distributed energy resources (DERs) and enhancing the system performance in terms of reliability, resilience, flexibility, and energy efficiency. With the penetration of MGs, the communication-based distributed control is playing an increasingly important role in NMGs for coordinating a multitude of heterogeneous and spatially distributed DERs, which feature enhanced efficiency, reliability, resilience, scalability, and privacy-preserving as compared with conventional centralized control. This article aims to provide a comprehensive survey of distributed control and communication strategies in NMGs. We provide thorough discussions and elaborate on: 1) Essential merits of MGs and NMGs, and their practical implementations; 2) Distributed communication network characteristics and specific operation objectives of NMGs; 3) Classifications of distributed control strategies in NMGs and their salient features; 4) Communication reliability issues concerning data timeliness, data availability, and data accuracy, and the development of countermeasures; 5) Advancements in several promising communication and computation technologies and their potential applications in NMGs.

Journal ArticleDOI
TL;DR: Evaluation results show that DDQN-VNFPA can get improved network performance in terms of the reject number and reject ratio of Service Function Chain Requests, throughput, end-to-end delay, VNFI running time and load balancing compared with the algorithms in existing literatures.
Abstract: The emerging paradigm - Software-Defined Networking (SDN) and Network Function Virtualization (NFV) - makes it feasible and scalable to run Virtual Network Functions (VNFs) in commercial-off-the-shelf devices, which provides a variety of network services with reduced cost. Benefitting from centralized network management, lots of information about network devices, traffic and resources can be collected in SDN/NFV-enabled networks. Using powerful machine learning tools, algorithms can be designed in a customized way according to the collected information to efficiently optimize network performance. In this paper, we study the VNF placement problem in SDN/NFV-enabled networks, which is naturally formulated as a Binary Integer Programming (BIP) problem. Using deep reinforcement learning, we propose a Double Deep Q Network-based VNF Placement Algorithm (DDQN-VNFPA). Specifically, DDQN determines the optimal solution from a prohibitively large solution space and DDQN-VNFPA then places/releases VNF Instances (VNFIs) following a threshold-based policy. We evaluate DDQN-VNFPA with trace-driven simulations on a real-world network topology. Evaluation results show that DDQN-VNFPA can get improved network performance in terms of the reject number and reject ratio of Service Function Chain Requests (SFCRs), throughput, end-to-end delay, VNFI running time and load balancing compared with the algorithms in existing literatures.

Proceedings ArticleDOI
30 May 2020
TL;DR: RecNMP as mentioned in this paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate personalized recommendation inference, which is specifically tailored to production environments with heavy co-location of operators on a single server.
Abstract: Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate personalized recommendation inference. The in-depth characterization of production-grade recommendation models shows that embedding operations with high model-, operator- and data-level parallelism lead to memory bandwidth saturation, limiting recommendation inference performance. We propose RecNMP which provides a scalable solution to improve system throughput, supporting a broad range of sparse embedding models. RecNMP is specifically tailored to production environments with heavy co-location of operators on a single server. Several hardware/software co-optimization techniques such as memory-side caching, table-aware packet scheduling, and hot entry profiling are studied, providing up to $9.8 \times$ memory latency speedup over a highly-optimized baseline. Overall, RecNMP offers $4.2 \times$ throughput improvement and 45.8% memory energy savings.

Journal ArticleDOI
TL;DR: An improved algorithm based on Two_Arch2 is proposed to improve the scalability and decentralization while reducing the latency and cost of the blockchain.
Abstract: The Industrial Internet of Things (IIoT) has developed rapidly in recent years. Private blockchains with decentralization, flexible rules, and good privacy protection can be applied in the IIoT to process the massive data and tackle the security problem. However, the scalability of blockchain places a restriction on IIoT. Accordingly, this article proposes an improved algorithm based on Two_Arch2 to improve the scalability and decentralization while reducing the latency and cost of the blockchain. By integrating the private blockchain theory to IIoT and simultaneously considering the above four objectives, a many-objective blockchain-enabled IIoT model is constructed. Then an improved Two_Arch2 algorithm is utilized to solve the model. Experimental results show that the improved algorithm can effectively optimize four indicators of the model.

Journal ArticleDOI
01 Aug 2020
TL;DR: This work designs a Raftbased HTAP database, TiDB, and builds an SQL engine to process large-scale distributed transactions and expensive analytical queries and includes a powerful analysis engine, TiSpark, to help TiDB connect to the Hadoop ecosystem.
Abstract: Hybrid Transactional and Analytical Processing (HTAP) databases require processing transactional and analytical queries in isolation to remove the interference between them. To achieve this, it is necessary to maintain different replicas of data specified for the two types of queries. However, it is challenging to provide a consistent view for distributed replicas within a storage system, where analytical requests can efficiently read consistent and fresh data from transactional workloads at scale and with high availability.To meet this challenge, we propose extending replicated state machine-based consensus algorithms to provide consistent replicas for HTAP workloads. Based on this novel idea, we present a Raft-based HTAP database: TiDB. In the database, we design a multi-Raft storage system which consists of a row store and a column store. The row store is built based on the Raft algorithm. It is scalable to materialize updates from transactional requests with high availability. In particular, it asynchronously replicates Raft logs to learners which transform row format to column format for tuples, forming a real-time updatable column store. This column store allows analytical queries to efficiently read fresh and consistent data with strong isolation from transactions on the row store. Based on this storage system, we build an SQL engine to process large-scale distributed transactions and expensive analytical queries. The SQL engine optimally accesses row-format and column-format replicas of data. We also include a powerful analysis engine, TiSpark, to help TiDB connect to the Hadoop ecosystem. Comprehensive experiments show that TiDB achieves isolated high performance under CH-benCHmark, a benchmark focusing on HTAP workloads.


Journal ArticleDOI
TL;DR: A novel artificial intelligence algorithm, called deep Q-learning task scheduling (DQTS), that combines the advantages of the Q- learning algorithm and a deep neural network is proposed, aimed at solving the problem of handling directed acyclic graph tasks in a cloud computing environment.

Journal ArticleDOI
TL;DR: A novel lightweight proof of block and trade (PoBT) consensus algorithm for IoT blockchain and its integration framework is proposed that allows the validation of trades as well as blocks with reduced computation time and a ledger distribution mechanism to decrease the memory requirements of IoT nodes.
Abstract: Efficient and smart business processes are heavily dependent on the Internet of Things (IoT) networks, where end-to-end optimization is critical to the success of the whole ecosystem. These systems, including industrial, healthcare, and others, are large scale complex networks of heterogeneous devices. This introduces many security and access control challenges. Blockchain has emerged as an effective solution for addressing several such challenges. However, the basic algorithms used in the business blockchain are not feasible for large scale IoT systems. To make them scalable for IoT, the complex consensus-based security has to be downgraded. In this article, we propose a novel lightweight proof of block and trade (PoBT) consensus algorithm for IoT blockchain and its integration framework. This solution allows the validation of trades as well as blocks with reduced computation time. Also, we present a ledger distribution mechanism to decrease the memory requirements of IoT nodes. The analysis and evaluation of security aspects, computation time, memory, and bandwidth requirements show significant improvement in the performance of the overall system.