scispace - formally typeset
Search or ask a question

Showing papers on "Scalability published in 2018"


Proceedings Article
24 Jun 2018
TL;DR: The proposed algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques.
Abstract: This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, our method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. Extensive experiments on CIFAR-10, ImageNet, Penn Treebank and WikiText-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques. Our implementation has been made publicly available to facilitate further research on efficient architecture search algorithms.

2,466 citations


Journal ArticleDOI
TL;DR: SOAPnuke is demonstrated as a tool with abundant functions for a “QC-Preprocess-QC” workflow and MapReduce acceleration framework that enables large scalability to distribute all the processing works to an entire compute cluster.
Abstract: Quality control (QC) and preprocessing are essential steps for sequencing data analysis to ensure the accuracy of results. However, existing tools cannot provide a satisfying solution with integrated comprehensive functions, proper architectures, and highly scalable acceleration. In this article, we demonstrate SOAPnuke as a tool with abundant functions for a "QC-Preprocess-QC" workflow and MapReduce acceleration framework. Four modules with different preprocessing functions are designed for processing datasets from genomic, small RNA, Digital Gene Expression, and metagenomic experiments, respectively. As a workflow-like tool, SOAPnuke centralizes processing functions into 1 executable and predefines their order to avoid the necessity of reformatting different files when switching tools. Furthermore, the MapReduce framework enables large scalability to distribute all the processing works to an entire compute cluster.We conducted a benchmarking where SOAPnuke and other tools are used to preprocess a ∼30× NA12878 dataset published by GIAB. The standalone operation of SOAPnuke struck a balance between resource occupancy and performance. When accelerated on 16 working nodes with MapReduce, SOAPnuke achieved ∼5.7 times the fastest speed of other tools.

1,043 citations


Proceedings Article
03 Dec 2018
TL;DR: A novel $\gamma$-decaying heuristic theory is developed that unifies a wide range of heuristics in a single framework, and proves that all these heuristic can be well approximated from local subgraphs.
Abstract: Link prediction is a key problem for network-structured data. Link prediction heuristics use some score functions, such as common neighbors and Katz index, to measure the likelihood of links. They have obtained wide practical uses due to their simplicity, interpretability, and for some of them, scalability. However, every heuristic has a strong assumption on when two nodes are likely to link, which limits their effectiveness on networks where these assumptions fail. In this regard, a more reasonable way should be learning a suitable heuristic from a given network instead of using predefined ones. By extracting a local subgraph around each target link, we aim to learn a function mapping the subgraph patterns to link existence, thus automatically learning a "heuristic" that suits the current network. In this paper, we study this heuristic learning paradigm for link prediction. First, we develop a novel γ-decaying heuristic theory. The theory unifies a wide range of heuristics in a single framework, and proves that all these heuristics can be well approximated from local subgraphs. Our results show that local subgraphs reserve rich information related to link existence. Second, based on the γ-decaying theory, we propose a new method to learn heuristics from local subgraphs using a graph neural network (GNN). Its experimental results show unprecedented performance, working consistently well on a wide range of problems.

980 citations


Journal ArticleDOI
TL;DR: The effectiveness of using machine learning for model-free prediction of spatiotemporally chaotic systems of arbitrarily large spatial extent and attractor dimension purely from observations of the system's past evolution is demonstrated.
Abstract: We demonstrate the effectiveness of using machine learning for model-free prediction of spatiotemporally chaotic systems of arbitrarily large spatial extent and attractor dimension purely from observations of the system's past evolution. We present a parallel scheme with an example implementation based on the reservoir computing paradigm and demonstrate the scalability of our scheme using the Kuramoto-Sivashinsky equation as an example of a spatiotemporally chaotic system.

916 citations


Proceedings ArticleDOI
20 May 2018
TL;DR: OmniLedger ensures security and correctness by using a bias-resistant public-randomness protocol for choosing large, statistically representative shards that process transactions, and by introducing an efficient cross-shard commit protocol that atomically handles transactions affecting multiple shards.
Abstract: Designing a secure permissionless distributed ledger (blockchain) that performs on par with centralized payment processors, such as Visa, is a challenging task. Most existing distributed ledgers are unable to scale-out, i.e., to grow their total processing capacity with the number of validators; and those that do, compromise security or decentralization. We present OmniLedger, a novel scale-out distributed ledger that preserves longterm security under permissionless operation. It ensures security and correctness by using a bias-resistant public-randomness protocol for choosing large, statistically representative shards that process transactions, and by introducing an efficient cross-shard commit protocol that atomically handles transactions affecting multiple shards. OmniLedger also optimizes performance via parallel intra-shard transaction processing, ledger pruning via collectively-signed state blocks, and low-latency "trust-but-verify" validation for low-value transactions. An evaluation of our experimental prototype shows that OmniLedger’s throughput scales linearly in the number of active validators, supporting Visa-level workloads and beyond, while confirming typical transactions in under two seconds.

856 citations


Proceedings ArticleDOI
15 Oct 2018
TL;DR: RapidChain is proposed, the first sharding-based public blockchain protocol that is resilient to Byzantine faults from up to a 1/3 fraction of its participants, and achieves complete sharding of the communication, computation, and storage overhead of processing transactions without assuming any trusted setup.
Abstract: A major approach to overcoming the performance and scalability limitations of current blockchain protocols is to use sharding which is to split the overheads of processing transactions among multiple, smaller groups of nodes. These groups work in parallel to maximize performance while requiring significantly smaller communication, computation, and storage per node, allowing the system to scale to large networks. However, existing sharding-based blockchain protocols still require a linear amount of communication (in the number of participants) per transaction, and hence, attain only partially the potential benefits of sharding. We show that this introduces a major bottleneck to the throughput and latency of these protocols. Aside from the limited scalability, these protocols achieve weak security guarantees due to either a small fault resiliency (e.g., 1/8 and 1/4) or high failure probability, or they rely on strong assumptions (e.g., trusted setup) that limit their applicability to mainstream payment systems. We propose RapidChain, the first sharding-based public blockchain protocol that is resilient to Byzantine faults from up to a 1/3 fraction of its participants, and achieves complete sharding of the communication, computation, and storage overhead of processing transactions without assuming any trusted setup. RapidChain employs an optimal intra-committee consensus algorithm that can achieve very high throughputs via block pipelining, a novel gossiping protocol for large blocks, and a provably-secure reconfiguration mechanism to ensure robustness. Using an efficient cross-shard transaction verification technique, our protocol avoids gossiping transactions to the entire network. Our empirical evaluations suggest that RapidChain can process (and confirm) more than 7,300 tx/sec with an expected confirmation latency of roughly 8.7 seconds in a network of 4,000 nodes with an overwhelming time-to-failure of more than 4,500 years.

695 citations


Journal ArticleDOI
TL;DR: Fog computing is not a substitute for cloud computing but a powerful complement as discussed by the authors, which enables processing at the edge while still offering the possibility to interact with the cloud. But it still faces several challenges, such as the distance between the cloud and the end devices.
Abstract: Cloud computing with its three key facets (i.e., Infrastructure-as-a-Service, Platform-as-a-Service, and Software-as-a-Service) and its inherent advantages (e.g., elasticity and scalability) still faces several challenges. The distance between the cloud and the end devices might be an issue for latency-sensitive applications such as disaster management and content delivery applications. Service level agreements (SLAs) may also impose processing at locations where the cloud provider does not have data centers. Fog computing is a novel paradigm to address such issues. It enables provisioning resources and services outside the cloud, at the edge of the network, closer to end devices, or eventually, at locations stipulated by SLAs. Fog computing is not a substitute for cloud computing but a powerful complement. It enables processing at the edge while still offering the possibility to interact with the cloud. This paper presents a comprehensive survey on fog computing. It critically reviews the state of the art in the light of a concise set of evaluation criteria. We cover both the architectures and the algorithms that make fog systems. Challenges and research directions are also introduced. In addition, the lessons learned are reviewed and the prospects are discussed in terms of the key role fog is likely to play in emerging technologies such as tactile Internet.

598 citations


Journal ArticleDOI
TL;DR: An iterative cluster Primal Dual Splitting algorithm for solving the large-scale sSVM problem in a decentralized fashion, which extracts important features discovered by the algorithm that are predictive of future hospitalizations, thus providing a way to interpret the classification results and inform prevention efforts.

577 citations


Journal ArticleDOI
TL;DR: The results of the evaluation show that performance is improved by reducing the induced delay, reducing the response time, increasing throughput, and the ability to detect real-time attacks in the IoT network with low performance overheads.
Abstract: The recent expansion of the Internet of Things (IoT) and the consequent explosion in the volume of data produced by smart devices have led to the outsourcing of data to designated data centers However, to manage these huge data stores, centralized data centers, such as cloud storage cannot afford auspicious way There are many challenges that must be addressed in the traditional network architecture due to the rapid growth in the diversity and number of devices connected to the internet, which is not designed to provide high availability, real-time data delivery, scalability, security, resilience, and low latency To address these issues, this paper proposes a novel blockchain-based distributed cloud architecture with a software defined networking (SDN) enable controller fog nodes at the edge of the network to meet the required design principles The proposed model is a distributed cloud architecture based on blockchain technology, which provides low-cost, secure, and on-demand access to the most competitive computing infrastructures in an IoT network By creating a distributed cloud infrastructure, the proposed model enables cost-effective high-performance computing Furthermore, to bring computing resources to the edge of the IoT network and allow low latency access to large amounts of data in a secure manner, we provide a secure distributed fog node architecture that uses SDN and blockchain techniques Fog nodes are distributed fog computing entities that allow the deployment of fog services, and are formed by multiple computing resources at the edge of the IoT network We evaluated the performance of our proposed architecture and compared it with the existing models using various performance measures The results of our evaluation show that performance is improved by reducing the induced delay, reducing the response time, increasing throughput, and the ability to detect real-time attacks in the IoT network with low performance overheads

549 citations


Journal ArticleDOI
TL;DR: A comprehensive top down survey of the most recent proposed security and privacy solutions in IoT in terms of flexibility and scalability and a general classification of existing solutions is given.

432 citations


Journal ArticleDOI
TL;DR: In this article, a Deep Reinforcement Learning-based Online Offloading (DROO) framework is proposed to optimize task offloading decisions and wireless resource allocation to the time-varying wireless channel conditions.
Abstract: Wireless powered mobile-edge computing (MEC) has recently emerged as a promising paradigm to enhance the data processing capability of low-power networks, such as wireless sensor networks and internet of things (IoT). In this paper, we consider a wireless powered MEC network that adopts a binary offloading policy, so that each computation task of wireless devices (WDs) is either executed locally or fully offloaded to an MEC server. Our goal is to acquire an online algorithm that optimally adapts task offloading decisions and wireless resource allocations to the time-varying wireless channel conditions. This requires quickly solving hard combinatorial optimization problems within the channel coherence time, which is hardly achievable with conventional numerical optimization methods. To tackle this problem, we propose a Deep Reinforcement learning-based Online Offloading (DROO) framework that implements a deep neural network as a scalable solution that learns the binary offloading decisions from the experience. It eliminates the need of solving combinatorial optimization problems, and thus greatly reduces the computational complexity especially in large-size networks. To further reduce the complexity, we propose an adaptive procedure that automatically adjusts the parameters of the DROO algorithm on the fly. Numerical results show that the proposed algorithm can achieve near-optimal performance while significantly decreasing the computation time by more than an order of magnitude compared with existing optimization methods. For example, the CPU execution latency of DROO is less than $0.1$ second in a $30$-user network, making real-time and optimal offloading truly viable even in a fast fading environment.

Proceedings Article
09 Apr 2018
TL;DR: The design of AccelNet is presented, including the hardware/software codesign model, performance results on key workloads, and experiences and lessons learned from developing and deploying Accel net on FPGA-based Azure SmartNICs.
Abstract: Modern cloud architectures rely on each server running its own networking stack to implement policies such as tunneling for virtual networks, security, and load balancing. However, these networking stacks are becoming increasingly complex as features are added and as network speeds increase. Running these stacks on CPU cores takes away processing power from VMs, increasing the cost of running cloud services, and adding latency and variability to network performance. We present Azure Accelerated Networking (AccelNet), our solution for offloading host networking to hardware, using custom Azure SmartNICs based on FPGAs. We define the goals of AccelNet, including programmability comparable to software, and performance and efficiency comparable to hardware. We show that FPGAs are the best current platform for offloading our networking stack as ASICs do not provide sufficient programmability, and embedded CPU cores do not provide scalable performance, especially on single network flows. Azure SmartNICs implementing AccelNet have been deployed on all new Azure servers since late 2015 in a fleet of >1M hosts. The AccelNet service has been available for Azure customers since 2016, providing consistent <15µs VM-VM TCP latencies and 32Gbps throughput, which we believe represents the fastest network available to customers in the public cloud. We present the design of AccelNet, including our hardware/software codesign model, performance results on key workloads, and experiences and lessons learned from developing and deploying AccelNet on FPGA-based Azure SmartNICs.

Proceedings ArticleDOI
20 May 2018
TL;DR: Angora as discussed by the authors is a new mutation-based fuzzer that outperforms the state-of-the-art fuzzers by a wide margin, which aims to increase branch coverage by solving path constraints without symbolic execution.
Abstract: Fuzzing is a popular technique for finding software bugs. However, the performance of the state-of-the-art fuzzers leaves a lot to be desired. Fuzzers based on symbolic execution produce quality inputs but run slow, while fuzzers based on random mutation run fast but have difficulty producing quality inputs. We propose Angora, a new mutation-based fuzzer that outperforms the state-of-the-art fuzzers by a wide margin. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution. To solve path constraints efficiently, we introduce several key techniques: scalable byte-level taint tracking, context-sensitive branch count, search based on gradient descent, and input length exploration. On the LAVA-M data set, Angora found almost all the injected bugs, found more bugs than any other fuzzer that we compared with, and found eight times as many bugs as the second-best fuzzer in the program who. Angora also found 103 bugs that the LAVA authors injected but could not trigger. We also tested Angora on eight popular, mature open source programs. Angora found 6, 52, 29, 40 and 48 new bugs in file, jhead, nm, objdump and size, respectively. We measured the coverage of Angora and evaluated how its key techniques contribute to its impressive performance.

Proceedings ArticleDOI
01 Jun 2018
TL;DR: MorphNet iteratively shrinks and expands a network, shrinking via a resource-weighted sparsifying regularizer on activations and expanding via a uniform multiplicative factor on all layers, which is scalable to large networks, adaptable to specific resource constraints, and capable of increasing the network's performance.
Abstract: We present MorphNet, an approach to automate the design of neural network structures. MorphNet iteratively shrinks and expands a network, shrinking via a resource-weighted sparsifying regularizer on activations and expanding via a uniform multiplicative factor on all layers. In contrast to previous approaches, our method is scalable to large networks, adaptable to specific resource constraints (e.g. the number of floating-point operations per inference), and capable of increasing the network's performance. When applied to standard network architectures on a wide variety of datasets, our approach discovers novel structures in each domain, obtaining higher performance while respecting the resource constraint.

Journal ArticleDOI
TL;DR: This paper presents a detailed analysis of Colaboratory regarding hardware resources, performance, and limitations and shows that the performance reached using this cloud service is equivalent to the performance of the dedicated testbeds, given similar resources.
Abstract: Google Colaboratory (also known as Colab) is a cloud service based on Jupyter Notebooks for disseminating machine learning education and research. It provides a runtime fully configured for deep learning and free-of-charge access to a robust GPU. This paper presents a detailed analysis of Colaboratory regarding hardware resources, performance, and limitations. This analysis is performed through the use of Colaboratory for accelerating deep learning for computer vision and other GPU-centric applications. The chosen test-cases are a parallel tree-based combinatorial search and two computer vision applications: object detection/classification and object localization/segmentation. The hardware under the accelerated runtime is compared with a mainstream workstation and a robust Linux server equipped with 20 physical cores. Results show that the performance reached using this cloud service is equivalent to the performance of the dedicated testbeds, given similar resources. Thus, this service can be effectively exploited to accelerate not only deep learning but also other classes of GPU-centric applications. For instance, it is faster to train a CNN on Colaboratory’s accelerated runtime than using 20 physical cores of a Linux server. The performance of the GPU made available by Colaboratory may be enough for several profiles of researchers and students. However, these free-of-charge hardware resources are far from enough to solve demanding real-world problems and are not scalable. The most significant limitation found is the lack of CPU cores. Finally, several strengths and limitations of this cloud service are discussed, which might be useful for helping potential users.

Proceedings ArticleDOI
07 Aug 2018
TL;DR: Chameleon is a controller that dynamically picks the best configurations for existing NN-based video analytics pipelines, demonstrating that compared to a baseline that picks a single optimal configuration offline, Chameleon can achieve 20-50% higher accuracy with the same amount of resources, or achieve the same accuracy with only 30--50% of the resources.
Abstract: Applying deep convolutional neural networks (NN) to video data at scale poses a substantial systems challenge, as improving inference accuracy often requires a prohibitive cost in computational resources. While it is promising to balance resource and accuracy by selecting a suitable NN configuration (e.g., the resolution and frame rate of the input video), one must also address the significant dynamics of the NN configuration's impact on video analytics accuracy. We present Chameleon, a controller that dynamically picks the best configurations for existing NN-based video analytics pipelines. The key challenge in Chameleon is that in theory, adapting configurations frequently can reduce resource consumption with little degradation in accuracy, but searching a large space of configurations periodically incurs an overwhelming resource overhead that negates the gains of adaptation. The insight behind Chameleon is that the underlying characteristics (e.g., the velocity and sizes of objects) that affect the best configuration have enough temporal and spatial correlation to allow the search cost to be amortized over time and across multiple video feeds. For example, using the video feeds of five traffic cameras, we demonstrate that compared to a baseline that picks a single optimal configuration offline, Chameleon can achieve 20-50% higher accuracy with the same amount of resources, or achieve the same accuracy with only 30--50% of the resources (a 2-3X speedup).

Journal ArticleDOI
TL;DR: DeepThings is proposed, a framework for adaptively distributed execution of CNN-based inference applications on tightly resource-constrained IoT edge clusters that employs a scalable Fused Tile Partitioning of convolutional layers to minimize memory footprint while exposing parallelism.
Abstract: Edge computing has emerged as a trend to improve scalability, overhead, and privacy by processing large-scale data, e.g., in deep learning applications locally at the source. In IoT networks, edge devices are characterized by tight resource constraints and often dynamic nature of data sources, where existing approaches for deploying Deep/Convolutional Neural Networks (DNNs/CNNs) can only meet IoT constraints when severely reducing accuracy or using a static distribution that cannot adapt to dynamic IoT environments. In this paper, we propose DeepThings, a framework for adaptively distributed execution of CNN-based inference applications on tightly resource-constrained IoT edge clusters. DeepThings employs a scalable Fused Tile Partitioning (FTP) of convolutional layers to minimize memory footprint while exposing parallelism. It further realizes a distributed work stealing approach to enable dynamic workload distribution and balancing at inference runtime. Finally, we employ a novel work scheduling process to improve data reuse and reduce overall execution latency. Results show that our proposed FTP method can reduce memory footprint by more than 68% without sacrificing accuracy. Furthermore, compared to existing work sharing methods, our distributed work stealing and work scheduling improve throughput by $1.7\times -2.2\times$ with multiple dynamic data sources. When combined, DeepThings provides scalable CNN inference speedups of $1.7\times$ – $3.5\times$ on 2–6 edge devices with less than 23 MB memory each.

Journal ArticleDOI
TL;DR: The range of services distributed ES systems can provide, and the control challenges they introduce are reviewed, and multi-agent control with agents satisfying Wooldridge’s definition of intelligence is proposed as a promising direction for future research.
Abstract: This paper presents an overview of the state of the art control strategies specifically designed to coordinate distributed energy storage (ES) systems in microgrids. Power networks are undergoing a transition from the traditional model of centralised generation towards a smart decentralised network of renewable sources and ES systems, organised into autonomous microgrids. ES systems can provide a range of services, particularly when distributed throughout the power network. The introduction of distributed ES represents a fundamental change for power networks, increasing the network control problem dimensionality and adding long time-scale dynamics associated with the storage systems’ state of charge levels. Managing microgrids with many small distributed ES systems requires new scalable control strategies that are robust to power network and communication network disturbances. This paper reviews the range of services distributed ES systems can provide, and the control challenges they introduce. The focus of this paper is a presentation of the latest decentralised, centralised and distributed multi-agent control strategies designed to coordinate distributed microgrid ES systems. Finally, multi-agent control with agents satisfying Wooldridge’s definition of intelligence is proposed as a promising direction for future research.

Journal ArticleDOI
24 Oct 2018
TL;DR: MadMax is presented: a static program analysis technique to automatically detect gas-focused vulnerabilities with very high confidence and achieves high precision and scalability.
Abstract: Ethereum is a distributed blockchain platform, serving as an ecosystem for smart contracts: full-fledged inter-communicating programs that capture the transaction logic of an account. Unlike programs in mainstream languages, a gas limit restricts the execution of an Ethereum smart contract: execution proceeds as long as gas is available. Thus, gas is a valuable resource that can be manipulated by an attacker to provoke unwanted behavior in a victim's smart contract (e.g., wasting or blocking funds of said victim). Gas-focused vulnerabilities exploit undesired behavior when a contract (directly or through other interacting contracts) runs out of gas. Such vulnerabilities are among the hardest for programmers to protect against, as out-of-gas behavior may be uncommon in non-attack scenarios and reasoning about it is far from trivial. In this paper, we classify and identify gas-focused vulnerabilities, and present MadMax: a static program analysis technique to automatically detect gas-focused vulnerabilities with very high confidence. Our approach combines a control-flow-analysis-based decompiler and declarative program-structure queries. The combined analysis captures high-level domain-specific concepts (such as "dynamic data structure storage" and "safely resumable loops") and achieves high precision and scalability. MadMax analyzes the entirety of smart contracts in the Ethereum blockchain in just 10 hours (with decompilation timeouts in 8% of the cases) and flags contracts with a (highly volatile) monetary value of over $2.8B as vulnerable. Manual inspection of a sample of flagged contracts shows that 81% of the sampled warnings do indeed lead to vulnerabilities, which we report on in our experiment.

Journal ArticleDOI
TL;DR: An SDN-based edge-cloud interplay is presented to handle streaming big data in IIoT environment, wherein SDN provides an efficient middleware support and a multi-objective evolutionary algorithm using Tchebycheff decomposition for flow scheduling and routing in SDN is presented.
Abstract: The emergence of the Industrial Internet of Things (IIoT) has paved the way to real-time big data storage, access, and processing in the cloud environment. In IIoT, the big data generated by various devices such as-smartphones, wireless body sensors, and smart meters will be on the order of zettabytes in the near future. Hence, relaying this huge amount of data to the remote cloud platform for further processing can lead to severe network congestion. This in turn will result in latency issues which affect the overall QoS for various applications in IIoT. To cope with these challenges, a recent paradigm shift in computing, popularly known as edge computing, has emerged. Edge computing can be viewed as a complement to cloud computing rather than as a competition. The cooperation and interplay among cloud and edge devices can help to reduce energy consumption in addition to maintaining the QoS for various applications in the IIoT environment. However, a large number of migrations among edge devices and cloud servers leads to congestion in the underlying networks. Hence, to handle this problem, SDN, a recent programmable and scalable network paradigm, has emerged as a viable solution. Keeping focus on all the aforementioned issues, in this article, an SDN-based edge-cloud interplay is presented to handle streaming big data in IIoT environment, wherein SDN provides an efficient middleware support. In the proposed solution, a multi-objective evolutionary algorithm using Tchebycheff decomposition for flow scheduling and routing in SDN is presented. The proposed scheme is evaluated with respect to two optimization objectives, that is, the trade-off between energy efficiency and latency, and the trade-off between energy efficiency and bandwidth. The results obtained prove the effectiveness of the proposed flow scheduling scheme in the IIoT environment.

Journal ArticleDOI
TL;DR: This paper proposes a novel hybrid network architecture for the smart city by leveraging the strength of emerging Software Defined Networking and blockchain technologies and proposes a Proof-of-Work scheme in the model to ensure security and privacy.

Posted Content
TL;DR: A more comprehensive search space of parallelization strategies for DNNs called SOAP, which includes strategies to parallelize a DNN in the Sample, Operation, Attribute, and Parameter dimensions is defined and FlexFlow, a deep learning framework that uses guided randomized search of the SOAP space to find a fast parallelization strategy for a specific parallel machine is proposed.
Abstract: The computational requirements for training deep neural networks (DNNs) have grown to the point that it is now standard practice to parallelize training. Existing deep learning systems commonly use data or model parallelism, but unfortunately, these strategies often result in suboptimal parallelization performance. In this paper, we define a more comprehensive search space of parallelization strategies for DNNs called SOAP, which includes strategies to parallelize a DNN in the Sample, Operation, Attribute, and Parameter dimensions. We also propose FlexFlow, a deep learning framework that uses guided randomized search of the SOAP space to find a fast parallelization strategy for a specific parallel machine. To accelerate this search, FlexFlow introduces a novel execution simulator that can accurately predict a parallelization strategy's performance and is three orders of magnitude faster than prior approaches that have to execute each strategy. We evaluate FlexFlow with six real-world DNN benchmarks on two GPU clusters and show that FlexFlow can increase training throughput by up to 3.8x over state-of-the-art approaches, even when including its search time, and also improves scalability.

Journal ArticleDOI
TL;DR: A novel distributed deep learning scheme of cyber-attack detection in fog-to-things computing is proposed and experiments show that deep models are superior to shallow models in detection accuracy, false alarm rate, and scalability.
Abstract: The increase in the number and diversity of smart objects has raised substantial cybersecurity challenges due to the recent exponential rise in the occurrence and sophistication of attacks Although cloud computing has transformed the world of business in a dramatic way, its centralization hammers the application of distributed services such as security mechanisms for IoT applications The new and emerging IoT applications require novel cybersecurity controls, models, and decisions distributed at the edge of the network Despite the success of the existing cryptographic solutions in the traditional Internet, factors such as system development flaws, increased attack surfaces, and hacking skills have proven the inevitability of detection mechanisms The traditional approaches such as classical machine-learning-based attack detection mechanisms have been successful in the last decades, but it has already been proven that they have low accuracy and less scalability for cyber-attack detection in massively distributed nodes such as IoT The proliferation of deep learning and hardware technology advancement could pave a way to detecting the current level of sophistication of cyber-attacks in edge networks The application of deep networks has already been successful in big data areas, and this indicates that fog-tothings computing can be the ultimate beneficiary of the approach for attack detection because a massive amount of data produced by IoT devices enable deep models to learn better than shallow algorithms In this article, we propose a novel distributed deep learning scheme of cyber-attack detection in fog-to-things computing Our experiments show that deep models are superior to shallow models in detection accuracy, false alarm rate, and scalability

Journal ArticleDOI
TL;DR: In this article, the authors study the mobile edge service performance optimization problem under long-term cost budget constraint, and apply Lyapunov optimization to decompose the problem into a series of real-time optimization problems which do not require a priori knowledge such as user mobility.
Abstract: Mobile edge computing is a new computing paradigm, which pushes cloud computing capabilities away from the centralized cloud to the network edge. However, with the sinking of computing capabilities, the new challenge incurred by user mobility arises: since end users typically move erratically, the services should be dynamically migrated among multiple edges to maintain the service performance, i.e., user-perceived latency. Tackling this problem is non-trivial since frequent service migration would greatly increase the operational cost. To address this challenge in terms of the performance-cost tradeoff, in this paper, we study the mobile edge service performance optimization problem under long-term cost budget constraint. To address user mobility which is typically unpredictable, we apply Lyapunov optimization to decompose the long-term optimization problem into a series of real-time optimization problems which do not require a priori knowledge such as user mobility. As the decomposed problem is NP-hard, we first design an approximation algorithm based on Markov approximation to seek a near-optimal solution. To make our solution scalable and amenable to future fifth-generation application scenario with large-scale user devices, we further propose a distributed approximation scheme with greatly reduced time complexity, based on the technique of the best response update. Rigorous theoretical analysis and extensive evaluations demonstrate the efficacy of the proposed centralized and distributed schemes.

Journal ArticleDOI
TL;DR: In this paper, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown, providing a simple, yet practical asynchronous caching approach.
Abstract: Small basestations (SBs) equipped with caching units have potential to handle the unprecedented demand growth in heterogeneous networks. Through low-rate, backhaul connections with the backbone, SBs can prefetch popular files during off-peak traffic hours, and service them to the edge at peak periods. To intelligently prefetch, each SB must learn what and when to cache, while taking into account SB memory limitations, the massive number of available contents, the unknown popularity profiles, as well as the space-time popularity dynamics of user file requests. In this paper, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown. Joint consideration of global and local popularity demands along with cache-refreshing costs allow for a simple, yet practical asynchronous caching approach. The novel RL-based caching relies on a Q-learning algorithm to implement the optimal policy in an online fashion, thus, enabling the cache control unit at the SB to learn, track, and possibly adapt to the underlying dynamics. To endow the algorithm with scalability, a linear function approximation of the proposed Q-learning scheme is introduced, offering faster convergence as well as reduced complexity and memory requirements. Numerical tests corroborate the merits of the proposed approach in various realistic settings.

Journal ArticleDOI
TL;DR: An in-depth analysis on the requirements of edge computing from the perspective of three selected use cases that are particularly interesting for harnessing the power of the Internet of Things and the applicability of two LV technologies, containers and unikernels, as platforms for enabling the scalability, security, and manageability required.
Abstract: Lightweight virtualization (LV) technologies have refashioned the world of software development by introducing flexibility and new ways of managing and distributing software. Edge computing complements today's powerful centralized data centers with a large number of distributed nodes that provide virtualization close to the data source and end users. This emerging paradigm offers ubiquitous processing capabilities on a wide range of heterogeneous hardware characterized by different processing power and energy availability. The scope of this article is to present an in-depth analysis on the requirements of edge computing from the perspective of three selected use cases that are particularly interesting for harnessing the power of the Internet of Things. We discuss and compare the applicability of two LV technologies, containers and unikernels, as platforms for enabling the scalability, security, and manageability required by such pervasive applications that soon may be part of our everyday lives. To inspire further research, we identify open problems and highlight future directions to serve as a road map for both industry and academia.

Journal ArticleDOI
TL;DR: This work describes how existing solutions exploit resource elasticity features of cloud computing in stream processing and presents a gap analysis and future directions on stream processing on heterogeneous environments.

Proceedings ArticleDOI
07 Aug 2018
TL;DR: A two-level DRL system, AuTO, mimicking the Peripheral & Central Nervous Systems in animals, is developed, which is an end-to-end automatic TO system that can collect network information, learn from past decisions, and perform actions to achieve operator-defined goals while achieving superior performance.
Abstract: Traffic optimizations (TO, e.g. flow scheduling, load balancing) in datacenters are difficult online decision-making problems. Previously, they are done with heuristics relying on operators' understanding of the workload and environment. Designing and implementing proper TO algorithms thus take at least weeks. Encouraged by recent successes in applying deep reinforcement learning (DRL) techniques to solve complex online control problems, we study if DRL can be used for automatic TO without human-intervention. However, our experiments show that the latency of current DRL systems cannot handle flow-level TO at the scale of current datacenters, because short flows (which constitute the majority of traffic) are usually gone before decisions can be made. Leveraging the long-tail distribution of datacenter traffic, we develop a two-level DRL system, AuTO, mimicking the Peripheral & Central Nervous Systems in animals, to solve the scalability problem. Peripheral Systems (PS) reside on end-hosts, collect flow information, and make TO decisions locally with minimal delay for short flows. PS's decisions are informed by a Central System (CS), where global traffic information is aggregated and processed. CS further makes individual TO decisions for long flows. With CS&PS, AuTO is an end-to-end automatic TO system that can collect network information, learn from past decisions, and perform actions to achieve operator-defined goals. We implement AuTO with popular machine learning frameworks and commodity servers, and deploy it on a 32-server testbed. Compared to existing approaches, AuTO reduces the TO turn-around time from weeks to ~100 milliseconds while achieving superior performance. For example, it demonstrates up to 48.14% reduction in average flow completion time (FCT) over existing solutions.

Proceedings Article
01 Sep 2018
TL;DR: In this paper, an adaptive layer-wise sampling method is proposed to accelerate the training of GCNs through constructing the network layer by layer in a top-down passway, where the sampled neighborhoods are shared by different parent nodes and the over expansion is avoided owing to the fixed-size sampling.
Abstract: Graph Convolutional Networks (GCNs) have become a crucial tool on learning representations of graph vertices. The main challenge of adapting GCNs on large-scale graphs is the scalability issue that it incurs heavy cost both in computation and memory due to the uncontrollable neighborhood expansion across layers. In this paper, we accelerate the training of GCNs through developing an adaptive layer-wise sampling method. By constructing the network layer by layer in a top-down passway, we sample the lower layer conditioned on the top one, where the sampled neighborhoods are shared by different parent nodes and the over expansion is avoided owing to the fixed-size sampling. More importantly, the proposed sampler is adaptive and applicable for explicit variance reduction, which in turn enhances the training of our method. Furthermore, we propose a novel and economical approach to promote the message passing over distant nodes by applying skip connections. Intensive experiments on several benchmarks verify the effectiveness of our method regarding the classification accuracy while enjoying faster convergence speed.

Proceedings ArticleDOI
19 Jul 2018
TL;DR: SUSTain this article extracts factor values from integer datasets as scores that are constrained to take values from a small integer set, which are easy to interpret: a score of zero indicates no feature contribution and higher scores indicate distinct levels of feature importance.
Abstract: This paper presents a new method, which we call SUSTain, that extends real-valued matrix and tensor factorizations to data where values are integers. Such data are common when the values correspond to event counts or ordinal measures. The conventional approach is to treat integer data as real, and then apply real-valued factorizations. However, doing so fails to preserve important characteristics of the original data, thereby making it hard to interpret the results. Instead, our approach extracts factor values from integer datasets as scores that are constrained to take values from a small integer set. These scores are easy to interpret: a score of zero indicates no feature contribution and higher scores indicate distinct levels of feature importance. At its core, SUSTain relies on: a) a problem partitioning into integer-constrained subproblems, so that they can be optimally solved in an efficient manner; and b) organizing the order of the subproblems' solution, to promote reuse of shared intermediate results. We propose two variants, SUSTain_M and SUSTain_T, to handle both matrix and tensor inputs, respectively. We evaluate SUSTain against several state-of-the-art baselines on both synthetic and real Electronic Health Record (EHR) datasets. Comparing to those baselines, SUSTain shows either significantly better fit or orders of magnitude speedups that achieve a comparable fit (up to 425× faster). We apply SUSTain to EHR datasets to extract patient phenotypes (i.e., clinically meaningful patient clusters). Furthermore, 87% of them were validated as clinically meaningful phenotypes related to heart failure by a cardiologist.