scispace - formally typeset
Search or ask a question

Showing papers on "Overhead (computing) published in 2018"


Posted Content
TL;DR: Horovod is an open source library that improves on both obstructions to scaling: it employs efficient inter-GPU communication via ring reduction and requires only a few lines of modification to user code, enabling faster, easier distributed training in TensorFlow.
Abstract: Training modern deep learning models requires large amounts of computation, often provided by GPUs. Scaling computation from one GPU to many can enable much faster training and research progress but entails two complications. First, the training library must support inter-GPU communication. Depending on the particular methods employed, this communication may entail anywhere from negligible to significant overhead. Second, the user must modify his or her training code to take advantage of inter-GPU communication. Depending on the training library's API, the modification required may be either significant or minimal. Existing methods for enabling multi-GPU training under the TensorFlow library entail non-negligible communication overhead and require users to heavily modify their model-building code, leading many researchers to avoid the whole mess and stick with slower single-GPU training. In this paper we introduce Horovod, an open source library that improves on both obstructions to scaling: it employs efficient inter-GPU communication via ring reduction and requires only a few lines of modification to user code, enabling faster, easier distributed training in TensorFlow. Horovod is available under the Apache 2.0 license at this https URL

910 citations


Journal ArticleDOI
TL;DR: In this article, a new data-driven model for automatic modulation classification based on long short term memory (LSTM) is proposed, which learns from the time domain amplitude and phase information of the modulation schemes present in the training data without requiring expert features like higher order cyclic moments.
Abstract: This paper looks into the modulation classification problem for a distributed wireless spectrum sensing network. First, a new data-driven model for automatic modulation classification based on long short term memory (LSTM) is proposed. The model learns from the time domain amplitude and phase information of the modulation schemes present in the training data without requiring expert features like higher order cyclic moments. Analyses show that the proposed model yields an average classification accuracy of close to 90% at varying signal-to-noise ratio conditions ranging from 0 dB to 20 dB. Further, we explore the utility of this LSTM model for a variable symbol rate scenario. We show that a LSTM based model can learn good representations of variable length time domain sequences, which is useful in classifying modulation signals with different symbol rates. The achieved accuracy of 75% on an input sample length of 64 for which it was not trained, substantiates the representation power of the model. To reduce the data communication overhead from distributed sensors, the feasibility of classification using averaged magnitude spectrum data and on-line classification on the low-cost spectrum sensors are studied. Furthermore, quantized realizations of the proposed models are analyzed for deployment on sensors with low processing power.

420 citations


Journal ArticleDOI
TL;DR: This paper proposes a new attribute-based data sharing scheme suitable for resource-limited mobile users in cloud computing and is proven secure against adaptively chosen-ciphertext attacks, which is widely recognized as a standard security notion.

407 citations


Proceedings ArticleDOI
07 Aug 2018
TL;DR: Chameleon is a controller that dynamically picks the best configurations for existing NN-based video analytics pipelines, demonstrating that compared to a baseline that picks a single optimal configuration offline, Chameleon can achieve 20-50% higher accuracy with the same amount of resources, or achieve the same accuracy with only 30--50% of the resources.
Abstract: Applying deep convolutional neural networks (NN) to video data at scale poses a substantial systems challenge, as improving inference accuracy often requires a prohibitive cost in computational resources. While it is promising to balance resource and accuracy by selecting a suitable NN configuration (e.g., the resolution and frame rate of the input video), one must also address the significant dynamics of the NN configuration's impact on video analytics accuracy. We present Chameleon, a controller that dynamically picks the best configurations for existing NN-based video analytics pipelines. The key challenge in Chameleon is that in theory, adapting configurations frequently can reduce resource consumption with little degradation in accuracy, but searching a large space of configurations periodically incurs an overwhelming resource overhead that negates the gains of adaptation. The insight behind Chameleon is that the underlying characteristics (e.g., the velocity and sizes of objects) that affect the best configuration have enough temporal and spatial correlation to allow the search cost to be amortized over time and across multiple video feeds. For example, using the video feeds of five traffic cameras, we demonstrate that compared to a baseline that picks a single optimal configuration offline, Chameleon can achieve 20-50% higher accuracy with the same amount of resources, or achieve the same accuracy with only 30--50% of the resources (a 2-3X speedup).

344 citations


Posted Content
TL;DR: Federated distillation (FD) is proposed, a distributed model training algorithm whose communication payload size is much smaller than a benchmark scheme, federated learning (FL), particularly when the model size is large.
Abstract: On-device machine learning (ML) enables the training process to exploit a massive amount of user-generated private data samples. To enjoy this benefit, inter-device communication overhead should be minimized. With this end, we propose federated distillation (FD), a distributed model training algorithm whose communication payload size is much smaller than a benchmark scheme, federated learning (FL), particularly when the model size is large. Moreover, user-generated data samples are likely to become non-IID across devices, which commonly degrades the performance compared to the case with an IID dataset. To cope with this, we propose federated augmentation (FAug), where each device collectively trains a generative model, and thereby augments its local data towards yielding an IID dataset. Empirical studies demonstrate that FD with FAug yields around 26x less communication overhead while achieving 95-98% test accuracy compared to FL.

322 citations


Posted Content
TL;DR: This paper presents a unified framework called Cooperative SGD that subsumes existing communication-efficient SGD algorithms such as periodic-averaging, elastic-aversaging and decentralized SGD and provides novel convergence guarantees for existing algorithms.
Abstract: Communication-efficient SGD algorithms, which allow nodes to perform local updates and periodically synchronize local models, are highly effective in improving the speed and scalability of distributed SGD. However, a rigorous convergence analysis and comparative study of different communication-reduction strategies remains a largely open problem. This paper presents a unified framework called Cooperative SGD that subsumes existing communication-efficient SGD algorithms such as periodic-averaging, elastic-averaging and decentralized SGD. By analyzing Cooperative SGD, we provide novel convergence guarantees for existing algorithms. Moreover, this framework enables us to design new communication-efficient SGD algorithms that strike the best balance between reducing communication overhead and achieving fast error convergence with low error floor.

306 citations


Proceedings ArticleDOI
29 May 2018
TL;DR: Chameleon as mentioned in this paper is a hybrid mixed protocol for secure function evaluation (SFE) which enables two parties to jointly compute a function without disclosing their private inputs, but does not support signed fixed-point numbers.
Abstract: We present Chameleon, a novel hybrid (mixed-protocol) framework for secure function evaluation (SFE) which enables two parties to jointly compute a function without disclosing their private inputs. Chameleon combines the best aspects of generic SFE protocols with the ones that are based upon additive secret sharing. In particular, the framework performs linear operations in the ring $\mathbbZ _2^l $ using additively secret shared values and nonlinear operations using Yao's Garbled Circuits or the Goldreich-Micali-Wigderson protocol. Chameleon departs from the common assumption of additive or linear secret sharing models where three or more parties need to communicate in the online phase: the framework allows two parties with private inputs to communicate in the online phase under the assumption of a third node generating correlated randomness in an offline phase. Almost all of the heavy cryptographic operations are precomputed in an offline phase which substantially reduces the communication overhead. Chameleon is both scalable and significantly more efficient than the ABY framework (NDSS'15) it is based on. Our framework supports signed fixed-point numbers. In particular, Chameleon's vector dot product of signed fixed-point numbers improves the efficiency of mining and classification of encrypted data for algorithms based upon heavy matrix multiplications. Our evaluation of Chameleon on a 5 layer convolutional deep neural network shows 133x and 4.2x faster executions than Microsoft CryptoNets (ICML'16) and MiniONN (CCS'17), respectively.

258 citations


Journal ArticleDOI
TL;DR: A permissioned blockchain framework among the various elements involved to manage the collected vehicle-related data enables trustless, traceable, and privacy-aware post-accident analysis with minimal storage and processing overhead.
Abstract: Today's vehicles are becoming cyber-physical systems that not only communicate with other vehicles but also gather various information from hundreds of sensors within them These developments help create smart and connected (eg, self-driving) vehicles that will introduce significant information to drivers, manufacturers, insurance companies, and maintenance service providers for various applications One such application that is becoming crucial with the introduction of self-driving cars is forensic analysis of traffic accidents The utilization of vehicle-related data can be instrumental in post-accident scenarios to discover the faulty party, particularly for self-driving vehicles With the opportunity of being able to access various information in cars, we propose a permissioned blockchain framework among the various elements involved to manage the collected vehicle-related data Specifically, we first integrate vehicular public key infrastructure (VPKI) to the proposed blockchain to provide membership establishment and privacy Next, we design a fragmented ledger that will store detailed data related to vehicles such as maintenance information/ history, car diagnosis reports, and so on The proposed forensic framework enables trustless, traceable, and privacy-aware post-accident analysis with minimal storage and processing overhead

238 citations


Proceedings ArticleDOI
08 Jul 2018
TL;DR: The paper demonstrates that although the bidirectional approach adds overhead to each epoch and increases processing time, it proves to be a better progressive model over time.
Abstract: The recent growth of the Internet of Things (IoT) has resulted in a rise in IoT based DDoS attacks. This paper presents a solution to the detection of botnet activity within consumer IoT devices and networks. A novel application of Deep Learning is used to develop a detection model based on a Bidirectional Long Short Term Memory based Recurrent Neural Network (BLSTM-RNN). Word Embedding is used for text recognition and conversion of attack packets into tokenised integer format. The developed BLSTM-RNN detection model is compared to a LSTM-RNN for detecting four attack vectors used by the mirai botnet, and evaluated for accuracy and loss. The paper demonstrates that although the bidirectional approach adds overhead to each epoch and increases processing time, it proves to be a better progressive model over time. A labelled dataset was generated as part of this research, and is available upon request.

230 citations


Journal ArticleDOI
TL;DR: Through simulations, it is shown that the DPC can achieve almost the same or even higher SE and EE than a conventional power control scheme, with a much lower computation time.
Abstract: In this letter, deep power control (DPC), which is the first transmit power control framework based on a convolutional neural network (CNN), is proposed. In DPC, the transmit power control strategy to maximize either spectral efficiency (SE) or energy efficiency (EE) is learned by means of a CNN. While conventional power control schemes require a considerable number of computations, in DPC, the transmit power of users can be determined using far fewer computations enabling real-time processing. We also propose a form of DPC that can be performed in a distributed manner with local channel state information, allowing the signaling overhead to be greatly reduced. Through simulations, we show that the DPC can achieve almost the same or even higher SE and EE than a conventional power control scheme, with a much lower computation time.

219 citations


Posted Content
TL;DR: Chameleon combines the best aspects of generic SFE protocols with the ones that are based upon additive secret sharing, and improves the efficiency of mining and classification of encrypted data for algorithms based upon heavy matrix multiplications.
Abstract: We present Chameleon, a novel hybrid (mixed-protocol) framework for secure function evaluation (SFE) which enables two parties to jointly compute a function without disclosing their private inputs. Chameleon combines the best aspects of generic SFE protocols with the ones that are based upon additive secret sharing. In particular, the framework performs linear operations in the ring $\mathbb{Z}_{2^l}$ using additively secret shared values and nonlinear operations using Yao's Garbled Circuits or the Goldreich-Micali-Wigderson protocol. Chameleon departs from the common assumption of additive or linear secret sharing models where three or more parties need to communicate in the online phase: the framework allows two parties with private inputs to communicate in the online phase under the assumption of a third node generating correlated randomness in an offline phase. Almost all of the heavy cryptographic operations are precomputed in an offline phase which substantially reduces the communication overhead. Chameleon is both scalable and significantly more efficient than the ABY framework (NDSS'15) it is based on. Our framework supports signed fixed-point numbers. In particular, Chameleon's vector dot product of signed fixed-point numbers improves the efficiency of mining and classification of encrypted data for algorithms based upon heavy matrix multiplications. Our evaluation of Chameleon on a 5 layer convolutional deep neural network shows 133x and 4.2x faster executions than Microsoft CryptoNets (ICML'16) and MiniONN (CCS'17), respectively.

Posted Content
TL;DR: This satellite imagery dataset enables research progress pertaining to four key computer vision frontiers and utilizes a novel process for geospatial category detection and bounding box annotation with three stages of quality control.
Abstract: We introduce a new large-scale dataset for the advancement of object detection techniques and overhead object detection research. This satellite imagery dataset enables research progress pertaining to four key computer vision frontiers. We utilize a novel process for geospatial category detection and bounding box annotation with three stages of quality control. Our data is collected from WorldView-3 satellites at 0.3m ground sample distance, providing higher resolution imagery than most public satellite imagery datasets. We compare xView to other object detection datasets in both natural and overhead imagery domains and then provide a baseline analysis using the Single Shot MultiBox Detector. xView is one of the largest and most diverse publicly available object-detection datasets to date, with over 1 million objects across 60 classes in over 1,400 km^2 of imagery.

Journal ArticleDOI
TL;DR: A new hybrid classification method based on Artificial Bee Colony (ABC) and Artificial Fish Swarm (AFS) algorithms is proposed that outperforms in terms of performance metrics and can achieve 99% detection rate and 0.01% false positive rate.

Proceedings ArticleDOI
15 Oct 2018
TL;DR: DeepCache as mentioned in this paper proposes a principled cache design for deep learning inference in continuous mobile vision, which benefits model execution efficiency by exploiting temporal locality in input video streams and propagates regions of reusable results by exploiting the model's internal structure.
Abstract: We present DeepCache, a principled cache design for deep learning inference in continuous mobile vision. DeepCache benefits model execution efficiency by exploiting temporal locality in input video streams. It addresses a key challenge raised by mobile vision: the cache must operate under video scene variation, while trading off among cacheability, overhead, and loss in model accuracy. At the input of a model, DeepCache discovers video temporal locality by exploiting the video's internal structure, for which it borrows proven heuristics from video compression; into the model, DeepCache propagates regions of reusable results by exploiting the model's internal structure. Notably, DeepCache eschews applying video heuristics to model internals which are not pixels but high-dimensional, difficult-to-interpret data. Our implementation of DeepCache works with unmodified deep learning models, requires zero developer's manual effort, and is therefore immediately deployable on off-the-shelf mobile devices. Our experiments show that DeepCache saves inference execution time by 18% on average and up to 47%. DeepCache reduces system energy consumption by 20% on average.

Journal ArticleDOI
TL;DR: A system that uses compressive estimation on the uplink to configure precoders and combiners for the downlink in a millimeter wave multiuser multiple-input multiple-output system is developed.
Abstract: Configuring the hybrid precoders and combiners in a millimeter wave multiuser multiple-input multiple-output system is challenging in frequency selective channels. In this paper, we develop a system that uses compressive estimation on the uplink to configure precoders and combiners for the downlink. In the first step, the base station (BS) simultaneously estimates the channels from all the mobile stations on each subcarrier. To reduce the number of measurements required, compressed sensing techniques are developed that exploit common support on the different subcarriers. In the second step, exploiting reciprocity and the channel estimates the BS designs hybrid precoders and combiners. Two algorithms are developed for this purpose, with different performance and complexity tradeoffs: First, a factorization of the purely digital solution; and second, an iterative hybrid design. Extensive numerical experiments evaluate the proposed solutions comparing to the state-of-the-art strategies, and illustrating design tradeoffs in overhead, complexity, and performance.

Proceedings ArticleDOI
02 Jun 2018
TL;DR: This paper investigates widely used DNNs and finds that the major contributors to memory footprint are intermediate layer outputs (feature maps), and introduces a framework for DNN-layer-specific optimizations that significantly reduce this source of main memory pressure on GPUs.
Abstract: Modern deep neural networks (DNNs) training typically relies on GPUs to train complex hundred-layer deep networks A significant problem facing both researchers and industry practitioners is that, as the networks get deeper, the available GPU main memory becomes a primary bottleneck, limiting the size of networks it can train In this paper, we investigate widely used DNNs and find that the major contributors to memory footprint are intermediate layer outputs (feature maps) We then introduce a framework for DNN-layer-specific optimizations (eg, convolution, ReLU, pool) that significantly reduce this source of main memory pressure on GPUs We find that a feature map typically has two uses that are spread far apart temporally Our key approach is to store an encoded representation of feature maps for this temporal gap and decode this data for use in the backward pass; the full-fidelity feature maps are used in the forward pass and relinquished immediately Based on this approach, we present Gist, our system that employs two classes of layer-specific encoding schemes -- lossless and lossy -- to exploit existing value redundancy in DNN training to significantly reduce the memory consumption of targeted feature maps For example, one insight is by taking advantage of the computational nature of back propagation from pool to ReLU layer, we can store the intermediate feature map using just 1 bit instead of 32 bits per value We deploy these mechanisms in a state-of-the-art DNN framework (CNTK) and observe that Gist reduces the memory footprint to upto 2X across 5 state-of-the-art image classification DNNs, with an average of 18X with only 4% performance overhead We also show that further software (eg, CuDNN) and hardware (eg, dynamic allocation) optimizations can result in even larger footprint reduction (upto 41X)

Journal ArticleDOI
TL;DR: This paper outlines a strategy to extract spatial information from sub-6 GHz and its use in mmWave compressed beam-selection and outlines a structured precoder/combiner design to tailor the training to out-of-band information.
Abstract: Millimeter wave (mmWave) communication is one feasible solution for high data-rate applications like vehicular-to-everything communication and next generation cellular communication. Configuring mmWave links, which can be done through channel estimation or beam-selection, however, is a source of significant overhead. In this paper, we propose using spatial information extracted at sub-6 GHz to help establish the mmWave link. Assuming a fully digital architecture at sub-6 GHz; and an analog architecture at mmWave, we outline a strategy to extract spatial information from sub-6 GHz and its use in mmWave compressed beam-selection. Specifically, we formulate compressed beam-selection as a weighted sparse signal recovery problem, and obtain the weighting information from sub-6 GHz channels. In addition, we outline a structured precoder/combiner design to tailor the training to out-of-band information. We also extend the proposed out-of-band aided compressed beam-selection approach to leverage information from all active subcarriers at mmWave. To simulate multi-band frequency dependent channels, we review the prior work on frequency dependent channel behavior and outline a multi-frequency channel model. The simulation results for achievable rate show that out-of-band aided beam-selection can considerably reduce the training overhead of in-band only beam-selection.

Proceedings Article
03 Jul 2018
TL;DR: This paper proposes the error compensated quantized stochastic gradient descent algorithm to improve the training efficiency, and presents theoretical analysis on the convergence behaviour, and demonstrates its advantage over competitors.
Abstract: Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the error compensated quantized stochastic gradient descent algorithm to improve the training efficiency. Local gradients are quantized to reduce the communication overhead, and accumulated quantization error is utilized to speed up the convergence. Furthermore, we present theoretical analysis on the convergence behaviour, and demonstrate its advantage over competitors. Extensive experiments indicate that our algorithm can compress gradients by a factor of up to two magnitudes without performance degradation.

Journal ArticleDOI
TL;DR: An edge computing architecture adequate for massive scale MCS services by placing key MCS features within the reference MEC architecture, which is adequate for both data analytics and real-time MCS scenarios, in line with the 5G vision to integrate a huge number of devices and enable innovative applications requiring low network latency.
Abstract: Mobile crowdsensing (MCS) is a human-driven Internet of Things service empowering citizens to observe the phenomena of individual, community, or even societal value by sharing sensor data about their environment while on the move. Typical MCS service implementations utilize cloud-based centralized architectures, which consume a lot of computational resources and generate significant network traffic, both in mobile networks and toward cloud-based MCS services. Mobile edge computing (MEC) is a natural choice to distribute MCS solutions by moving computation to network edge, since an MEC-based architecture enables significant performance improvements due to the partitioning of problem space based on location, where real-time data processing and aggregation is performed close to data sources. This in turn reduces the associated traffic in mobile core and will facilitate MCS deployments of massive scale. This paper proposes an edge computing architecture adequate for massive scale MCS services by placing key MCS features within the reference MEC architecture. In addition to improved performance, the proposed architecture decreases privacy threats and permits citizens to control the flow of contributed sensor data. It is adequate for both data analytics and real-time MCS scenarios, in line with the 5G vision to integrate a huge number of devices and enable innovative applications requiring low network latency. Our analysis of service overhead introduced by distributed architecture and service reconfiguration at network edge performed on real user traces shows that this overhead is controllable and small compared with the aforementioned benefits. When enhanced by interoperability concepts, the proposed architecture creates an environment for the establishment of an MCS marketplace for bartering and trading of both raw sensor data and aggregated/processed information.

Journal ArticleDOI
TL;DR: In this article, the authors use the vehicle's position (e.g., available via GPS) to query a multipath fingerprint database, which provides prior knowledge of potential pointing directions for reliable beam alignment.
Abstract: Efficient beam alignment is a crucial component in millimeter wave systems with analog beamforming, especially in fast-changing vehicular settings. This paper proposes to use the vehicle's position (e.g., available via GPS) to query a multipath fingerprint database, which provides prior knowledge of potential pointing directions for reliable beam alignment. The approach is the inverse of fingerprinting localization, where the measured multipath signature is compared to the fingerprint database to retrieve the most likely position. The power loss probability is introduced as a metric to quantify misalignment accuracy and is used for optimizing candidate beam selection. Two candidate beam selection methods are developed, where one is a heuristic while the other minimizes the misalignment probability. The proposed beam alignment is evaluated using realistic channels generated from a commercial ray-tracing simulator. Using the generated channels, an extensive investigation is provided, which includes the required measurement sample size to build an effective fingerprint, the impact of measurement noise, the sensitivity to changes in traffic density, and beam alignment overhead comparison with IEEE 802.11ad as the baseline. Using the concept of beam coherence time, which is the duration between two consecutive beam alignments, and parameters of IEEE 802.11ad, the overhead is compared in the mobility context. The results show that while the proposed approach provides increasing rates with larger antenna arrays, IEEE 802.11ad has decreasing rates due to the higher beam training overhead that eats up a large portion of the beam coherence time, which becomes shorter with increasing mobility.

Journal ArticleDOI
TL;DR: A novel identity-based key establishment protocol which employs elliptic curves at its core and has the lowest computational overhead among current secure protocols, especially at the meter side is proposed.
Abstract: Security is fundamental to the operation of smart grid and its advanced metering infrastructure (AMI). Key establishment is the core component in key management schemes and plays a crucial role in satisfying the security requirements. In recent years, many key establishment protocols have been proposed in the context of smart grids. However, their high overhead and complexity have undermined their ability to be practically employed in AMI. In this paper, we propose a novel identity-based key establishment protocol which employs elliptic curves at its core and has the lowest computational overhead among current secure protocols, especially at the meter side. We have given a detailed and comparative analysis of both security and computational burden for the proposed protocol as well as the previous efforts. We have evaluated the resilience of different protocols to well-known attacks by discussion, however, for the proposed protocol, we have also used AVISPA tool to formally verify its security. Compared to the previous solutions, the proposed protocol either is more secure or scores higher in performance benchmarks run on the same hardware.

Journal ArticleDOI
TL;DR: An efficient ELM based on the Spark framework (SELM), which includes three parallel subalgorithms, is proposed for big data classification and implemented, which strengthens the learning ability of the SELM.
Abstract: As data sets become larger and more complicated, an extreme learning machine (ELM) that runs in a traditional serial environment cannot realize its ability to be fast and effective. Although a parallel ELM (PELM) based on MapReduce to process large-scale data shows more efficient learning speed than identical ELM algorithms in a serial environment, some operations, such as intermediate results stored on disks and multiple copies for each task, are indispensable, and these operations create a large amount of extra overhead and degrade the learning speed and efficiency of the PELMs. In this paper, an efficient ELM based on the Spark framework (SELM), which includes three parallel subalgorithms, is proposed for big data classification. By partitioning the corresponding data sets reasonably, the hidden layer output matrix calculation algorithm, matrix $\mathbf {\hat {U}}$ decomposition algorithm, and matrix $\mathbf {V}$ decomposition algorithm perform most of the computations locally. At the same time, they retain the intermediate results in distributed memory and cache the diagonal matrix as broadcast variables instead of several copies for each task to reduce a large amount of the costs, and these actions strengthen the learning ability of the SELM. Finally, we implement our SELM algorithm to classify large data sets. Extensive experiments have been conducted to validate the effectiveness of the proposed algorithms. As shown, our SELM achieves an $8.71\times$ speedup on a cluster with ten nodes, and reaches a $13.79\times$ speedup with 15 nodes, an $18.74\times$ speedup with 20 nodes, a $23.79\times$ speedup with 25 nodes, a $28.89\times$ speedup with 30 nodes, and a $33.81\times$ speedup with 35 nodes.

Proceedings ArticleDOI
01 Jan 2018
TL;DR: SpeedyMurmurs as discussed by the authors is a routing algorithm for decentralized path-based transactions in credit networks that uses embedding-based path discovery and on-demand efficient stabilization to handle the dynamics of a PBT network.
Abstract: Path-based transaction (PBT) networks, which settle payments from one user to another via a path of intermediaries, are a growing area of research. They overcome the scalability and privacy issues in cryptocurrencies like Bitcoin and Ethereum by replacing expensive and slow on-chain blockchain operations with inexpensive and fast off-chain transfers. In the form of credit networks such as Ripple and Stellar, they also enable low-price real-time gross settlements across different currencies. For example, SilentWhsipers is a recently proposed fully distributed credit network relying on path-based transactions for secure and in particular private payments without a public ledger. At the core of a decentralized PBT network is a routing algorithm that discovers transaction paths between payer and payee. During the last year, a number of routing algorithms have been proposed. However, the existing ad hoc efforts lack either efficiency or privacy. In this work, we first identify several efficiency concerns in SilentWhsipers. Armed with this knowledge, we design and evaluate SpeedyMurmurs, a novel routing algorithm for decentralized PBT networks using efficient and flexible embedding-based path discovery and on-demand efficient stabilization to handle the dynamics of a PBT network. Our simulation study, based on real-world data from the currently deployed Ripple credit network, indicates that SpeedyMurmurs reduces the overhead of stabilization by up to two orders of magnitude and the overhead of routing a transaction by more than a factor of two. Furthermore, using SpeedyMurmurs maintains at least the same success ratio as decentralized landmark routing, while providing lower delays. Finally, SpeedyMurmurs achieves key privacy goals for routing in PBT networks.

Journal ArticleDOI
TL;DR: Simulation results indicate that the secure MPC-based protocol can be a viable privacy-preserving data aggregation mechanism since it not only reduces the overhead with respect to FHE but also almost matches the performance of the Paillier cryptosystem when it is used within a proper sized AMI network.

Proceedings ArticleDOI
24 Jun 2018
TL;DR: The proposed DrAcc achieves high inference accuracy by implementing a ternary weight network using in-DRAM bit operation with simple enhancements, and can be flexibly configured for the best trade-off among performance, power and energy consumption, and DRAM data reuse factors.
Abstract: Modern Convolutional Neural Networks (CNNs) are computation and memory intensive. Thus it is crucial to develop hardware accelerators to achieve high performance as well as power/energy-efficiency on resource limited embedded systems. DRAM-based CNN accelerators exhibit great potentials but face inference accuracy and area overhead challenges. In this paper, we propose DrAcc, a novel DRAM-based processing-in-memory CNN accelerator. DrAcc achieves high inference accuracy by implementing a ternary weight network using in-DRAM bit operation with simple enhancements. The data partition and mapping strategies can be flexibly configured for the best trade-off among performance, power and energy consumption, and DRAM data reuse factors. Our experimental results show that DrAcc achieves 84.8 FPS (frame per second) at 2W and 2.9× power efficiency improvement over the process-near-memory design.

Journal ArticleDOI
TL;DR: An interference aware resource allocation for NB-IoT is proposed by formulating the rate maximization problem considering the overhead of control channels, time offset, and repetition factor and it is shown through the simulation results that the cooperative scheme provides up to 8% rate improvement and 17% energy reduction as compared with the non-cooperative scheme.
Abstract: Narrowband Internet of Things (NB-IoT) is the prominent technology that fits the requirements of future IoT networks. However, due to the limited spectrum (i.e., 180 kHz) availability for NB-IoT systems, one of the key issues is how to efficiently use these resources to support massive IoT devices? Furthermore, in NB-IoT, to reduce the computation complexity and to provide coverage extension, the concept of time offset and repetition has been introduced. Considering these new features, the existing resource management schemes are no longer applicable. Moreover, the allocation of frequency band for NB-IoT within LTE band, or as a standalone, might not be synchronous in all the cells, resulting in intercell interference (ICI) from the neighboring cells’ LTE users or NB-IoT users (synchronous case). In this paper, first a theoretical framework for the upper bound on the achievable data rate is formulated in the presence of control channel and repetition factor. From the conducted analysis, it is shown that the maximum achievable data rates are 89.2 Kbps and 92 Kbps for downlink and uplink, respectively. Second, we propose an interference aware resource allocation for NB-IoT by formulating the rate maximization problem considering the overhead of control channels, time offset, and repetition factor. Due to the complexity of finding the globally optimum solution of the formulated problem, a sub-optimal solution with an iterative algorithm based on cooperative approaches is proposed. The proposed algorithm is then evaluated to investigate the impact of repetition factor, time offset and ICI on the NB-IoT data rate, and energy consumption. Furthermore, a detailed comparison between the non-cooperative, cooperative, and optimal scheme (i.e., no repetition) is also presented. It is shown through the simulation results that the cooperative scheme provides up to 8% rate improvement and 17% energy reduction as compared with the non-cooperative scheme.

Journal ArticleDOI
TL;DR: A bio-inspired and trust-based cluster head selection approach for WSN adopted in ITS applications and the results demonstrated that the proposed model achieved longer network lifetime, i.e., nodes are kept alive longer than what LEACH, SEP and DEEC can achieve.

Proceedings ArticleDOI
01 Oct 2018
TL;DR: In this article, the authors focus on collaborative object detection and study the impact of both near-lossless and lossy compression of feature data on its accuracy, and propose a strategy for improving the accuracy under lossy feature compression.
Abstract: Recent studies have shown that the efficiency of deep neural networks in mobile applications can be significantly improved by distributing the computational workload between the mobile device and the cloud. This paradigm, termed collaborative intelligence, involves communicating feature data between the mobile and the cloud. The efficiency of such approach can be further improved by lossy compression of feature data, which has not been examined to date. In this work we focus on collaborative object detection and study the impact of both near-lossless and lossy compression of feature data on its accuracy. We also propose a strategy for improving the accuracy under lossy feature compression. Experiments indicate that using this strategy, the communication overhead can be reduced by up to 70% without sacrificing accuracy.

Journal ArticleDOI
Qiang Ye1, Weihua Zhuang1, Shan Zhang1, A-Long Jin1, Xuemin Shen1, Xu Li2 
TL;DR: It is demonstrated that the proposed radio resource slicing framework outperforms the two other resource slicing schemes in terms of low communication overhead, high spectrum utilization, and high aggregate network utility.
Abstract: In this paper, a dynamic radio resource slicing framework is presented for a two-tier heterogeneous wireless network Through software-defined networking-enabled wireless network function virtualization, radio spectrum resources of heterogeneous wireless networks are partitioned into different bandwidth slices for different base stations (BSs) This framework facilitates spectrum sharing among heterogeneous BSs and achieves differentiated quality-of-service (QoS) provisioning for data service and machine-to-machine service in the presence of network load dynamics To determine the set of optimal bandwidth slicing ratios and optimal BS-device association patterns, a network utility maximization problem is formulated with the consideration of different traffic statistics and QoS requirements, location distribution for end devices, varying device locations, load conditions in each cell, and intercell interference For tractability, the optimization problem is transformed to a biconcave maximization problem An alternative concave search (ACS) algorithm is then designed to solve for a set of partial optimal solutions Simulation results verify the convergence property and display low complexity of the ACS algorithm It is demonstrated that the proposed radio resource slicing framework outperforms the two other resource slicing schemes in terms of low communication overhead, high spectrum utilization, and high aggregate network utility

Journal ArticleDOI
TL;DR: An intelligent control strategy to obtain the solution of the corresponding Hamilton–Jacobi–Bellman equation is established and neural networks are employed to serve as a necessary component to the control system, which exhibits strong online learning ability.
Abstract: In this paper, for achieving the discounted optimal feedback stabilization of a nonlinear overhead crane system, we establish an intelligent control strategy to obtain the solution of the corresponding Hamilton–Jacobi–Bellman equation. Specifically, neural networks are employed to serve as a necessary component to the control system, which exhibits strong online learning ability. A novel updating rule compared to the traditional adaptive critic algorithms is developed, which eliminates the requirement of the initial stabilizing controller and brings in unique advantages to the adaptive critic control design. Stability analysis of the closed-loop system based on the well-known Lyapunov approach and experimental simulation considering the nonlinear overhead dynamics with different case studies are performed to verify the effectiveness of the present control method both in theory and applications.