scispace - formally typeset
Search or ask a question

Showing papers on "Overhead (computing) published in 2016"


Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this paper, both filter kernels in convolutional layers and weight matrices in fully-connected layers are quantized, aiming at minimizing the estimation error of each layer's response.
Abstract: Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computer vision tasks. However, high performance hardware is typically indispensable for the application of CNN models due to the high computation complexity, which prohibits their further extensions. In this paper, we propose an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models. Both filter kernels in convolutional layers and weighting matrices in fully-connected layers are quantized, aiming at minimizing the estimation error of each layer's response. Extensive experiments on the ILSVRC-12 benchmark demonstrate 4 ~ 6× speed-up and 15 ~ 20× compression with merely one percentage loss of classification accuracy. With our quantized CNN model, even mobile devices can accurately classify images within one second.

902 citations


Journal ArticleDOI
18 Jun 2016
TL;DR: Cnvolutin (CNV), a value-based approach to hardware acceleration that eliminates most of these ineffectual operations, improving performance and energy over a state-of-the-art accelerator with no accuracy loss.
Abstract: This work observes that a large fraction of the computations performed by Deep Neural Networks (DNNs) are intrinsically ineffectual as they involve a multiplication where one of the inputs is zero. This observation motivates Cnvlutin (CNV), a value-based approach to hardware acceleration that eliminates most of these ineffectual operations, improving performance and energy over a state-of-the-art accelerator with no accuracy loss. CNV uses hierarchical data-parallel units, allowing groups of lanes to proceed mostly independently enabling them to skip over the ineffectual computations. A co-designed data storage format encodes the computation elimination decisions taking them off the critical path while avoiding control divergence in the data parallel units. Combined, the units and the data storage format result in a data-parallel architecture that maintains wide, aligned accesses to its memory hierarchy and that keeps its data lanes busy. By loosening the ineffectual computation identification criterion, CNV enables further performance and energy efficiency improvements, and more so if a loss in accuracy is acceptable. Experimental measurements over a set of state-of-the-art DNNs for image classification show that CNV improves performance over a state-of-the-art accelerator from 1.24× to 1.55× and by 1.37× on average without any loss in accuracy by removing zero-valued operand multiplications alone. While CNV incurs an area overhead of 4.49%, it improves overall EDP (Energy Delay Product) and ED2P (Energy Delay Squared Product) on average by 1.47× and 2.01×, respectively. The average performance improvements increase to 1.52× without any loss in accuracy with a broader ineffectual identification policy. Further improvements are demonstrated with a loss in accuracy.

687 citations


Journal ArticleDOI
TL;DR: In this paper, the authors make the case that mmWave communication is the only viable approach for high bandwidth connected vehicles and highlight the motivations and challenges associated with using mmWave for vehicle-to-vehicle and V2V applications.
Abstract: As driving becomes more automated, vehicles are being equipped with more sensors generating even higher data rates. Radars are used for object detection, visual cameras as virtual mirrors, and LIDARs for generating high resolution depth associated range maps, all to enhance the safety and efficiency of driving. Connected vehicles can use wireless communication to exchange sensor data, allowing them to enlarge their sensing range and improve automated driving functions. Unfortunately, conventional technologies, such as DSRC and 4G cellular communication, do not support the gigabit-per-second data rates that would be required for raw sensor data exchange between vehicles. This article makes the case that mmWave communication is the only viable approach for high bandwidth connected vehicles. The motivations and challenges associated with using mmWave for vehicle-to-vehicle and vehicle-to-infrastructure applications are highlighted. A high-level solution to one key challenge - the overhead of mmWave beam training - is proposed. The critical feature of this solution is to leverage information derived from the sensors or DSRC as side information for the mmWave communication link configuration. Examples and simulation results show that the beam alignment overhead can be reduced by using position information obtained from DSRC.

638 citations


Proceedings ArticleDOI
11 Apr 2016
TL;DR: Experiments show, DeepX can allow even large-scale deep learning models to execute efficently on modern mobile processors and significantly outperform existing solutions, such as cloud-based offloading.
Abstract: Breakthroughs from the field of deep learning are radically changing how sensor data are interpreted to extract the high-level information needed by mobile apps. It is critical that the gains in inference accuracy that deep models afford become embedded in future generations of mobile apps. In this work, we present the design and implementation of DeepX, a software accelerator for deep learning execution. DeepX signif- icantly lowers the device resources (viz. memory, computation, energy) required by deep learning that currently act as a severe bottleneck to mobile adoption. The foundation of DeepX is a pair of resource control algorithms, designed for the inference stage of deep learning, that: (1) decompose monolithic deep model network architectures into unit- blocks of various types, that are then more efficiently executed by heterogeneous local device processors (e.g., GPUs, CPUs); and (2), perform principled resource scaling that adjusts the architecture of deep models to shape the overhead each unit-blocks introduces. Experiments show, DeepX can allow even large-scale deep learning models to execute efficently on modern mobile processors and significantly outperform existing solutions, such as cloud-based offloading.

442 citations


Proceedings Article
16 Mar 2016
TL;DR: Ernest, a performance prediction framework for large scale analytics, and evaluation on Amazon EC2 using several workloads shows that the prediction error is low while having a training overhead of less than 5% for long-running jobs.
Abstract: Recent workload trends indicate rapid growth in the deployment of machine learning, genomics and scientific workloads on cloud computing infrastructure. However, efficiently running these applications on shared infrastructure is challenging and we find that choosing the right hardware configuration can significantly improve performance and cost. The key to address the above challenge is having the ability to predict performance of applications under various resource configurations so that we can automatically choose the optimal configuration. Our insight is that a number of jobs have predictable structure in terms of computation and communication. Thus we can build performance models based on the behavior of the job on small samples of data and then predict its performance on larger datasets and cluster sizes. To minimize the time and resources spent in building a model, we use optimal experiment design, a statistical technique that allows us to collect as few training points as required. We have built Ernest, a performance prediction framework for large scale analytics and our evaluation on Amazon EC2 using several workloads shows that our prediction error is low while having a training overhead of less than 5% for long-running jobs.

401 citations


Journal ArticleDOI
TL;DR: In this article, various features and properties of the BSUM are discussed from the viewpoint of design flexibility, computational efficiency, parallel/distributed implementation, and the required communication overhead.
Abstract: This article presents a powerful algorithmic framework for big data optimization, called the block successive upper-bound minimization (BSUM). The BSUM includes as special cases many well-known methods for analyzing massive data sets, such as the block coordinate descent (BCD) method, the convex-concave procedure (CCCP) method, the block coordinate proximal gradient (BCPG) method, the nonnegative matrix factorization (NMF) method, the expectation maximization (EM) method, etc. In this article, various features and properties of the BSUM are discussed from the viewpoint of design flexibility, computational efficiency, parallel/distributed implementation, and the required communication overhead. Illustrative examples from networking, signal processing, and machine learning are presented to demonstrate the practical performance of the BSUM framework.

383 citations


Journal ArticleDOI
TL;DR: This work proposes a method for introducing independent random single-qubit gates into the logical circuit in such a way that the effective logical circuit remains unchanged and proves that this randomization tailors the noise into stochastic Pauli errors, which can dramatically reduce error rates while introducing little or no experimental overhead.
Abstract: Quantum computers are poised to radically outperform their classical counterparts by manipulating coherent quantum systems. A realistic quantum computer will experience errors due to the environment and imperfect control. When these errors are even partially coherent, they present a major obstacle to performing robust computations. Here, we propose a method for introducing independent random single-qubit gates into the logical circuit in such a way that the effective logical circuit remains unchanged. We prove that this randomization tailors the noise into stochastic Pauli errors, which can dramatically reduce error rates while introducing little or no experimental overhead. Moreover, we prove that our technique is robust to the inevitable variation in errors over the randomizing gates and numerically illustrate the dramatic reductions in worst-case error that are achievable. Given such tailored noise, gates with significantly lower fidelity---comparable to fidelities realized in current experiments---are sufficient to achieve fault-tolerant quantum computation. Furthermore, the worst-case error rate of the tailored noise can be directly and efficiently measured through randomized benchmarking protocols, enabling a rigorous certification of the performance of a quantum computer.

331 citations


Proceedings Article
16 Mar 2016
TL;DR: The key idea of FlowRadar is to encode perflow counters with a small memory and constant insertion time at switches, and then to leverage the computing power at the remote collector to perform network-wide decoding and analysis of the flow counters.
Abstract: NetFlow has been a widely used monitoring tool with a variety of applications. NetFlow maintains an active working set of flows in a hash table that supports flow insertion, collision resolution, and flow removing. This is hard to implement in merchant silicon at data center switches, which has limited per-packet processing time. Therefore, many NetFlow implementations and other monitoring solutions have to sample or select a subset of packets to monitor. In this paper, we observe the need to monitor all the flows without sampling in short time scales. Thus, we design FlowRadar, a new way to maintain flows and their counters that scales to a large number of flows with small memory and bandwidth overhead. The key idea of FlowRadar is to encode perflow counters with a small memory and constant insertion time at switches, and then to leverage the computing power at the remote collector to perform network-wide decoding and analysis of the flow counters. Our evaluation shows that the memory usage of FlowRadar is close to traditional NetFlow with perfect hashing. With FlowRadar, operators can get better views into their networks as demonstrated by two new monitoring applications we build on top of FlowRadar.

285 citations


Journal ArticleDOI
TL;DR: A general overview of the current low-rank channel estimation approaches is provided, including their basic assumptions, key results, as well as pros and cons on addressing the aforementioned tricky challenges.
Abstract: Massive multiple-input multiple-output is a promising physical layer technology for 5G wireless communications due to its capability of high spectrum and energy efficiency, high spatial resolution, and simple transceiver design. To embrace its potential gains, the acquisition of channel state information is crucial, which unfortunately faces a number of challenges, such as the uplink pilot contamination, the overhead of downlink training and feedback, and the computational complexity. In order to reduce the effective channel dimensions, researchers have been investigating the low-rank (sparse) properties of channel environments from different viewpoints. This paper then provides a general overview of the current low-rank channel estimation approaches, including their basic assumptions, key results, as well as pros and cons on addressing the aforementioned tricky challenges. Comparisons among all these methods are provided for better understanding and some future research prospects for these low-rank approaches are also forecasted.

265 citations


Proceedings ArticleDOI
01 Jun 2016
TL;DR: FireCaffe is presented, which successfully scales deep neural network training across a cluster of GPUs, and finds that reduction trees are more efficient and scalable than the traditional parameter server approach.
Abstract: Long training times for high-accuracy deep neural networks (DNNs) impede research into new DNN architectures and slow the development of high-accuracy DNNs. In this paper we present FireCaffe, which successfully scales deep neural network training across a cluster of GPUs. We also present a number of best practices to aid in comparing advancements in methods for scaling and accelerating the training of deep neural networks. The speed and scalability of distributed algorithms is almost always limited by the overhead of communicating between servers, DNN training is not an exception to this rule. Therefore, the key consideration here is to reduce communication overhead wherever possible, while not degrading the accuracy of the DNN models that we train. Our approach has three key pillars. First, we select network hardware that achieves high bandwidth between GPU servers – Infiniband or Cray interconnects are ideal for this. Second, we consider a number of communication algorithms, and we find that reduction trees are more efficient and scalable than the traditional parameter server approach. Third, we optionally increase the batch size to reduce the total quantity of communication during DNN training, and we identify hyperparameters that allow us to reproduce the small-batch accuracy while training with large batch sizes. When training GoogLeNet and Network-in-Network on ImageNet, we achieve a 47x and 39x speedup, respectively, when training on a cluster of 128 GPUs.

251 citations


Proceedings ArticleDOI
Yuan Zhifeng1, Guanghui Yu1, Weimin Li1, Yifei Yuan1, Wang Xinhui1, Jun Xu1 
15 May 2016
TL;DR: A new type of non-orthogonal multiple access scheme called multi-user shared access (MUSA) is proposed to support IoT and can achieve significant gain in user overloading performance compared to orthogonal systems, while incurring much lower control overhead.
Abstract: Internet of things (IoT) is widely expected to be an important scenario in the fifth generation (5G) wireless network. Major challenges of IoT include the low cost of devices, low energy consumption, low latency and the ability to support a large number of simultaneous connections. In this article, a new type of non-orthogonal multiple access scheme called multi-user shared access (MUSA) is proposed to support IoT. MUSA adopts a grant-free access strategy to simplify the access procedure significantly and utilizes advanced code domain non-orthogonal complex spreading to accommodate massive number of users in the same radio resources. A family of complex sequences with short length is chosen as spreading sequence for its ability to enable simple and robust successive interference cancellation at the base station side and cope with high user load. Simulation results show that MUSA can achieve significant gain in user overloading performance compared to orthogonal systems, while incurring much lower control overhead.

Proceedings ArticleDOI
05 Jul 2016
TL;DR: This paper designs, implements and evaluates MOCA, a protocol for Mobility resilience and Overhead Constrained Adaptation for directional 60 GHz links, and introduces Beam Sounding as a mechanism invoked before each data transmission to estimate the link quality for selected beams, and identify and adapt to link impairments.
Abstract: High directivity of 60 GHz links introduces new link training and adaptation challenges due to both client and environmental mobility. In this paper, we design, implement and evaluate MOCA, a protocol for Mobility resilience and Overhead Constrained Adaptation for directional 60 GHz links. Since mobility-induced link blockage and misalignment cannot be countered with data rate adaptation alone, we introduce Beam Sounding as a mechanism invoked before each data transmission to estimate the link quality for selected beams, and identify and adapt to link impairments. We devise proactive techniques to restore broken directional links with low overhead and design a mechanism to jointly adapt beamwidth and data rate, targeting throughput maximization that incorporates data rate, overhead for beam alignment, and mobility resilience. We implement a programmable node and testbed using software defined radios with commercial 60 GHz transceivers, and conduct an extensive over-the-air measurement study to collect channel traces for various environments. Based on trace based emulations and the IEEE 802.11ad channel model, we evaluate MOCA under a variety of propagation environments and mobility scenarios. Our experiments show that MOCA achieves up to 2x throughput gains compared to a baseline WLAN scheme in a diverse set of operational conditions.

Book ChapterDOI
08 Oct 2016
TL;DR: In this article, the authors proposed a new dataset with one million pairs of street view and overhead images sampled from eleven U.S. cities and explored several deep CNN architectures for cross-domain matching.
Abstract: In this paper we aim to determine the location and orientation of a ground-level query image by matching to a reference database of overhead (e.g. satellite) images. For this task we collect a new dataset with one million pairs of street view and overhead images sampled from eleven U.S. cities. We explore several deep CNN architectures for cross-domain matching – Classification, Hybrid, Siamese, and Triplet networks. Classification and Hybrid architectures are accurate but slow since they allow only partial feature precomputation. We propose a new loss function which significantly improves the accuracy of Siamese and Triplet embedding networks while maintaining their applicability to large-scale retrieval tasks like image geolocalization. This image matching task is challenging not just because of the dramatic viewpoint difference between ground-level and overhead imagery but because the orientation (i.e. azimuth) of the street views is unknown making correspondence even more difficult. We examine several mechanisms to match in spite of this – training for rotation invariance, sampling possible rotations at query time, and explicitly predicting relative rotation of ground and overhead images with our deep networks. It turns out that explicit orientation supervision also improves location prediction accuracy. Our best performing architectures are roughly 2.5 times as accurate as the commonly used Siamese network baseline.

Journal ArticleDOI
TL;DR: By exploiting the temporal correlation of active user sets, a dynamic compressive sensing (DCS)-based multi-user detection (MUD) to realize both user activity and data detection in several continuous time slots is proposed.
Abstract: Non-orthogonal multiple access (NOMA) can support more users than OMA techniques using the same wireless resources, which is expected to support massive connectivity for Internet of Things in 5G. Furthermore, in order to reduce the transmission latency and signaling overhead, grant-free transmission is highly expected in the uplink NOMA systems, where user activity has to be detected. In this letter, by exploiting the temporal correlation of active user sets, we propose a dynamic compressive sensing (DCS)-based multi-user detection (MUD) to realize both user activity and data detection in several continuous time slots. In particular, as the temporal correlation of the active user sets between adjacent time slots exists, we can use the estimated active user set in the current time slot as the prior information to estimate the active user set in the next time slot. Simulation results show that the proposed DCS-based MUD can achieve much better performance than that of the conventional CS-based MUD in NOMA systems.

Journal ArticleDOI
TL;DR: A structured compressive sensing (SCS)-based spatio-temporal joint channel estimation scheme to reduce the required pilot overhead and is capable of approaching the optimal oracle least squares estimator.
Abstract: Massive MIMO is a promising technique for future 5G communications due to its high spectrum and energy efficiency. To realize its potential performance gain, accurate channel estimation is essential. However, due to massive number of antennas at the base station (BS), the pilot overhead required by conventional channel estimation schemes will be unaffordable, especially for frequency division duplex (FDD) massive MIMO. To overcome this problem, we propose a structured compressive sensing (SCS)-based spatio-temporal joint channel estimation scheme to reduce the required pilot overhead, whereby the spatio-temporal common sparsity of delay-domain MIMO channels is leveraged. Particularly, we first propose the nonorthogonal pilots at the BS under the framework of CS theory to reduce the required pilot overhead. Then, an adaptive structured subspace pursuit (ASSP) algorithm at the user is proposed to jointly estimate channels associated with multiple OFDM symbols from the limited number of pilots, whereby the spatio-temporal common sparsity of MIMO channels is exploited to improve the channel estimation accuracy. Moreover, by exploiting the temporal channel correlation, we propose a space-time adaptive pilot scheme to further reduce the pilot overhead. Additionally, we discuss the proposed channel estimation scheme in multicell scenario. Simulation results demonstrate that the proposed scheme can accurately estimate channels with the reduced pilot overhead, and it is capable of approaching the optimal oracle least squares estimator.

Book ChapterDOI
19 Sep 2016
TL;DR: This work presents CloudRadar, a system to detect, and hence mitigate, cache-based side-channel attacks in multi-tenant cloud systems, designed as a lightweight patch to existing cloud systems which does not require new hardware support, or any hypervisor, operating system, application modifications.
Abstract: We present CloudRadar, a system to detect, and hence mitigate, cache-based side-channel attacks in multi-tenant cloud systems. CloudRadar operates by correlating two events: first, it exploits signature-based detection to identify when the protected virtual machine (VM) executes a cryptographic application; at the same time, it uses anomaly-based detection techniques to monitor the co-located VMs to identify abnormal cache behaviors that are typical during cache-based side-channel attacks. We show that correlation in the occurrence of these two events offer strong evidence of side-channel attacks. Compared to other work on side-channel defenses, CloudRadar has the following advantages: first, CloudRadar focuses on the root causes of cache-based side-channel attacks and hence is hard to evade using metamorphic attack code, while maintaining a low false positive rate. Second, CloudRadar is designed as a lightweight patch to existing cloud systems, which does not require new hardware support, or any hypervisor, operating system, application modifications. Third, CloudRadar provides real-time protection and can detect side-channel attacks within the order of milliseconds. We demonstrate a prototype implementation of CloudRadar in the OpenStack cloud framework. Our evaluation suggests CloudRadar achieves negligible performance overhead with high detection accuracy.

Journal ArticleDOI
01 Dec 2016
TL;DR: This paper analyzes three methods to detect cache-based side-channel attacks in real time, preventing or limiting the amount of leaked information, and how the detection systems behave with a modified version of one of the spy processes.
Abstract: Graphical abstractDisplay Omitted HighlightsThree methods for detecting a class of cache-based side-channel attacks are proposed.A new tool (quickhpc) for probing hardware performance counters at a higher temporal resolution than the existing tools is presented.The first method is based on correlation, the other two use machine learning techniques and reach a minimum F-score of 0.93.A smarter attack is devised that is capable of circumventing the first method. In this paper we analyze three methods to detect cache-based side-channel attacks in real time, preventing or limiting the amount of leaked information. Two of the three methods are based on machine learning techniques and all the three of them can successfully detect an attack in about one fifth of the time required to complete it. We could not experience the presence of false positives in our test environment and the overhead caused by the detection systems is negligible. We also analyze how the detection systems behave with a modified version of one of the spy processes. With some optimization we are confident these systems can be used in real world scenarios.

Journal ArticleDOI
TL;DR: It is shown that the distributed scheme is effective for the resource allocation and could protect the CUs with limited signaling overhead and the signaling overhead is compared between the centralized and decentralized schemes.
Abstract: This paper addresses the joint spectrum sharing and power allocation problem for device-to-device (D2D) communications underlaying a cellular network (CN). In the context of orthogonal frequency-division multiple-access systems, with the uplink resources shared with D2D links, both centralized and decentralized methods are proposed. Assuming global channel state information (CSI), the resource allocation problem is first formulated as a nonconvex optimization problem, which is solved using convex approximation techniques. We prove that the approximation method converges to a suboptimal solution and is often very close to the global optimal solution. On the other hand, by exploiting the decentralized network structure with only local CSI at each node, the Stackelberg game model is then adopted to devise a distributed resource allocation scheme. In this game-theoretic model, the base station (BS), which is modeled as the leader, coordinates the interference from the D2D transmission to the cellular users (CUs) by pricing the interference. Subsequently, the D2D pairs, as followers, compete for the spectrum in a noncooperative fashion. Sufficient conditions for the existence of the Nash equilibrium (NE) and the uniqueness of the solution are presented, and an iterative algorithm is proposed to solve the problem. In addition, the signaling overhead is compared between the centralized and decentralized schemes. Finally, numerical results are presented to verify the proposed schemes. It is shown that the distributed scheme is effective for the resource allocation and could protect the CUs with limited signaling overhead.

Proceedings ArticleDOI
18 Jun 2016
TL;DR: The GraphBLAS standard as discussed by the authors defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments.
Abstract: The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix-based graph algorithms to the broadest possible audience. Mathematically, the GraphBLAS defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the mathematics of the GraphBLAS. Graphs represent connections between vertices with edges. Matrices can represent a wide range of graphs using adjacency matrices or incidence matrices. Adjacency matrices are often easier to analyze while incidence matrices are often better for representing data. Fortunately, the two are easily connected by matrix multiplication. A key feature of matrix mathematics is that a very small number of matrix operations can be used to manipulate a very wide range of graphs. This composability of a small number of operations is the foundation of the GraphBLAS. A standard such as the GraphBLAS can only be effective if it has low performance overhead. Performance measurements of prototype GraphBLAS implementations indicate that the overhead is low.

Journal ArticleDOI
TL;DR: This letter is the first attempt to conflate a machine learning technique with wireless communications and provides insight into the potential of fusion of machine learning and wireless communications.
Abstract: This letter is the first attempt to conflate a machine learning technique with wireless communications. Through interpreting the antenna selection (AS) in wireless communications (i.e., an optimization-driven decision) to multiclass-classification learning (i.e., data-driven prediction), and through comparing the learning-based AS using $k$ -nearest neighbors and support vector machine algorithms with conventional optimization-driven AS methods in terms of communications performance, computational complexity, and feedback overhead, we provide insight into the potential of fusion of machine learning and wireless communications.

Journal ArticleDOI
TL;DR: 2FLIP provides strong privacy preservation that the adversaries can never succeed in tracing any vehicles, even with all RSUs compromised, and achieves strong nonrepudiation that any biological anonym driver could be conditionally traced, even if he is not the only driver of the vehicle.
Abstract: Authentication in a vehicular ad-hoc network (VANET) requires not only secure and efficient authentication with privacy preservation but applicable flexibility to handle complicated transportation circumstances as well. In this paper, we proposed a Two-Factor LIghtweight Privacy-preserving authentication scheme (2FLIP) to enhance the security of VANET communication. 2FLIP employs the decentralized certificate authority (CA) and the biological-password-based two-factor authentication (2FA) to achieve the goals. Based on the decentralized CA, 2FLIP only requires several extremely lightweight hashing processes and a fast message-authentication-code operation for message signing and verification between vehicles. Compared with previous schemes, 2FLIP significantly reduces computation cost by 100–1000 times and decreases communication overhead by 55.24%–77.52%. Furthermore, any certificate revocation list (CRL)-related overhead on vehicles is avoided. 2FLIP makes the scheme resilient to denial-of-service attack in both computation and memory, which is caused by either deliberate invading behaviors or jammed traffic scenes. The proposed scheme provides strong privacy preservation that the adversaries can never succeed in tracing any vehicles, even with all RSUs compromised. Moreover, it achieves strong nonrepudiation that any biological anonym driver could be conditionally traced, even if he is not the only driver of the vehicle. Extensive simulations reveal that 2FLIP is feasible and has an outstanding performance of nearly 0-ms network delay and 0% packet-loss ratio, which are particularly appropriate for real-time emergency reporting applications.

Proceedings ArticleDOI
22 May 2016
TL;DR: Binary-level analysis techniques are proposed to significantly reduce the number of possible targets for indirect branches and reconstructed a conservative approximation of target function prototypes by means of use-def analysis at possible callees, providing evidence that strict binary-level CFI can still mitigate advanced attacks, despite the absence of source information or C++ semantics.
Abstract: Current binary-level Control-Flow Integrity (CFI) techniques are weak in determining the set of valid targets for indirect control flow transfers on the forward edge. In particular, the lack of source code forces existing techniques to resort to a conservative address-taken policy that overapproximates this set. In contrast, source-level solutions can accurately infer the targets of indirect calls and thus detect malicious control-flow transfers more precisely. Given that source code is not always available, however, offering similar quality of protection at the binary level is important, but, unquestionably, more challenging than ever: recent work demonstrates powerful attacks such as Counterfeit Object-oriented Programming (COOP), which made the community believe that protecting software against control-flow diversion attacks at the binary level is rather impossible. In this paper, we propose binary-level analysis techniques to significantly reduce the number of possible targets for indirect branches. More specifically, we reconstruct a conservative approximation of target function prototypes by means of use-def analysis at possible callees. We then couple this with liveness analysis at each indirect callsite to derive a many-to-many relationship between callsites and target callees with a much higher precision compared to prior binary-level solutions. Experimental results on popular server programs and on SPEC CPU2006 show that TypeArmor, a prototype implementation of our approach, is efficient - with a runtime overhead of less than 3%. Furthermore, we evaluate to what extent TypeArmor can mitigate COOP and other advanced attacks and show that our approach can significantly reduce the number of targets on the forward edge. Moreover, we show that TypeArmor breaks published COOP exploits, providing concrete evidence that strict binary-level CFI can still mitigate advanced attacks, despite the absence of source information or C++ semantics.

Journal ArticleDOI
TL;DR: A structured iterative support detection algorithm is proposed by exploiting the inherent structured sparsity of user activity naturally existing in NOMA systems to jointly detect user activity and transmitted data in several continuous time slots and can achieve better performance than conventional solutions.
Abstract: Non-orthogonal multiple access (NOMA) has been regarded as one of the promising key technologies for future 5G systems. In the uplink grant-free NOMA schemes, dynamic scheduling is not required, which can significantly reduce the signaling overhead and transmission latency. However, user activity has to be detected in grant-free NOMA systems, which is challenging in practice. In this letter, by exploiting the inherent structured sparsity of user activity naturally existing in NOMA systems, we propose a low-complexity multi-user detector based on structured compressive sensing to realize joint user activity and data detection. In particular, we propose a structured iterative support detection algorithm by exploiting such structured sparsity, which is able to jointly detect user activity and transmitted data in several continuous time slots. Simulation results show that the proposed scheme can achieve better performance than conventional solutions.

Journal ArticleDOI
01 Sep 2016
TL;DR: This work builds two new layers over Spark, namely a query scheduler and a query executor, and embeds an efficient spatial Bloom filter into LocationSpark's indexes to avoid unnecessary network communication overhead when processing overlapped spatial data.
Abstract: We present LocationSpark, a spatial data processing system built on top of Apache Spark, a widely used distributed data processing system. LocationSpark offers a rich set of spatial query operators, e.g., range search, kNN, spatio-textual operation, spatial-join, and kNN-join. To achieve high performance, LocationSpark employs various spatial indexes for in-memory data, and guarantees that immutable spatial indexes have low overhead with fault tolerance. In addition, we build two new layers over Spark, namely a query scheduler and a query executor. The query scheduler is responsible for mitigating skew in spatial queries, while the query executor selects the best plan based on the indexes and the nature of the spatial queries. Furthermore, to avoid unnecessary network communication overhead when processing overlapped spatial data, We embed an efficient spatial Bloom filter into LocationSpark's indexes. Finally, LocationSpark tracks frequently accessed spatial data, and dynamically flushes less frequently accessed data into disk. We evaluate our system on real workloads and demonstrate that it achieves an order of magnitude performance gain over a baseline framework.

Proceedings Article
14 Mar 2016
TL;DR: This paper addresses the question of controlling the in-memory computation, by proposing a lightweight unit managing the operations performed on a memristive array, and presents a standardized symmetric-key cipher for lightweight security applications.
Abstract: Realization of logic and storage operations in memristive circuits have opened up a promising research direction of in-memory computing. Elementary digital circuits, e.g., Boolean arithmetic circuits, can be economically realized within memristive circuits with a limited performance overhead as compared to the standard computation paradigms. This paper takes a major step along this direction by proposing a fully-programmable in-memory computing system. In particular, we address, for the first time, the question of controlling the in-memory computation, by proposing a lightweight unit managing the operations performed on a memristive array. Assembly-level programming abstraction is achieved by a natively-implemented majority and complement operator. This platform enables diverse sets of applications to be ported with little effort. As a case study, we present a standardized symmetric-key cipher for lightweight security applications. The detailed system design flow and simulation results with accurate device models are reported validating the approach.

Journal ArticleDOI
TL;DR: In this paper, a wide range of real-time line monitoring devices can be used to determine the dynamic thermal rating of an overhead transmission line with the power system operating normally or during a system contingency.
Abstract: This paper discusses the wide range of real-time line monitoring devices which can be used to determine the dynamic thermal rating of an overhead transmission line with the power system operating normally or during a system contingency. The most common types of real-time monitors are described including those that measure the line clearance, conductor temperature, and weather data in the line right of way. The strengths and weaknesses of the various monitoring methods are evaluated, concluding that some are more effective during system normal and others during system contingency conditions.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a received signal strength indication-based distributed Bayesian localization algorithm based on message passing to solve the approximate inference problem for precision agriculture applications, such as pest management and pH sensing in large farms.
Abstract: In this paper, we propose a received signal strength indication-based distributed Bayesian localization algorithm based on message passing to solve the approximate inference problem. The algorithm is designed for precision agriculture applications, such as pest management and pH sensing in large farms, where greater power efficiency besides communication and computational scalability is needed but location accuracy requirements are less demanding. Communication overhead, which is a key limitation of popular non-Bayesian and Bayesian distributed techniques, is avoided by a message passing schedule, in which outgoing message by each node does not depend on the destination node, and therefore is a fixed size. Fast convergence is achieved by: 1) eliminating the setup phase linked with spanning tree construction, which is frequent in belief propagation schemes and 2) the parallel nature of the updates, since no message needs to be exchanged among nodes during each update, which is called the coupled variables phenomenon in non-Bayesian techniques and accounts for a significant amount of communication overhead. These features make the proposed algorithm highly compatible with realistic wireless sensor network (WSN) deployments, e.g., ZigBee, that are based upon the ad hoc on-demand distance vector, where route request and route reply packets are flooded in the network during route discovery phase.

Journal ArticleDOI
TL;DR: In this article, a SIRMs-based fuzzy controller for transport control of double-pendulum-type systems is presented, where genetic algorithm (GA) is adopted to tune some parameters of the controller.

Proceedings ArticleDOI
07 Nov 2016
TL;DR: A quantitative security criterion is proposed for de-camouflaging complexity measurements and formally analyzed through the demonstration of the equivalence between the existing de-Camouflaging strategy and the active learning scheme and a provably secure camouflaging framework is developed by combining these two techniques.
Abstract: The advancing of reverse engineering techniques has complicated the efforts in intellectual property protection. Proactive methods have been developed recently, among which layout-level IC camouflaging is the leading example. However, existing camouflaging methods are rarely supported by provably secure criteria, which further leads to over-estimation of the security level when countering the latest de-camouflaging attacks, e.g., the SAT-based attack. In this paper, a quantitative security criterion is proposed for de-camouflaging complexity measurements and formally analyzed through the demonstration of the equivalence between the existing de-camouflaging strategy and the active learning scheme. Supported by the new security criterion, two novel camouflaging techniques are proposed, the low-overhead camouflaging cell library and the AND-tree structure, to help achieve exponentially increasing security levels at the cost of linearly increasing performance overhead on the circuit under protection. A provably secure camouflaging framework is then developed by combining these two techniques. Experimental results using the security criterion show that the camouflaged circuits with the proposed framework are of high resilience against the SAT-based attack with negligible performance overhead.

Journal ArticleDOI
TL;DR: A novel sensory data processing framework is proposed, which aims at transmitting desirable sensory data to the mobile users in a fast, reliable, and secure manner and further decreases the storage and processing overhead of the cloud, while enabling mobile users to securely obtain their desired sensory data faster.
Abstract: Taking advantage of the data gathering capability of wireless sensor networks (WSNs) as well as the data storage and processing ability of mobile cloud computing (MCC), WSN–MCC integration is attracting significant attention from both academia and industry. This paper focuses on processing of the sensory data in WSN–MCC integration, by identifying the critical issues concerning WSN–MCC integration and proposing a novel sensory data processing framework, which aims at transmitting desirable sensory data to the mobile users in a fast, reliable, and secure manner. The proposed framework could prolong the WSN lifetime, decrease the storage requirements of the sensors and the WSN gateway, and reduce the traffic load and bandwidth requirement of sensory data transmissions. In addition, the framework is capable of monitoring and predicting the future trend of the sensory data traffic, as well as improving its security. The framework further decreases the storage and processing overhead of the cloud, while enabling mobile users to securely obtain their desired sensory data faster. Analytical and experimental results are presented to demonstrate the effectiveness of the proposed framework.