Showing papers on "Overhead (computing) published in 2016"

PDF

Open Access

Proceedings Article•DOI•

Quantized Convolutional Neural Networks for Mobile Devices

[...]

Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng - Show less +1 more

27 Jun 2016

TL;DR: In this paper, both filter kernels in convolutional layers and weight matrices in fully-connected layers are quantized, aiming at minimizing the estimation error of each layer's response.

...read moreread less

Abstract: Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computer vision tasks. However, high performance hardware is typically indispensable for the application of CNN models due to the high computation complexity, which prohibits their further extensions. In this paper, we propose an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models. Both filter kernels in convolutional layers and weighting matrices in fully-connected layers are quantized, aiming at minimizing the estimation error of each layer's response. Extensive experiments on the ILSVRC-12 benchmark demonstrate 4 ~ 6× speed-up and 15 ~ 20× compression with merely one percentage loss of classification accuracy. With our quantized CNN model, even mobile devices can accurately classify images within one second.

...read moreread less

902 citations

Journal Article•DOI•

Cnvlutin: ineffectual-neuron-free deep neural network computing

[...]

Jorge Albericio¹, Patrick Judd¹, Tayler Hetherington², Tor M. Aamodt², Natalie Enright Jerger¹, Andreas Moshovos¹ - Show less +2 more•Institutions (2)

University of Toronto¹, University of British Columbia²

18 Jun 2016

TL;DR: Cnvolutin (CNV), a value-based approach to hardware acceleration that eliminates most of these ineffectual operations, improving performance and energy over a state-of-the-art accelerator with no accuracy loss.

...read moreread less

Abstract: This work observes that a large fraction of the computations performed by Deep Neural Networks (DNNs) are intrinsically ineffectual as they involve a multiplication where one of the inputs is zero. This observation motivates Cnvlutin (CNV), a value-based approach to hardware acceleration that eliminates most of these ineffectual operations, improving performance and energy over a state-of-the-art accelerator with no accuracy loss. CNV uses hierarchical data-parallel units, allowing groups of lanes to proceed mostly independently enabling them to skip over the ineffectual computations. A co-designed data storage format encodes the computation elimination decisions taking them off the critical path while avoiding control divergence in the data parallel units. Combined, the units and the data storage format result in a data-parallel architecture that maintains wide, aligned accesses to its memory hierarchy and that keeps its data lanes busy. By loosening the ineffectual computation identification criterion, CNV enables further performance and energy efficiency improvements, and more so if a loss in accuracy is acceptable. Experimental measurements over a set of state-of-the-art DNNs for image classification show that CNV improves performance over a state-of-the-art accelerator from 1.24× to 1.55× and by 1.37× on average without any loss in accuracy by removing zero-valued operand multiplications alone. While CNV incurs an area overhead of 4.49%, it improves overall EDP (Energy Delay Product) and ED2P (Energy Delay Squared Product) on average by 1.47× and 2.01×, respectively. The average performance improvements increase to 1.52× without any loss in accuracy with a broader ineffectual identification policy. Further improvements are demonstrated with a loss in accuracy.

...read moreread less

687 citations

Journal Article•DOI•

Millimeter-Wave Vehicular Communication to Support Massive Automotive Sensing

[...]

Junil Choi¹, Vutha Va², Nuria Gonzalez-Prelcic³, Robert C. Daniels, Chandra R. Bhat², Robert W. Heath² - Show less +2 more•Institutions (3)

Pohang University of Science and Technology¹, University of Texas at Austin², University of Vigo³

01 Dec 2016-IEEE Communications Magazine

TL;DR: In this paper, the authors make the case that mmWave communication is the only viable approach for high bandwidth connected vehicles and highlight the motivations and challenges associated with using mmWave for vehicle-to-vehicle and V2V applications.

...read moreread less

Abstract: As driving becomes more automated, vehicles are being equipped with more sensors generating even higher data rates. Radars are used for object detection, visual cameras as virtual mirrors, and LIDARs for generating high resolution depth associated range maps, all to enhance the safety and efficiency of driving. Connected vehicles can use wireless communication to exchange sensor data, allowing them to enlarge their sensing range and improve automated driving functions. Unfortunately, conventional technologies, such as DSRC and 4G cellular communication, do not support the gigabit-per-second data rates that would be required for raw sensor data exchange between vehicles. This article makes the case that mmWave communication is the only viable approach for high bandwidth connected vehicles. The motivations and challenges associated with using mmWave for vehicle-to-vehicle and vehicle-to-infrastructure applications are highlighted. A high-level solution to one key challenge - the overhead of mmWave beam training - is proposed. The critical feature of this solution is to leverage information derived from the sensors or DSRC as side information for the mmWave communication link configuration. Examples and simulation results show that the beam alignment overhead can be reduced by using position information obtained from DSRC.

...read moreread less

638 citations

Proceedings Article•DOI•

DeepX: a software accelerator for low-power deep learning inference on mobile devices

[...]

Nicholas D. Lane¹, Sourav Bhattacharya¹, Petko Georgiev², Claudio Forlivesi¹, Lei Jiao¹, Lorena Qendro³, Fahim Kawsar¹ - Show less +3 more•Institutions (3)

Bell Labs¹, University of Cambridge², University of Bologna³

11 Apr 2016

TL;DR: Experiments show, DeepX can allow even large-scale deep learning models to execute efficently on modern mobile processors and significantly outperform existing solutions, such as cloud-based offloading.

...read moreread less

Abstract: Breakthroughs from the field of deep learning are radically changing how sensor data are interpreted to extract the high-level information needed by mobile apps. It is critical that the gains in inference accuracy that deep models afford become embedded in future generations of mobile apps. In this work, we present the design and implementation of DeepX, a software accelerator for deep learning execution. DeepX signif- icantly lowers the device resources (viz. memory, computation, energy) required by deep learning that currently act as a severe bottleneck to mobile adoption. The foundation of DeepX is a pair of resource control algorithms, designed for the inference stage of deep learning, that: (1) decompose monolithic deep model network architectures into unit- blocks of various types, that are then more efficiently executed by heterogeneous local device processors (e.g., GPUs, CPUs); and (2), perform principled resource scaling that adjusts the architecture of deep models to shape the overhead each unit-blocks introduces. Experiments show, DeepX can allow even large-scale deep learning models to execute efficently on modern mobile processors and significantly outperform existing solutions, such as cloud-based offloading.

...read moreread less

442 citations

Proceedings Article•

Ernest: efficient performance prediction for large-scale advanced analytics

[...]

Shivaram Venkataraman¹, Zongheng Yang¹, Michael J. Franklin¹, Benjamin Recht¹, Ion Stoica¹ - Show less +1 more•Institutions (1)

University of California, Berkeley¹

16 Mar 2016

TL;DR: Ernest, a performance prediction framework for large scale analytics, and evaluation on Amazon EC2 using several workloads shows that the prediction error is low while having a training overhead of less than 5% for long-running jobs.

...read moreread less

Abstract: Recent workload trends indicate rapid growth in the deployment of machine learning, genomics and scientific workloads on cloud computing infrastructure. However, efficiently running these applications on shared infrastructure is challenging and we find that choosing the right hardware configuration can significantly improve performance and cost. The key to address the above challenge is having the ability to predict performance of applications under various resource configurations so that we can automatically choose the optimal configuration. Our insight is that a number of jobs have predictable structure in terms of computation and communication. Thus we can build performance models based on the behavior of the job on small samples of data and then predict its performance on larger datasets and cluster sizes. To minimize the time and resources spent in building a model, we use optimal experiment design, a statistical technique that allows us to collect as few training points as required. We have built Ernest, a performance prediction framework for large scale analytics and our evaluation on Amazon EC2 using several workloads shows that our prediction error is low while having a training overhead of less than 5% for long-running jobs.

...read moreread less

401 citations

Journal Article•DOI•

A Unified Algorithmic Framework for Block-Structured Optimization Involving Big Data: With applications in machine learning and signal processing

[...]

Mingyi Hong¹, Meisam Razaviyayn¹, Zhi-Quan Luo¹, Jong-Shi Pang²•Institutions (2)

University of Minnesota¹, University of Illinois at Urbana–Champaign²

01 Jan 2016-IEEE Signal Processing Magazine

TL;DR: In this article, various features and properties of the BSUM are discussed from the viewpoint of design flexibility, computational efficiency, parallel/distributed implementation, and the required communication overhead.

...read moreread less

Abstract: This article presents a powerful algorithmic framework for big data optimization, called the block successive upper-bound minimization (BSUM). The BSUM includes as special cases many well-known methods for analyzing massive data sets, such as the block coordinate descent (BCD) method, the convex-concave procedure (CCCP) method, the block coordinate proximal gradient (BCPG) method, the nonnegative matrix factorization (NMF) method, the expectation maximization (EM) method, etc. In this article, various features and properties of the BSUM are discussed from the viewpoint of design flexibility, computational efficiency, parallel/distributed implementation, and the required communication overhead. Illustrative examples from networking, signal processing, and machine learning are presented to demonstrate the practical performance of the BSUM framework.

...read moreread less

383 citations

Journal Article•DOI•

Noise tailoring for scalable quantum computation via randomized compiling

[...]

Joel J. Wallman¹, Joseph Emerson²•Institutions (2)

University of Waterloo¹, Canadian Institute for Advanced Research²

18 Nov 2016-Physical Review A

TL;DR: This work proposes a method for introducing independent random single-qubit gates into the logical circuit in such a way that the effective logical circuit remains unchanged and proves that this randomization tailors the noise into stochastic Pauli errors, which can dramatically reduce error rates while introducing little or no experimental overhead.

...read moreread less

Abstract: Quantum computers are poised to radically outperform their classical counterparts by manipulating coherent quantum systems. A realistic quantum computer will experience errors due to the environment and imperfect control. When these errors are even partially coherent, they present a major obstacle to performing robust computations. Here, we propose a method for introducing independent random single-qubit gates into the logical circuit in such a way that the effective logical circuit remains unchanged. We prove that this randomization tailors the noise into stochastic Pauli errors, which can dramatically reduce error rates while introducing little or no experimental overhead. Moreover, we prove that our technique is robust to the inevitable variation in errors over the randomizing gates and numerically illustrate the dramatic reductions in worst-case error that are achievable. Given such tailored noise, gates with significantly lower fidelity---comparable to fidelities realized in current experiments---are sufficient to achieve fault-tolerant quantum computation. Furthermore, the worst-case error rate of the tailored noise can be directly and efficiently measured through randomized benchmarking protocols, enabling a rigorous certification of the performance of a quantum computer.

...read moreread less

331 citations

Proceedings Article•

FlowRadar: a better NetFlow for data centers

[...]

Yuliang Li¹, Rui Miao¹, Changhoon Kim, Minlan Yu¹•Institutions (1)

University of Southern California¹

16 Mar 2016

TL;DR: The key idea of FlowRadar is to encode perflow counters with a small memory and constant insertion time at switches, and then to leverage the computing power at the remote collector to perform network-wide decoding and analysis of the flow counters.

...read moreread less

Abstract: NetFlow has been a widely used monitoring tool with a variety of applications. NetFlow maintains an active working set of flows in a hash table that supports flow insertion, collision resolution, and flow removing. This is hard to implement in merchant silicon at data center switches, which has limited per-packet processing time. Therefore, many NetFlow implementations and other monitoring solutions have to sample or select a subset of packets to monitor. In this paper, we observe the need to monitor all the flows without sampling in short time scales. Thus, we design FlowRadar, a new way to maintain flows and their counters that scales to a large number of flows with small memory and bandwidth overhead. The key idea of FlowRadar is to encode perflow counters with a small memory and constant insertion time at switches, and then to leverage the computing power at the remote collector to perform network-wide decoding and analysis of the flow counters. Our evaluation shows that the memory usage of FlowRadar is close to traditional NetFlow with perfect hashing. With FlowRadar, operators can get better views into their networks as demonstrated by two new monitoring applications we build on top of FlowRadar.

...read moreread less

285 citations

Journal Article•DOI•

An Overview of Low-Rank Channel Estimation for Massive MIMO Systems

[...]

Hongxiang Xie¹, Feifei Gao¹, Shi Jin²•Institutions (2)

Tsinghua University¹, Southeast University²

01 Nov 2016-IEEE Access

TL;DR: A general overview of the current low-rank channel estimation approaches is provided, including their basic assumptions, key results, as well as pros and cons on addressing the aforementioned tricky challenges.

...read moreread less

Abstract: Massive multiple-input multiple-output is a promising physical layer technology for 5G wireless communications due to its capability of high spectrum and energy efficiency, high spatial resolution, and simple transceiver design. To embrace its potential gains, the acquisition of channel state information is crucial, which unfortunately faces a number of challenges, such as the uplink pilot contamination, the overhead of downlink training and feedback, and the computational complexity. In order to reduce the effective channel dimensions, researchers have been investigating the low-rank (sparse) properties of channel environments from different viewpoints. This paper then provides a general overview of the current low-rank channel estimation approaches, including their basic assumptions, key results, as well as pros and cons on addressing the aforementioned tricky challenges. Comparisons among all these methods are provided for better understanding and some future research prospects for these low-rank approaches are also forecasted.

...read moreread less

265 citations

Proceedings Article•DOI•

FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters

[...]

Forrest Iandola¹, Matthew W. Moskewicz¹, Khalid Ashraf¹, Kurt Keutzer¹•Institutions (1)

University of California, Berkeley¹

01 Jun 2016

TL;DR: FireCaffe is presented, which successfully scales deep neural network training across a cluster of GPUs, and finds that reduction trees are more efficient and scalable than the traditional parameter server approach.

...read moreread less

Abstract: Long training times for high-accuracy deep neural networks (DNNs) impede research into new DNN architectures and slow the development of high-accuracy DNNs. In this paper we present FireCaffe, which successfully scales deep neural network training across a cluster of GPUs. We also present a number of best practices to aid in comparing advancements in methods for scaling and accelerating the training of deep neural networks. The speed and scalability of distributed algorithms is almost always limited by the overhead of communicating between servers, DNN training is not an exception to this rule. Therefore, the key consideration here is to reduce communication overhead wherever possible, while not degrading the accuracy of the DNN models that we train. Our approach has three key pillars. First, we select network hardware that achieves high bandwidth between GPU servers – Infiniband or Cray interconnects are ideal for this. Second, we consider a number of communication algorithms, and we find that reduction trees are more efficient and scalable than the traditional parameter server approach. Third, we optionally increase the batch size to reduce the total quantity of communication during DNN training, and we identify hyperparameters that allow us to reproduce the small-batch accuracy while training with large batch sizes. When training GoogLeNet and Network-in-Network on ImageNet, we achieve a 47x and 39x speedup, respectively, when training on a cluster of 128 GPUs.

...read moreread less

251 citations

Proceedings Article•DOI•

Multi-User Shared Access for Internet of Things

[...]

Yuan Zhifeng¹, Guanghui Yu¹, Weimin Li¹, Yifei Yuan¹, Wang Xinhui¹, Jun Xu¹ - Show less +2 more•Institutions (1)

ZTE¹

15 May 2016

TL;DR: A new type of non-orthogonal multiple access scheme called multi-user shared access (MUSA) is proposed to support IoT and can achieve significant gain in user overloading performance compared to orthogonal systems, while incurring much lower control overhead.

...read moreread less

Abstract: Internet of things (IoT) is widely expected to be an important scenario in the fifth generation (5G) wireless network. Major challenges of IoT include the low cost of devices, low energy consumption, low latency and the ability to support a large number of simultaneous connections. In this article, a new type of non-orthogonal multiple access scheme called multi-user shared access (MUSA) is proposed to support IoT. MUSA adopts a grant-free access strategy to simplify the access procedure significantly and utilizes advanced code domain non-orthogonal complex spreading to accommodate massive number of users in the same radio resources. A family of complex sequences with short length is chosen as spreading sequence for its ability to enable simple and robust successive interference cancellation at the base station side and cope with high user load. Simulation results show that MUSA can achieve significant gain in user overloading performance compared to orthogonal systems, while incurring much lower control overhead.

...read moreread less

Proceedings Article•DOI•

Mobility resilience and overhead constrained adaptation in directional 60 GHz WLANs: protocol design and system implementation

[...]

Muhammad Kumail Haider¹, Edward W. Knightly¹•Institutions (1)

Rice University¹

05 Jul 2016

TL;DR: This paper designs, implements and evaluates MOCA, a protocol for Mobility resilience and Overhead Constrained Adaptation for directional 60 GHz links, and introduces Beam Sounding as a mechanism invoked before each data transmission to estimate the link quality for selected beams, and identify and adapt to link impairments.

...read moreread less

Abstract: High directivity of 60 GHz links introduces new link training and adaptation challenges due to both client and environmental mobility. In this paper, we design, implement and evaluate MOCA, a protocol for Mobility resilience and Overhead Constrained Adaptation for directional 60 GHz links. Since mobility-induced link blockage and misalignment cannot be countered with data rate adaptation alone, we introduce Beam Sounding as a mechanism invoked before each data transmission to estimate the link quality for selected beams, and identify and adapt to link impairments. We devise proactive techniques to restore broken directional links with low overhead and design a mechanism to jointly adapt beamwidth and data rate, targeting throughput maximization that incorporates data rate, overhead for beam alignment, and mobility resilience. We implement a programmable node and testbed using software defined radios with commercial 60 GHz transceivers, and conduct an extensive over-the-air measurement study to collect channel traces for various environments. Based on trace based emulations and the IEEE 802.11ad channel model, we evaluate MOCA under a variety of propagation environments and mobility scenarios. Our experiments show that MOCA achieves up to 2x throughput gains compared to a baseline WLAN scheme in a diverse set of operational conditions.

...read moreread less

Book Chapter•DOI•

Localizing and Orienting Street Views Using Overhead Imagery

[...]

Nam Vo¹, James Hays¹•Institutions (1)

Georgia Institute of Technology¹

08 Oct 2016

TL;DR: In this article, the authors proposed a new dataset with one million pairs of street view and overhead images sampled from eleven U.S. cities and explored several deep CNN architectures for cross-domain matching.

...read moreread less

Abstract: In this paper we aim to determine the location and orientation of a ground-level query image by matching to a reference database of overhead (e.g. satellite) images. For this task we collect a new dataset with one million pairs of street view and overhead images sampled from eleven U.S. cities. We explore several deep CNN architectures for cross-domain matching – Classification, Hybrid, Siamese, and Triplet networks. Classification and Hybrid architectures are accurate but slow since they allow only partial feature precomputation. We propose a new loss function which significantly improves the accuracy of Siamese and Triplet embedding networks while maintaining their applicability to large-scale retrieval tasks like image geolocalization. This image matching task is challenging not just because of the dramatic viewpoint difference between ground-level and overhead imagery but because the orientation (i.e. azimuth) of the street views is unknown making correspondence even more difficult. We examine several mechanisms to match in spite of this – training for rotation invariance, sampling possible rotations at query time, and explicitly predicting relative rotation of ground and overhead images with our deep networks. It turns out that explicit orientation supervision also improves location prediction accuracy. Our best performing architectures are roughly 2.5 times as accurate as the commonly used Siamese network baseline.

...read moreread less

Journal Article•DOI•

Dynamic Compressive Sensing-Based Multi-User Detection for Uplink Grant-Free NOMA

[...]

Bichai Wang¹, Linglong Dai¹, Yuan Zhang¹, Talha Mir¹, Jianjun Li² - Show less +1 more•Institutions (2)

Tsinghua University¹, Zhongyuan University of Technology²

24 Aug 2016-IEEE Communications Letters

TL;DR: By exploiting the temporal correlation of active user sets, a dynamic compressive sensing (DCS)-based multi-user detection (MUD) to realize both user activity and data detection in several continuous time slots is proposed.

...read moreread less

Abstract: Non-orthogonal multiple access (NOMA) can support more users than OMA techniques using the same wireless resources, which is expected to support massive connectivity for Internet of Things in 5G. Furthermore, in order to reduce the transmission latency and signaling overhead, grant-free transmission is highly expected in the uplink NOMA systems, where user activity has to be detected. In this letter, by exploiting the temporal correlation of active user sets, we propose a dynamic compressive sensing (DCS)-based multi-user detection (MUD) to realize both user activity and data detection in several continuous time slots. In particular, as the temporal correlation of the active user sets between adjacent time slots exists, we can use the estimated active user set in the current time slot as the prior information to estimate the active user set in the next time slot. Simulation results show that the proposed DCS-based MUD can achieve much better performance than that of the conventional CS-based MUD in NOMA systems.

...read moreread less

Journal Article•DOI•

Structured Compressive Sensing-Based Spatio-Temporal Joint Channel Estimation for FDD Massive MIMO

[...]

Zhen Gao¹, Linglong Dai¹, Wei Dai², Byonghyo Shim³, Zhaocheng Wang¹ - Show less +1 more•Institutions (3)

Tsinghua University¹, Imperial College London², Seoul National University³

01 Feb 2016-IEEE Transactions on Communications

TL;DR: A structured compressive sensing (SCS)-based spatio-temporal joint channel estimation scheme to reduce the required pilot overhead and is capable of approaching the optimal oracle least squares estimator.

...read moreread less

Abstract: Massive MIMO is a promising technique for future 5G communications due to its high spectrum and energy efficiency. To realize its potential performance gain, accurate channel estimation is essential. However, due to massive number of antennas at the base station (BS), the pilot overhead required by conventional channel estimation schemes will be unaffordable, especially for frequency division duplex (FDD) massive MIMO. To overcome this problem, we propose a structured compressive sensing (SCS)-based spatio-temporal joint channel estimation scheme to reduce the required pilot overhead, whereby the spatio-temporal common sparsity of delay-domain MIMO channels is leveraged. Particularly, we first propose the nonorthogonal pilots at the BS under the framework of CS theory to reduce the required pilot overhead. Then, an adaptive structured subspace pursuit (ASSP) algorithm at the user is proposed to jointly estimate channels associated with multiple OFDM symbols from the limited number of pilots, whereby the spatio-temporal common sparsity of MIMO channels is exploited to improve the channel estimation accuracy. Moreover, by exploiting the temporal channel correlation, we propose a space-time adaptive pilot scheme to further reduce the pilot overhead. Additionally, we discuss the proposed channel estimation scheme in multicell scenario. Simulation results demonstrate that the proposed scheme can accurately estimate channels with the reduced pilot overhead, and it is capable of approaching the optimal oracle least squares estimator.

...read moreread less

Book Chapter•DOI•

CloudRadar: A Real-Time Side-Channel Attack Detection System in Clouds

[...]

Tianwei Zhang¹, Yinqian Zhang², Ruby B. Lee¹•Institutions (2)

Princeton University¹, Ohio State University²

19 Sep 2016

TL;DR: This work presents CloudRadar, a system to detect, and hence mitigate, cache-based side-channel attacks in multi-tenant cloud systems, designed as a lightweight patch to existing cloud systems which does not require new hardware support, or any hypervisor, operating system, application modifications.

...read moreread less

Abstract: We present CloudRadar, a system to detect, and hence mitigate, cache-based side-channel attacks in multi-tenant cloud systems. CloudRadar operates by correlating two events: first, it exploits signature-based detection to identify when the protected virtual machine (VM) executes a cryptographic application; at the same time, it uses anomaly-based detection techniques to monitor the co-located VMs to identify abnormal cache behaviors that are typical during cache-based side-channel attacks. We show that correlation in the occurrence of these two events offer strong evidence of side-channel attacks. Compared to other work on side-channel defenses, CloudRadar has the following advantages: first, CloudRadar focuses on the root causes of cache-based side-channel attacks and hence is hard to evade using metamorphic attack code, while maintaining a low false positive rate. Second, CloudRadar is designed as a lightweight patch to existing cloud systems, which does not require new hardware support, or any hypervisor, operating system, application modifications. Third, CloudRadar provides real-time protection and can detect side-channel attacks within the order of milliseconds. We demonstrate a prototype implementation of CloudRadar in the OpenStack cloud framework. Our evaluation suggests CloudRadar achieves negligible performance overhead with high detection accuracy.

...read moreread less

Journal Article•DOI•

Real time detection of cache-based side-channel attacks using hardware performance counters

[...]

Marco Chiappetta¹, Erkay Savas¹, Cemal Yilmaz¹•Institutions (1)

Sabancı University¹

01 Dec 2016

TL;DR: This paper analyzes three methods to detect cache-based side-channel attacks in real time, preventing or limiting the amount of leaked information, and how the detection systems behave with a modified version of one of the spy processes.

...read moreread less

Abstract: Graphical abstractDisplay Omitted HighlightsThree methods for detecting a class of cache-based side-channel attacks are proposed.A new tool (quickhpc) for probing hardware performance counters at a higher temporal resolution than the existing tools is presented.The first method is based on correlation, the other two use machine learning techniques and reach a minimum F-score of 0.93.A smarter attack is devised that is capable of circumventing the first method. In this paper we analyze three methods to detect cache-based side-channel attacks in real time, preventing or limiting the amount of leaked information. Two of the three methods are based on machine learning techniques and all the three of them can successfully detect an attack in about one fifth of the time required to complete it. We could not experience the presence of false positives in our test environment and the overhead caused by the detection systems is negligible. We also analyze how the detection systems behave with a modified version of one of the spy processes. With some optimization we are confident these systems can be used in real world scenarios.

...read moreread less

Journal Article•DOI•

Joint Spectrum and Power Allocation for D2D Communications Underlaying Cellular Networks

[...]

Rui Yin¹, Caijun Zhong², Guanding Yu², Zhaoyang Zhang², Kai-Kit Wong³, Xiaoming Chen⁴ - Show less +2 more•Institutions (4)

Zhejiang Gongshang University¹, Zhejiang University², University College London³, Nanjing University⁴

01 Apr 2016-IEEE Transactions on Vehicular Technology

TL;DR: It is shown that the distributed scheme is effective for the resource allocation and could protect the CUs with limited signaling overhead and the signaling overhead is compared between the centralized and decentralized schemes.

...read moreread less

Abstract: This paper addresses the joint spectrum sharing and power allocation problem for device-to-device (D2D) communications underlaying a cellular network (CN). In the context of orthogonal frequency-division multiple-access systems, with the uplink resources shared with D2D links, both centralized and decentralized methods are proposed. Assuming global channel state information (CSI), the resource allocation problem is first formulated as a nonconvex optimization problem, which is solved using convex approximation techniques. We prove that the approximation method converges to a suboptimal solution and is often very close to the global optimal solution. On the other hand, by exploiting the decentralized network structure with only local CSI at each node, the Stackelberg game model is then adopted to devise a distributed resource allocation scheme. In this game-theoretic model, the base station (BS), which is modeled as the leader, coordinates the interference from the D2D transmission to the cellular users (CUs) by pricing the interference. Subsequently, the D2D pairs, as followers, compete for the spectrum in a noncooperative fashion. Sufficient conditions for the existence of the Nash equilibrium (NE) and the uniqueness of the solution are presented, and an iterative algorithm is proposed to solve the problem. In addition, the signaling overhead is compared between the centralized and decentralized schemes. Finally, numerical results are presented to verify the proposed schemes. It is shown that the distributed scheme is effective for the resource allocation and could protect the CUs with limited signaling overhead.

...read moreread less

Proceedings Article•DOI•

Mathematical foundations of the GraphBLAS

[...]

Jeremy Kepner¹, Peter Aaltonen², David A. Bader³, Aydin Buluc⁴, Franz Franchetti⁵, John R. Gilbert⁶, Dylan Hutchison⁷, Manoj Kumar⁸, Andrew Lumsdaine², Henning Meyerhenke⁹, Scott McMillan¹⁰, Carl Yang⁸, John D. Owens¹¹, Marcin Zalewski¹¹, Timothy G. Mattson², José E. Moreira¹² - Show less +12 more•Institutions (12)

Massachusetts Institute of Technology¹, Indiana University², Georgia Institute of Technology³, Lawrence Berkeley National Laboratory⁴, Carnegie Mellon University⁵, University of California, Santa Barbara⁶, University of Washington⁷, IBM⁸, Karlsruhe Institute of Technology⁹, Software Engineering Institute¹⁰, University of California, Davis¹¹, Intel¹²

18 Jun 2016

TL;DR: The GraphBLAS standard as discussed by the authors defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments.

...read moreread less

Abstract: The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix-based graph algorithms to the broadest possible audience. Mathematically, the GraphBLAS defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the mathematics of the GraphBLAS. Graphs represent connections between vertices with edges. Matrices can represent a wide range of graphs using adjacency matrices or incidence matrices. Adjacency matrices are often easier to analyze while incidence matrices are often better for representing data. Fortunately, the two are easily connected by matrix multiplication. A key feature of matrix mathematics is that a very small number of matrix operations can be used to manipulate a very wide range of graphs. This composability of a small number of operations is the foundation of the GraphBLAS. A standard such as the GraphBLAS can only be effective if it has low performance overhead. Performance measurements of prototype GraphBLAS implementations indicate that the overhead is low.

...read moreread less

Journal Article•DOI•

Machine Learning-Based Antenna Selection in Wireless Communications

[...]

Jingon Joung¹•Institutions (1)

Chung-Ang University¹

27 Jul 2016-IEEE Communications Letters

TL;DR: This letter is the first attempt to conflate a machine learning technique with wireless communications and provides insight into the potential of fusion of machine learning and wireless communications.

...read moreread less

Abstract: This letter is the first attempt to conflate a machine learning technique with wireless communications. Through interpreting the antenna selection (AS) in wireless communications (i.e., an optimization-driven decision) to multiclass-classification learning (i.e., data-driven prediction), and through comparing the learning-based AS using $k$ -nearest neighbors and support vector machine algorithms with conventional optimization-driven AS methods in terms of communications performance, computational complexity, and feedback overhead, we provide insight into the potential of fusion of machine learning and wireless communications.

...read moreread less

Journal Article•DOI•

2FLIP: A Two-Factor Lightweight Privacy-Preserving Authentication Scheme for VANET

[...]

Fei Wang, Yongjun Xu, Hanwen Zhang, Yujun Zhang, Liehuang Zhu¹ - Show less +1 more•Institutions (1)

Beijing Institute of Technology¹

01 Feb 2016-IEEE Transactions on Vehicular Technology

TL;DR: 2FLIP provides strong privacy preservation that the adversaries can never succeed in tracing any vehicles, even with all RSUs compromised, and achieves strong nonrepudiation that any biological anonym driver could be conditionally traced, even if he is not the only driver of the vehicle.

...read moreread less

Abstract: Authentication in a vehicular ad-hoc network (VANET) requires not only secure and efficient authentication with privacy preservation but applicable flexibility to handle complicated transportation circumstances as well. In this paper, we proposed a Two-Factor LIghtweight Privacy-preserving authentication scheme (2FLIP) to enhance the security of VANET communication. 2FLIP employs the decentralized certificate authority (CA) and the biological-password-based two-factor authentication (2FA) to achieve the goals. Based on the decentralized CA, 2FLIP only requires several extremely lightweight hashing processes and a fast message-authentication-code operation for message signing and verification between vehicles. Compared with previous schemes, 2FLIP significantly reduces computation cost by 100–1000 times and decreases communication overhead by 55.24%–77.52%. Furthermore, any certificate revocation list (CRL)-related overhead on vehicles is avoided. 2FLIP makes the scheme resilient to denial-of-service attack in both computation and memory, which is caused by either deliberate invading behaviors or jammed traffic scenes. The proposed scheme provides strong privacy preservation that the adversaries can never succeed in tracing any vehicles, even with all RSUs compromised. Moreover, it achieves strong nonrepudiation that any biological anonym driver could be conditionally traced, even if he is not the only driver of the vehicle. Extensive simulations reveal that 2FLIP is feasible and has an outstanding performance of nearly 0-ms network delay and 0% packet-loss ratio, which are particularly appropriate for real-time emergency reporting applications.

...read moreread less

Proceedings Article•DOI•

A Tough Call: Mitigating Advanced Code-Reuse Attacks at the Binary Level

[...]

Victor van der Veen¹, Enes Goktas¹, Moritz Contag², Andre Pawoloski², Xi Chen¹, Sanjay Rawat¹, Herbert Bos¹, Thorsten Holz², Elias Athanasopoulos¹, Cristiano Giuffrida¹ - Show less +6 more•Institutions (2)

University of Amsterdam¹, Ruhr University Bochum²

22 May 2016

TL;DR: Binary-level analysis techniques are proposed to significantly reduce the number of possible targets for indirect branches and reconstructed a conservative approximation of target function prototypes by means of use-def analysis at possible callees, providing evidence that strict binary-level CFI can still mitigate advanced attacks, despite the absence of source information or C++ semantics.

...read moreread less

Abstract: Current binary-level Control-Flow Integrity (CFI) techniques are weak in determining the set of valid targets for indirect control flow transfers on the forward edge. In particular, the lack of source code forces existing techniques to resort to a conservative address-taken policy that overapproximates this set. In contrast, source-level solutions can accurately infer the targets of indirect calls and thus detect malicious control-flow transfers more precisely. Given that source code is not always available, however, offering similar quality of protection at the binary level is important, but, unquestionably, more challenging than ever: recent work demonstrates powerful attacks such as Counterfeit Object-oriented Programming (COOP), which made the community believe that protecting software against control-flow diversion attacks at the binary level is rather impossible. In this paper, we propose binary-level analysis techniques to significantly reduce the number of possible targets for indirect branches. More specifically, we reconstruct a conservative approximation of target function prototypes by means of use-def analysis at possible callees. We then couple this with liveness analysis at each indirect callsite to derive a many-to-many relationship between callsites and target callees with a much higher precision compared to prior binary-level solutions. Experimental results on popular server programs and on SPEC CPU2006 show that TypeArmor, a prototype implementation of our approach, is efficient - with a runtime overhead of less than 3%. Furthermore, we evaluate to what extent TypeArmor can mitigate COOP and other advanced attacks and show that our approach can significantly reduce the number of targets on the forward edge. Moreover, we show that TypeArmor breaks published COOP exploits, providing concrete evidence that strict binary-level CFI can still mitigate advanced attacks, despite the absence of source information or C++ semantics.

...read moreread less

Journal Article•DOI•

Joint User Activity and Data Detection Based on Structured Compressive Sensing for NOMA

[...]

Bichai Wang¹, Linglong Dai¹, Talha Mir¹, Zhaocheng Wang¹•Institutions (1)

Tsinghua University¹

28 Apr 2016-IEEE Communications Letters

TL;DR: A structured iterative support detection algorithm is proposed by exploiting the inherent structured sparsity of user activity naturally existing in NOMA systems to jointly detect user activity and transmitted data in several continuous time slots and can achieve better performance than conventional solutions.

...read moreread less

Abstract: Non-orthogonal multiple access (NOMA) has been regarded as one of the promising key technologies for future 5G systems. In the uplink grant-free NOMA schemes, dynamic scheduling is not required, which can significantly reduce the signaling overhead and transmission latency. However, user activity has to be detected in grant-free NOMA systems, which is challenging in practice. In this letter, by exploiting the inherent structured sparsity of user activity naturally existing in NOMA systems, we propose a low-complexity multi-user detector based on structured compressive sensing to realize joint user activity and data detection. In particular, we propose a structured iterative support detection algorithm by exploiting such structured sparsity, which is able to jointly detect user activity and transmitted data in several continuous time slots. Simulation results show that the proposed scheme can achieve better performance than conventional solutions.

...read moreread less

Journal Article•DOI•

LocationSpark: a distributed in-memory data management system for big spatial data

[...]

Mingjie Tang¹, Yongyang Yu¹, Qutaibah M. Malluhi², Mourad Ouzzani³, Walid G. Aref¹ - Show less +1 more•Institutions (3)

Purdue University¹, Qatar University², Qatar Computing Research Institute³

01 Sep 2016

TL;DR: This work builds two new layers over Spark, namely a query scheduler and a query executor, and embeds an efficient spatial Bloom filter into LocationSpark's indexes to avoid unnecessary network communication overhead when processing overlapped spatial data.

...read moreread less

Abstract: We present LocationSpark, a spatial data processing system built on top of Apache Spark, a widely used distributed data processing system. LocationSpark offers a rich set of spatial query operators, e.g., range search, kNN, spatio-textual operation, spatial-join, and kNN-join. To achieve high performance, LocationSpark employs various spatial indexes for in-memory data, and guarantees that immutable spatial indexes have low overhead with fault tolerance. In addition, we build two new layers over Spark, namely a query scheduler and a query executor. The query scheduler is responsible for mitigating skew in spatial queries, while the query executor selects the best plan based on the indexes and the nature of the spatial queries. Furthermore, to avoid unnecessary network communication overhead when processing overlapped spatial data, We embed an efficient spatial Bloom filter into LocationSpark's indexes. Finally, LocationSpark tracks frequently accessed spatial data, and dynamically flushes less frequently accessed data into disk. We evaluate our system on real workloads and demonstrate that it achieves an order of magnitude performance gain over a baseline framework.

...read moreread less

Proceedings Article•

The Programmable Logic-in-Memory (PLiM) computer

[...]

Pierre-Emmanuel Gaillardon¹, Luca Amaru², Anne Siemon³, Eike Linn³, Rainer Waser³, Anupam Chattopadhyay⁴, Giovanni De Micheli² - Show less +3 more•Institutions (4)

University of Utah¹, École Polytechnique Fédérale de Lausanne², RWTH Aachen University³, Nanyang Technological University⁴

14 Mar 2016

TL;DR: This paper addresses the question of controlling the in-memory computation, by proposing a lightweight unit managing the operations performed on a memristive array, and presents a standardized symmetric-key cipher for lightweight security applications.

...read moreread less

Abstract: Realization of logic and storage operations in memristive circuits have opened up a promising research direction of in-memory computing. Elementary digital circuits, e.g., Boolean arithmetic circuits, can be economically realized within memristive circuits with a limited performance overhead as compared to the standard computation paradigms. This paper takes a major step along this direction by proposing a fully-programmable in-memory computing system. In particular, we address, for the first time, the question of controlling the in-memory computation, by proposing a lightweight unit managing the operations performed on a memristive array. Assembly-level programming abstraction is achieved by a natively-implemented majority and complement operator. This platform enables diverse sets of applications to be ported with little effort. As a case study, we present a standardized symmetric-key cipher for lightweight security applications. The detailed system design flow and simulation results with accurate device models are reported validating the approach.

...read moreread less

Journal Article•DOI•

Real-Time Overhead Transmission-Line Monitoring for Dynamic Rating

[...]

Dale Douglass, W. A. Chisholm, Glenn Davidson, Ian Grant, Keith Lindsey, Mark Lancaster, Dan Lawry, Tom McCarthy, Carlos Alexandre Meireles Do Nascimento, Mohammad Pasha, Jerry Reding, Tapani Seppa, Janos Toth, Peter Waltz - Show less +10 more

01 Jun 2016-IEEE Transactions on Power Delivery

TL;DR: In this paper, a wide range of real-time line monitoring devices can be used to determine the dynamic thermal rating of an overhead transmission line with the power system operating normally or during a system contingency.

...read moreread less

Abstract: This paper discusses the wide range of real-time line monitoring devices which can be used to determine the dynamic thermal rating of an overhead transmission line with the power system operating normally or during a system contingency. The most common types of real-time monitors are described including those that measure the line clearance, conductor temperature, and weather data in the line right of way. The strengths and weaknesses of the various monitoring methods are evaluated, concluding that some are more effective during system normal and others during system contingency conditions.

...read moreread less

Journal Article•DOI•

RSSI-Based Distributed Self-Localization for Wireless Sensor Networks Used in Precision Agriculture

[...]

Pooyan Abouzar¹, David G. Michelson¹, Maziyar Hamdi¹•Institutions (1)

University of British Columbia¹

01 Oct 2016-IEEE Transactions on Wireless Communications

TL;DR: In this article, the authors proposed a received signal strength indication-based distributed Bayesian localization algorithm based on message passing to solve the approximate inference problem for precision agriculture applications, such as pest management and pH sensing in large farms.

...read moreread less

Abstract: In this paper, we propose a received signal strength indication-based distributed Bayesian localization algorithm based on message passing to solve the approximate inference problem. The algorithm is designed for precision agriculture applications, such as pest management and pH sensing in large farms, where greater power efficiency besides communication and computational scalability is needed but location accuracy requirements are less demanding. Communication overhead, which is a key limitation of popular non-Bayesian and Bayesian distributed techniques, is avoided by a message passing schedule, in which outgoing message by each node does not depend on the destination node, and therefore is a fixed size. Fast convergence is achieved by: 1) eliminating the setup phase linked with spanning tree construction, which is frequent in belief propagation schemes and 2) the parallel nature of the updates, since no message needs to be exchanged among nodes during each update, which is called the coupled variables phenomenon in non-Bayesian techniques and accounts for a significant amount of communication overhead. These features make the proposed algorithm highly compatible with realistic wireless sensor network (WSN) deployments, e.g., ZigBee, that are based upon the ad hoc on-demand distance vector, where route request and route reply packets are flooded in the network during route discovery phase.

...read moreread less

Journal Article•DOI•

Fuzzy-Logic-based control of payloads subjected to double-pendulum motion in overhead cranes

[...]

Dianwei Qian¹, Shiwen Tong², Suk-Gyu Lee³•Institutions (3)

North China Electric Power University¹, Beijing Union University², Yeungnam University³

01 May 2016-Automation in Construction

TL;DR: In this article, a SIRMs-based fuzzy controller for transport control of double-pendulum-type systems is presented, where genetic algorithm (GA) is adopted to tune some parameters of the controller.

...read moreread less

Proceedings Article•DOI•

Provably secure camouflaging strategy for IC protection

[...]

Meng Li¹, Kaveh Shamsi², Travis Meade², Zheng Zhao¹, Bei Yu³, Yier Jin², David Z. Pan¹ - Show less +3 more•Institutions (3)

University of Texas at Austin¹, University of Central Florida², The Chinese University of Hong Kong³

07 Nov 2016

TL;DR: A quantitative security criterion is proposed for de-camouflaging complexity measurements and formally analyzed through the demonstration of the equivalence between the existing de-Camouflaging strategy and the active learning scheme and a provably secure camouflaging framework is developed by combining these two techniques.

...read moreread less

Abstract: The advancing of reverse engineering techniques has complicated the efforts in intellectual property protection. Proactive methods have been developed recently, among which layout-level IC camouflaging is the leading example. However, existing camouflaging methods are rarely supported by provably secure criteria, which further leads to over-estimation of the security level when countering the latest de-camouflaging attacks, e.g., the SAT-based attack. In this paper, a quantitative security criterion is proposed for de-camouflaging complexity measurements and formally analyzed through the demonstration of the equivalence between the existing de-camouflaging strategy and the active learning scheme. Supported by the new security criterion, two novel camouflaging techniques are proposed, the low-overhead camouflaging cell library and the AND-tree structure, to help achieve exponentially increasing security levels at the cost of linearly increasing performance overhead on the circuit under protection. A provably secure camouflaging framework is then developed by combining these two techniques. Experimental results using the security criterion show that the camouflaged circuits with the proposed framework are of high resilience against the SAT-based attack with negligible performance overhead.

...read moreread less

Journal Article•DOI•

A Novel Sensory Data Processing Framework to Integrate Sensor Networks With Mobile Cloud

[...]

Chunsheng Zhu¹, Hai Wang², Xiulong Liu³, Lei Shu, Laurence T. Yang⁴, Victor C. M. Leung¹ - Show less +2 more•Institutions (4)

University of British Columbia¹, Pohang University of Science and Technology², Dalian University of Technology³, St. Francis Xavier University⁴

01 Sep 2016-IEEE Systems Journal

TL;DR: A novel sensory data processing framework is proposed, which aims at transmitting desirable sensory data to the mobile users in a fast, reliable, and secure manner and further decreases the storage and processing overhead of the cloud, while enabling mobile users to securely obtain their desired sensory data faster.

...read moreread less

Abstract: Taking advantage of the data gathering capability of wireless sensor networks (WSNs) as well as the data storage and processing ability of mobile cloud computing (MCC), WSN–MCC integration is attracting significant attention from both academia and industry. This paper focuses on processing of the sensory data in WSN–MCC integration, by identifying the critical issues concerning WSN–MCC integration and proposing a novel sensory data processing framework, which aims at transmitting desirable sensory data to the mobile users in a fast, reliable, and secure manner. The proposed framework could prolong the WSN lifetime, decrease the storage requirements of the sensors and the WSN gateway, and reduce the traffic load and bandwidth requirement of sensory data transmissions. In addition, the framework is capable of monitoring and predicting the future trend of the sensory data traffic, as well as improving its security. The framework further decreases the storage and processing overhead of the cloud, while enabling mobile users to securely obtain their desired sensory data faster. Analytical and experimental results are presented to demonstrate the effectiveness of the proposed framework.

...read moreread less

Collapse