scispace - formally typeset
Search or ask a question

Showing papers on "Overhead (computing) published in 2015"


Posted Content
TL;DR: This work presents a novel network architecture, HashedNets, that exploits inherent redundancy in neural networks to achieve drastic reductions in model sizes, and demonstrates on several benchmark data sets that HashingNets shrink the storage requirements of neural networks substantially while mostly preserving generalization performance.
Abstract: As deep nets are increasingly used in applications suited for mobile devices, a fundamental dilemma becomes apparent: the trend in deep learning is to grow models to absorb ever-increasing data set sizes; however mobile devices are designed with very little memory and cannot store such large models. We present a novel network architecture, HashedNets, that exploits inherent redundancy in neural networks to achieve drastic reductions in model sizes. HashedNets uses a low-cost hash function to randomly group connection weights into hash buckets, and all connections within the same hash bucket share a single parameter value. These parameters are tuned to adjust to the HashedNets weight sharing architecture with standard backprop during training. Our hashing procedure introduces no additional memory overhead, and we demonstrate on several benchmark data sets that HashedNets shrink the storage requirements of neural networks substantially while mostly preserving generalization performance.

1,039 citations


Posted Content
TL;DR: This paper proposes an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models.
Abstract: Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computer vision tasks. However, high performance hardware is typically indispensable for the application of CNN models due to the high computation complexity, which prohibits their further extensions. In this paper, we propose an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models. Both filter kernels in convolutional layers and weighting matrices in fully-connected layers are quantized, aiming at minimizing the estimation error of each layer's response. Extensive experiments on the ILSVRC-12 benchmark demonstrate 4~6x speed-up and 15~20x compression with merely one percentage loss of classification accuracy. With our quantized CNN model, even mobile devices can accurately classify images within one second.

753 citations


Journal ArticleDOI
TL;DR: A spatially common sparsity based adaptive channel estimation and feedback scheme for frequency division duplex based massive multi-input multi-output (MIMO) systems, which adapts training overhead and pilot design to reliably estimate and feed back the downlink channel state information (CSI) with significantly reduced overhead.
Abstract: This paper proposes a spatially common sparsity based adaptive channel estimation and feedback scheme for frequency division duplex based massive multi-input multi-output (MIMO) systems, which adapts training overhead and pilot design to reliably estimate and feed back the downlink channel state information (CSI) with significantly reduced overhead. Specifically, a nonorthogonal downlink pilot design is first proposed, which is very different from standard orthogonal pilots. By exploiting the spatially common sparsity of massive MIMO channels, a compressive sensing (CS) based adaptive CSI acquisition scheme is proposed, where the consumed time slot overhead only adaptively depends on the sparsity level of the channels. In addition, a distributed sparsity adaptive matching pursuit algorithm is proposed to jointly estimate the channels of multiple subcarriers. Furthermore, by exploiting the temporal channel correlation, a closed-loop channel tracking scheme is provided, which adaptively designs the nonorthogonal pilot according to the previous channel estimation to achieve an enhanced CSI acquisition. Finally, we generalize the results of the multiple-measurement-vectors case in CS and derive the Cramer–Rao lower bound of the proposed scheme, which enlightens us to design the nonorthogonal pilot signals for the improved performance. Simulation results demonstrate that the proposed scheme outperforms its counterparts, and it is capable of approaching the performance bound.

423 citations


Journal ArticleDOI
01 Feb 2015
TL;DR: This paper proposes write atomic B+-Trees (wB+- Trees), a new type of main-memory B-Tree that aim to reduce such overhead as much as possible and replace Memcached's internal hash index with tree indices.
Abstract: Computer systems in the near future are expected to have Non-Volatile Main Memory (NVMM), enabled by a new generation of Non-Volatile Memory (NVM) technologies, such as Phase Change Memory (PCM), STT-MRAM, and Memristor The non-volatility property has the promise to persist in-memory data structures for instantaneous failure recovery However, realizing such promise requires a careful design to ensure that in-memory data structures are in known consistent states after failuresThis paper studies persistent in-memory B+-Trees as B+-Trees are widely used in database and data-intensive systems While traditional techniques, such as undo-redo logging and shadowing, support persistent B+-Trees, we find that they incur drastic performance overhead because of extensive NVM writes and CPU cache flush operations PCM-friendly B+-Trees with unsorted leaf nodes help mediate this issue, but the remaining overhead is still large In this paper, we propose write atomic B+-Trees (wB+-Trees), a new type of main-memory B+-Trees, that aim to reduce such overhead as much as possible wB+-Tree nodes employ a small indirect slot array and/or a bitmap so that most insertions and deletions do not require the movement of index entries In this way, wB+-Trees can achieve node consistency either through atomic writes in the nodes or by redo-only logging We model fast NVM using DRAM on a real machine and model PCM using a cycle-accurate simulator Experimental results show that compared with previous persistent B+-Tree solutions, wB+-Trees achieve up to 88x speedups on DRAM-like fast NVM and up to 271x speedups on PCM for insertions and deletions while maintaining good search performance Moreover, we replaced Memcached's internal hash index with tree indices Our real machine Memcached experiments show that wB+-Trees achieve up to 38X improvements over previous persistent tree structures with undo-redo logging or shadowing

311 citations


Proceedings ArticleDOI
19 Apr 2015
TL;DR: This paper proposes and evaluates a downlink system operation for multi-user mmWave systems based on compressed sensing channel estimation and conjugate analog beamforming, and shows how many compressed sensing measurements are needed to approach the perfect channel knowledge performance.
Abstract: Millimeter wave (mmWave) systems will likely employ directional beamforming with large antenna arrays at both the transmitters and receivers. Acquiring channel knowledge to design these beamformers, however, is challenging due to the large antenna arrays and small signal-to-noise ratio before beamforming. In this paper, we propose and evaluate a downlink system operation for multi-user mmWave systems based on compressed sensing channel estimation and conjugate analog beamforming. Adopting the achievable sum-rate as a performance metric, we show how many compressed sensing measurements are needed to approach the perfect channel knowledge performance. The results illustrate that the proposed algorithm requires an order of magnitude less training overhead compared with traditional lower-frequency solutions, while employing mmWave-suitable hardware. They also show that the number of measurements need to be optimized to handle the trade-off between the channel estimate quality and the training overhead.

297 citations


Proceedings ArticleDOI
16 Feb 2015
TL;DR: NV-Tree, a consistent and cache-optimized B+Tree variant with reduced CPU cacheline flush, and NV-Store, a key-value store based on NV- tree, are implemented and evaluated on an NVDIMM server.
Abstract: The non-volatile memory (NVM) has DRAM-like performance and disk-like persistency which make it possible to replace both disk and DRAM to build single level systems. To keep data consistency in such systems is non-trivial because memory writes may be reordered by CPU and memory controller. In this paper, we study the consistency cost for an important and common data structure, B+Tree. Although the memory fence and CPU cacheline flush instructions can order memory writes to achieve data consistency, they introduce a significant overhead (more than 10X slower in performance). Based on our quantitative analysis of consistency cost, we propose NV-Tree, a consistent and cache-optimized B+Tree variant with reduced CPU cacheline flush. We implement and evaluate NV-Tree and NV-Store, a key-value store based on NV-Tree, on an NVDIMM server. NV-Tree outperforms the state-of-art consistent tree structures by up to 12X under write-intensive workloads. NV-Store increases the throughput by up to 4.8X under YCSB workloads compared to Redis.

296 citations


Proceedings Article
12 Aug 2015
TL;DR: This paper presents a method of defending against a broad class of side-channel attacks, which it is argued about the correctness and security of the compiler transformations and demonstrates that the transformations are safe in the context of a modern processor.
Abstract: Side-channel attacks monitor some aspect of a computer system's behavior to infer the values of secret data. Numerous side-channels have been exploited, including those that monitor caches, the branch predictor, and the memory address bus. This paper presents a method of defending against a broad class of side-channel attacks, which we refer to as digital side-channel attacks. The key idea is to obfuscate the program at the source code level to provide the illusion that many extraneous program paths are executed. This paper describes the technical issues involved in using this idea to provide confidentiality while minimizing execution overhead. We argue about the correctness and security of our compiler transformations and demonstrate that our transformations are safe in the context of a modern processor. Our empirical evaluation shows that our solution is 8.9× faster than prior work (GhostRider [20]) that specifically defends against memory trace-based side-channel attacks.

227 citations


Posted Content
TL;DR: The Block Successive Upper Bound Minimization (BSUM) as mentioned in this paper is a powerful algorithmic framework for big data optimization, which includes as special cases many well-known methods for analyzing massive data sets, such as the block coordinate Descent (BCD), the Convex-Concave Procedure (CCCP), the Block Coordinate Proximal Gradient (BCPG), the Nonnegative Matrix Factorization (NMF), the Expectation Maximization (EM) method and so on.
Abstract: This article presents a powerful algorithmic framework for big data optimization, called the Block Successive Upper bound Minimization (BSUM). The BSUM includes as special cases many well-known methods for analyzing massive data sets, such as the Block Coordinate Descent (BCD), the Convex-Concave Procedure (CCCP), the Block Coordinate Proximal Gradient (BCPG) method, the Nonnegative Matrix Factorization (NMF), the Expectation Maximization (EM) method and so on. In this article, various features and properties of the BSUM are discussed from the viewpoint of design flexibility, computational efficiency, parallel/distributed implementation and the required communication overhead. Illustrative examples from networking, signal processing and machine learning are presented to demonstrate the practical performance of the BSUM framework

215 citations


Posted Content
TL;DR: In this paper, the authors present FireCaffe, which scales deep neural network training across a cluster of GPUs by selecting network hardware that achieves high bandwidth between GPU servers and using reduction trees to reduce communication overhead.
Abstract: Long training times for high-accuracy deep neural networks (DNNs) impede research into new DNN architectures and slow the development of high-accuracy DNNs. In this paper we present FireCaffe, which successfully scales deep neural network training across a cluster of GPUs. We also present a number of best practices to aid in comparing advancements in methods for scaling and accelerating the training of deep neural networks. The speed and scalability of distributed algorithms is almost always limited by the overhead of communicating between servers; DNN training is not an exception to this rule. Therefore, the key consideration here is to reduce communication overhead wherever possible, while not degrading the accuracy of the DNN models that we train. Our approach has three key pillars. First, we select network hardware that achieves high bandwidth between GPU servers -- Infiniband or Cray interconnects are ideal for this. Second, we consider a number of communication algorithms, and we find that reduction trees are more efficient and scalable than the traditional parameter server approach. Third, we optionally increase the batch size to reduce the total quantity of communication during DNN training, and we identify hyperparameters that allow us to reproduce the small-batch accuracy while training with large batch sizes. When training GoogLeNet and Network-in-Network on ImageNet, we achieve a 47x and 39x speedup, respectively, when training on a cluster of 128 GPUs.

213 citations


Journal ArticleDOI
TL;DR: MuR-DPA as mentioned in this paper is a public data auditing scheme based on the Merkle hash tree (MHT), which can not only incur much less communication overhead for both update verification and integrity verification of cloud datasets with multiple replicas, but also provide enhanced security against dishonest cloud service providers.
Abstract: Cloud computing that provides elastic computing and storage resource on demand has become increasingly important due to the emergence of “big data”. Cloud computing resources are a natural fit for processing big data streams as they allow big data application to run at a scale which is required for handling its complexities (data volume, variety and velocity). With the data no longer under users’ direct control, data security in cloud computing is becoming one of the most concerns in the adoption of cloud computing resources. In order to improve data reliability and availability, storing multiple replicas along with original datasets is a common strategy for cloud service providers. Public data auditing schemes allow users to verify their outsourced data storage without having to retrieve the whole dataset. However, existing data auditing techniques suffers from efficiency and security problems. First, for dynamic datasets with multiple replicas, the communication overhead for update verifications is very large, because each update requires updating of all replicas, where verification for each update requires O(log n ) communication complexity. Second, existing schemes cannot provide public auditing and authentication of block indices at the same time. Without authentication of block indices, the server can build a valid proof based on data blocks other than the blocks client requested to verify. In order to address these problems, in this paper, we present a novel public auditing scheme named MuR-DPA. The new scheme incorporated a novel authenticated data structure (ADS) based on the Merkle hash tree (MHT), which we call MR-MHT. To support full dynamic data updates and authentication of block indices, we included rank and level values in computation of MHT nodes. In contrast to existing schemes, level values of nodes in MR-MHT are assigned in a top-down order, and all replica blocks for each data block are organized into a same replica sub-tree. Such a configuration allows efficient verification of updates for multiple replicas. Compared to existing integrity verification and public auditing schemes, theoretical analysis and experimental results show that the proposed MuR-DPA scheme can not only incur much less communication overhead for both update verification and integrity verification of cloud datasets with multiple replicas, but also provide enhanced security against dishonest cloud service providers.

196 citations


Journal ArticleDOI
TL;DR: A subgradient-based cost minimization algorithm that converges to the optimal solution in a practical number of iterations and with limited communication overhead is proposed for energy trading between islanded microgrids.
Abstract: In this paper, a distributed convex optimization framework is developed for energy trading between islanded microgrids. More specifically, the problem consists of several islanded microgrids that exchange energy flows by means of an arbitrary topology. Due to scalability issues and in order to safeguard local information on cost functions, a subgradient-based cost minimization algorithm that converges to the optimal solution in a practical number of iterations and with limited communication overhead is proposed. Furthermore, this approach allows for a very intuitive economics interpretation that explains the algorithm iterations in terms of a “supply–demand model” and “market clearing.” Numerical results are given in terms of the convergence rate of the algorithm and the attained costs for different network topologies.

Proceedings Article
12 Aug 2015
TL;DR: A new approach for designing PSI protocols based on permutation-based hashing, which enables to reduce the length of items mapped to bins while ensuring that no collisions occur, is described, which is the first secure PSI protocol that is scalable to the demands and the constraints of current real-world settings.
Abstract: Private Set Intersection (PSI) allows two parties to compute the intersection of private sets while revealing nothing more than the intersection itself. PSI needs to be applied to large data sets in scenarios such as measurement of ad conversion rates, data sharing, or contact discovery. Existing PSI protocols do not scale up well, and therefore some applications use insecure solutions instead. We describe a new approach for designing PSI protocols based on permutation-based hashing, which enables to reduce the length of items mapped to bins while ensuring that no collisions occur. We denote this approach as Phasing, for Permutation-based Hashing Set Intersection. Phasing can dramatically improve the performance of PSI protocols whose overhead depends on the length of the representations of input items. We apply Phasing to design a new approach for circuit-based PSI protocols. The resulting protocol is up to 5 times faster than the previously best Sort-Compare-Shuffle circuit of Huang et al. (NDSS 2012). We also apply Phasing to the OT-based PSI protocol of Pinkas et al. (USENIX Security 2014), which is the fastest PSI protocol to date. Together with additional improvements that reduce the computation complexity by a logarithmic factor, the resulting protocol improves run-time by a factor of up to 20 and can also have similar communication overhead as the previously best PSI protocol in that respect. The new protocol is only moderately less efficient than an insecure PSI protocol that is currently used by real-world applications, and is therefore the first secure PSI protocol that is scalable to the demands and the constraints of current real-world settings.

Journal ArticleDOI
TL;DR: In this article, the ergodic sum rate of RS-S and RS-ST schemes with quantized CSIT was studied, where the common message(s) are transmitted via a space and space-time design, respectively.
Abstract: To enhance the multiplexing gain of two-receiver Multiple-Input-Single-Output Broadcast Channel with imperfect channel state information at the transmitter (CSIT), a class of Rate-Splitting (RS) approaches has been proposed recently, which divides one receiver's message into a common and a private part, and superposes the common message on top of Zero-Forcing precoded private messages. In this paper, with quantized CSIT, we study the ergodic sum rate of two schemes, namely RS-S and RS-ST, where the common message(s) are transmitted via a space and space-time design, respectively. Firstly, we upper-bound the sum rate loss incurred by each scheme relative to Zero-Forcing Beamforming (ZFBF) with perfect CSIT. Secondly, we show that, to maintain a constant sum rate loss, RS-S scheme enables a feedback overhead reduction over ZFBF with quantized CSIT. Such reduction scales logarithmically with the constant rate loss at high Signal-to-Noise-Ratio (SNR). We also find that, compared to RS-S scheme, RS-ST scheme offers a further feedback overhead reduction that scales with the discrepancy between the feedback overhead employed by the two receivers when there are alternating receiver-specific feedback qualities. Finally, simulation results show that both schemes offer a significant SNR gain over conventional single-user/multiuser mode switching when the feedback overhead is fixed.

Posted Content
TL;DR: SparkNet as mentioned in this paper is a framework for training deep networks in Spark, which includes a convenient interface for reading data from Spark RDDs, a Scala interface to the Caffe deep learning framework, and a lightweight multi-dimensional tensor library.
Abstract: Training deep networks is a time-consuming process, with networks for object recognition often requiring multiple days to train. For this reason, leveraging the resources of a cluster to speed up training is an important area of work. However, widely-popular batch-processing computational frameworks like MapReduce and Spark were not designed to support the asynchronous and communication-intensive workloads of existing distributed deep learning systems. We introduce SparkNet, a framework for training deep networks in Spark. Our implementation includes a convenient interface for reading data from Spark RDDs, a Scala interface to the Caffe deep learning framework, and a lightweight multi-dimensional tensor library. Using a simple parallelization scheme for stochastic gradient descent, SparkNet scales well with the cluster size and tolerates very high-latency communication. Furthermore, it is easy to deploy and use with no parameter tuning, and it is compatible with existing Caffe models. We quantify the dependence of the speedup obtained by SparkNet on the number of machines, the communication frequency, and the cluster's communication overhead, and we benchmark our system's performance on the ImageNet dataset.

Proceedings ArticleDOI
16 Feb 2015
TL;DR: This paper designs erasure codes that are simultaneously optimal in terms of I/O, storage, and network bandwidth, and builds on top of a class of powerful practical codes, called the product-matrix-MSR codes.
Abstract: Erasure codes, such as Reed-Solomon (RS) codes, are increasingly being deployed as an alternative to data-replication for fault tolerance in distributed storage systems. While RS codes provide significant savings in storage space, they can impose a huge burden on the I/O and network resources when reconstructing failed or otherwise unavailable data. A recent class of erasure codes, called minimum-storage-regeneration (MSR) codes, has emerged as a superior alternative to the popular RS codes, in that it minimizes network transfers during reconstruction while also being optimal with respect to storage and reliability. However, existing practical MSR codes do not address the increasingly important problem of I/O overhead incurred during reconstructions, and are, in general, inferior to RS codes in this regard. In this paper, we design erasure codes that are simultaneously optimal in terms of I/O, storage, and network bandwidth. Our design builds on top of a class of powerful practical codes, called the product-matrix-MSR codes. Evaluations show that our proposed design results in a significant reduction the number of I/Os consumed during reconstructions (a 5× reduction for typical parameters), while retaining optimality with respect to storage, reliability, and network bandwidth.

Journal ArticleDOI
TL;DR: This work derives a closed-form analytic formula for the M2M traffic throughput and proposes a joint adaptive resource allocation and access barring scheme based on the analytic results and shows that the proposed scheme exhibits a near-optimal performance in terms of the capacity.
Abstract: To address random access channel (RACH) congestion and high signaling overhead problems of machine-to-machine (M2M) communication in cellular networks, we propose a new design of a random access procedure that is exclusively engineered for the M2M communication. Our design has two prominent features. One is a fast signaling process that allows M2M user equipment to transmit data right after preamble transmission on a physical RACH to reduce the signaling overhead. The other is a self-optimization feature that allows the cellular system to produce optimal M2M throughput by adaptively changing resource block (RB) composition and an access barring parameter according to the amount of available RBs and the M2M traffic load. We derive a closed-form analytic formula for the M2M traffic throughput and propose a joint adaptive resource allocation and access barring scheme based on the analytic results. By simulation, we show that the proposed scheme exhibits a near-optimal performance in terms of the capacity.

Proceedings ArticleDOI
24 Aug 2015
TL;DR: This paper presents a simulation driven prediction model that can predict job performance with high accuracy for Apache Spark platform and evaluated the prediction framework using four real-life applications to show that the model can achieve high prediction accuracy.
Abstract: Apache Spark is an open source distributed data processing platform that uses distributed memory abstraction to process large volume of data efficiently. However, performance of a particular job on Apache Spark platform can vary significantly depending on the input data type and size, design and implementation of the algorithm, and computing capability, making it extremely difficult to predict the performance metric of a job such as execution time, memory footprint, and I/O cost. To address this challenge, in this paper, we present a simulation driven prediction model that can predict job performance with high accuracy for Apache Spark platform. Specifically, as Apache spark jobs are often consist of multiple sequential stages, the presented prediction model simulates the execution of the actual job by using only a fraction of the input data, and collect execution traces (e.g., I/O overhead, memory consumption, execution time) to predict job performance for each execution stage individually. We evaluated our prediction framework using four real-life applications on a 13 node cluster, and experimental results show that the model can achieve high prediction accuracy.

Journal ArticleDOI
TL;DR: An accurate and lightweight intrusion detection framework that aims to protect the vehicular ad hoc networks (VANETs) against the most dangerous attacks that could occurred on this network, called AECFV is designed and implemented.

Journal ArticleDOI
TL;DR: A stochastic geometric analysis framework on user mobility is proposed, to capture the spatial randomness and various scales of cell sizes in different tiers and provide guidelines for optimal tier selection under various user velocities.
Abstract: Horizontal and vertical handoffs are important ramifications of user mobility in multitier heterogeneous wireless networks. They directly impact the signaling overhead and quality of calls. However, they are difficult to analyze due to the irregularly shaped network topologies introduced by multiple tiers of cells. In this paper, a stochastic geometric analysis framework on user mobility is proposed, to capture the spatial randomness and various scales of cell sizes in different tiers. We derive theoretical expressions for the rates of all handoff types experienced by an active user with arbitrary movement trajectory. Furthermore, noting that the data rate of a user depends on the set of cell tiers that it is willing to use, we provide guidelines for optimal tier selection under various user velocities, taking both the handoff rates and the data rate into consideration. Empirical studies using user mobility trace data and extensive simulation are conducted, demonstrating the correctness and usefulness of our analysis.

Posted Content
TL;DR: In this paper, the authors analyze three methods to detect cache-based side-channel attacks in real-time, preventing or limiting the amount of leaked information, and demonstrate that two of the three methods are based on machine learning techniques and all of them can successfully detect an attacker in about one fifth of the time required to complete the attack.
Abstract: In this paper we analyze three methods to detect cache-based side-channel attacks in real time, preventing or limiting the amount of leaked information. Two of the three methods are based on machine learning techniques and all the three of them can successfully detect an attacker in about one fifth of the time required to complete the attack. There were no false positives in our test environment. Moreover we could not measure a change in the execution time of the processes involved in the attack, meaning there is no perceivable overhead. We also analyze how the detection systems behave with a modified version of one of the spy processes. With some optimization we are confident these systems can be used in real world scenarios.

Journal ArticleDOI
TL;DR: A set of distributed algorithms for estimating the electro-mechanical oscillation modes of large power system networks using synchrophasors is presented and three different communication and computational architectures by which estimators located at the control centers of various utility companies can run local optimization algorithms using local PMU data, and thereafter communicate with other estimators to reach a global solution.
Abstract: In this paper, we present a set of distributed algorithms for estimating the electro-mechanical oscillation modes of large power system networks using synchrophasors. With the number of phasor measurement units (PMUs) in the North American grid scaling up to the thousands, system operators are gradually inclining toward distributed cyber-physical architectures for executing wide-area monitoring and control operations. Traditional centralized approaches, in fact, are anticipated to become untenable soon due to various factors such as data volume, security, communication overhead, and failure to adhere to real-time deadlines. To address this challenge, we propose three different communication and computational architectures by which estimators located at the control centers of various utility companies can run local optimization algorithms using local PMU data, and thereafter communicate with other estimators to reach a global solution. Both synchronous and asynchronous communications are considered. Each architecture integrates a centralized Prony-based algorithm with several variants of alternating direction method of multipliers (ADMM). We discuss the relative advantages and bottlenecks of each architecture using simulations of IEEE 68-bus and IEEE 145-bus power system, as well as an Exo-GENI-based software defined network.

Journal ArticleDOI
TL;DR: This paper develops an outsourced policy updating method that enabling efficient access control with dynamic policy updating for big data in the cloud and proposes an efficient and secure method that allows data owner to check whether the cloud server has updated the ciphertexts correctly.
Abstract: Due to the high volume and velocity of big data, it is an effective option to store big data in the cloud, as the cloud has capabilities of storing big data and processing high volume of user access requests. Attribute-based encryption (ABE) is a promising technique to ensure the end-to-end security of big data in the cloud. However, the policy updating has always been a challenging issue when ABE is used to construct access control schemes. A trivial implementation is to let data owners retrieve the data and re-encrypt it under the new access policy, and then send it back to the cloud. This method, however, incurs a high communication overhead and heavy computation burden on data owners. In this paper, we propose a novel scheme that enabling efficient access control with dynamic policy updating for big data in the cloud. We focus on developing an outsourced policy updating method for ABE systems. Our method can avoid the transmission of encrypted data and minimize the computation work of data owners, by making use of the previously encrypted data with old access policies. Moreover, we also propose policy updating algorithms for different types of access policies. Finally, we propose an efficient and secure method that allows data owner to check whether the cloud server has updated the ciphertexts correctly. The analysis shows that our policy updating outsourcing scheme is correct, complete, secure and efficient.

Journal ArticleDOI
01 Jan 2015
TL;DR: Simulation results prove that Selective 3-Anchor DV-hop algorithm offers the best performance when it comes to localization accuracy, mobility, synchronization and overhead.
Abstract: Localization is a fundamental issue for many applications in wireless sensor networks. Without the need of additional ranging devices, the range-free localization technology is a cost-effective solution for low-cost indoor and outdoor wireless sensor networks. Among range-free algorithms, DV-hop (Distance Vector-hop) has the advantage to localize the mobile nodes which has less than three neighbour anchors. Based on the original DV-hop algorithm, this paper presents two improved algorithms (Checkout DV-hop and Selective 3-Anchor DV-hop). Checkout DV-hop algorithm estimates the mobile node position by using the nearest anchor, while Selective 3-Anchor DV-hop algorithm chooses the best 3 anchors to improve localization accuracy. Then, in order to implement these DV-hop based algorithms in network scenarios, a novel DV-hop localization protocol is proposed. This new protocol is presented in detail in this paper, including the format of data payloads, the improved collision reduction method E-CSMA/CA, as well as parameters used in deciding the end of each DV-hop step. Finally, using our localization protocol, we investigate the performance of typical DV-hop based algorithms in terms of localization accuracy, mobility, synchronization and overhead. Simulation results prove that Selective 3-Anchor DV-hop algorithm offers the best performance compared to Checkout DV-hop and the original DV-hop algorithm.

Journal ArticleDOI
TL;DR: This work proposes an analytical model-based approach for quality evaluation of Infrastructure-as-a-Service cloud by considering expected request completion time, rejection probability, and system overhead rate as key quality metrics.
Abstract: Cloud computing is a recently developed new technology for complex systems with massive service sharing, which is different from the resource sharing of the grid computing systems. In a cloud environment, service requests from users go through numerous provider-specific steps from the instant it is submitted to when the requested service is fully delivered. Quality modeling and analysis of clouds are not easy tasks because of the complexity of the automated provisioning mechanism and dynamically changing cloud environment. This work proposes an analytical model-based approach for quality evaluation of Infrastructure-as-a-Service cloud by considering expected request completion time, rejection probability, and system overhead rate as key quality metrics. It also features with the modeling of different warm-up and cool-down strategies of machines and the ability to identify the optimal balance between system overhead and performance. To validate the correctness of the proposed model, we obtain simulative quality-of-service (QoS) data and conduct a confidence interval analysis. The result can be used to help design and optimize industrial cloud computing systems.

Journal ArticleDOI
TL;DR: Optique overcomes problems in current ontology-based data access systems pertaining to installation overhead, usability, scalability, and scope by integrating a user-oriented query interface, semi-automated managing methods, new query rewriting techniques, and temporal and streaming data processing in one platform.
Abstract: Optique overcomes problems in current ontology-based data access systems pertaining to installation overhead, usability, scalability, and scope by integrating a user-oriented query interface, semi-automated managing methods, new query rewriting techniques, and temporal and streaming data processing in one platform.

Proceedings ArticleDOI
01 Jan 2015
TL;DR: An implementation of StackArmor for x86 64 Linux is presented and a detailed experimental analysis of the prototype is provided to demonstrate that StackArmor offers better security than prior binary and source-level approaches, at the cost of only modest performance and memory overhead even with full protection.
Abstract: StackArmor is a comprehensive protection technique for stack-based memory error vulnerabilities in binaries. It relies on binary analysis and rewriting strategies to drastically reduce the uniquely high spatial and temporal memory predictability of traditional call stack organizations. Unlike prior solutions, StackArmor can protect against arbitrary stack-based attacks, requires no access to the source code, and offers a policy-driven protection strategy that allows end users to tune the securityperformance tradeoff according to their needs. We present an implementation of StackArmor for x86 64 Linux and provide a detailed experimental analysis of our prototype on popular server programs and standard benchmarks (SPEC CPU2006). Our results demonstrate that StackArmor offers better security than prior binaryand source-level approaches, at the cost of only modest performance and memory overhead even with full protection.

Journal ArticleDOI
TL;DR: This paper defines the achievable effective ergodic secrecy rate (ESR), investigates a joint power allocation and training overhead optimization problem for the maximization of effective ESR, and derives a deterministic approximation for the achievableeffective ESR which facilitates the joint optimization.
Abstract: This paper proposes a framework for the artificial noise assisted secure transmission in multiple-input, multiple-output, multiple antenna eavesdropper (MIMOME) wiretap channels in frequency-division duplexed (FDD) systems. We focus on a practical scenario that only the eavesdroppers’ channel distribution information (CDI) is available and the imperfect channel state information (CSI) of the legitimate receiver is acquired through training and analog feedback. By taking explicitly into account the signaling overhead and training power overhead incurred by channel estimation and feedback, we define the achievable effective ergodic secrecy rate (ESR), and investigate a joint power allocation and training overhead optimization problem for the maximization of effective ESR. We first derive a deterministic approximation for the achievable effective ESR which facilitates the joint optimization. Then, efficient iterative algorithms are proposed to solve the considered nonconvex optimization problem. In particular, in the high-SNR regime, a block coordinate descent method (BCDM) is proposed to handle the joint optimization. In the low-SNR regime, we transform the problem into a sequence of geometric programmings (GPs) and locate its Karush–Kuhn–Tucker (KKT) solution using the successive convex approximation (SCA) method. For the general case of SNR, we maximize the lower bound of the achievable effective ESR. Simulation results corroborate the theoretical analysis and illustrate the secrecy performance of the proposed secure transmission scheme.

Journal ArticleDOI
TL;DR: LIBRA as discussed by the authors is a lightweight strategy to address the data skew problem among the reducers of MapReduce applications, which does not require any pre-run sampling of the input data or prevent the overlap between the map and the reduce stages.
Abstract: MapReduce is an effective tool for parallel data processing. One significant issue in practical MapReduce applications is data skew: the imbalance in the amount of data assigned to each task. This causes some tasks to take much longer to finish than others and can significantly impact performance. This paper presents LIBRA, a lightweight strategy to address the data skew problem among the reducers of MapReduce applications. Unlike previous work, LIBRA does not require any pre-run sampling of the input data or prevent the overlap between the map and the reduce stages. It uses an innovative sampling method which can achieve a highly accurate approximation to the distribution of the intermediate data by sampling only a small fraction of the intermediate data during the normal map processing. It allows the reduce tasks to start copying as soon as the chosen sample map tasks (only a small fraction of map tasks which are issued first) complete. It supports the split of large keys when application semantics permit and the total order of the output data. It considers the heterogeneity of the computing resources when balancing the load among the reduce tasks appropriately. LIBRA is applicable to a wide range of applications and is transparent to the users. We implement LIBRA in Hadoop and our experiments show that LIBRA has negligible overhead and can speed up the execution of some popular applications by up to a factor of 4.

Proceedings ArticleDOI
13 Apr 2015
TL;DR: This work introduces ChronoStream, a distributed system specifically designed for elastic stateful stream computation in the cloud that can scale linearly and achieve transparent elasticity and high availability without sacrificing system performance or affecting collocated tenants.
Abstract: We introduce ChronoStream, a distributed system specifically designed for elastic stateful stream computation in the cloud. ChronoStream treats internal state as a first-class citizen and aims at providing flexible elastic support in both vertical and horizontal dimensions to cope with workload fluctuation and dynamic resource reclamation. With a clear separation between application-level computation parallelism and OS-level execution concurrency, ChronoStream enables transparent dynamic scaling and failure recovery by eliminating any network I/O and state-synchronization overhead. Our evaluation on dozens of computing nodes shows that ChronoStream can scale linearly and achieve transparent elasticity and high availability without sacrificing system performance or affecting collocated tenants.

Proceedings ArticleDOI
16 Feb 2015
TL;DR: This paper shows that two families of advanced caching algorithms, Segmented-LRU and Greedy-Dual-Size-Frequency, can be easily implemented with RIPQ and shows that these algorithms running on RIPQ increase hit ratios up to ∼20% over the current FIFO system, incur low overhead, and achieve high throughput.
Abstract: Facebook uses flash devices extensively in its photo-caching stack The key design challenge for an efficient photo cache on flash at Facebook is its workload: many small random writes are generated by inserting cache-missed content, or updating cache-hit content for advanced caching algorithms The Flash Translation Layer on flash devices performs poorly with such a workload, lowering throughput and decreasing device lifespan Existing coping strategies under-utilize the space on flash devices, sacrificing cache capacity, or are limited to simple caching algorithms like FIFO, sacrificing hit ratiosWe overcome these limitations with the novel Restricted Insertion Priority Queue (RIPQ) framework that supports advanced caching algorithms with large cache sizes, high throughput, and long device lifespan RIPQ aggregates small random writes, co-locates similarly prioritized content, and lazily moves updated content to further reduce device overhead We show that two families of advanced caching algorithms, Segmented-LRU and Greedy-Dual-Size-Frequency, can be easily implemented with RIPQ Our evaluation on Facebook's photo trace shows that these algorithms running on RIPQ increase hit ratios up to ∼20% over the current FIFO system, incur low overhead, and achieve high throughput