scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Parallel and Distributed Systems in 2011"


Journal ArticleDOI
TL;DR: To achieve efficient data dynamics, the existing proof of storage models are improved by manipulating the classic Merkle Hash Tree construction for block tag authentication, and an elegant verification scheme is constructed for the seamless integration of these two salient features in the protocol design.
Abstract: Cloud Computing has been envisioned as the next-generation architecture of IT Enterprise. It moves the application software and databases to the centralized large data centers, where the management of the data and services may not be fully trustworthy. This unique paradigm brings about many new security challenges, which have not been well understood. This work studies the problem of ensuring the integrity of data storage in Cloud Computing. In particular, we consider the task of allowing a third party auditor (TPA), on behalf of the cloud client, to verify the integrity of the dynamic data stored in the cloud. The introduction of TPA eliminates the involvement of the client through the auditing of whether his data stored in the cloud are indeed intact, which can be important in achieving economies of scale for Cloud Computing. The support for data dynamics via the most general forms of data operation, such as block modification, insertion, and deletion, is also a significant step toward practicality, since services in Cloud Computing are not limited to archive or backup data only. While prior works on ensuring remote data integrity often lacks the support of either public auditability or dynamic data operations, this paper achieves both. We first identify the difficulties and potential security problems of direct extensions with fully dynamic data updates from prior works and then show how to construct an elegant verification scheme for the seamless integration of these two salient features in our protocol design. In particular, to achieve efficient data dynamics, we improve the existing proof of storage models by manipulating the classic Merkle Hash Tree construction for block tag authentication. To support efficient handling of multiple auditing tasks, we further explore the technique of bilinear aggregate signature to extend our main result into a multiuser setting, where TPA can perform multiple auditing tasks simultaneously. Extensive security and performance analysis show that the proposed schemes are highly efficient and provably secure.

1,422 citations


Journal ArticleDOI
TL;DR: The results indicate that the current clouds need an order of magnitude in performance improvement to be useful to the scientific community, and show which improvements should be considered first to address this discrepancy between offer and demand.
Abstract: Cloud computing is an emerging commercial infrastructure paradigm that promises to eliminate the need for maintaining expensive computing facilities by companies and institutes alike. Through the use of virtualization and resource time sharing, clouds serve with a single set of physical resources a large user base with different needs. Thus, clouds have the potential to provide to their owners the benefits of an economy of scale and, at the same time, become an alternative for scientists to clusters, grids, and parallel production environments. However, the current commercial clouds have been built to support web and small database workloads, which are very different from typical scientific computing workloads. Moreover, the use of virtualization and resource time sharing may introduce significant performance penalties for the demanding scientific computing workloads. In this work, we analyze the performance of cloud computing services for scientific computing workloads. We quantify the presence in real scientific computing workloads of Many-Task Computing (MTC) users, that is, of users who employ loosely coupled applications comprising many tasks to achieve their scientific goals. Then, we perform an empirical evaluation of the performance of four commercial cloud computing services including Amazon EC2, which is currently the largest commercial cloud. Last, we compare through trace-based simulation the performance characteristics and cost models of clouds and other scientific computing platforms, for general and MTC-based scientific computing workloads. Our results indicate that the current clouds need an order of magnitude in performance improvement to be useful to the scientific community, and show which improvements should be considered first to address this discrepancy between offer and demand.

915 citations


Journal ArticleDOI
TL;DR: This paper proposes an access control mechanism using ciphertext-policy attribute-based encryption to enforce access control policies with efficient attribute and user revocation capability and demonstrates how to apply the proposed mechanism to securely manage the outsourced data.
Abstract: Some of the most challenging issues in data outsourcing scenario are the enforcement of authorization policies and the support of policy updates. Ciphertext-policy attribute-based encryption is a promising cryptographic solution to these issues for enforcing access control policies defined by a data owner on outsourced data. However, the problem of applying the attribute-based encryption in an outsourced architecture introduces several challenges with regard to the attribute and user revocation. In this paper, we propose an access control mechanism using ciphertext-policy attribute-based encryption to enforce access control policies with efficient attribute and user revocation capability. The fine-grained access control can be achieved by dual encryption mechanism which takes advantage of the attribute-based encryption and selective group key distribution in each attribute group. We demonstrate how to apply the proposed mechanism to securely manage the outsourced data. The analysis results indicate that the proposed scheme is efficient and secure in the data outsourcing systems.

743 citations


Journal ArticleDOI
TL;DR: This work addresses the problem of scheduling precedence-constrained parallel applications on multiprocessor computer systems and presents two energy-conscious scheduling algorithms using dynamic voltage scaling (DVS) and a novel objective function and a variant from that.
Abstract: Traditionally, the primary performance goal of computer systems has focused on reducing the execution time of applications while increasing throughput. This performance goal has been mostly achieved by the development of high-density computer systems. As witnessed recently, these systems provide very powerful processing capability and capacity. They often consist of tens or hundreds of thousands of processors and other resource-hungry devices. The energy consumption of these systems has become a major concern. In this paper, we address the problem of scheduling precedence-constrained parallel applications on multiprocessor computer systems and present two energy-conscious scheduling algorithms using dynamic voltage scaling (DVS). A number of recent commodity processors are capable of DVS, which enables processors to operate at different voltage supply levels at the expense of sacrificing clock frequencies. In the context of scheduling, this multiple voltage facility implies that there is a trade-off between the quality of schedules and energy consumption. To effectively balance these two performance goals, we have devised a novel objective function and a variant from that. The main difference between the two algorithms is in their measurement of energy consumption. The extensive comparative evaluations conducted as part of this work show that the performance of our algorithms is very compelling in terms of both application completion time and energy consumption.

306 citations


Journal ArticleDOI
TL;DR: Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's IaaS clouds for both, task scheduling and execution.
Abstract: In recent years ad hoc parallel data processing has emerged to be one of the killer applications for Infrastructure-as-a-Service (IaaS) clouds. Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. However, the processing frameworks which are currently used have been designed for static, homogeneous cluster setups and disregard the particular nature of a cloud. Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this paper, we discuss the opportunities and challenges for efficient parallel data processing in clouds and present our research project Nephele. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's IaaS clouds for both, task scheduling and execution. Particular tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution. Based on this new framework, we perform extended evaluations of MapReduce-inspired processing jobs on an IaaS cloud system and compare the results to the popular data processing framework Hadoop.

296 citations


Journal ArticleDOI
TL;DR: A new scheme for dynamic adaptation of transmission power and contention window (CW) size to enhance performance of information dissemination in Vehicular Ad-hoc Networks (VANETs) and features significantly better throughput and lower average end-to-end delay compared with a similar scheme with static parameters.
Abstract: In this paper, we present a new scheme for dynamic adaptation of transmission power and contention window (CW) size to enhance performance of information dissemination in Vehicular Ad-hoc Networks (VANETs). The proposed scheme incorporates the Enhanced Distributed Channel Access (EDCA) mechanism of 802.11e and uses a joint approach to adapt transmission power at the physical (PHY) layer and quality-of-service (QoS) parameters at the medium access control (MAC) layer. In our scheme, transmission power is adapted based on the estimated local vehicle density to change the transmission range dynamically, while the CW size is adapted according to the instantaneous collision rate to enable service differentiation. In the interest of promoting timely propagation of information, VANET advisories are prioritized according to their urgency and the EDCA mechanism is employed for their dissemination. The performance of the proposed joint adaptation scheme was evaluated using the ns-2 simulator with added EDCA support. Extensive simulations have demonstrated that our scheme features significantly better throughput and lower average end-to-end delay compared with a similar scheme with static parameters.

245 citations


Journal ArticleDOI
TL;DR: This paper designs an Energy-Balanced Routing Protocol (EBRP) by constructing a mixed virtual potential field in terms of depth, energy density, and residual energy and shows significant improvements in energy balance, network lifetime, coverage ratio, and throughput as compared to the commonly used energy-efficient routing algorithm.
Abstract: Energy is an extremely critical resource for battery-powered wireless sensor networks (WSN), thus making energy-efficient protocol design a key challenging problem. Most of the existing energy-efficient routing protocols always forward packets along the minimum energy path to the sink to merely minimize energy consumption, which causes an unbalanced distribution of residual energy among sensor nodes, and eventually results in a network partition. In this paper, with the help of the concept of potential in physics, we design an Energy-Balanced Routing Protocol (EBRP) by constructing a mixed virtual potential field in terms of depth, energy density, and residual energy. The goal of this basic approach is to force packets to move toward the sink through the dense energy area so as to protect the nodes with relatively low residual energy. To address the routing loop problem emerging in this basic algorithm, enhanced mechanisms are proposed to detect and eliminate loops. The basic algorithm and loop elimination mechanism are first validated through extensive simulation experiments. Finally, the integrated performance of the full potential-based energy-balanced routing algorithm is evaluated through numerous simulations in a random deployed network running event-driven applications, the impact of the parameters on the performance is examined and guidelines for parameter settings are summarized. Our experimental results show that there are significant improvements in energy balance, network lifetime, coverage ratio, and throughput as compared to the commonly used energy-efficient routing algorithm.

233 citations


Journal ArticleDOI
TL;DR: A generic and secure framework is proposed to upgrade two-Factor authentication to three-factor authentication, which not only significantly improves the information assurance at low cost but also protects client privacy in distributed systems.
Abstract: As part of the security within distributed systems, various services and resources need protection from unauthorized use. Remote authentication is the most commonly used method to determine the identity of a remote client. This paper investigates a systematic approach for authenticating clients by three factors, namely password, smart card, and biometrics. A generic and secure framework is proposed to upgrade two-factor authentication to three-factor authentication. The conversion not only significantly improves the information assurance at low cost but also protects client privacy in distributed systems. In addition, our framework retains several practice-friendly properties of the underlying two-factor authentication, which we believe is of independent interest.

224 citations


Journal ArticleDOI
TL;DR: An efficient distributed algorithm is proposed that produces a collision-free schedule for data aggregation in WSNs and it is theoretically proved that the delay of the aggregation schedule generated by the algorithm is at most 16R + Δ - 14 time slots.
Abstract: Data aggregation is a key functionality in wireless sensor networks (WSNs). This paper focuses on data aggregation scheduling problem to minimize the delay (or latency). We propose an efficient distributed algorithm that produces a collision-free schedule for data aggregation in WSNs. We theoretically prove that the delay of the aggregation schedule generated by our algorithm is at most 16R + Δ - 14 time slots. Here, R is the network radius and Δ is the maximum node degree in the communication graph of the original network. Our algorithm significantly improves the previously known best data aggregation algorithm with an upper bound of delay of 24D + 6Δ + 16 time slots, where D is the network diameter (note that D can be as large as 2R). We conduct extensive simulations to study the practical performances of our proposed data aggregation algorithm. Our simulation results corroborate our theoretical results and show that our algorithms perform better in practice. We prove that the overall lower bound of delay for data aggregation under any interference model is max{log n,R}, where n is the network size. We provide an example to show that the lower bound is (approximately) tight under the protocol interference model when rI = r, where rI is the interference range and r is the transmission range. We also derive the lower bound of delay under the protocol interference model when r <; rI <; 3r and rI ≥ 3r.

224 citations


Journal ArticleDOI
TL;DR: The hiCUDA}, a high-level directive-based language for CUDA programming is designed, which allows programmers to perform tedious tasks in a simpler manner and directly to the sequential code, thus speeding up the porting process.
Abstract: Graphics Processing Units (GPUs) have become a competitive accelerator for applications outside the graphics domain, mainly driven by the improvements in GPU programmability. Although the Compute Unified Device Architecture (CUDA) is a simple C-like interface for programming NVIDIA GPUs, porting applications to CUDA remains a challenge to average programmers. In particular, CUDA places on the programmer the burden of packaging GPU code in separate functions, of explicitly managing data transfer between the host and GPU memories, and of manually optimizing the utilization of the GPU memory. Practical experience shows that the programmer needs to make significant code changes, often tedious and error-prone, before getting an optimized program. We have designed hiCUDA}, a high-level directive-based language for CUDA programming. It allows programmers to perform these tedious tasks in a simpler manner and directly to the sequential code, thus speeding up the porting process. In this paper, we describe the hiCUDA} directives as well as the design and implementation of a prototype compiler that translates a hiCUDA} program to a CUDA program. Our compiler is able to support real-world applications that span multiple procedures and use dynamically allocated arrays. Experiments using nine CUDA benchmarks show that the simplicity hiCUDA} provides comes at no expense to performance.

217 citations


Journal ArticleDOI
TL;DR: This paper presents an energy-efficient opportunistic routing strategy, denoted as EEOR, and extensive simulations in TOSSIM show that the protocol EEOR performs better than the well-known ExOR protocol in terms of the energy consumption, the packet loss ratio, and the average delivery delay.
Abstract: Opportunistic routing, has been shown to improve the network throughput, by allowing nodes that overhear the transmission and closer to the destination to participate in forwarding packets, i.e., in forwarder list. The nodes in forwarder list are prioritized and the lower priority forwarder will discard the packet if the packet has been forwarded by a higher priority forwarder. One challenging problem is to select and prioritize forwarder list such that a certain network performance is optimized. In this paper, we focus on selecting and prioritizing forwarder list to minimize energy consumption by all nodes. We study both cases where the transmission power of each node is fixed or dynamically adjustable. We present an energy-efficient opportunistic routing strategy, denoted as EEOR. Our extensive simulations in TOSSIM show that our protocol EEOR performs better than the well-known ExOR protocol (when adapted in sensor networks) in terms of the energy consumption, the packet loss ratio, and the average delivery delay.

Journal ArticleDOI
TL;DR: This paper presents the Lottery Frame (LoF) estimation scheme, which can achieve high accuracy, low latency, and scalability, and shows the significant advantages, e.g.,High accuracy, short processing time, and low overhead, of the proposed LoF scheme through analysis and simulations.
Abstract: Counting the number of RFID tags (cardinality) is a fundamental problem for large-scale RFID systems. Not only does it satisfy some real application requirements, it also acts as an important aid for RFID identification. Due to the extremely long processing time, slotted ALOHA-based or tree-based arbitration protocols are often impractical for many applications, because tags are usually attached to moving objects and they may have left the readers interrogation region before being counted. Recently, estimation schemes have been proposed to count the approximate number of tags. Most of them, however, suffer from two scalability problems: time inefficiency and multiple-reading. Without resolving these problems, large-scale RFID systems cannot easily apply the estimation scheme as well as the corresponding identification. In this paper, we present the Lottery Frame (LoF) estimation scheme, which can achieve high accuracy, low latency, and scalability. LoF estimates the tag numbers by utilizing the collision information. We show the significant advantages, e.g., high accuracy, short processing time, and low overhead, of the proposed LoF scheme through analysis and simulations.

Journal ArticleDOI
TL;DR: Techniques for enhancing the memory efficiency of applications on data-parallel architectures are presented, based on the analysis and characterization of memory access patterns in loop bodies; they target vectorization via data transformation to benefit vector-based architectures and algorithmic memory selection for scalar- based architectures.
Abstract: The introduction of General-Purpose computation on GPUs (GPGPUs) has changed the landscape for the future of parallel computing. At the core of this phenomenon are massively multithreaded, data-parallel architectures possessing impressive acceleration ratings, offering low-cost supercomputing together with attractive power budgets. Even given the numerous benefits provided by GPGPUs, there remain a number of barriers that delay wider adoption of these architectures. One major issue is the heterogeneous and distributed nature of the memory subsystem commonly found on data-parallel architectures. Application acceleration is highly dependent on being able to utilize the memory subsystem effectively so that all execution units remain busy. In this paper, we present techniques for enhancing the memory efficiency of applications on data-parallel architectures, based on the analysis and characterization of memory access patterns in loop bodies; we target vectorization via data transformation to benefit vector-based architectures (e.g., AMD GPUs) and algorithmic memory selection for scalar-based architectures (e.g., NVIDIA GPUs). We demonstrate the effectiveness of our proposed methods with kernels from a wide range of benchmark suites. For the benchmark kernels studied, we achieve consistent and significant performance improvements (up to 11.4× and 13.5× over baseline GPU implementations on each platform, respectively) by applying our proposed methodology.

Journal ArticleDOI
TL;DR: An energy-efficient framework for clustering-based data collection in wireless sensor networks by integrating adaptively enabling/disabling prediction scheme and shows how sleep/awake scheduling can be applied, which takes the framework approach to designing a practical algorithm for data aggregation.
Abstract: For many applications in wireless sensor networks (WSNs), users may want to continuously extract data from the networks for analysis later. However, accurate data extraction is difficult-it is often too costly to obtain all sensor readings, as well as not necessary in the sense that the readings themselves only represent samples of the true state of the world. Clustering and prediction techniques, which exploit spatial and temporal correlation among the sensor data provide opportunities for reducing the energy consumption of continuous sensor data collection. Integrating clustering and prediction techniques makes it essential to design a new data collection scheme, so as to achieve network energy efficiency and stability. We propose an energy-efficient framework for clustering-based data collection in wireless sensor networks by integrating adaptively enabling/disabling prediction scheme. Our framework is clustering based. A cluster head represents all sensor nodes in the cluster and collects data values from them. To realize prediction techniques efficiently in WSNs, we present adaptive scheme to control prediction used in our framework, analyze the performance tradeoff between reducing communication cost and limiting prediction cost, and design algorithms to exploit the benefit of adaptive scheme to enable/disable prediction operations. Our framework is general enough to incorporate many advanced features and we show how sleep/awake scheduling can be applied, which takes our framework approach to designing a practical algorithm for data aggregation: it avoids the need for rampant node-to-node propagation of aggregates, but rather it uses faster and more efficient cluster-to-cluster propagation. To the best of our knowledge, this is the first work adaptively enabling/disabling prediction scheme for clustering-based continuous data collection in sensor networks. Our proposed models, analysis, and framework are validated via simulation and comparison with competing techniques.

Journal ArticleDOI
TL;DR: This paper proposes a novel traceback method for DDoS attacks that is based on entropy variations between normal and DDoS attack traffic, which is fundamentally different from commonly used packet marking techniques and is memory nonintensive, efficiently scalable, robust against packet pollution, and independent of attack traffic patterns.
Abstract: Distributed Denial-of-Service (DDoS) attacks are a critical threat to the Internet. However, the memoryless feature of the Internet routing mechanisms makes it extremely hard to trace back to the source of these attacks. As a result, there is no effective and efficient method to deal with this issue so far. In this paper, we propose a novel traceback method for DDoS attacks that is based on entropy variations between normal and DDoS attack traffic, which is fundamentally different from commonly used packet marking techniques. In comparison to the existing DDoS traceback methods, the proposed strategy possesses a number of advantages - it is memory nonintensive, efficiently scalable, robust against packet pollution, and independent of attack traffic patterns. The results of extensive experimental and simulation studies are presented to demonstrate the effectiveness and efficiency of the proposed method. Our experiments show that accurate traceback is possible within 20 seconds (approximately) in a large-scale attack network with thousands of zombies.

Journal ArticleDOI
TL;DR: Several versions of the CRO algorithm, a population-based metaheuristic inspired by the interactions between molecules in a chemical reaction, are proposed for grid scheduling problem and compared with four other acknowledged metaheuristics on a wide range of instances.
Abstract: Grid computing solves high performance and high-throughput computing problems through sharing resources ranging from personal computers to supercomputers distributed around the world. One of the major problems is task scheduling, i.e., allocating tasks to resources. In addition to Makespan and Flowtime, we also take reliability of resources into account, and task scheduling is formulated as an optimization problem with three objectives. This is an NP-hard problem, and thus, metaheuristic approaches are employed to find the optimal solutions. In this paper, several versions of the Chemical Reaction Optimization (CRO) algorithm are proposed for the grid scheduling problem. CRO is a population-based metaheuristic inspired by the interactions between molecules in a chemical reaction. We compare these CRO methods with four other acknowledged metaheuristics on a wide range of instances. Simulation results show that the CRO methods generally perform better than existing methods and performance improvement is especially significant in large-scale applications.

Journal ArticleDOI
TL;DR: Co-Con is proposed, a novel cluster-level control architecture that coordinates individual power and performance control loops for virtualized server clusters that can simultaneously provide effective control on both application-level performance and underlying power consumption.
Abstract: Today's data centers face two critical challenges. First, various customers need to be assured by meeting their required service-level agreements such as response time and throughput. Second, server power consumption must be controlled in order to avoid failures caused by power capacity overload or system overheating due to increasing high server density. However, existing work controls power and application-level performance separately, and thus, cannot simultaneously provide explicit guarantees on both. In addition, as power and performance control strategies may come from different hardware/software vendors and coexist at different layers, it is more feasible to coordinate various strategies to achieve the desired control objectives than relying on a single centralized control strategy. This paper proposes Co-Con, a novel cluster-level control architecture that coordinates individual power and performance control loops for virtualized server clusters. To emulate the current practice in data centers, the power control loop changes hardware power states with no regard to the application-level performance. The performance control loop is then designed for each virtual machine to achieve the desired performance even when the system model varies significantly due to the impact of power control. Co-Con configures the two control loops rigorously, based on feedback control theory, for theoretically guaranteed control accuracy and system stability. Empirical results on a physical testbed demonstrate that Co-Con can simultaneously provide effective control on both application-level performance and underlying power consumption.

Journal ArticleDOI
TL;DR: The experimental results show that, the GPU-CPU coprocessing of Mars on an NVIDIA GTX280 GPU and an Intel quad-core CPU outperformed Phoenix, the state-of-the-art MapReduce on the multicore CPU with a speedup of up to 72 times and 24 times on average, depending on the applications.
Abstract: We design and implement Mars, a MapReduce runtime system accelerated with graphics processing units (GPUs). MapReduce is a simple and flexible parallel programming paradigm originally proposed by Google, for the ease of large-scale data processing on thousands of CPUs. Compared with CPUs, GPUs have an order of magnitude higher computation power and memory bandwidth. However, GPUs are designed as special-purpose coprocessors and their programming interfaces are less familiar than those on the CPUs to MapReduce programmers. To harness GPUs' power for MapReduce, we developed Mars to run on NVIDIA GPUs, AMD GPUs as well as multicore CPUs. Furthermore, we integrated Mars into Hadoop, an open-source CPU-based MapReduce system. Mars hides the programming complexity of GPUs behind the simple and familiar MapReduce interface, and automatically manages task partitioning, data distribution, and parallelization on the processors. We have implemented six representative applications on Mars and evaluated their performance on PCs equipped with GPUs as well as multicore CPUs. The experimental results show that, the GPU-CPU coprocessing of Mars on an NVIDIA GTX280 GPU and an Intel quad-core CPU outperformed Phoenix, the state-of-the-art MapReduce on the multicore CPU with a speedup of up to 72 times and 24 times on average, depending on the applications. Additionally, integrating Mars into Hadoop enabled GPU acceleration for a network of PCs.

Journal ArticleDOI
In Kyu Park1, Nitin Singhal2, Man Hee Lee1, Sung-Dae Cho2, Chris W Kim3 
TL;DR: This paper construe key factors in design and evaluation of image processing algorithms on the massive parallel graphics processing units (GPUs) using the compute unified device architecture (CUDA) programming model and proposes a set of metrics, customized for image processing, to quantitatively evaluate algorithm characteristics.
Abstract: In this paper, we construe key factors in design and evaluation of image processing algorithms on the massive parallel graphics processing units (GPUs) using the compute unified device architecture (CUDA) programming model. A set of metrics, customized for image processing, is proposed to quantitatively evaluate algorithm characteristics. In addition, we show that a range of image processing algorithms map readily to CUDA using multiview stereo matching, linear feature extraction, JPEG2000 image encoding, and nonphotorealistic rendering (NPR) as our example applications. The algorithms are carefully selected from major domains of image processing, so they inherently contain a variety of subalgorithms with diverse characteristics when implemented on the GPU. Performance is evaluated in terms of execution time and is compared to the fastest host-only version implemented using OpenMP. It is shown that the observed speedup varies extensively depending on the characteristics of each algorithm. Intensive analysis is conducted to show the appropriateness of the proposed metrics in predicting the effectiveness of an application for parallel implementation.

Journal ArticleDOI
TL;DR: The proposed scheme exploits a novel cryptographic primitive called attribute-based encryption (ABE), tailors, and adapts it for WSNs with respect to both performance and security requirements and is the first to realize distributed fine-grained data access control for W SNs.
Abstract: Distributed sensor data storage and retrieval have gained increasing popularity in recent years for supporting various applications. While distributed architecture enjoys a more robust and fault-tolerant wireless sensor network (WSN), such architecture also poses a number of security challenges especially when applied in mission-critical applications such as battlefield and e-healthcare. First, as sensor data are stored and maintained by individual sensors and unattended sensors are easily subject to strong attacks such as physical compromise, it is significantly harder to ensure data security. Second, in many mission-critical applications, fine-grained data access control is a must as illegal access to the sensitive data may cause disastrous results and/or be prohibited by the law. Last but not least, sensor nodes usually are resource-constrained, which limits the direct adoption of expensive cryptographic primitives. To address the above challenges, we propose, in this paper, a distributed data access control scheme that is able to enforce fine-grained access control over sensor data and is resilient against strong attacks such as sensor compromise and user colluding. The proposed scheme exploits a novel cryptographic primitive called attribute-based encryption (ABE), tailors, and adapts it for WSNs with respect to both performance and security requirements. The feasibility of the scheme is demonstrated by experiments on real sensor platforms. To our best knowledge, this paper is the first to realize distributed fine-grained data access control for WSNs.

Journal ArticleDOI
TL;DR: This work develops a high-performance tracker crawler, and over a narrow window of 12 hours, crawl essentially all of the public BitTorrent Ecosystem's trackers, obtaining peer lists for all referenced torrents.
Abstract: BitTorrent is the most successful open Internet application for content distribution. Despite its importance, both in terms of its footprint in the Internet and the influence it has on emerging P2P applications, the BitTorrent Ecosystem is only partially understood. We seek to provide a nearly complete picture of the entire public BitTorrent Ecosystem. To this end, we crawl five of the most popular torrent-discovery sites over a ine-month period, identifying all of 4.6 million and 38,996 trackers that the sites reference. We also develop a high-performance tracker crawler, and over a narrow window of 12 hours, crawl essentially all of the public Ecosystem's trackers, obtaining peer lists for all referenced torrents. Complementing the torrent-discovery site and tracker crawling, we further crawl Azureus and Mainline DHTs for a random sample of torrents. Our resulting measurement data are more than an order of magnitude larger (in terms of number of torrents, trackers, or peers) than any earlier study. Using this extensive data set, we study in-depth the Ecosystem's torrent-discovery, tracker, peer, user behavior, and content landscapes. For peer statistics, the analysis is based on one typical snapshot obtained over 12 hours. We further analyze the fragility of the Ecosystem upon the removal of its most important tracker service.

Journal ArticleDOI
TL;DR: TASA relaxes the monitored objects from attaching RFID tags, online recovers and checks frequent trajectories by capturing the Received Signal Strength Indicator (RSSI) series for passive RFID tag arrays where objects traverse, and introduces reference tags with known positions.
Abstract: Radio Frequency IDentification (RFID) has attracted considerable attention in recent years for its low cost, general availability, and location sensing functionality. Most existing schemes require the tracked persons to be labeled with RFID tags. This requirement may not be satisfied for some activity sensing applications due to privacy and security concerns and uncertainty of objects to be monitored, e.g., group behavior monitoring in warehouses with privacy limitations, and abnormal customers in banks. In this paper, we propose TASA-Tag-free Activity Sensing using RFID tag Arrays for location sensing and frequent route detection. TASA relaxes the monitored objects from attaching RFID tags, online recovers and checks frequent trajectories by capturing the Received Signal Strength Indicator (RSSI) series for passive RFID tag arrays where objects traverse. In order to improve the accuracy for estimated trajectories and accelerate location sensing, TASA introduces reference tags with known positions. With the readings from reference tags, TASA can locate objects more accurately. Extensive experiment shows that TASA is an effective approach for certain activity sensing applications.

Journal ArticleDOI
TL;DR: A traffic-aware dynamic routing (TADR) algorithm is proposed to route packets around the congestion areas and scatter the excessive packets along multiple paths consisting of idle and underloaded nodes to alleviate congestion and improve the overall throughput in WSNs.
Abstract: The congestion problem in Wireless Sensor Networks (WSNs) is quite different from that in traditional networks. Most current congestion control algorithms try to alleviate the congestion by reducing the rate at which the source nodes inject packets into the network. However, this traffic control scheme always decreases the throughput so as to violate fidelity level required by the applications. In this paper, we present a solution that sufficiently exerts the idle or underloaded nodes to alleviate congestion and improve the overall throughput in WSNs. To achieve this goal, a traffic-aware dynamic routing (TADR) algorithm is proposed to route packets around the congestion areas and scatter the excessive packets along multiple paths consisting of idle and underloaded nodes. Utilizing the concept of potential in classical physics, our TADR algorithm is designed through constructing a hybrid virtual potential field using depth and normalized queue length to force the packets to steer clear of obstacles created by congestion and eventually move toward the sink. The simulation results show that the proposed solution improves the overall throughput by around 370 percent as compared to MintRoute, which is one of benchmark routing protocols. Furthermore, TADR scheme has low overhead suitable for large-scale, dense sensor networks.

Journal ArticleDOI
TL;DR: This paper considers a category of rogue access points (APs) that pretend to be legitimate APs to lure users to connect to them and proposes a practical timing-based technique that allows the user to avoid connecting to rogue APs.
Abstract: This paper considers a category of rogue access points (APs) that pretend to be legitimate APs to lure users to connect to them. We propose a practical timing-based technique that allows the user to avoid connecting to rogue APs. Our detection scheme is a client-centric approach that employs the round trip time between the user and the DNS server to independently determine whether an AP is a rogue AP without assistance from the WLAN operator. We implemented our detection technique on commercially available wireless cards to evaluate their performance. Extensive experiments have demonstrated the accuracy, effectiveness, and robustness of our approach. The algorithm achieves close to 100 percent accuracy in distinguishing rogue APs from legitimate APs in lightly loaded traffic conditions, and larger than 60 percent accuracy in heavy traffic conditions. At the same time, the detection only requires less than 1 second for lightly-loaded traffic conditions and tens of seconds for heavy traffic conditions.

Journal ArticleDOI
TL;DR: This paper demonstrates that mixed precision schemes constitute a significant performance gain over native double precision and presents a new implementation of cyclic reduction for the parallel solution of tridiagonal systems and employs this scheme as a line relaxation smoother in a GPU-based multigrid solver.
Abstract: We have previously suggested mixed precision iterative solvers specifically tailored to the iterative solution of sparse linear equation systems as they typically arise in the finite element discretization of partial differential equations. These schemes have been evaluated for a number of hardware platforms, in particular, single-precision GPUs as accelerators to the general purpose CPU. This paper reevaluates the situation with new mixed precision solvers that run entirely on the GPU: We demonstrate that mixed precision schemes constitute a significant performance gain over native double precision. Moreover, we present a new implementation of cyclic reduction for the parallel solution of tridiagonal systems and employ this scheme as a line relaxation smoother in our GPU-based multigrid solver. With an alternating direction implicit variant of this advanced smoother, we can extend the applicability of the GPU multigrid solvers to very ill-conditioned systems arising from the discretization on anisotropic meshes, that previously had to be solved on the CPU. The resulting mixed-precision schemes are always faster than double precision alone, and outperform tuned CPU solvers consistently by almost an order of magnitude.

Journal ArticleDOI
TL;DR: It is shown that OpenCL provides application portability between multicore processors and GPUs, but may incur a performance cost and it is illustrated that graphics accelerators can make simulations involving large numbers of particles feasible.
Abstract: Multicore processors and a variety of accelerators have allowed scientific applications to scale to larger problem sizes. We present a performance, design methodology, platform, and architectural comparison of several application accelerators executing a Quantum Monte Carlo application. We compare the application's performance and programmability on a variety of platforms including CUDA with Nvidia GPUs, Brook+ with ATI graphics accelerators, OpenCL running on both multicore and graphics processors, C++ running on multicore processors, and a VHDL implementation running on a Xilinx FPGA. We show that OpenCL provides application portability between multicore processors and GPUs, but may incur a performance cost. Furthermore, we illustrate that graphics accelerators can make simulations involving large numbers of particles feasible.

Journal ArticleDOI
TL;DR: Experimental results show that the CR/TR-Motion approach can drastically reduce migration overheads compared with memory-to-memory approach in a LAN, and for a variety of workloads migrated across WANs, the migration downtime is less than 300 milliseconds.
Abstract: Live migration of virtual machines (VM) across physical hosts provides a significant new benefit for administrators of data centers and clusters. Previous memory-to-memory approaches demonstrate the effectiveness of live VM migration in local area networks (LAN), but they would cause a long period of downtime in a wide area network (WAN) environment. This paper describes the design and implementation of a novel approach, namely, CR/TR-Motion, which adopts checkpointing/recovery and trace/replay technologies to provide fast, transparent VM migration for both LAN and WAN environments. With execution trace logged on the source host, a synchronization algorithm is performed to orchestrate the running source and target VMs until they reach a consistent state. CR/TR-Motion can greatly reduce the migration downtime and network bandwidth consumption. Experimental results show that the approach can drastically reduce migration overheads compared with memory-to-memory approach in a LAN: up to 72.4 percent on application observed downtime, up to 31.5 percent on total migration time, and up to 95.9 percent on the data to synchronize the VM state. The application performance overhead due to migration is kept within 8.54 percent on average. The results also show that for a variety of workloads migrated across WANs, the migration downtime is less than 300 milliseconds.

Journal ArticleDOI
TL;DR: This paper describes an effective method for discovering subsets of hosts whose availability have similar statistical properties and can be modeled with similar probability distributions and shows that these methods and models are critical for the design of stochastic scheduling algorithms across large systems where host availability is uncertain.
Abstract: In the age of cloud, Grid, P2P, and volunteer distributed computing, large-scale systems with tens of thousands of unreliable hosts are increasingly common. Invariably, these systems are composed of heterogeneous hosts whose individual availability often exhibit different statistical properties (for example stationary versus nonstationary behavior) and fit different models (for example exponential, Weibull, or Pareto probability distributions). In this paper, we describe an effective method for discovering subsets of hosts whose availability have similar statistical properties and can be modeled with similar probability distributions. We apply this method with about 230,000 host availability traces obtained from a real Internet-distributed system, namely SETI@home. We find that about 21 percent of hosts exhibit availability, that is, a truly random process, and that these hosts can often be modeled accurately with a few distinct distributions from different families. We show that our models are useful and accurate in the context of a scheduling problem that deals with resource brokering. We believe that these methods and models are critical for the design of stochastic scheduling algorithms across large systems where host availability is uncertain.

Journal ArticleDOI
TL;DR: This work investigates a new dynamic resource provisioning method for MMOG operation using external data centers as low-cost resource providers and introduces a combined MMOG processor, network, and memory load model that takes into account both the player interaction type and the population size.
Abstract: Today's Massively Multiplayer Online Games (MMOGs) can include millions of concurrent players spread across the world and interacting with each other within a single session. Faced with high resource demand variability and with misfit resource renting policies, the current industry practice is to overprovision for each game tens of self-owned data centers, making the market entry affordable only for big companies. Focusing on the reduction of entry and operational costs, we investigate a new dynamic resource provisioning method for MMOG operation using external data centers as low-cost resource providers. First, we identify in the various types of player interaction a source of short-term load variability, which complements the long-term load variability due to the size of the player population. Then, we introduce a combined MMOG processor, network, and memory load model that takes into account both the player interaction type and the population size. Our model is best used for estimating the MMOG resource demand dynamically, and thus, for dynamic resource provisioning based on the game world entity distribution. We evaluate several classes of online predictors for MMOG entity distribution and propose and tune a neural network-based predictor to deliver good accuracy consistently under real-time performance constraints. We assess using trace-based simulation the impact of the data center policies on the quality of resource provisioning. We find that the dynamic resource provisioning can be much more efficient than its static alternative even when the external data centers are busy, and that data centers with policies unsuitable for MMOGs are penalized by our dynamic resource provisioning method. Finally, we present experimental results showing the real-time parallelization and load balancing of a real game prototype using data center resources provisioned using our method and show its advantage against a rudimentary client threshold approach.

Journal ArticleDOI
Jaehoon Jeong1, Shuo Guo1, Yu Gu1, Tian He1, David H. C. Du1 
TL;DR: This paper presents the first attempt to effectively utilize vehicles' trajectory information in a privacy-preserving manner and outperforms the existing scheme in terms of both the data delivery delay and packet delivery ratio.
Abstract: This paper proposes a Trajectory-Based Data (TBD) Forwarding scheme, tailored for the data forwarding for roadside reports in light-traffic vehicular ad hoc networks. State-of-the-art schemes have demonstrated the effectiveness of their data forwarding strategies by exploiting known vehicular traffic statistics (e.g., densities and speeds). These results are encouraging, however, further improvements can be made by taking advantage of the growing popularity of GPS-based navigation systems. This paper presents the first attempt to effectively utilize vehicles' trajectory information in a privacy-preserving manner. In our design, such trajectory information is combined with the vehicular traffic statistics for a better performance. In a distributed way, each individual vehicle computes its end-to-end expected delivery delay to the Internet access points based on its position on its vehicle trajectory and exchanges this delay with neighboring vehicles to determine the best next-hop vehicle. For the accurate end-to-end delay computation, this paper also proposes a link delay model to estimate the packet forwarding delay on a road segment. Through theoretical analysis and extensive simulation, it is shown that our link delay model provides the accurate link delay estimation and our forwarding design outperforms the existing scheme in terms of both the data delivery delay and packet delivery ratio.