Showing papers in "IEEE Transactions on Parallel and Distributed Systems in 2011"

PDF

Open Access

Journal Article•DOI•

Enabling Public Auditability and Data Dynamics for Storage Security in Cloud Computing

[...]

Qian Wang¹, Cong Wang¹, Kui Ren¹, Wenjing Lou, Jin Li² - Show less +1 more•Institutions (2)

Illinois Institute of Technology¹, Guangzhou University²

01 May 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: To achieve efficient data dynamics, the existing proof of storage models are improved by manipulating the classic Merkle Hash Tree construction for block tag authentication, and an elegant verification scheme is constructed for the seamless integration of these two salient features in the protocol design.

...read moreread less

Abstract: Cloud Computing has been envisioned as the next-generation architecture of IT Enterprise. It moves the application software and databases to the centralized large data centers, where the management of the data and services may not be fully trustworthy. This unique paradigm brings about many new security challenges, which have not been well understood. This work studies the problem of ensuring the integrity of data storage in Cloud Computing. In particular, we consider the task of allowing a third party auditor (TPA), on behalf of the cloud client, to verify the integrity of the dynamic data stored in the cloud. The introduction of TPA eliminates the involvement of the client through the auditing of whether his data stored in the cloud are indeed intact, which can be important in achieving economies of scale for Cloud Computing. The support for data dynamics via the most general forms of data operation, such as block modification, insertion, and deletion, is also a significant step toward practicality, since services in Cloud Computing are not limited to archive or backup data only. While prior works on ensuring remote data integrity often lacks the support of either public auditability or dynamic data operations, this paper achieves both. We first identify the difficulties and potential security problems of direct extensions with fully dynamic data updates from prior works and then show how to construct an elegant verification scheme for the seamless integration of these two salient features in our protocol design. In particular, to achieve efficient data dynamics, we improve the existing proof of storage models by manipulating the classic Merkle Hash Tree construction for block tag authentication. To support efficient handling of multiple auditing tasks, we further explore the technique of bilinear aggregate signature to extend our main result into a multiuser setting, where TPA can perform multiple auditing tasks simultaneously. Extensive security and performance analysis show that the proposed schemes are highly efficient and provably secure.

...read moreread less

1,422 citations

Journal Article•DOI•

Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing

[...]

Alexandru Iosup¹, Simon Ostermann², M N Yigitbasi¹, Radu Prodan², Thomas Fahringer², Dick Epema¹ - Show less +2 more•Institutions (2)

Delft University of Technology¹, University of Innsbruck²

01 Jun 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The results indicate that the current clouds need an order of magnitude in performance improvement to be useful to the scientific community, and show which improvements should be considered first to address this discrepancy between offer and demand.

...read moreread less

Abstract: Cloud computing is an emerging commercial infrastructure paradigm that promises to eliminate the need for maintaining expensive computing facilities by companies and institutes alike. Through the use of virtualization and resource time sharing, clouds serve with a single set of physical resources a large user base with different needs. Thus, clouds have the potential to provide to their owners the benefits of an economy of scale and, at the same time, become an alternative for scientists to clusters, grids, and parallel production environments. However, the current commercial clouds have been built to support web and small database workloads, which are very different from typical scientific computing workloads. Moreover, the use of virtualization and resource time sharing may introduce significant performance penalties for the demanding scientific computing workloads. In this work, we analyze the performance of cloud computing services for scientific computing workloads. We quantify the presence in real scientific computing workloads of Many-Task Computing (MTC) users, that is, of users who employ loosely coupled applications comprising many tasks to achieve their scientific goals. Then, we perform an empirical evaluation of the performance of four commercial cloud computing services including Amazon EC2, which is currently the largest commercial cloud. Last, we compare through trace-based simulation the performance characteristics and cost models of clouds and other scientific computing platforms, for general and MTC-based scientific computing workloads. Our results indicate that the current clouds need an order of magnitude in performance improvement to be useful to the scientific community, and show which improvements should be considered first to address this discrepancy between offer and demand.

...read moreread less

915 citations

Journal Article•DOI•

Attribute-Based Access Control with Efficient Revocation in Data Outsourcing Systems

[...]

Junbeom Hur¹, Dong Kun Noh²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Pai Chai University²

01 Jul 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper proposes an access control mechanism using ciphertext-policy attribute-based encryption to enforce access control policies with efficient attribute and user revocation capability and demonstrates how to apply the proposed mechanism to securely manage the outsourced data.

...read moreread less

Abstract: Some of the most challenging issues in data outsourcing scenario are the enforcement of authorization policies and the support of policy updates. Ciphertext-policy attribute-based encryption is a promising cryptographic solution to these issues for enforcing access control policies defined by a data owner on outsourced data. However, the problem of applying the attribute-based encryption in an outsourced architecture introduces several challenges with regard to the attribute and user revocation. In this paper, we propose an access control mechanism using ciphertext-policy attribute-based encryption to enforce access control policies with efficient attribute and user revocation capability. The fine-grained access control can be achieved by dual encryption mechanism which takes advantage of the attribute-based encryption and selective group key distribution in each attribute group. We demonstrate how to apply the proposed mechanism to securely manage the outsourced data. The analysis results indicate that the proposed scheme is efficient and secure in the data outsourcing systems.

...read moreread less

743 citations

Journal Article•DOI•

Energy Conscious Scheduling for Distributed Computing Systems under Different Operating Conditions

[...]

Young Choon Lee¹, Albert Y. Zomaya¹•Institutions (1)

University of Sydney¹

01 Aug 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This work addresses the problem of scheduling precedence-constrained parallel applications on multiprocessor computer systems and presents two energy-conscious scheduling algorithms using dynamic voltage scaling (DVS) and a novel objective function and a variant from that.

...read moreread less

Abstract: Traditionally, the primary performance goal of computer systems has focused on reducing the execution time of applications while increasing throughput. This performance goal has been mostly achieved by the development of high-density computer systems. As witnessed recently, these systems provide very powerful processing capability and capacity. They often consist of tens or hundreds of thousands of processors and other resource-hungry devices. The energy consumption of these systems has become a major concern. In this paper, we address the problem of scheduling precedence-constrained parallel applications on multiprocessor computer systems and present two energy-conscious scheduling algorithms using dynamic voltage scaling (DVS). A number of recent commodity processors are capable of DVS, which enables processors to operate at different voltage supply levels at the expense of sacrificing clock frequencies. In the context of scheduling, this multiple voltage facility implies that there is a trade-off between the quality of schedules and energy consumption. To effectively balance these two performance goals, we have devised a novel objective function and a variant from that. The main difference between the two algorithms is in their measurement of energy consumption. The extensive comparative evaluations conducted as part of this work show that the performance of our algorithms is very compelling in terms of both application completion time and energy consumption.

...read moreread less

306 citations

Journal Article•DOI•

Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing in the Cloud

[...]

Daniel Warneke¹, Odej Kao¹•Institutions (1)

Technical University of Berlin¹

01 Jun 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's IaaS clouds for both, task scheduling and execution.

...read moreread less

Abstract: In recent years ad hoc parallel data processing has emerged to be one of the killer applications for Infrastructure-as-a-Service (IaaS) clouds. Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. However, the processing frameworks which are currently used have been designed for static, homogeneous cluster setups and disregard the particular nature of a cloud. Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this paper, we discuss the opportunities and challenges for efficient parallel data processing in clouds and present our research project Nephele. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's IaaS clouds for both, task scheduling and execution. Particular tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution. Based on this new framework, we perform extended evaluations of MapReduce-inspired processing jobs on an IaaS cloud system and compare the results to the popular data processing framework Hadoop.

...read moreread less

296 citations

Journal Article•DOI•

Enhancing VANET Performance by Joint Adaptation of Transmission Power and Contention Window Size

[...]

Danda B. Rawat¹, Dimitrie C. Popescu², Gongjun Yan², Stephan Olariu¹•Institutions (2)

Old Dominion University¹, Indiana University²

01 Sep 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A new scheme for dynamic adaptation of transmission power and contention window (CW) size to enhance performance of information dissemination in Vehicular Ad-hoc Networks (VANETs) and features significantly better throughput and lower average end-to-end delay compared with a similar scheme with static parameters.

...read moreread less

Abstract: In this paper, we present a new scheme for dynamic adaptation of transmission power and contention window (CW) size to enhance performance of information dissemination in Vehicular Ad-hoc Networks (VANETs). The proposed scheme incorporates the Enhanced Distributed Channel Access (EDCA) mechanism of 802.11e and uses a joint approach to adapt transmission power at the physical (PHY) layer and quality-of-service (QoS) parameters at the medium access control (MAC) layer. In our scheme, transmission power is adapted based on the estimated local vehicle density to change the transmission range dynamically, while the CW size is adapted according to the instantaneous collision rate to enable service differentiation. In the interest of promoting timely propagation of information, VANET advisories are prioritized according to their urgency and the EDCA mechanism is employed for their dissemination. The performance of the proposed joint adaptation scheme was evaluated using the ns-2 simulator with added EDCA support. Extensive simulations have demonstrated that our scheme features significantly better throughput and lower average end-to-end delay compared with a similar scheme with static parameters.

...read moreread less

245 citations

Journal Article•DOI•

EBRP: Energy-Balanced Routing Protocol for Data Gathering in Wireless Sensor Networks

[...]

Fengyuan Ren¹, Zhang Jiao¹, Tao He¹, Chuang Lin¹, S. K. D. Ren² - Show less +1 more•Institutions (2)

Tsinghua University¹, University of Texas at Austin²

01 Dec 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper designs an Energy-Balanced Routing Protocol (EBRP) by constructing a mixed virtual potential field in terms of depth, energy density, and residual energy and shows significant improvements in energy balance, network lifetime, coverage ratio, and throughput as compared to the commonly used energy-efficient routing algorithm.

...read moreread less

Abstract: Energy is an extremely critical resource for battery-powered wireless sensor networks (WSN), thus making energy-efficient protocol design a key challenging problem. Most of the existing energy-efficient routing protocols always forward packets along the minimum energy path to the sink to merely minimize energy consumption, which causes an unbalanced distribution of residual energy among sensor nodes, and eventually results in a network partition. In this paper, with the help of the concept of potential in physics, we design an Energy-Balanced Routing Protocol (EBRP) by constructing a mixed virtual potential field in terms of depth, energy density, and residual energy. The goal of this basic approach is to force packets to move toward the sink through the dense energy area so as to protect the nodes with relatively low residual energy. To address the routing loop problem emerging in this basic algorithm, enhanced mechanisms are proposed to detect and eliminate loops. The basic algorithm and loop elimination mechanism are first validated through extensive simulation experiments. Finally, the integrated performance of the full potential-based energy-balanced routing algorithm is evaluated through numerous simulations in a random deployed network running event-driven applications, the impact of the parameters on the performance is examined and guidelines for parameter settings are summarized. Our experimental results show that there are significant improvements in energy balance, network lifetime, coverage ratio, and throughput as compared to the commonly used energy-efficient routing algorithm.

...read moreread less

233 citations

Journal Article•DOI•

A Generic Framework for Three-Factor Authentication: Preserving Security and Privacy in Distributed Systems

[...]

Xinyi Huang, Yang Xiang¹, Ashley Chonka¹, Jianying Zhou², Robert H. Deng - Show less +1 more•Institutions (2)

Deakin University¹, Institute for Infocomm Research Singapore²

01 Aug 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A generic and secure framework is proposed to upgrade two-Factor authentication to three-factor authentication, which not only significantly improves the information assurance at low cost but also protects client privacy in distributed systems.

...read moreread less

Abstract: As part of the security within distributed systems, various services and resources need protection from unauthorized use. Remote authentication is the most commonly used method to determine the identity of a remote client. This paper investigates a systematic approach for authenticating clients by three factors, namely password, smart card, and biometrics. A generic and secure framework is proposed to upgrade two-factor authentication to three-factor authentication. The conversion not only significantly improves the information assurance at low cost but also protects client privacy in distributed systems. In addition, our framework retains several practice-friendly properties of the underlying two-factor authentication, which we believe is of independent interest.

...read moreread less

224 citations

Journal Article•DOI•

A Delay-Efficient Algorithm for Data Aggregation in Multihop Wireless Sensor Networks

[...]

Xiaohua Xu¹, Xiang-Yang Li¹, Xufei Mao¹, Shaojie Tang¹, Shiguang Wang¹ - Show less +1 more•Institutions (1)

Illinois Institute of Technology¹

01 Jan 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: An efficient distributed algorithm is proposed that produces a collision-free schedule for data aggregation in WSNs and it is theoretically proved that the delay of the aggregation schedule generated by the algorithm is at most 16R + Δ - 14 time slots.

...read moreread less

Abstract: Data aggregation is a key functionality in wireless sensor networks (WSNs). This paper focuses on data aggregation scheduling problem to minimize the delay (or latency). We propose an efficient distributed algorithm that produces a collision-free schedule for data aggregation in WSNs. We theoretically prove that the delay of the aggregation schedule generated by our algorithm is at most 16R + Δ - 14 time slots. Here, R is the network radius and Δ is the maximum node degree in the communication graph of the original network. Our algorithm significantly improves the previously known best data aggregation algorithm with an upper bound of delay of 24D + 6Δ + 16 time slots, where D is the network diameter (note that D can be as large as 2R). We conduct extensive simulations to study the practical performances of our proposed data aggregation algorithm. Our simulation results corroborate our theoretical results and show that our algorithms perform better in practice. We prove that the overall lower bound of delay for data aggregation under any interference model is max{log n,R}, where n is the network size. We provide an example to show that the lower bound is (approximately) tight under the protocol interference model when rI = r, where rI is the interference range and r is the transmission range. We also derive the lower bound of delay under the protocol interference model when r <; rI <; 3r and rI ≥ 3r.

...read moreread less

224 citations

Journal Article•DOI•

hiCUDA: High-Level GPGPU Programming

[...]

Tianyi David Han¹, Tarek S. Abdelrahman¹•Institutions (1)

University of Toronto¹

01 Jan 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The hiCUDA}, a high-level directive-based language for CUDA programming is designed, which allows programmers to perform tedious tasks in a simpler manner and directly to the sequential code, thus speeding up the porting process.

...read moreread less

Abstract: Graphics Processing Units (GPUs) have become a competitive accelerator for applications outside the graphics domain, mainly driven by the improvements in GPU programmability. Although the Compute Unified Device Architecture (CUDA) is a simple C-like interface for programming NVIDIA GPUs, porting applications to CUDA remains a challenge to average programmers. In particular, CUDA places on the programmer the burden of packaging GPU code in separate functions, of explicitly managing data transfer between the host and GPU memories, and of manually optimizing the utilization of the GPU memory. Practical experience shows that the programmer needs to make significant code changes, often tedious and error-prone, before getting an optimized program. We have designed hiCUDA}, a high-level directive-based language for CUDA programming. It allows programmers to perform these tedious tasks in a simpler manner and directly to the sequential code, thus speeding up the porting process. In this paper, we describe the hiCUDA} directives as well as the design and implementation of a prototype compiler that translates a hiCUDA} program to a CUDA program. Our compiler is able to support real-world applications that span multiple procedures and use dynamically allocated arrays. Experiments using nine CUDA benchmarks show that the simplicity hiCUDA} provides comes at no expense to performance.

...read moreread less

217 citations

Journal Article•DOI•

Energy-Efficient Opportunistic Routing in Wireless Sensor Networks

[...]

Xufei Mao¹, Shaojie Tang², Xiahua Xu², Xiang-Yang Li¹, Huadong Ma³ - Show less +1 more•Institutions (3)

Tsinghua University¹, Illinois Institute of Technology², Beijing University of Posts and Telecommunications³

01 Nov 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper presents an energy-efficient opportunistic routing strategy, denoted as EEOR, and extensive simulations in TOSSIM show that the protocol EEOR performs better than the well-known ExOR protocol in terms of the energy consumption, the packet loss ratio, and the average delivery delay.

...read moreread less

Abstract: Opportunistic routing, has been shown to improve the network throughput, by allowing nodes that overhear the transmission and closer to the destination to participate in forwarding packets, i.e., in forwarder list. The nodes in forwarder list are prioritized and the lower priority forwarder will discard the packet if the packet has been forwarded by a higher priority forwarder. One challenging problem is to select and prioritize forwarder list such that a certain network performance is optimized. In this paper, we focus on selecting and prioritizing forwarder list to minimize energy consumption by all nodes. We study both cases where the transmission power of each node is fixed or dynamically adjustable. We present an energy-efficient opportunistic routing strategy, denoted as EEOR. Our extensive simulations in TOSSIM show that our protocol EEOR performs better than the well-known ExOR protocol (when adapted in sensor networks) in terms of the energy consumption, the packet loss ratio, and the average delivery delay.

...read moreread less

Journal Article•DOI•

Cardinality Estimation for Large-Scale RFID Systems

[...]

Chen Qian¹, Hoilun Ngan², Yunhao Liu³, Lionel M. Ni²•Institutions (3)

University of Texas at Austin¹, Hong Kong University of Science and Technology², Tsinghua University³

01 Sep 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper presents the Lottery Frame (LoF) estimation scheme, which can achieve high accuracy, low latency, and scalability, and shows the significant advantages, e.g.,High accuracy, short processing time, and low overhead, of the proposed LoF scheme through analysis and simulations.

...read moreread less

Abstract: Counting the number of RFID tags (cardinality) is a fundamental problem for large-scale RFID systems. Not only does it satisfy some real application requirements, it also acts as an important aid for RFID identification. Due to the extremely long processing time, slotted ALOHA-based or tree-based arbitration protocols are often impractical for many applications, because tags are usually attached to moving objects and they may have left the readers interrogation region before being counted. Recently, estimation schemes have been proposed to count the approximate number of tags. Most of them, however, suffer from two scalability problems: time inefficiency and multiple-reading. Without resolving these problems, large-scale RFID systems cannot easily apply the estimation scheme as well as the corresponding identification. In this paper, we present the Lottery Frame (LoF) estimation scheme, which can achieve high accuracy, low latency, and scalability. LoF estimates the tag numbers by utilizing the collision information. We show the significant advantages, e.g., high accuracy, short processing time, and low overhead, of the proposed LoF scheme through analysis and simulations.

...read moreread less

Journal Article•DOI•

Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures

[...]

Byunghyun Jang¹, Dana Schaa¹, Perhaad Mistry¹, David Kaeli¹•Institutions (1)

Northeastern University¹

01 Jan 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Techniques for enhancing the memory efficiency of applications on data-parallel architectures are presented, based on the analysis and characterization of memory access patterns in loop bodies; they target vectorization via data transformation to benefit vector-based architectures and algorithmic memory selection for scalar- based architectures.

...read moreread less

Abstract: The introduction of General-Purpose computation on GPUs (GPGPUs) has changed the landscape for the future of parallel computing. At the core of this phenomenon are massively multithreaded, data-parallel architectures possessing impressive acceleration ratings, offering low-cost supercomputing together with attractive power budgets. Even given the numerous benefits provided by GPGPUs, there remain a number of barriers that delay wider adoption of these architectures. One major issue is the heterogeneous and distributed nature of the memory subsystem commonly found on data-parallel architectures. Application acceleration is highly dependent on being able to utilize the memory subsystem effectively so that all execution units remain busy. In this paper, we present techniques for enhancing the memory efficiency of applications on data-parallel architectures, based on the analysis and characterization of memory access patterns in loop bodies; we target vectorization via data transformation to benefit vector-based architectures (e.g., AMD GPUs) and algorithmic memory selection for scalar-based architectures (e.g., NVIDIA GPUs). We demonstrate the effectiveness of our proposed methods with kernels from a wide range of benchmark suites. For the benchmark kernels studied, we achieve consistent and significant performance improvements (up to 11.4× and 13.5× over baseline GPU implementations on each platform, respectively) by applying our proposed methodology.

...read moreread less

Journal Article•DOI•

Prediction or Not? An Energy-Efficient Framework for Clustering-Based Data Collection in Wireless Sensor Networks

[...]

Hongbo Jiang¹, Shudong Jin², Chonggang Wang³•Institutions (3)

Huazhong University of Science and Technology¹, Case Western Reserve University², Princeton University³

01 Jun 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: An energy-efficient framework for clustering-based data collection in wireless sensor networks by integrating adaptively enabling/disabling prediction scheme and shows how sleep/awake scheduling can be applied, which takes the framework approach to designing a practical algorithm for data aggregation.

...read moreread less

Abstract: For many applications in wireless sensor networks (WSNs), users may want to continuously extract data from the networks for analysis later. However, accurate data extraction is difficult-it is often too costly to obtain all sensor readings, as well as not necessary in the sense that the readings themselves only represent samples of the true state of the world. Clustering and prediction techniques, which exploit spatial and temporal correlation among the sensor data provide opportunities for reducing the energy consumption of continuous sensor data collection. Integrating clustering and prediction techniques makes it essential to design a new data collection scheme, so as to achieve network energy efficiency and stability. We propose an energy-efficient framework for clustering-based data collection in wireless sensor networks by integrating adaptively enabling/disabling prediction scheme. Our framework is clustering based. A cluster head represents all sensor nodes in the cluster and collects data values from them. To realize prediction techniques efficiently in WSNs, we present adaptive scheme to control prediction used in our framework, analyze the performance tradeoff between reducing communication cost and limiting prediction cost, and design algorithms to exploit the benefit of adaptive scheme to enable/disable prediction operations. Our framework is general enough to incorporate many advanced features and we show how sleep/awake scheduling can be applied, which takes our framework approach to designing a practical algorithm for data aggregation: it avoids the need for rampant node-to-node propagation of aggregates, but rather it uses faster and more efficient cluster-to-cluster propagation. To the best of our knowledge, this is the first work adaptively enabling/disabling prediction scheme for clustering-based continuous data collection in sensor networks. Our proposed models, analysis, and framework are validated via simulation and comparison with competing techniques.

...read moreread less

Journal Article•DOI•

Traceback of DDoS Attacks Using Entropy Variations

[...]

Shui Yu¹, Wanlei Zhou¹, Robin Doss¹, Weijia Jia²•Institutions (2)

Deakin University¹, City University of Hong Kong²

01 Mar 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper proposes a novel traceback method for DDoS attacks that is based on entropy variations between normal and DDoS attack traffic, which is fundamentally different from commonly used packet marking techniques and is memory nonintensive, efficiently scalable, robust against packet pollution, and independent of attack traffic patterns.

...read moreread less

Abstract: Distributed Denial-of-Service (DDoS) attacks are a critical threat to the Internet. However, the memoryless feature of the Internet routing mechanisms makes it extremely hard to trace back to the source of these attacks. As a result, there is no effective and efficient method to deal with this issue so far. In this paper, we propose a novel traceback method for DDoS attacks that is based on entropy variations between normal and DDoS attack traffic, which is fundamentally different from commonly used packet marking techniques. In comparison to the existing DDoS traceback methods, the proposed strategy possesses a number of advantages - it is memory nonintensive, efficiently scalable, robust against packet pollution, and independent of attack traffic patterns. The results of extensive experimental and simulation studies are presented to demonstrate the effectiveness and efficiency of the proposed method. Our experiments show that accurate traceback is possible within 20 seconds (approximately) in a large-scale attack network with thousands of zombies.

...read moreread less

Journal Article•DOI•

Chemical Reaction Optimization for Task Scheduling in Grid Computing

[...]

Jin Xu¹, Albert Y. S. Lam², Victor O. K. Li¹•Institutions (2)

University of Hong Kong¹, University of California, Berkeley²

28 Jan 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Several versions of the CRO algorithm, a population-based metaheuristic inspired by the interactions between molecules in a chemical reaction, are proposed for grid scheduling problem and compared with four other acknowledged metaheuristics on a wide range of instances.

...read moreread less

Abstract: Grid computing solves high performance and high-throughput computing problems through sharing resources ranging from personal computers to supercomputers distributed around the world. One of the major problems is task scheduling, i.e., allocating tasks to resources. In addition to Makespan and Flowtime, we also take reliability of resources into account, and task scheduling is formulated as an optimization problem with three objectives. This is an NP-hard problem, and thus, metaheuristic approaches are employed to find the optimal solutions. In this paper, several versions of the Chemical Reaction Optimization (CRO) algorithm are proposed for the grid scheduling problem. CRO is a population-based metaheuristic inspired by the interactions between molecules in a chemical reaction. We compare these CRO methods with four other acknowledged metaheuristics on a wide range of instances. Simulation results show that the CRO methods generally perform better than existing methods and performance improvement is especially significant in large-scale applications.

...read moreread less

Journal Article•DOI•

Coordinating Power Control and Performance Management for Virtualized Server Clusters

[...]

Xiaorui Wang¹, Yefu Wang¹•Institutions (1)

University of Tennessee¹

01 Feb 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Co-Con is proposed, a novel cluster-level control architecture that coordinates individual power and performance control loops for virtualized server clusters that can simultaneously provide effective control on both application-level performance and underlying power consumption.

...read moreread less

Abstract: Today's data centers face two critical challenges. First, various customers need to be assured by meeting their required service-level agreements such as response time and throughput. Second, server power consumption must be controlled in order to avoid failures caused by power capacity overload or system overheating due to increasing high server density. However, existing work controls power and application-level performance separately, and thus, cannot simultaneously provide explicit guarantees on both. In addition, as power and performance control strategies may come from different hardware/software vendors and coexist at different layers, it is more feasible to coordinate various strategies to achieve the desired control objectives than relying on a single centralized control strategy. This paper proposes Co-Con, a novel cluster-level control architecture that coordinates individual power and performance control loops for virtualized server clusters. To emulate the current practice in data centers, the power control loop changes hardware power states with no regard to the application-level performance. The performance control loop is then designed for each virtual machine to achieve the desired performance even when the system model varies significantly due to the impact of power control. Co-Con configures the two control loops rigorously, based on feedback control theory, for theoretically guaranteed control accuracy and system stability. Empirical results on a physical testbed demonstrate that Co-Con can simultaneously provide effective control on both application-level performance and underlying power consumption.

...read moreread less

Journal Article•DOI•

Mars: Accelerating MapReduce with Graphics Processors

[...]

Wenbin Fang¹, Bingsheng He², Qiong Luo³, Naga K. Govindaraju⁴•Institutions (4)

University of Wisconsin-Madison¹, Nanyang Technological University², Hong Kong University of Science and Technology³, Microsoft⁴

01 Apr 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The experimental results show that, the GPU-CPU coprocessing of Mars on an NVIDIA GTX280 GPU and an Intel quad-core CPU outperformed Phoenix, the state-of-the-art MapReduce on the multicore CPU with a speedup of up to 72 times and 24 times on average, depending on the applications.

...read moreread less

Abstract: We design and implement Mars, a MapReduce runtime system accelerated with graphics processing units (GPUs). MapReduce is a simple and flexible parallel programming paradigm originally proposed by Google, for the ease of large-scale data processing on thousands of CPUs. Compared with CPUs, GPUs have an order of magnitude higher computation power and memory bandwidth. However, GPUs are designed as special-purpose coprocessors and their programming interfaces are less familiar than those on the CPUs to MapReduce programmers. To harness GPUs' power for MapReduce, we developed Mars to run on NVIDIA GPUs, AMD GPUs as well as multicore CPUs. Furthermore, we integrated Mars into Hadoop, an open-source CPU-based MapReduce system. Mars hides the programming complexity of GPUs behind the simple and familiar MapReduce interface, and automatically manages task partitioning, data distribution, and parallelization on the processors. We have implemented six representative applications on Mars and evaluated their performance on PCs equipped with GPUs as well as multicore CPUs. The experimental results show that, the GPU-CPU coprocessing of Mars on an NVIDIA GTX280 GPU and an Intel quad-core CPU outperformed Phoenix, the state-of-the-art MapReduce on the multicore CPU with a speedup of up to 72 times and 24 times on average, depending on the applications. Additionally, integrating Mars into Hadoop enabled GPU acceleration for a network of PCs.

...read moreread less

Journal Article•DOI•

Design and Performance Evaluation of Image Processing Algorithms on GPUs

[...]

In Kyu Park¹, Nitin Singhal², Man Hee Lee¹, Sung-Dae Cho², Chris W Kim³ - Show less +1 more•Institutions (3)

Inha University¹, Samsung², Nvidia³

01 Jan 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper construe key factors in design and evaluation of image processing algorithms on the massive parallel graphics processing units (GPUs) using the compute unified device architecture (CUDA) programming model and proposes a set of metrics, customized for image processing, to quantitatively evaluate algorithm characteristics.

...read moreread less

Abstract: In this paper, we construe key factors in design and evaluation of image processing algorithms on the massive parallel graphics processing units (GPUs) using the compute unified device architecture (CUDA) programming model. A set of metrics, customized for image processing, is proposed to quantitatively evaluate algorithm characteristics. In addition, we show that a range of image processing algorithms map readily to CUDA using multiview stereo matching, linear feature extraction, JPEG2000 image encoding, and nonphotorealistic rendering (NPR) as our example applications. The algorithms are carefully selected from major domains of image processing, so they inherently contain a variety of subalgorithms with diverse characteristics when implemented on the GPU. Performance is evaluated in terms of execution time and is compared to the fastest host-only version implemented using OpenMP. It is shown that the observed speedup varies extensively depending on the characteristics of each algorithm. Intensive analysis is conducted to show the appropriateness of the proposed metrics in predicting the effectiveness of an application for parallel implementation.

...read moreread less

Journal Article•DOI•

FDAC: Toward Fine-Grained Distributed Data Access Control in Wireless Sensor Networks

[...]

Shucheng Yu, Kui Ren¹, Wenjing Lou•Institutions (1)

Illinois Institute of Technology¹

01 Apr 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The proposed scheme exploits a novel cryptographic primitive called attribute-based encryption (ABE), tailors, and adapts it for WSNs with respect to both performance and security requirements and is the first to realize distributed fine-grained data access control for W SNs.

...read moreread less

Abstract: Distributed sensor data storage and retrieval have gained increasing popularity in recent years for supporting various applications. While distributed architecture enjoys a more robust and fault-tolerant wireless sensor network (WSN), such architecture also poses a number of security challenges especially when applied in mission-critical applications such as battlefield and e-healthcare. First, as sensor data are stored and maintained by individual sensors and unattended sensors are easily subject to strong attacks such as physical compromise, it is significantly harder to ensure data security. Second, in many mission-critical applications, fine-grained data access control is a must as illegal access to the sensitive data may cause disastrous results and/or be prohibited by the law. Last but not least, sensor nodes usually are resource-constrained, which limits the direct adoption of expensive cryptographic primitives. To address the above challenges, we propose, in this paper, a distributed data access control scheme that is able to enforce fine-grained access control over sensor data and is resilient against strong attacks such as sensor compromise and user colluding. The proposed scheme exploits a novel cryptographic primitive called attribute-based encryption (ABE), tailors, and adapts it for WSNs with respect to both performance and security requirements. The feasibility of the scheme is demonstrated by experiments on real sensor platforms. To our best knowledge, this paper is the first to realize distributed fine-grained data access control for WSNs.

...read moreread less

Journal Article•DOI•

Unraveling the BitTorrent Ecosystem

[...]

Chao Zhang, Prithula Dhungel, Di Wu¹, Keith W. Ross•Institutions (1)

Sun Yat-sen University¹

01 Jul 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This work develops a high-performance tracker crawler, and over a narrow window of 12 hours, crawl essentially all of the public BitTorrent Ecosystem's trackers, obtaining peer lists for all referenced torrents.

...read moreread less

Abstract: BitTorrent is the most successful open Internet application for content distribution. Despite its importance, both in terms of its footprint in the Internet and the influence it has on emerging P2P applications, the BitTorrent Ecosystem is only partially understood. We seek to provide a nearly complete picture of the entire public BitTorrent Ecosystem. To this end, we crawl five of the most popular torrent-discovery sites over a ine-month period, identifying all of 4.6 million and 38,996 trackers that the sites reference. We also develop a high-performance tracker crawler, and over a narrow window of 12 hours, crawl essentially all of the public Ecosystem's trackers, obtaining peer lists for all referenced torrents. Complementing the torrent-discovery site and tracker crawling, we further crawl Azureus and Mainline DHTs for a random sample of torrents. Our resulting measurement data are more than an order of magnitude larger (in terms of number of torrents, trackers, or peers) than any earlier study. Using this extensive data set, we study in-depth the Ecosystem's torrent-discovery, tracker, peer, user behavior, and content landscapes. For peer statistics, the analysis is based on one typical snapshot obtained over 12 hours. We further analyze the fragility of the Ecosystem upon the removal of its most important tracker service.

...read moreread less

Journal Article•DOI•

TASA: Tag-Free Activity Sensing Using RFID Tag Arrays

[...]

Daqiang Zhang¹, Jingyu Zhou¹, Minyi Guo¹, Jiannong Cao, Tianbao Li¹ - Show less +1 more•Institutions (1)

Shanghai Jiao Tong University¹

01 Apr 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: TASA relaxes the monitored objects from attaching RFID tags, online recovers and checks frequent trajectories by capturing the Received Signal Strength Indicator (RSSI) series for passive RFID tag arrays where objects traverse, and introduces reference tags with known positions.

...read moreread less

Abstract: Radio Frequency IDentification (RFID) has attracted considerable attention in recent years for its low cost, general availability, and location sensing functionality. Most existing schemes require the tracked persons to be labeled with RFID tags. This requirement may not be satisfied for some activity sensing applications due to privacy and security concerns and uncertainty of objects to be monitored, e.g., group behavior monitoring in warehouses with privacy limitations, and abnormal customers in banks. In this paper, we propose TASA-Tag-free Activity Sensing using RFID tag Arrays for location sensing and frequent route detection. TASA relaxes the monitored objects from attaching RFID tags, online recovers and checks frequent trajectories by capturing the Received Signal Strength Indicator (RSSI) series for passive RFID tag arrays where objects traverse. In order to improve the accuracy for estimated trajectories and accelerate location sensing, TASA introduces reference tags with known positions. With the readings from reference tags, TASA can locate objects more accurately. Extensive experiment shows that TASA is an effective approach for certain activity sensing applications.

...read moreread less

Journal Article•DOI•

Traffic-Aware Dynamic Routing to Alleviate Congestion in Wireless Sensor Networks

[...]

Fengyuan Ren¹, Tao He¹, Sajal K. Das², Chuang Lin¹•Institutions (2)

Tsinghua University¹, University of Texas at Arlington²

01 Sep 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A traffic-aware dynamic routing (TADR) algorithm is proposed to route packets around the congestion areas and scatter the excessive packets along multiple paths consisting of idle and underloaded nodes to alleviate congestion and improve the overall throughput in WSNs.

...read moreread less

Abstract: The congestion problem in Wireless Sensor Networks (WSNs) is quite different from that in traditional networks. Most current congestion control algorithms try to alleviate the congestion by reducing the rate at which the source nodes inject packets into the network. However, this traffic control scheme always decreases the throughput so as to violate fidelity level required by the applications. In this paper, we present a solution that sufficiently exerts the idle or underloaded nodes to alleviate congestion and improve the overall throughput in WSNs. To achieve this goal, a traffic-aware dynamic routing (TADR) algorithm is proposed to route packets around the congestion areas and scatter the excessive packets along multiple paths consisting of idle and underloaded nodes. Utilizing the concept of potential in classical physics, our TADR algorithm is designed through constructing a hybrid virtual potential field using depth and normalized queue length to force the packets to steer clear of obstacles created by congestion and eventually move toward the sink. The simulation results show that the proposed solution improves the overall throughput by around 370 percent as compared to MintRoute, which is one of benchmark routing protocols. Furthermore, TADR scheme has low overhead suitable for large-scale, dense sensor networks.

...read moreread less

Journal Article•DOI•

A Timing-Based Scheme for Rogue AP Detection

[...]

Hao Han¹, Bo Sheng², Chiu C. Tan³, Qun Li¹, Sanglu Lu⁴ - Show less +1 more•Institutions (4)

College of William & Mary¹, University of Massachusetts Boston², Temple University³, Nanjing University⁴

01 Nov 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper considers a category of rogue access points (APs) that pretend to be legitimate APs to lure users to connect to them and proposes a practical timing-based technique that allows the user to avoid connecting to rogue APs.

...read moreread less

Abstract: This paper considers a category of rogue access points (APs) that pretend to be legitimate APs to lure users to connect to them. We propose a practical timing-based technique that allows the user to avoid connecting to rogue APs. Our detection scheme is a client-centric approach that employs the round trip time between the user and the DNS server to independently determine whether an AP is a rogue AP without assistance from the WLAN operator. We implemented our detection technique on commercially available wireless cards to evaluate their performance. Extensive experiments have demonstrated the accuracy, effectiveness, and robustness of our approach. The algorithm achieves close to 100 percent accuracy in distinguishing rogue APs from legitimate APs in lightly loaded traffic conditions, and larger than 60 percent accuracy in heavy traffic conditions. At the same time, the detection only requires less than 1 second for lightly-loaded traffic conditions and tens of seconds for heavy traffic conditions.

...read moreread less

Journal Article•DOI•

Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid

[...]

Dominik Göddeke, Robert Strzodka¹•Institutions (1)

Max Planck Society¹

01 Jan 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper demonstrates that mixed precision schemes constitute a significant performance gain over native double precision and presents a new implementation of cyclic reduction for the parallel solution of tridiagonal systems and employs this scheme as a line relaxation smoother in a GPU-based multigrid solver.

...read moreread less

Abstract: We have previously suggested mixed precision iterative solvers specifically tailored to the iterative solution of sparse linear equation systems as they typically arise in the finite element discretization of partial differential equations. These schemes have been evaluated for a number of hardware platforms, in particular, single-precision GPUs as accelerators to the general purpose CPU. This paper reevaluates the situation with new mixed precision solvers that run entirely on the GPU: We demonstrate that mixed precision schemes constitute a significant performance gain over native double precision. Moreover, we present a new implementation of cyclic reduction for the parallel solution of tridiagonal systems and employ this scheme as a line relaxation smoother in our GPU-based multigrid solver. With an alternating direction implicit variant of this advanced smoother, we can extend the applicability of the GPU multigrid solvers to very ill-conditioned systems arising from the discretization on anisotropic meshes, that previously had to be solved on the CPU. The resulting mixed-precision schemes are always faster than double precision alone, and outperform tuned CPU solvers consistently by almost an order of magnitude.

...read moreread less

Journal Article•DOI•

Comparing Hardware Accelerators in Scientific Applications: A Case Study

[...]

Rick Weber¹, A Gothandaraman², Robert J. Hinde¹, Gregory D. Peterson¹•Institutions (2)

University of Tennessee¹, University of Pittsburgh²

01 Jan 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: It is shown that OpenCL provides application portability between multicore processors and GPUs, but may incur a performance cost and it is illustrated that graphics accelerators can make simulations involving large numbers of particles feasible.

...read moreread less

Abstract: Multicore processors and a variety of accelerators have allowed scientific applications to scale to larger problem sizes. We present a performance, design methodology, platform, and architectural comparison of several application accelerators executing a Quantum Monte Carlo application. We compare the application's performance and programmability on a variety of platforms including CUDA with Nvidia GPUs, Brook+ with ATI graphics accelerators, OpenCL running on both multicore and graphics processors, C++ running on multicore processors, and a VHDL implementation running on a Xilinx FPGA. We show that OpenCL provides application portability between multicore processors and GPUs, but may incur a performance cost. Furthermore, we illustrate that graphics accelerators can make simulations involving large numbers of particles feasible.

...read moreread less

Journal Article•DOI•

Live Virtual Machine Migration via Asynchronous Replication and State Synchronization

[...]

Haikun Liu¹, Hai Jin¹, Xiaofei Liao¹, Chen Yu¹, Cheng-Zhong Xu² - Show less +1 more•Institutions (2)

Huazhong University of Science and Technology¹, Wayne State University²

01 Dec 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Experimental results show that the CR/TR-Motion approach can drastically reduce migration overheads compared with memory-to-memory approach in a LAN, and for a variety of workloads migrated across WANs, the migration downtime is less than 300 milliseconds.

...read moreread less

Abstract: Live migration of virtual machines (VM) across physical hosts provides a significant new benefit for administrators of data centers and clusters. Previous memory-to-memory approaches demonstrate the effectiveness of live VM migration in local area networks (LAN), but they would cause a long period of downtime in a wide area network (WAN) environment. This paper describes the design and implementation of a novel approach, namely, CR/TR-Motion, which adopts checkpointing/recovery and trace/replay technologies to provide fast, transparent VM migration for both LAN and WAN environments. With execution trace logged on the source host, a synchronization algorithm is performed to orchestrate the running source and target VMs until they reach a consistent state. CR/TR-Motion can greatly reduce the migration downtime and network bandwidth consumption. Experimental results show that the approach can drastically reduce migration overheads compared with memory-to-memory approach in a LAN: up to 72.4 percent on application observed downtime, up to 31.5 percent on total migration time, and up to 95.9 percent on the data to synchronize the VM state. The application performance overhead due to migration is kept within 8.54 percent on average. The results also show that for a variety of workloads migrated across WANs, the migration downtime is less than 300 milliseconds.

...read moreread less

Journal Article•DOI•

Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of SETI@home

[...]

Bahman Javadi¹, Derrick Kondo², Jean-Marc Vincent, Dustin Anderson³•Institutions (3)

University of Melbourne¹, French Institute for Research in Computer Science and Automation², University of California, Berkeley³

01 Nov 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper describes an effective method for discovering subsets of hosts whose availability have similar statistical properties and can be modeled with similar probability distributions and shows that these methods and models are critical for the design of stochastic scheduling algorithms across large systems where host availability is uncertain.

...read moreread less

Abstract: In the age of cloud, Grid, P2P, and volunteer distributed computing, large-scale systems with tens of thousands of unreliable hosts are increasingly common. Invariably, these systems are composed of heterogeneous hosts whose individual availability often exhibit different statistical properties (for example stationary versus nonstationary behavior) and fit different models (for example exponential, Weibull, or Pareto probability distributions). In this paper, we describe an effective method for discovering subsets of hosts whose availability have similar statistical properties and can be modeled with similar probability distributions. We apply this method with about 230,000 host availability traces obtained from a real Internet-distributed system, namely SETI@home. We find that about 21 percent of hosts exhibit availability, that is, a truly random process, and that these hosts can often be modeled accurately with a few distinct distributions from different families. We show that our models are useful and accurate in the context of a scheduling problem that deals with resource brokering. We believe that these methods and models are critical for the design of stochastic scheduling algorithms across large systems where host availability is uncertain.

...read moreread less

Journal Article•DOI•

Dynamic Resource Provisioning in Massively Multiplayer Online Games

[...]

Vlad Nae¹, Alexandru Iosup², Radu Prodan¹•Institutions (2)

University of Innsbruck¹, Delft University of Technology²

01 Mar 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This work investigates a new dynamic resource provisioning method for MMOG operation using external data centers as low-cost resource providers and introduces a combined MMOG processor, network, and memory load model that takes into account both the player interaction type and the population size.

...read moreread less

Abstract: Today's Massively Multiplayer Online Games (MMOGs) can include millions of concurrent players spread across the world and interacting with each other within a single session. Faced with high resource demand variability and with misfit resource renting policies, the current industry practice is to overprovision for each game tens of self-owned data centers, making the market entry affordable only for big companies. Focusing on the reduction of entry and operational costs, we investigate a new dynamic resource provisioning method for MMOG operation using external data centers as low-cost resource providers. First, we identify in the various types of player interaction a source of short-term load variability, which complements the long-term load variability due to the size of the player population. Then, we introduce a combined MMOG processor, network, and memory load model that takes into account both the player interaction type and the population size. Our model is best used for estimating the MMOG resource demand dynamically, and thus, for dynamic resource provisioning based on the game world entity distribution. We evaluate several classes of online predictors for MMOG entity distribution and propose and tune a neural network-based predictor to deliver good accuracy consistently under real-time performance constraints. We assess using trace-based simulation the impact of the data center policies on the quality of resource provisioning. We find that the dynamic resource provisioning can be much more efficient than its static alternative even when the external data centers are busy, and that data centers with policies unsuitable for MMOGs are penalized by our dynamic resource provisioning method. Finally, we present experimental results showing the real-time parallelization and load balancing of a real game prototype using data center resources provisioned using our method and show its advantage against a rudimentary client threshold approach.

...read moreread less

Journal Article•DOI•

Trajectory-Based Data Forwarding for Light-Traffic Vehicular Ad Hoc Networks

[...]

Jaehoon Jeong¹, Shuo Guo¹, Yu Gu¹, Tian He¹, David H. C. Du¹ - Show less +1 more•Institutions (1)

University of Minnesota¹

01 May 2011-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper presents the first attempt to effectively utilize vehicles' trajectory information in a privacy-preserving manner and outperforms the existing scheme in terms of both the data delivery delay and packet delivery ratio.

...read moreread less

Abstract: This paper proposes a Trajectory-Based Data (TBD) Forwarding scheme, tailored for the data forwarding for roadside reports in light-traffic vehicular ad hoc networks. State-of-the-art schemes have demonstrated the effectiveness of their data forwarding strategies by exploiting known vehicular traffic statistics (e.g., densities and speeds). These results are encouraging, however, further improvements can be made by taking advantage of the growing popularity of GPS-based navigation systems. This paper presents the first attempt to effectively utilize vehicles' trajectory information in a privacy-preserving manner. In our design, such trajectory information is combined with the vehicular traffic statistics for a better performance. In a distributed way, each individual vehicle computes its end-to-end expected delivery delay to the Internet access points based on its position on its vehicle trajectory and exchanges this delay with neighboring vehicles to determine the best next-hop vehicle. For the accurate end-to-end delay computation, this paper also proposes a link delay model to estimate the packet forwarding delay on a road segment. Through theoretical analysis and extensive simulation, it is shown that our link delay model provides the accurate link delay estimation and our forwarding design outperforms the existing scheme in terms of both the data delivery delay and packet delivery ratio.

...read moreread less

Collapse