Showing papers in "Concurrency and Computation: Practice and Experience in 2011"

PDF

Open Access

Journal Article•DOI•

Power-aware provisioning of virtual machines for real-time Cloud services

[...]

Kyong Hoon Kim¹, Anton Beloglazov², Rajkumar Buyya²•Institutions (2)

Gyeongsang National University¹, University of Melbourne²

01 Sep 2011-Concurrency and Computation: Practice and Experience

TL;DR: This work investigates power‐aware provisioning of virtual machines for real‐time services, and proposes several schemes to reduce power consumption by hard real-time services and power-aware profitable provisioned of soft real‐ time services.

...read moreread less

Abstract: Reducing power consumption has been an essential requirement for Cloud resource providers not only to decrease operating costs, but also to improve the system reliability. As Cloud computing becomes emergent for the Anything as a Service (XaaS) paradigm, modern real-time services also become available through Cloud computing. In this work, we investigate power-aware provisioning of virtual machines for real-time services. Our approach is (i) to model a real-time service as a real-time virtual machine request; and (ii) to provision virtual machines in Cloud data centers using dynamic voltage frequency scaling schemes. We propose several schemes to reduce power consumption by hard real-time services and power-aware profitable provisioning of soft real-time services. Copyright © 2011 John Wiley & Sons, Ltd.

...read moreread less

164 citations

Journal Article•DOI•

A new approach for sparse matrix vector product on NVIDIA GPUs

[...]

Francisco Vázquez¹, José-Jesús Fernández¹, Ester M. Garzón¹•Institutions (1)

University of Almería¹

01 Jun 2011-Concurrency and Computation: Practice and Experience

TL;DR: This work proposes and evaluates a new implementation of SpMV for NVIDIA GPUs based on a new format, ELLPACK‐R, that allows storage of the sparse matrix in a regular manner and shows that significant speedup factors are achieved with GPUs.

...read moreread less

Abstract: The sparse matrix vector product (SpMV) is a key operation in engineering and scientific computing and, hence, it has been subjected to intense research for a long time. The irregular computations involved in SpMV make its optimization challenging. Therefore, enormous effort has been devoted to devise data formats to store the sparse matrix with the ultimate aim of maximizing the performance. Graphics Processing Units (GPUs) have recently emerged as platforms that yield outstanding acceleration factors. SpMV implementations for NVIDIA GPUs have already appeared on the scene. This work proposes and evaluates a new implementation of SpMV for NVIDIA GPUs based on a new format, ELLPACK-R, that allows storage of the sparse matrix in a regular manner. A comparative evaluation against a variety of storage formats previously proposed has been carried out based on a representative set of test matrices. The results show that, although the performance strongly depends on the specific pattern of the matrix, the implementation based on ELLPACK-R achieves higher overall performance. Moreover, a comparison with standard state-of-the-art superscalar processors reveals that significant speedup factors are achieved with GPUs. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

120 citations

Journal Article•DOI•

P-GRADE portal family for grid infrastructures

[...]

Péter Kacsuk¹•Institutions (1)

Hungarian Academy of Sciences¹

01 Mar 2011-Concurrency and Computation: Practice and Experience

TL;DR: The paper summarizes the most advanced features of P‐GRADE, such as parameter sweep workflow execution, multi‐grid workflow execution and integration with the DSpace workflow repository, as well as introducing the second generation P‐ GRADE portal called WS‐PGRADE.

...read moreread less

Abstract: P-GRADE portal is one of the most widely used general-purpose grid portal in Europe. The paper summarizes the most advanced features of P-GRADE, such as parameter sweep workflow execution, multi-grid workflow execution and integration with the DSpace workflow repository. It also shows the NGS P-GRADE portal that extends P-GRADE with the GEMLCA legacy code execution support in Grid systems, as well as with coarse-grain workflow interoperability services. Next, the paper introduces the second generation P-GRADE portal called WS-PGRADE that merges the advanced features of the first generation P-GRADE portals and extends them with new workflow and architecture concepts. Finally, the application-specific science gateway of the CancerGrid project is briefly described to demonstrate that application-specific portals can easily be developed on top of the general-purpose WS-PGRADE portal. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

113 citations

Journal Article•DOI•

Web-of-things framework for cyber–physical systems

[...]

Tharam S. Dillon¹, Hai Zhuge², Chen Wu¹, Jaipal Singh¹, Elizabeth Chang¹ - Show less +1 more•Institutions (2)

Curtin University¹, Chinese Academy of Sciences²

01 Jun 2011-Concurrency and Computation: Practice and Experience

TL;DR: This paper proposes a framework to integrate WoT and CPS and a case study is presented to demonstrate the advantage of the framework.

...read moreread less

Abstract: The recent development of Web-of-Things (WoT) and Cyber–Physical Systems (CPS) raises a new requirement of connecting abstract computational artifacts with the physical world. This requires both new theories and engineering practices that model cyber and physical resources in a unified framework, a challenge that few current approaches are able to tackle. The solution must break the boundary between the cyber world and the physical world by providing a unified infrastructure that permits integrated models addressing issues from both worlds simultaneously. This paper proposes a framework to integrate WoT and CPS. A case study is presented to demonstrate the advantage of the framework. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

96 citations

Journal Article•DOI•

Cloud computing paradigms for pleasingly parallel biomedical applications

[...]

Thilina Gunarathne¹, Tak-Lon Wu¹, Jong Youl Choi¹, Seung-Hee Bae¹, Judy Qiu¹ - Show less +1 more•Institutions (1)

Indiana University¹

01 Dec 2011-Concurrency and Computation: Practice and Experience

TL;DR: This paper presents three pleasingly parallel biomedical applications, and discusses variations in cost among the different platform choices (e.g., Elastic Compute Cloud instance types), highlighting the importance of selecting an appropriate platform based on the nature of the computation.

...read moreread less

Abstract: Cloud computing offers exciting new approaches for scientific computing that leverage major commercial players’ hardware and software investments in large-scale data centers. Loosely coupled problems are very important in many scientific fields, and with the ongoing move towards data-intensive computing, they are on the rise. There exist several different approaches to leveraging clouds and cloud-oriented data processing frameworks to perform pleasingly parallel (also called embarrassingly parallel) computations. In this paper, we present three pleasingly parallel biomedical applications: (i) assembly of genome fragments; (ii) sequence alignment and similarity search; and (iii) dimension reduction in the analysis of chemical structures, which are implemented utilizing a cloud infrastructure service-based utility computing models of Amazon Web Services ( Inc., Seattle, WA, USA) and Microsoft Windows Azure (Microsoft Corp., Redmond, WA, USA) as well as utilizing MapReduce-based data processing frameworks Apache Hadoop (Apache Software Foundation, Los Angeles, CA, USA) and Microsoft DryadLINQ. We review and compare each of these frameworks, performing a comparative study among them based on performance, cost, and usability. High latency, eventually consistent cloud infrastructure service-based frameworks that rely on off-the-node cloud storage were able to exhibit performance efficiencies and scalability comparable to the MapReduce-based frameworks with local disk-based storage for the applications considered. In this paper, we also analyze variations in cost among the different platform choices (e.g., Elastic Compute Cloud instance types), highlighting the importance of selecting an appropriate platform based on the nature of the computation. Copyright © 2011 John Wiley & Sons, Ltd.

...read moreread less

59 citations

Journal Article•DOI•

A taxonomy and survey on autonomic management of applications in grid computing environments

[...]

Mustafizur Rahman¹, Rajiv Ranjan², Rajkumar Buyya¹, Boualem Benatallah²•Institutions (2)

University of Melbourne¹, University of New South Wales²

01 Nov 2011-Concurrency and Computation: Practice and Experience

TL;DR: A comprehensive taxonomy is proposed that characterizes and classifies different software components and high‐level methods that are required for autonomic management of applications in Grids and identifies the areas that require further research initiatives.

...read moreread less

Abstract: In Grid computing environments, the availability, performance, and state of resources, applications, services, and data undergo continuous changes during the life cycle of an application. Uncertainty is a fact in Grid environments, which is triggered by multiple factors, including: (1) failures, (2) dynamism, (3) incomplete global knowledge, and (4) heterogeneity. Unfortunately, the existing Grid management methods, tools, and application composition techniques are inadequate to handle these resource, application and environment behaviors. The aforementioned characteristics impose serious requirements on the Grid programming and runtime systems if they wish to deliver efficient performance to scientific and commercial applications. To overcome the above challenges, the Grid programming and runtime systems must become autonomic or self-managing in accordance with the high-level behavior specified by system administrators. Autonomic systems are inspired by biological systems that deal with similar challenges of complexity, dynamism, heterogeneity, and uncertainty. To this end, we propose a comprehensive taxonomy that characterizes and classifies different software components and high-level methods that are required for autonomic management of applications in Grids. We also survey several representative Grid computing systems that have been developed by various leading research groups in the academia and industry. The taxonomy not only highlights the similarities and differences of state-of-the-art technologies utilized in autonomic application management from the perspective of Grid computing, but also identifies the areas that require further research initiatives. We believe that this taxonomy and its mapping to relevant systems would be highly useful for academic- and industry-based researchers, who are engaged in the design of Autonomic Grid and more recently, Cloud computing systems. Copyright © 2011 John Wiley & Sons, Ltd.

...read moreread less

56 citations

Journal Article•DOI•

Parallel unmixing of remotely sensed hyperspectral images on commodity graphics processing units

[...]

Sergio Sánchez¹, Abel Paz¹, Gabriel Martin¹, Antonio Plaza¹•Institutions (1)

University of Extremadura¹

01 Sep 2011-Concurrency and Computation: Practice and Experience

TL;DR: This paper develops an implementation of the full hyperspectral unmixing chain on commodity graphics processing units (GPUs), and has been implemented, using the CUDA (compute device unified architecture), and tested on three different GPU architectures.

...read moreread less

Abstract: Hyperspectral imaging instruments are capable of collecting hundreds of images, corresponding to different wavelength channels, for the same area on the surface of the Earth. One of the main problems in the analysis of hyperspectral data cubes is the presence of mixed pixels, which arise when the spatial resolution of the sensor is not enough to separate spectrally distinct materials. Hyperspectral unmixing is one of the most popular techniques to analyze hyperspectral data. It comprises two stages: (i) automatic identification of pure spectral signatures (endmembers) and (ii) estimation of the fractional abundance of each endmember in each pixel. The spectral unmixing process is quite expensive in computational terms, mainly due to the extremely high dimensionality of hyperspectral data cubes. Although this process maps nicely to high performance systems such as clusters of computers, these systems are generally expensive and difficult to adapt to real-time data processing requirements introduced by several applications, such as wildland fire tracking, biological threat detection, monitoring of oil spills, and other types of chemical contamination. In this paper, we develop an implementation of the full hyperspectral unmixing chain on commodity graphics processing units (GPUs). The proposed methodology has been implemented, using the CUDA (compute device unified architecture), and tested on three different GPU architectures: NVidia Tesla C1060, NVidia GeForce GTX 275, and NVidia GeForce 9800 GX2, achieving near real-time unmixing performance in some configurations tested when analyzing two different hyperspectral images, collected over the World Trade Center complex in New York City and the Cuprite mining district in Nevada. Copyright © 2011 John Wiley & Sons, Ltd.

...read moreread less

54 citations

Journal Article•DOI•

Meeting subscriber-defined QoS constraints in publish/subscribe systems

[...]

Muhammad Adnan Tariq¹, Boris Koldehofe¹, Gerald G. Koch¹, Imran Khan¹, Kurt Rothermel¹ - Show less +1 more•Institutions (1)

University of Stuttgart¹

01 Dec 2011-Concurrency and Computation: Practice and Experience

TL;DR: This article proposes a peer‐to‐peer‐based approach to satisfy the individual delay requirements of subscribers in the presence of bandwidth constraints, and allows subscribers to dynamically adjust the granularity of their subscriptions according to their bandwidth constraints and delay requirements.

...read moreread less

Abstract: Current distributed publish/subscribe systems consider all participants to have similar QoS requirements and contribute equally to the system's resources. However, in many real-world applications, the message delay tolerance of individual participants may differ widely. Disseminating messages according to individual delay requirements not only allows for the satisfaction of user-specific needs, but also significantly improves the utilization of the resources that participants contribute to a publish/subscribe system. In this article, we propose a peer-to-peer-based approach to satisfy the individual delay requirements of subscribers in the presence of bandwidth constraints. Our approach allows subscribers to dynamically adjust the granularity of their subscriptions according to their bandwidth constraints and delay requirements. Subscribers maintain the overlay in a decentralized manner, exclusively establishing connections that satisfy their individual delay requirements, and that provide messages exactly meeting their subscription granularity. The evaluations show that for many practical workloads, the proposed publish/subscribe system can scale up to a large number of subscribers and performs robustly in a very dynamic setting. Copyright © 2011 John Wiley & Sons, Ltd. (Spatial indexing can work with any ordered data type with a known domain [5]. Evaluation results are not sensitive to the choice of data type and therefore, similar to [5] only integer data types are considered.)

...read moreread less

50 citations

Journal Article•DOI•

The scalable process topology interface of MPI 2.2

[...]

Torsten Hoefler¹, Rolf Rabenseifner², Hubert Ritzdorf, Bronis R. de Supinski³, Rajeev Thakur⁴, Jesper Larsson Träff⁵ - Show less +2 more•Institutions (5)

University of Illinois at Urbana–Champaign¹, University of Stuttgart², Lawrence Livermore National Laboratory³, Argonne National Laboratory⁴, University of Vienna⁵

01 Mar 2011-Concurrency and Computation: Practice and Experience

TL;DR: The paper summarizes the main issues in the efficient implementation of the interface and explains the optimization problems that need to be (approximately) solved by a good MPI library.

...read moreread less

Abstract: The Message-passing Interface (MPI) standard provides basic means for adaptations of the mapping of MPI process ranks to processing elements to better match the communication characteristics of applications to the capabilities of the underlying systems. The MPI process topology mechanism enables the MPI implementation to rerank processes by creating a new communicator that reflects user-supplied information about the application communication pattern. With the newly released MPI 2.2 version of the MPI standard, the process topology mechanism has been enhanced with new interfaces for scalable and informative user-specification of communication patterns. Applications with relatively static communication patterns are encouraged to take advantage of the mechanism whenever convenient by specifying their communication pattern to the MPI library. Reference implementations of the new mechanism can be expected to be readily available (and come at essentially no cost), but non-trivial implementations pose challenging problems for the MPI implementer. This paper is first and foremost addressed to application programmers wanting to use the new process topology interfaces. It explains the use and the motivation for the enhanced interfaces and the advantages gained even with a straightforward implementation. For the MPI implementer, the paper summarizes the main issues in the efficient implementation of the interface and explains the optimization problems that need to be (approximately) solved by a good MPI library. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

50 citations

Journal Article•DOI•

Achieving fine-grained access control for secure data sharing on cloud servers

[...]

Guojun Wang¹, Qin Liu¹, Jie Wu²•Institutions (2)

Central South University¹, Temple University²

01 Aug 2011-Concurrency and Computation: Practice and Experience

TL;DR: A conjunctive precise and fuzzy identity‐based encryption (PFIBE) scheme for secure data sharing on cloud servers is proposed, which allows the encryption of data by specifying a recipient identity (ID) set or a disjunctive normal form (DNF) access control policy over attributes.

...read moreread less

Abstract: With more and more enterprises sharing their sensitive data on cloud servers, building a secure cloud environment for data sharing has attracted a lot of attention in both the industry and academic communities. In this paper, we propose a conjunctive precise and fuzzy identity-based encryption (PFIBE) scheme for secure data sharing on cloud servers, which allows the encryption of data by specifying a recipient identity (ID) set or a disjunctive normal form (DNF) access control policy over attributes, so that only the user whose ID belongs to the ID set or attributes satisfy the DNF access control policy can decrypt the corresponding data. Our design goal is to propose a novel encryption scheme, which simultaneously achieves a fine-grained access control, flexibility, high performance, and full key delegation, so as to help enterprise users to enjoy more secure, comprehensive, and flexible services. We achieve this goal by first combining the hierarchical identity-based encryption (HIBE) system and the ciphertext-policy attribute-based encryption (CP-ABE) system, and then marking each user with both an ID and a set of descriptive attributes, finally separating the access control policy into two parts: a recipient ID set and a DNF attribute-based access control policy. Copyright © 2011 John Wiley & Sons, Ltd.

...read moreread less

47 citations

Journal Article•DOI•

Ontology mapping: as a binary classification problem

[...]

Ming Mao, Yefei Peng¹, Michael B. Spring²•Institutions (2)

Yahoo!¹, University UCINF²

01 Jun 2011-Concurrency and Computation: Practice and Experience

TL;DR: A non-instance learning-based approach that transforms the ontology mapping problem to a binary classification problem and utilizes machine learning techniques as a solution and demonstrates that the approach can be generalized to different domains without extra training efforts.

...read moreread less

Abstract: Ontology mapping (OM) seeks to find semantic correspondences between similar elements of different ontologies. OM is critical to achieve semantic interoperability in the World Wide Web. To solve the OM problem, this article proposes a non-instance learning-based approach that transforms the OM problem into a binary classification problem and utilizes machine learning techniques as a solution. Same as other machine learning-based approaches, a number of features (i.e. linguistic, structural, and web features) are generated for each mapping candidate. However, in contrast to other learning-based mapping approaches, the features proposed in our approach are generic and do not rely on the existence and sufficiency of instances. Therefore, our approach can be generalized to different domains without extra training efforts. To evaluate our approach, two experiments (i.e. within-task vs cross-task) are implemented and the SVM (support vector machine) algorithm is applied. Experimental results show that our non-instance learning-based OM approach performs well on most of OAEI benchmark tests when training and testing on the same mapping task; and the results of approach vary according to the likelihood of training data and testing data when training and testing on different mapping tasks. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures

[...]

Azzam Haidar¹, Hatem Ltaief¹, Asim YarKhan¹, Jack Dongarra¹•Institutions (1)

University of Tennessee¹

01 Mar 2011-Concurrency and Computation: Practice and Experience

TL;DR: It is concluded that some commonly accepted rules for dense linear algebra algorithms may need to be revisited, both at the level of the scheduler and the algorithms (e.g., left‐looking and right‐looking variants).

...read moreread less

Abstract: The objective of this paper is to analyze the dynamic scheduling of dense linear algebra algorithms on shared-memory, multicore architectures. Current numerical libraries (e.g., linear algebra package) show clear limitations on such emerging systems mainly because of their coarse granularity tasks. Thus, many numerical algorithms need to be redesigned to better fit the architectural design of the multicore platform. The parallel linear algebra for scalable multicore architectures library developed at the University of Tennessee tackles this challenge by using tile algorithms to achieve a finer task granularity. These tile algorithms can then be represented by directed acyclic graphs, where nodes are the tasks and edges are the dependencies between the tasks. The paramount key to achieve high performance is to implement a runtime environment to efficiently schedule the execution of the directed acyclic graph across the multicore platform. This paper studies the impact on the overall performance of some parameters, both at the level of the scheduler (e.g., window size and locality) and the algorithms (e.g., left-looking and right-looking variants). We conclude that some commonly accepted rules for dense linear algebra algorithms may need to be revisited. Copyright © 2011 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simulations on GPU clusters

[...]

J. Myre¹, Stuart D.C. Walsh¹, David J. Lilja¹, Martin O. Saar¹•Institutions (1)

University of Minnesota¹

01 Mar 2011-Concurrency and Computation: Practice and Experience

TL;DR: It is shown that all the lattice‐Boltzmann simulations primarily depend on effects corresponding to simulation geometry and decomposition, and not on the architectural aspects of GPU, and that the metrics of Efficiency and Utilization are not suitable for memory‐bandwidth‐dependent codes.

...read moreread less

Abstract: The lattice-Boltzmann method is well suited for implementation in single-instruction multiple-data (SIMD) environments provided by general purpose graphics processing units (GPGPUs). This paper discusses the integration of these GPGPU programs with OpenMP to create lattice-Boltzmann applications for multi-GPU clusters. In addition to the standard single-phase single-component lattice-Boltzmann method, the performances of more complex multiphase, multicomponent models are also examined. The contributions of various GPU lattice-Boltzmann parameters to the performance are examined and quantified with a statistical model of the performance using Analysis of Variance (ANOVA). By examining single- and multi-GPU lattice-Boltzmann simulations with ANOVA, we show that all the lattice-Boltzmann simulations primarily depend on effects corresponding to simulation geometry and decomposition, and not on the architectural aspects of GPU. Additionally, using ANOVA we confirm that the metrics of Efficiency and Utilization are not suitable for memory-bandwidth-dependent codes. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Finding frequent items in parallel

[...]

Massimo Cafaro¹, Piergiulio Tempesta²•Institutions (2)

University of Salento¹, Complutense University of Madrid²

01 Oct 2011-Concurrency and Computation: Practice and Experience

TL;DR: A deterministic parallel algorithm that can be used to find in parallel frequent items, i.e. those whose multiplicity is greater than a given threshold, and is therefore useful to process iceberg queries and in many other different contexts of applied mathematics and information theory.

...read moreread less

Abstract: We present a deterministic parallel algorithm for the k-majority problem, that can be used to find in parallel frequent items, i.e. those whose multiplicity is greater than a given threshold, and is therefore useful to process iceberg queries and in many other different contexts of applied mathematics and information theory. The algorithm can be used both in the online (stream) context and in the offline setting, the difference being that in the former case we are restricted to a single scan of the input elements, so that verifying the frequent items that have been determined is not allowed (e.g. network traffic streams passing through internet routers), while in the latter a parallel scan of the input can be used to determine the actual k-majority elements. To the best of our knowledge, this is the first parallel algorithm solving the proposed problem. Copyright © 2011 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Measuring semantic similarity between words by removing noise and redundancy in web snippets

[...]

Zheng Xu¹, Xiangfeng Luo¹, Jie Yu¹, Weimin Xu¹•Institutions (1)

Shanghai University¹

01 Dec 2011-Concurrency and Computation: Practice and Experience

TL;DR: A method integrating page counts, semantics snippets, and the number of already displayed search results is proposed that outperforms the existing Web‐based methods by a wide margin and significantly improves the quality of query suggestion against some page counts based methods.

...read moreread less

Abstract: Semantic similarity measures play important roles in many Web-related tasks such as Web browsing and query suggestion. Because taxonomy-based methods can not deal with continually emerging words, recently Web-based methods have been proposed to solve this problem. Because of the noise and redundancy hidden in the Web data, robustness and accuracy are still challenges. In this paper, we propose a method integrating page counts and snippets returned by Web search engines. Then, the semantic snippets and the number of search results are used to remove noise and redundancy in the Web snippets (‘Web-snippet’ includes the title, summary, and URL of a Web page returned by a search engine). After that, a method integrating page counts, semantics snippets, and the number of already displayed search results are proposed. The proposed method does not need any human annotated knowledge (e.g., ontologies), and can be applied Web-related tasks (e.g., query suggestion) easily. A correlation coefficient of 0.851 against Rubenstein–Goodenough benchmark dataset shows that the proposed method outperforms the existing Web-based methods by a wide margin. Moreover, the proposed semantic similarity measure significantly improves the quality of query suggestion against some page counts based methods. Copyright © 2011 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Gibraltar: A Reed-Solomon coding library for storage applications on programmable graphics processors

[...]

Matthew L. Curry¹, Anthony Skjellum², H. Lee Ward¹, Ron Brightwell¹•Institutions (2)

Sandia National Laboratories¹, University of Alabama at Birmingham²

01 Dec 2011-Concurrency and Computation: Practice and Experience

TL;DR: This work demonstrates that high performance is possible with current cost‐effective graphics processing units across a wide range of operating conditions and describes how performance will likely evolve in similar architectures.

...read moreread less

Abstract: Reed–Solomon coding is a method for generating arbitrary amounts of erasure correction information from original data via matrix–vector multiplication in finite fields. Previous work has shown that modern CPUs are not well-matched to this type of computation, requiring applications that depend on Reed–Solomon coding at high speeds (such as high-performance storage arrays) to use hardware implementations. This work demonstrates that high performance is possible with current cost-effective graphics processing units across a wide range of operating conditions and describes how performance will likely evolve in similar architectures. It describes the characteristics of the graphics processing unit architecture that enable high-speed Reed–Solomon coding. A high-performance practical library, Gibraltar, has been prototyped that performs Reed–Solomon coding on graphics processors in a manner suitable for storage arrays, along with applications with similar data resiliency needs. This library enables variably resilient erasure correcting codes to be used in a broad range of applications. Its performance is compared with that of a widely available CPU implementation, and a rationale for its API is presented. Its practicality is demonstrated through a usage example. Copyright © 2011 John Wiley & Sons, Ltd. (Gibraltar is available at , along with sample applications to test it.)

...read moreread less

Journal Article•DOI•

Fast in-place, comparison-based sorting with CUDA: a study with bitonic sort

[...]

Hagen Peters¹, Ole Schulz-Hildebrandt¹, Norbert Luttenberger¹•Institutions (1)

University of Kiel¹

01 May 2011-Concurrency and Computation: Practice and Experience

TL;DR: This work assigned compare/exchange operations to threads in a way that decreases low‐performance global‐memory access and makes efficient use of high‐performance shared memory, which greatly increases the performance of this in‐place, comparison‐based sorting algorithm.

...read moreread less

Abstract: State-of-the-art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA (Compute Unified Device Architecture) increases their usability as high-performance co-processors for general-purpose computing. Sorting is well investigated in Computer Science in general, but (because of this new field of application for GPUs) there is a demand for high-performance parallel sorting algorithms that fit with the characteristics of the modern GPU-architecture. We present a high-performance in-place implementation of Batcher's bitonic sorting networks for CUDA-enabled GPUs. Therefore, we assigned compare/exchange operations to threads in a way that decreases low-performance global-memory access and makes efficient use of high-performance shared memory. This greatly increases the performance of this in-place, comparison-based sorting algorithm. Our implementation outperforms all other algorithms in our tests when sorting 64-bit keys. It is the fastest comparison-based GPU sorting algorithm for 32-bit keys, being only outperformed by (non-comparison-based) radix sort when sorting sequences larger than 223. Copyright © 2011 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Accelerating cardiac excitation spread simulations using graphics processing units

[...]

Bernardo Martins Rocha, Fernando O. Campos¹, Ronan Amorim², Gernot Plank¹, R. W. dos Santos², Manfred Liebmann³, Gundolf Haase³ - Show less +3 more•Institutions (3)

Medical University of Graz¹, Universidade Federal de Juiz de Fora², University of Graz³

01 May 2011-Concurrency and Computation: Practice and Experience

TL;DR: In tests on 2D cardiac tissues with different cell models it is shown that the GPU implementation runs 20 times faster than a parallel CPU implementation running with 4 threads on a quad–core machine, parts of the code are even accelerated by a factor of 180.

...read moreread less

Abstract: The modeling of the electrical activity of the heart is of great medical and scientific interest, because it provides a way to get a better understanding of the related biophysical phenomena, allows the development of new techniques for diagnoses and serves as a platform for drug tests. The cardiac electrophysiology may be simulated by solving a partial differential equation coupled to a system of ordinary differential equations describing the electrical behavior of the cell membrane. The numerical solution is, however, computationally demanding because of the fine temporal and spatial sampling required. The demand for real-time high definition 3D graphics made the new graphic processing units (GPUs) a highly parallel, multithreaded, many-core processor with tremendous computational horsepower. It makes the use of GPUs a promising alternative to simulate the electrical activity in the heart. The aim of this work is to study the performance of GPUs for solving the equations underlying the electrical activity in a simple cardiac tissue. In tests on 2D cardiac tissues with different cell models it is shown that the GPU implementation runs 20 times faster than a parallel CPU implementation running with 4 threads on a quad–core machine, parts of the code are even accelerated by a factor of 180. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Brokering of risk-aware service level agreements in grids

[...]

Karim Djemame¹, James Padgett¹, Iain Gourlay¹, Django Armstrong¹•Institutions (1)

University of Leeds¹

01 Sep 2011-Concurrency and Computation: Practice and Experience

TL;DR: This paper presents an SLA brokering mechanism with risk assessment support, which evaluates the probability of SLA failure and a comparison of its capabilities against similar SLA‐based solutions from the literature.

...read moreread less

Abstract: Service level agreements (SLAs) are facilitators for widening the commercial uptake of Grid technology. They provide explicit statements of expectation and obligation between service consumers and providers. However, without the ability to assess the probability that an SLA might fail, commercial uptake will be restricted, since neither party will be willing to agree. Therefore, risk assessment mechanisms are critical to increase confidence in Grid technology usage within the commercial sector. This paper presents an SLA brokering mechanism with risk assessment support, which evaluates the probability of SLA failure. WS-Agreement and risk metrics are used to facilitate SLA creation between service consumers and providers within a typical Grid resource usage scenario. An evaluation is conducted to examine risk models, the performance of the broker's implementation as well as a comparison of its capabilities against similar SLA-based solutions from the literature. Copyright © 2011 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

A context-aware semantic similarity model for ontology environments

[...]

Hai Dong¹, Farookh Khadeer Hussain¹, Elizabeth Chang¹•Institutions (1)

Curtin University¹

01 Apr 2011-Concurrency and Computation: Practice and Experience

TL;DR: This paper presents a solution for the two issues, including a novel ontology conversion process and a context‐aware semantic similarity model, by considering the factors of both the context of concepts and relations, and the ontology structure.

...read moreread less

Abstract: While many researchers have contributed to the field of semantic similarity models so far, we find that most of the models are designed for the semantic network environment. When applying the semantic similarity model within the semantic-rich ontology environment, two issues are observed: (1) most of the models ignore the context of ontology concepts and (2) most of the models ignore the context of relations. Therefore, in this paper, we present a solution for the two issues, including a novel ontology conversion process and a context-aware semantic similarity model, by considering the factors of both the context of concepts and relations, and the ontology structure. Furthermore, in order to evaluate this model, we compare its performance with that of several existing models' performance in a large-scale knowledge base, and the evaluation result preliminarily proves the technical advantage of our model in ontology environments. Conclusions and future works are described in the final section. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Secure SCADA framework for the protection of energy control systems

[...]

Cristina Alcaraz¹, Javier Lopez¹, Jianying Zhou², Rodrigo Roman¹•Institutions (2)

University of Málaga¹, Institute for Infocomm Research Singapore²

01 Aug 2011-Concurrency and Computation: Practice and Experience

TL;DR: The main task of this paper is to provide a framework that shows how the integration of different state-of-the‐art technologies in an energy control system can bring some interesting benefits, such as status management and anomaly prevention, while maintaining the security of the whole system.

...read moreread less

Abstract: Energy distribution systems are becoming increasingly widespread in today's society. One of the elements that are used to monitor and control these systems are SCADA (Supervisory Control and Data Acquisition) systems. In particular, these control systems and their complexities, together with the emerging use of the Internet and wireless technologies, bring new challenges that must be carefully considered. Examples of such challenges are the particular benefits of the integration of those new technologies, and also the effects they may have on the overall SCADA security. The main task of this paper is to provide a framework that shows how the integration of different state-of-the-art technologies in an energy control system, such as wireless sensor networks, mobile ad hoc networks, and the Internet, can bring some interesting benefits, such as status management and anomaly prevention, while maintaining the security of the whole system. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Basic operations, completeness and dynamicity of cyber physical socio semantic link network CPSocio-SLN

[...]

Hai Zhuge¹, Bei Xu¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jun 2011-Concurrency and Computation: Practice and Experience

TL;DR: A strategy for efficiently storing and managing CPSocio‐SLN based on the basic operation set is proposed to support efficient query and maintenance and the study of the dynamicity can help understand its basic characteristics.

...read moreread less

Abstract: Cyber Physical Socio Semantic Link network (CPSocio-SLN) is a model and method for self-organizing cyber physical socio resources in Cyber Physical Society (CP-Society). This paper views CPSocio-SLN as an evolution process through a series of operations, and investigates its basic operations, completeness, and dynamicity. A strategy for efficiently storing and managing CPSocio-SLN based on the basic operation set is proposed to support efficient query and maintenance. An approach is suggested for simplifying CPSocio-SLN reasoning by estimating the importance of reasoning at multiple levels. The study of the dynamicity can help understand its basic characteristics. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

A probabilistic strategy for temporal constraint management in scientific workflow systems

[...]

Xiao Liu¹, Zhiwei Ni², Jinjun Chen¹, Yun Yang¹•Institutions (2)

Swinburne University of Technology¹, Hefei University of Technology²

01 Nov 2011-Concurrency and Computation: Practice and Experience

TL;DR: A probabilistic strategy for temporal constraint management is proposed which utilizes a novel probability‐based temporal consistency model and is demonstrated through a case study on an example scientific workflow process in the authors' scientific workflow system.

...read moreread less

Abstract: In scientific workflow systems, it is critical to ensure the timely completion of scientific workflows. Therefore, temporal constraints as a type of QoS (Quality of Service) specification are usually required to be managed in scientific workflow systems. Specifically, temporal constraint management includes two basic tasks: setting temporal constraints at workflow build-time and updating temporal constraints at workflow run-time. For constraint setting, the current work mainly adopts user-specified temporal constraints without considering the system performance. Hence, it may result in frequent temporal violations which deteriorate the overall workflow execution effectiveness. As regards constraint updating, although not well investigated, so far is in fact of great importance to workflow management tasks such as workflow scheduling and exception handling. In this paper, with a systematic analysis of the above issues, we propose a probabilistic strategy for temporal constraint management which utilizes a novel probability-based temporal consistency model. Specifically for constraint setting, a negotiation process between the client and the service provider is designed to support the setting of coarse-grained temporal constraints and then automatically derive the fine-grained temporal constraints; for constraint updating, the probability time deficit/redundancy propagation process is proposed to update run-time fine-grained temporal constraints when workflow execution is either ahead of or behind the schedule. The effectiveness of our strategy is demonstrated through a case study on an example scientific workflow process in our scientific workflow system. Copyright © 2011 John Wiley & Sons, Ltd. (The initial work was published in the Proceedings of 6th International Conference on Business Process Management (BPM2008), Lecture Notes in Computer Science, vol. 5240, pp. 180–195, September 2008 Milan, Italy.)

...read moreread less

Journal Article•DOI•

A family of real-time Java benchmarks

[...]

Tomas Kalibera¹, Jeff Hagelberg², Petr Maj², Filip Pizlo², Ben Titzer², Jan Vitek² - Show less +2 more•Institutions (2)

Charles University in Prague¹, Purdue University²

01 Sep 2011-Concurrency and Computation: Practice and Experience

TL;DR: CDx is, at its core, a real‐time benchmark with a single periodic task, which implements an idealized aircraft collision detection algorithm, and can be configured to use different sets of real‐ time features and comes with a number of workloads.

...read moreread less

Abstract: Java is becoming a viable platform for real-time computing. There are production and research real-time Java VMs, as well as applications in both the military and civil sectors. Technological advances and increased adoption of real-time Java contrast significantly with the lack of benchmarks. Existing benchmarks are either synthetic micro-benchmarks, or proprietary, making it difficult to independently verify and repeat reported results. This paper presents the CD x benchmark, a family of open-source implementations of the same application that target different real-time virtual machines. CD x is, at its core, a real-time benchmark with a single periodic task, which implements an idealized aircraft collision detection algorithm. The benchmark can be configured to use different sets of real-time features and comes with a number of workloads. It can be run on standard Java virtual machines, on real-time and Safety Critical Java virtual machine, and a C version is provided to compare with native performance. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Scalable parallel regridding algorithms for block-structured adaptive mesh refinement

[...]

Justin Luitjens, Martin Berzins

01 Sep 2011-Concurrency and Computation: Practice and Experience

TL;DR: This paper provides a detailed analysis of the performance of two existing parallel implementations of the Berger–Rigoutsos algorithm and develops a new parallel implementation of theBerger–rigoutso algorithm and a tiled algorithm that exhibits ideal scalability.

...read moreread less

Abstract: Block-structured adaptive mesh refinement (BSAMR) is widely used within simulation software because it improves the utilization of computing resources by refining the mesh only where necessary. For BSAMR to scale onto existing petascale and eventually exascale computers all portions of the simulation need to weak scale ideally. Any portions of the simulation that do not will become a bottleneck at larger numbers of cores. The challenge is to design algorithms that will make it possible to avoid these bottlenecks on exascale computers. One step of existing BSAMR algorithms involves determining where to create new patches of refinement. The Berger–Rigoutsos algorithm is commonly used to perform this task. This paper provides a detailed analysis of the performance of two existing parallel implementations of the Berger–Rigoutsos algorithm and develops a new parallel implementation of the Berger–Rigoutsos algorithm and a tiled algorithm that exhibits ideal scalability. The analysis and computational results up to 98 304 cores are used to design performance models which are then used to predict how these algorithms will perform on 100 M cores. Copyright © 2011 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures

[...]

Paolo Bientinesi¹, Francisco D. Igual², Daniel Kressner³, Matthias Petschow¹, Enrique S. Quintana-Ortí² - Show less +1 more•Institutions (3)

RWTH Aachen University¹, James I University², ETH Zurich³

01 May 2011-Concurrency and Computation: Practice and Experience

TL;DR: This work investigates the performance of the routines in LAPACK and the Successive Band Reduction toolbox for the reduction of a dense matrix to tridiagonal form on general‐purpose multi‐core processors and modify the code in the SBR toolbox to accelerate the computation.

...read moreread less

Abstract: We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) toolbox for the reduction of a dense matrix to tridiagonal form, a crucial preprocessing stage in the solution of the symmetric eigenvalue problem, on general-purpose multi-core processors In response to the advances of hardware accelerators, we also modify the code in the SBR toolbox to accelerate the computation by off-loading a significant part of the operations to a graphics processor (GPU) The performance results illustrate the parallelism and scalability of these algorithms on current high-performance multi-core and many-core architectures Copyright © 2010 John Wiley & Sons, Ltd

...read moreread less

Journal Article•DOI•

Automatically constructing semantic link network on documents

[...]

Hai Zhuge¹, Junsheng Zhang¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jun 2011-Concurrency and Computation: Practice and Experience

TL;DR: This paper proposes an approach to automatically discovering and predicting semantic links in a document set based on a model of document semantic link network (SLN), which supports probabilistic relational reasoning; SLNs and the relevant rules automatically evolve; and, it can adapt to the update of the adopted techniques.

...read moreread less

Abstract: Knowing semantic links among resources is the basis of realizing machine intelligence over large-scale resources. Discovering semantic links among resources with limited human interference is a challenge issue. This paper proposes an approach to automatically discovering and predicting semantic links in a document set based on a model of document semantic link network (SLN). The approach has the following advantages: it supports probabilistic relational reasoning; SLNs and the relevant rules automatically evolve; and, it can adapt to the update of the adopted techniques. The approach can support cyber space applications, such as documentation recommendation and relational queries, on large documents. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Scalability of pseudospectral methods for geodynamo simulations

[...]

Christopher J. Davies¹, David Gubbins¹, Peter K. Jimack¹•Institutions (1)

University of Leeds¹

01 Jan 2011-Concurrency and Computation: Practice and Experience

TL;DR: Using a pseudospectral method, the most widely used method for simulating the geodynamo, computational requirements needed to run simulations in an ‘Earth‐like’ parameter regime are explored theoretically by approximating operation counts, memory requirements and communication costs in the asymptotic limit of large problem size.

...read moreread less

Abstract: The problem of understanding how Earth's magnetic field is generated is one of the foremost challenges in modern science. It is believed to be generated by a dynamo process, where the complex motions of an electrically conducting fluid provide the inductive action to sustain the field against the effects of dissipation. Current dynamo simulations, based on the numerical approximation to the governing equations of magnetohydrodynamics, cannot reach the very rapid rotation rates and low viscosities (i.e. low Ekman number) of Earth due to limitations in available computing power. Using a pseudospectral method, the most widely used method for simulating the geodynamo, computational requirements needed to run simulations in an ‘Earth-like’ parameter regime are explored theoretically by approximating operation counts, memory requirements and communication costs in the asymptotic limit of large problem size. Theoretical scalings are tested using numerical calculations. For asymptotically large problems the spherical transform is shown to be the limiting step within the pseudospectral method; memory requirements and communication costs are asymptotically negligible. Another limitation comes from the parallel implementation, however, this is unlikely to be threatened soon and we conclude that the pseudospectral method will remain competitive for the next decade. Extrapolating numerical results based upon the code analysis shows that simulating a problem characterizing the Earth with Ekman number E = 10−9 would require at least 13 000 days per magnetic diffusion time with 54 000 available processors, a formidable computational challenge. At E = 10−8 an allocation of around 350 million CPU hours would compute a single diffusion time, many more CPU hours than are available in current supercomputing allocations but potentially reachable in the next decade. Exploration of the 10−6⩽E⩽10−7 regime could be performed at the present time using a substantial share of national supercomputing facilities or a dedicated cluster. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Constrained range search query processing on road networks

[...]

Kefeng Xuan¹, Geng Zhao¹, David Taniar¹, Maytham Safar², Bala Srinivasan¹ - Show less +1 more•Institutions (2)

Monash University¹, Kuwait University²

01 Apr 2011-Concurrency and Computation: Practice and Experience

TL;DR: This paper proposes two constrained range search approaches based on network Voronoi diagram, namely Region Constrained Range (RCR) and k nearest neighbor Constraining Range (kCR), which make the range search query processing more flexible to satisfy different requirements in a complex environment.

...read moreread less

Abstract: Range search is one of the most common queries in the spatial databases and geographic information systems (GIS). Most range search processing depends on the length of the distance that expresses the relative position of the objects of interest in the Euclidean space or road networks. But, in reality, the expected result is normally constrained by other factors (e.g. number of spatial objects, pre-defined area, and so forth.) rather than the distance alone; hence, range search should be comprehensively discussed in various scenarios. In this paper, we propose two constrained range search approaches based on network Voronoi diagram, namely Region Constrained Range (RCR) and k nearest neighbor Constrained Range (kCR), which make the range search query processing more flexible to satisfy different requirements in a complex environment. The performance of these approaches is analyzed and evaluated to illustrate that both of them can process constrained range search queries very efficiently. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Parallel terrain visibility calculation on the graphics processing unit

[...]

Damjan Strnad

01 Dec 2011-Concurrency and Computation: Practice and Experience

TL;DR: The results demonstrate that using the GPU, the acceleration of an order of magnitude can be achieved on average with both point sampling and bilinear filtering of the elevation map.

...read moreread less

Abstract: In this paper, we present the graphics processing unit (GPU)-based parallel implementation of visibility calculation from multiple viewpoints on raster terrain grids. Two levels of parallelism are introduced in the GPU kernels — parallel traversal of visibility rays from a single viewpoint and parallel processing of viewpoints. The obtained visibility maps are combined in parallel using the selected logical operator. A comparison with multi-threaded CPU implementation is performed to establish the expected speed-ups of viewshed construction when the source and destination types are sets of scattered locations, paths, or regions. The results demonstrate that using the GPU, the acceleration of an order of magnitude can be achieved on average with both point sampling and bilinear filtering of the elevation map. Copyright © 2011 John Wiley & Sons, Ltd.

...read moreread less