Showing papers presented at "Parallel and Distributed Processing Techniques and Applications in 2013"

PDF

Open Access

Proceedings Article•

Cluster-SkePU: A Multi-Backend Skeleton Programming Library for GPU Clusters

[...]

Mudassar Majeed, Usman Dastgeer¹, Christoph Kessler¹•Institutions (1)

01 Jan 2013

TL;DR: This work presents the extension of SkePU for GPU clusters without the need to modify the SkePU application source code, and shows the benefit of lazy memory copying in terms of speedup gained for one level of Strassen’s algorithm and another synthetic matrix sum application.

...read moreread less

Abstract: SkePU is a C++ template library with a simple and unified interface for expressing data parallel computations in terms of generic components, called skeletons, on multi-GPU systems using CUDA and OpenCL. The smart containers in SkePU, such as Matrix and Vector, perform data management with a lazy memory copying mechanism that reduces redundant data communication. SkePU provides programmability, portability and even performance portability, but up to now application written using SkePU could only run on a single multi-GPU node. We present the extension of SkePU for GPU clusters without the need to modify the SkePU application source code. With our prototype implementation, we performed two experiments. The first experiment demonstrates the scalability with regular algorithms for N-body simulation and electric field calculation over multiple GPU nodes. The results for the second experiment show the benefit of lazy memory copying in terms of speedup gained for one level of Strassen’s algorithm and another synthetic matrix sum application.

...read moreread less

14 citations

Proceedings Article•DOI•

Optimizing Performance for Coalition Structure Generation Problems' IDP Algorithm

[...]

Francisco Cruz-Mencia¹, Jesus Cerquides¹, Antonio Espinosa², Juan Carlos Moure², Juan A. Rodríguez-Aguilar¹ - Show less +1 more•Institutions (2)

Spanish National Research Council¹, Autonomous University of Barcelona²

22 Jul 2013

TL;DR: This research has been supported by MICINN-Spain under contracts TIN2011-28689-C02-01, TIN2012-38876-C 02-01 and the Generalitat of Catalunya (2009-SGR-1434).

...read moreread less

Abstract: This research has been supported by MICINN-Spain under contracts TIN2011-28689-C02-01, TIN2012-38876-C02-01 and the Generalitat of Catalunya (2009-SGR-1434).

...read moreread less

8 citations

Proceedings Article•

A benchmark-driven modelling approach for evaluating deployment choices on a multi-core architecture

[...]

Annette Osprey¹, Graham Riley², M. Manjunathaiah¹, Bryan Lawrence¹•Institutions (2)

University of Reading¹, University of Manchester²

09 Apr 2013

TL;DR: A benchmark-driven model is developed for a simple shallow water code on a Cray XE6 system, to explore how deployment choices such as domain decomposition and core affinity affect performance.

...read moreread less

Abstract: The complexity of current and emerging architectures provides users with options about how best to use the available resources, but makes predicting performance challenging. In this work a benchmark-driven model is developed for a simple shallow water code on a Cray XE6 system, to explore how deployment choices such as domain decomposition and core affinity affect performance. The resource sharing present in modern multi-core architectures adds various levels of heterogeneity to the system. Shared resources often includes cache, memory, network controllers and in some cases floating point units (as in the AMD Bulldozer), which mean that the access time depends on the mapping of application tasks, and the core's location within the system. Heterogeneity further increases with the use of hardware-accelerators such as GPUs and the Intel Xeon Phi, where many specialist cores are attached to general-purpose cores. This trend for shared resources and non-uniform cores is expected to continue into the exascale era. The complexity of these systems means that various runtime scenarios are possible, and it has been found that under-populating nodes, altering the domain decomposition and non-standard task to core mappings can dramatically alter performance. To find this out, however, is often a process of trial and error. To better inform this process, a performance model was developed for a simple regular grid-based kernel code, shallow. The code comprises two distinct types of work, loop-based array updates and nearest-neighbour halo-exchanges. Separate performance models were developed for each part, both based on a similar methodology. Application specific benchmarks were run to measure performance for different problem sizes under different execution scenarios. These results were then fed into a performance model that derives resource usage for a given deployment scenario, with interpolation between results as necessary.

...read moreread less

2 citations

Proceedings Article•

A Load Balancing Technique for Heterogeneous Distributed Networks.

[...]

David H. Williams¹•Institutions (1)

University of Texas at El Paso¹

01 Jan 2013

2 citations

Proceedings Article•

A Fault-Tolerant Approach to Distributed Applications

[...]

Toan Nguyen, Jean-Antoine Désidéri, Laurentiu Trifan

22 Jul 2013

TL;DR: This paper addresses issues in application resilience, i.e., fault-tolerance to algorithmic errors and to resource allocation failures, and overviews a platform used to deploy, execute, monitor, restart and resume distributed applications on grids and cloud infrastructures in case of unexpected behavior.

...read moreread less

Abstract: Distributed computing infrastructures support system and network fault-tolerance, e.g., grids and clouds. They transparently repair and prevent communication and system software errors. They also allow duplication and migration of jobs and data to prevent hardware failures. However, only limited work has been done so far on application resilience, i.e., the ability to resume normal execution after errors and abnormal executions in distributed environments. This paper addresses issues in application resilience, i.e., fault-tolerance to algorithmic errors and to resource allocation failures. It addresses solutions for error detection and management. It also overviews a platform used to deploy, execute, monitor, restart and resume distributed applications on grids and cloud infrastructures in case of unexpected behavior.

...read moreread less

Proceedings Article•

Virtual processor frequency emulation

[...]

Christine Larissa Mayap Kamga, Daniel Hagimont¹•Institutions (1)

National Polytechnic Institute of Toulouse¹

22 Jul 2013

TL;DR: A way to emulate a precise CPU frequency thanks to the DVFS management in virtualized environments is proposed and implemented and evaluated in the Xen hypervisor.

...read moreread less

Abstract: Nowadays, virtualization is present in almost all computing infrastructures. Thanks to VM migration and server consolidation, virtualization helps in reducing power consumption in distributed environments. On another side, Dynamic Voltage and Frequency Scaling (DVFS) allows servers to dynamically modify the processor frequency (according to the CPU load) in order to achieve less energy consumption. We observe that while DVFS is widely used, it still generates a waste of energy. By default and thanks to the ondemand governor, it scales up or down the processor frequency according to the current load and the different predefined threshold (up and down). However, DVFS frequency scaling policies are based on advertised processor frequencies, i.e. the set of frequencies constitutes a discrete range of frequencies. The frequency required for a given load will be set to a frequency higher than necessary; which leads to an energy waste. In this paper, we propose a way to emulate a precise CPU frequency thanks to the DVFS management in virtualized environments. We implemented and evaluated our prototype in the Xen hypervisor.

...read moreread less