Showing papers presented at "Parallel and Distributed Processing Techniques and Applications in 2012"

PDF

Open Access

Proceedings Article•

Exploiting Instruction Level Parallelism for REPLICA - A Configurable VLIW Architecture With Chained Functional Units

[...]

Martin Kessler, Erik Hansson¹, Daniel Akesson¹, Christoph Kessler¹•Institutions (1)

01 Jan 2012

TL;DR: This paper presents a scheduling algorithm for VLIW architectures with chained functional units, and argues that the high parametrization of the compiler makes it, together with the simulator, useful for hardware/software co-design.

...read moreread less

Abstract: In this paper we present a scheduling algorithm for VLIW architectures with chained functional units. We show how our algorithm can help speed up programs at the instruction level, for an architecture called REPLICA, a configurable emulated shared memory (CESM) architecture whose computation model is based on the PRAM model. Since our LLVM based compiler is parameterizable in the number of different functional units, read and write ports to register file etc. we can generate code for different REPLICA architectures that have different functional unit configurations. We show for a set of different configurations how our implementation can produce high quality code; and we argue that the high parametrization of the compiler makes it, together with the simulator, useful for hardware/software co-design.

...read moreread less

5 citations

Proceedings Article•

A Parallel Algorithm for Constructing Obstacle-Avoiding Rectilinear Steiner Minimal Trees on Multi-Core Systems

[...]

Cheng-Yuan Chang, I-Lun Tseng

16 Jul 2012

3 citations

Proceedings Article•

Efficient data collection from Open Modeling Interface (OpenMI) components

[...]

Tom Bulatewicz¹, Daniel Andresen¹•Institutions (1)

Kansas State University¹

01 Jan 2012

TL;DR: 2012 International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, NV, July 16-19, 2012

...read moreread less

Abstract: 2012 International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, NV, July 16-19, 2012

...read moreread less

2 citations

Proceedings Article•

Tuning G-ensemble to improve forecast skill in numerical weather prediction models

[...]

Hisham Ihshaish¹, Ana Cortés¹, Miquel A. Senar¹•Institutions (1)

Autonomous University of Barcelona¹

01 Jan 2012

TL;DR: The main concern behind this work is to provide a more detailed study on how the GA used in G-Ensemble scheme could be tuned depending on the available computational resources in operational scenarios.

...read moreread less

Abstract: The process of weather forecasting produced by numerical weather prediction (NWP) models is complex and not always accurate. Moreover, it is generally defined by its very nature as a process that has to deal with uncertainties. In previous works, a new weather prediction scheme, Genetic Ensemble (G-Ensemble), was presented, which uses evolutionary computing methods. Particularly, it uses Genetic Algorithms (GA) in order to find the most timely 'optimal' values of model closure parameters that appear in physical parametrization schemes, which are coupled with NWP models. The presented scheme showed significant improvement of weather prediction quality and, moreover, the waiting time for an enhanced weather prediction result was reduced by executing a parallel G-Ensemble scheme over HPC platforms. In this work, however, we test the same scheme with different GA configurations regarding its Crossover type and ratio, and by variating its initial population size in order to get better predictions. The main concern behind this work is to provide a more detailed study on how the GA used in G-Ensemble scheme could be tuned depending on the available computational resources in operational scenarios. Finally, experimental results are discussed of a weather prediction case using historical data of a well known weather catastrophe: Hurricane Katrina that occurred in 2005 in the Gulf of Mexico. Obtained results provide significant enhancement in weather prediction.

...read moreread less

1 citations

Proceedings Article•

Dynamic Farm Skeleton Task Allocation Through Task Mobility

[...]

Gregory John Michaelson

01 Jul 2012

TL;DR: A generic algorithmic farm skeleton is presented which is able to move worker tasks between processors in a heterogeneous architecture at runtime guided by a simple dynamic load model and is suggested to effectively compensate for unpredictable load variations.

...read moreread less

Abstract: Demand for multi-process resource invariably outstrips supply and users must often share some common provision. Where batch-based, whole processor allocation proves inflexible, user programs must compete at runtime for the same resource so the load is changeable and unpredictable. We are exploring a mechanism to balance the runtime load by moving computations between processors to optimize resource use. In this paper, we present a generic algorithmic farm skeleton which is able to move worker tasks between processors in a heterogeneous architecture at runtime guided by a simple dynamic load model. Our experiments suggest that this mechanism is able to effectively compensate for unpredictable load variations.

...read moreread less

1 citations

Proceedings Article•

Multicore Clusters for CFD Simulations : Comparative Study of Three CFD-Softwares

[...]

Andreas de Blanche¹, Nima Namaki¹, Stefan Mankefors-Christiernin¹•Institutions (1)

University College West¹

01 Jan 2012

TL;DR: Multicore processors have come to stay, fulfill Moore’s law and might very well revolutionize the computer industry, but the industry is now in a transitional period before the new programming models, ...

...read moreread less

Abstract: Multicore processors have come to stay, fulfill Moore’s law and might very well revolutionize the computer industry. However, we are now in a transitional period before the new programming models, ...

...read moreread less

Proceedings Article•

Parallelization Strategies for Local Search Algorithms on Graphics Processing Units

[...]

Audrey Delevacq, Pierre Delisle, Michaël Krajecki

16 Jul 2012

TL;DR: This paper considers the distribution of the 3-opt neighborhood structure embedded in the Iterated Local Search framework and highlights the influence of the pivoting rule, neighborhood size and parallelization granularity on the obtained level of perfor-

...read moreread less

Abstract: The purpose of this paper is to propose effective parallelization strategies for Local Search algorithms on Graphics Processing Units (GPU). We consider the distribution of the 3-opt neighborhood structure embedded in the Iterated Local Search framework. Three resulting approaches are evaluated and compared on both speedup and solution quality on a state-of-the-art Fermi GPU architecture. Solving instances of the Travelling Salesman Problem ranging from 100 to 3038 cities, we report speedups of up to 8.51 with solution quality similar to the best known sequential implementations and of up to 45.40 with a variable increase in tour length. The proposed experimental study highlights the influence of the pivoting rule, neighborhood size and parallelization granularity on the obtained level of perfor-

...read moreread less

Proceedings Article•

File Composition Technique to Improve the Performance of Accessing a Number of Small Files

[...]

Yoshiyuki Ohno, Atsushi Hori¹, Yutaka Ishikawa¹•Institutions (1)

University of Tokyo¹

01 Jan 2012

TL;DR: The proposed file composition technique is evaluated using a climate simulation code, called SCALE, and shows that the elapsed time of file output is approximately 30% faster than that of original POSIX I/O functions.

...read moreread less

Abstract: One of the scalability issues in parallel applications, in which each process creates each file and writes data to the file, is the scalability of file management due to the increasing number of files. To mitigate this issue, a new file aggregation mechanism, called the file composition technique, is proposed. Unlike existing aggregation mechanisms, the file composition technique aggregates multiple files created by parallel processes into a single shared file without changing the code of file I/O operations. In contrast with the metadata operations in existing aggregation mechanisms, the metadata operations are distributed to each process in order to carry out scalability. The proposed file composition technique is evaluated using a climate simulation code, called SCALE. The result shows that the elapsed time of file output is approximately 30% faster than that of original POSIX I/O functions.

...read moreread less