scispace - formally typeset
Search or ask a question

Showing papers presented at "Parallel and Distributed Processing Techniques and Applications in 2012"


Proceedings Article
01 Jan 2012
TL;DR: This paper presents a scheduling algorithm for VLIW architectures with chained functional units, and argues that the high parametrization of the compiler makes it, together with the simulator, useful for hardware/software co-design.
Abstract: In this paper we present a scheduling algorithm for VLIW architectures with chained functional units. We show how our algorithm can help speed up programs at the instruction level, for an architecture called REPLICA, a configurable emulated shared memory (CESM) architecture whose computation model is based on the PRAM model. Since our LLVM based compiler is parameterizable in the number of different functional units, read and write ports to register file etc. we can generate code for different REPLICA architectures that have different functional unit configurations. We show for a set of different configurations how our implementation can produce high quality code; and we argue that the high parametrization of the compiler makes it, together with the simulator, useful for hardware/software co-design.

5 citations



Proceedings Article
01 Jan 2012
TL;DR: 2012 International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, NV, July 16-19, 2012
Abstract: 2012 International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, NV, July 16-19, 2012

2 citations


Proceedings Article
01 Jan 2012
TL;DR: The main concern behind this work is to provide a more detailed study on how the GA used in G-Ensemble scheme could be tuned depending on the available computational resources in operational scenarios.
Abstract: The process of weather forecasting produced by numerical weather prediction (NWP) models is complex and not always accurate. Moreover, it is generally defined by its very nature as a process that has to deal with uncertainties. In previous works, a new weather prediction scheme, Genetic Ensemble (G-Ensemble), was presented, which uses evolutionary computing methods. Particularly, it uses Genetic Algorithms (GA) in order to find the most timely 'optimal' values of model closure parameters that appear in physical parametrization schemes, which are coupled with NWP models. The presented scheme showed significant improvement of weather prediction quality and, moreover, the waiting time for an enhanced weather prediction result was reduced by executing a parallel G-Ensemble scheme over HPC platforms. In this work, however, we test the same scheme with different GA configurations regarding its Crossover type and ratio, and by variating its initial population size in order to get better predictions. The main concern behind this work is to provide a more detailed study on how the GA used in G-Ensemble scheme could be tuned depending on the available computational resources in operational scenarios. Finally, experimental results are discussed of a weather prediction case using historical data of a well known weather catastrophe: Hurricane Katrina that occurred in 2005 in the Gulf of Mexico. Obtained results provide significant enhancement in weather prediction.

1 citations


Proceedings Article
01 Jul 2012
TL;DR: A generic algorithmic farm skeleton is presented which is able to move worker tasks between processors in a heterogeneous architecture at runtime guided by a simple dynamic load model and is suggested to effectively compensate for unpredictable load variations.
Abstract: Demand for multi-process resource invariably outstrips supply and users must often share some common provision. Where batch-based, whole processor allocation proves inflexible, user programs must compete at runtime for the same resource so the load is changeable and unpredictable. We are exploring a mechanism to balance the runtime load by moving computations between processors to optimize resource use. In this paper, we present a generic algorithmic farm skeleton which is able to move worker tasks between processors in a heterogeneous architecture at runtime guided by a simple dynamic load model. Our experiments suggest that this mechanism is able to effectively compensate for unpredictable load variations.

1 citations


Proceedings Article
01 Jan 2012
TL;DR: Multicore processors have come to stay, fulfill Moore’s law and might very well revolutionize the computer industry, but the industry is now in a transitional period before the new programming models, ...
Abstract: Multicore processors have come to stay, fulfill Moore’s law and might very well revolutionize the computer industry. However, we are now in a transitional period before the new programming models, ...

Proceedings Article
16 Jul 2012
TL;DR: This paper considers the distribution of the 3-opt neighborhood structure embedded in the Iterated Local Search framework and highlights the influence of the pivoting rule, neighborhood size and parallelization granularity on the obtained level of perfor-
Abstract: The purpose of this paper is to propose effective parallelization strategies for Local Search algorithms on Graphics Processing Units (GPU). We consider the distribution of the 3-opt neighborhood structure embedded in the Iterated Local Search framework. Three resulting approaches are evaluated and compared on both speedup and solution quality on a state-of-the-art Fermi GPU architecture. Solving instances of the Travelling Salesman Problem ranging from 100 to 3038 cities, we report speedups of up to 8.51 with solution quality similar to the best known sequential implementations and of up to 45.40 with a variable increase in tour length. The proposed experimental study highlights the influence of the pivoting rule, neighborhood size and parallelization granularity on the obtained level of perfor-

Proceedings Article
01 Jan 2012
TL;DR: The proposed file composition technique is evaluated using a climate simulation code, called SCALE, and shows that the elapsed time of file output is approximately 30% faster than that of original POSIX I/O functions.
Abstract: One of the scalability issues in parallel applications, in which each process creates each file and writes data to the file, is the scalability of file management due to the increasing number of files. To mitigate this issue, a new file aggregation mechanism, called the file composition technique, is proposed. Unlike existing aggregation mechanisms, the file composition technique aggregates multiple files created by parallel processes into a single shared file without changing the code of file I/O operations. In contrast with the metadata operations in existing aggregation mechanisms, the metadata operations are distributed to each process in order to carry out scalability. The proposed file composition technique is evaluated using a climate simulation code, called SCALE. The result shows that the elapsed time of file output is approximately 30% faster than that of original POSIX I/O functions.