Showing papers by "Jesús Labarta published in 1996"

PDF

Open Access

Book Chapter•DOI•

DiP: A Parallel Program Development Environment

[...]

Jesús Labarta¹, S. Girona¹, Vincent Pillet¹, Toni Cortes¹, Luis Gregoris¹ - Show less +1 more•Institutions (1)

26 Aug 1996

TL;DR: An environment whose aim is to aid in the development and tuning of message passing applications before actually running them in a real system with a large number of processors is described.

...read moreread less

Abstract: This paper describes an environment whose aim is to aid in the development and tuning of message passing applications before actually running them in a real system with a large number of processors. Our objective is not to eliminate tests on real machines but to be able to focus them in a more selective way and thereby minimize their number. The environment presented in this paper consists of three closely integrated tools: an instrumented communication library, a trace driven simulator (Dimemas) and a visualization/analysis tool (Paraver).

...read moreread less

190 citations

Book Chapter•DOI•

A Library Implementation of the Nano-Threads Programming Model

[...]

Xavier Martorell¹, Jesús Labarta¹, Nacho Navarro¹, Eduard Ayguadé¹•Institutions (1)

Polytechnic University of Catalonia¹

26 Aug 1996

TL;DR: The design and implementation of a user-level thread package based on the nano-threads programming model, whose goal is to efficiently manage the application parallelism at user- level is described.

...read moreread less

Abstract: In this paper we describe the design and implementation of a user-level thread package based on the nano-threads programming model, whose goal is to efficiently manage the application parallelism at user-level. Nano-thread applications work close to the operating system to quickly adapt to resource availability.

...read moreread less

56 citations

Proceedings Article•DOI•

Dynamic data distribution with control flow analysis

[...]

Jordi Garcia, Eduard Ayguadé, Jesús Labarta

17 Nov 1996

TL;DR: A data distribution tool which automatically derives the data mapping for the arrays and the parallelization strategy for the loops in a Fortran 77 program and the quality of the solutions generated and the feasibility of the approach in terms of compilation time are described.

...read moreread less

Abstract: This paper describes the design of a data distribution tool which automatically derives the data mapping for the arrays and the parallelization strategy for the loops in a Fortran 77 program. The layout generated can be static or dynamic, and the distribution is one-dimensional BLOCK or CYCLIC. The tool takes into account the control flow statements in the code in order to better estimate the behavior of the program. All the information regarding data movement and parallelism is contained in a single data structure named Communication-Parallelism Graph (CPG). The CPG is used to model a minimal path problem in which time is the objective function to minimize. It is solved using a general purpose linear programming solver, which finds the optimal solution for the whole problem. The experimental results will illustrate the quality of the solutions generated and the feasibility of the approach in terms of compilation time.

...read moreread less

30 citations

Book Chapter•DOI•

PACA: A Cooperative File System Cache for Parallel Machines

[...]

Toni Cortes¹, S. Girona¹, Jesús Labarta¹•Institutions (1)

Polytechnic University of Catalonia¹

26 Aug 1996

TL;DR: A new cooperative caching mechanism, PACA, along with a caching algorithm, LRU-Interleaved, and an aggressive prefetching algorithm, Full-File-On-Open, are presented, which avoids the cache coherence problem with no loss in performance.

...read moreread less

Abstract: A new cooperative caching mechanism, PACA, along with a caching algorithm, LRU-Interleaved, and an aggressive prefetching algorithm, Full-File-On-Open, are presented. The caching algorithm is especially targeted to parallel machines running a microkernel-based operating system. It avoids the cache coherence problem with no loss in performance. Comparing our algorithm with another cooperative cache one (N-Chance Forwarding), in the above environment, better results have been obtained by LRU-Interleaved. We also evaluate an aggressive prefetching algorithm that highly increases read performance taking advantage of the huge caches cooperative caching offers.

...read moreread less

23 citations

Proceedings Article•DOI•

A framework for automatic dynamic data mapping

[...]

Jordi Garcia, Eduard Ayguadé, Jesús Labarta

23 Oct 1996

TL;DR: This work presents an approach to automatically derive static or dynamic data distribution strategies for the arrays used in a program using a general purpose linear 0-1 integer programming solver and finds the optimal solution for the problem for one-dimensional array distributions.

...read moreread less

Abstract: Physically-distributed memory multiprocessors are becoming popular and data distribution and loop parallelization are aspects that a parallelizing compiler has to consider in order to get efficiency from the system. The cost of accessing local and remote data can be one or several orders of magnitude different, and this can dramatically affect the performance of the system. It would be desirable to free the programmer from considerations of the low-level details of the target architecture, to program explicit processes or specify interprocess communication. We present an approach to automatically derive static or dynamic data distribution strategies for the arrays used in a program. All the information required about data movement and parallelism is contained in a single data structure, called the Communication-Parallelism Graph (CPG). The problem is modeled and solved using a general purpose linear 0-1 integer programming solver. This allows us to find the optimal solution for the problem for one-dimensional array distributions. We also show the feasibility of using this approach in terms of compilation time and quality of the solutions generated.

...read moreread less

12 citations

Proceedings Article•DOI•

Loop parallelization: revisiting framework of unimodular transformations

[...]

Jordi Torres, Eduard Ayguadé, Jesús Labarta, Mateo Valero

24 Jan 1996

TL;DR: The parallelizing algorithm solves the important problem of deciding the set of transformations to apply in order to maximize the degree of parallelism, the number of parallel loops within a loop nest, and presents a way of generating efficient transformed code that exploits coarse grain parallelism on a MIMD system.

...read moreread less

Abstract: The paper extends the framework of linear loop transformations adding a new nonlinear step at the transformation process. The current framework of linear loop transformation cannot identify a significant fraction of parallelism. For this reason, we present a method to complement it with some basic transformations in order to extract the maximum loop parallelism in perfect nested loops with tight recurrences in the dependence graph. The parallelizing algorithm solves the important problem of deciding the set of transformations to apply in order to maximize the degree of parallelism, the number of parallel loops within a loop nest, and presents a way of generating efficient transformed code that exploits coarse grain parallelism on a MIMD system.

...read moreread less

5 citations

Journal Article•DOI•

Using a 0-1 integer programming model for automatic static data distribution

[...]

Jordi Garcia¹, Eduard Ayguadé¹, Jesús Labarta¹•Institutions (1)

Polytechnic University of Catalonia¹

01 Mar 1996-Parallel Processing Letters

TL;DR: An automatic data distribution method which deal with both the alignment and the distribution problems in a single optimization phase, as opposed to sequentially solving these two inter-dependent approaches as done by previous work is described.

...read moreread less

Abstract: This paper describes an automatic data distribution method which deal with both the alignment and the distribution problems in a single optimization phase, as opposed to sequentially solving these two inter-dependent approaches as done by previous work. The core of this work is called the Communication-Parallelism Graph, which describes the relationships among array dimensions of the same and different array references regarding communication and parallelism. The overall data distribution problem is then formulated as a linear 0-1 integer programming problem, where the objective function to be minimized is the total execution time. The solution is static in the sense that the layout of the arrays does not change during the execution of the program. We also show the feasibility of using this approach to solve the problem in terms of compilation time and quality of the solutions generated.

...read moreread less

4 citations

Book Chapter•DOI•

Data Distribution and Loop Parallelization for Shared-Memory Multiprocessors

[...]

Eduard Ayguadé, Jordi Garcia, M. Luz Grande, Jesús Labarta

08 Aug 1996

TL;DR: The main features of the automatic parallelization and data distribution research tool are described and the performance of the parallelization strategies generated are shown.

...read moreread less

Abstract: Shared-memory multiprocessor systems can achieve high performance levels when appropriate work parallelization and data distribution are performed. These two actions are not independent and decisions have to be taken in a unified way trying to minimize execution time and data movement costs. The first goal is achieved by parallelizing loops (the main components suitable for parallel execution in scientific codes) and assign work to processors having in mind a good load balancing. The second goal is achieved when data is stored in the cache memories of processors minimizing both true and false sharing of cache lines. This paper describes the main features of our automatic parallelization and data distribution research tool and shows the performance of the parallelization strategies generated. The tool (named PDDT) accepts programs written in Fortran77 and generates directives of shared memory programming models (like Power Fortran from SGI or Exemplar from Convex).

...read moreread less

4 citations

Book Chapter•DOI•

PLS: A Parallel Linear Solvers Library for Domain Decomposition Methods

[...]

José María Cela, José M. Alfonso, Jesús Labarta

07 Oct 1996

TL;DR: A parallel library (PLS) to solve linear systems arising from non overlapped Domain Decomposition methods and preconditioned Krylov subspace iterative methods are considered as linear solvers.

...read moreread less

Abstract: In this paper we describe a parallel library (PLS) to solve linear systems arising from non overlapped Domain Decomposition methods. Preconditioned Krylov subspace iterative methods are considered as linear solvers. PLS had been implemented on the top of PVM using FORTRAN 77, additional library which allows the use of dynamic memory allocation.

...read moreread less

3 citations

Book Chapter•DOI•

Experiences and Achievements with the Parallelization of a Large Finite Element System

[...]

Uwe Schulz, Markus Ast, Jesús Labarta¹, Hartmut Manz, Andrés Pérez¹, Jaume Solé¹ - Show less +2 more•Institutions (1)

Polytechnic University of Catalonia¹

15 Apr 1996

TL;DR: The impact of the parallel version of PERMAS is not only a cost effective and fast solution for medium size Finite Element simulations, but also that extremely large industrial examples may be solved that until now were restricted to large supercomputers at very high cost.

...read moreread less

Abstract: The general purpose Finite Element system PERMAS has been ported to high parallel computer architectures in the scope of the ESPRIT project EUROPORT-1. We reported on the technical and theoretical background during the last HPCN conference. The parallelization rates and scalability achieved with this strategy are shown using both industrial relevant and artificial scalable examples. The behavior of the parallel version is studied on a parallel machine with a high speed communication network. The impact of the parallel version is not only a cost effective and fast solution for medium size Finite Element simulations, but also that extremely large industrial examples may be solved that until now were restricted to large supercomputers at very high cost. The results are discussed in view of the underlying approach and data structure.

...read moreread less

3 citations

Book Chapter•DOI•

Manufacturing Progressive Addition Lenses using Distributed Parallel Processing

[...]

José María Cela¹, Juan C. Dürsteler, Jesús Labarta¹•Institutions (1)

Polytechnic University of Catalonia¹

19 Aug 1996

TL;DR: A distributed parallel implementation of a Finite Element simulation used in the ophthalmic optics industry by using non overlapped domain decomposition methods to perform the parallelization on a cluster of workstations.

...read moreread less

Abstract: We describe a distributed parallel implementation of a Finite Element simulation used in the ophthalmic optics industry We use non overlapped domain decomposition methods to perform the parallelization on a cluster of workstations Different numerical techniques was implemented, and the code was tuned with performance analyzer for distributed parallel programs

...read moreread less