Topic

# Degree of parallelism

About: Degree of parallelism is a(n) research topic. Over the lifetime, 1515 publication(s) have been published within this topic receiving 25546 citation(s).

##### Papers published on a yearly basis

##### Papers

More filters

••

[...]

TL;DR: A novel domain decomposition approach for the parallel finite element solution of equilibrium equations is presented, which exhibits a degree of parallelism that is not limited by the bandwidth of the finite element system of equations.

Abstract: A novel domain decomposition approach for the parallel finite element solution of equilibrium equations is presented. The spatial domain is partitioned into a set of totally disconnected subdomains, each assigned to an individual processor. Lagrange multipliers are introduced to enforce compatibility at the interface nodes. In the static case, each floating subdomain induces a local singularity that is resolved in two phases. First, the rigid body modes are eliminated in parallel from each local problem and a direct scheme is applied concurrently to all subdomains in order to recover each partial local solution. Next, the contributions of these modes are related to the Lagrange multipliers through an orthogonality condition. A parallel conjugate projected gradient algorithm is developed for the solution of the coupled system of local rigid modes components and Lagrange multipliers, which completes the solution of the problem. When implemented on local memory multiprocessors, this proposed method of tearing and interconnecting requires less interprocessor communications than the classical method of substructuring. It is also suitable for parallel/vector computers with shared memory. Moreover, unlike parallel direct solvers, it exhibits a degree of parallelism that is not limited by the bandwidth of the finite element system of equations.

1,228 citations

••

[...]

TL;DR: A massively parallel machine called Anton is described, which should be capable of executing millisecond-scale classical MD simulations of such biomolecular systems and has been designed to use both novel parallel algorithms and special-purpose logic to dramatically accelerate those calculations that dominate the time required for a typical MD simulation.

Abstract: The ability to perform long, accurate molecular dynamics (MD) simulations involving proteins and other biological macro-molecules could in principle provide answers to some of the most important currently outstanding questions in the fields of biology, chemistry, and medicine. A wide range of biologically interesting phenomena, however, occur over timescales on the order of a millisecond---several orders of magnitude beyond the duration of the longest current MD simulations. We describe a massively parallel machine called Anton, which should be capable of executing millisecond-scale classical MD simulations of such biomolecular systems. The machine, which is scheduled for completion by the end of 2008, is based on 512 identical MD-specific ASICs that interact in a tightly coupled manner using a specialized highspeed communication network. Anton has been designed to use both novel parallel algorithms and special-purpose logic to dramatically accelerate those calculations that dominate the time required for a typical MD simulation. The remainder of the simulation algorithm is executed by a programmable portion of each chip that achieves a substantial degree of parallelism while preserving the flexibility necessary to accommodate anticipated advances in physical models and simulation methods.

716 citations

••

[...]

TL;DR: This paper deals with the problem of finding closed form schedules as affine or piecewise affine functions of the iteration vector and presents an algorithm which reduces the scheduling problem to a parametric linear program of small size, which can be readily solved by an efficient algorithm.

Abstract: Programs and systems of recurrence equations may be represented as sets of actions which are to be executed subject to precedence constraints. In may cases, actions may be labelled by integral vectors in some iterations domains, and precedence constraints may be described by affine relations. A schedule for such a program is a function which assigns an execution data to each action. Knowledge of such a schedule allows one to estimate the intrinsic degree of parallelism of the program and to compile a parallel version for multiprocessor architectures or systolic arrays. This paper deals with the problem of finding closed form schedules as affine or piecewise affine functions of the iteration vector. An algorithm is presented which reduces the scheduling problem to a parametric linear program of small size, which can be readily solved by an efficient algorithm.

587 citations

••

[...]

TL;DR: This paper explores the possibility of using a large-scale array of microprocessors as a computational facility for the execution of massive numerical computations with a high degree of parallelism.

Abstract: This paper explores the possibility of using a large-scale array of microprocessors as a computational facility for the execution of massive numerical computations with a high degree of parallelism. By microprocessor we mean a processor realized on one or a few semiconductor chips that include arithmetic and logical facilities and some memory. The current state of LSI technology makes this approach a feasible and attractive candidate for use in a macrocomputer facility.

548 citations

••

[...]

TL;DR: The CSDF paradigm is an extension of synchronous dataflow that still allows for static scheduling and, thus, a very efficient implementation of an application and it is indicated that CSDF is essential for modelling prescheduled components, like application-specific integrated circuits.

Abstract: We present cycle-static dataflow (CSDF), which is a new model for the specification and implementation of digital signal processing algorithms. The CSDF paradigm is an extension of synchronous dataflow that still allows for static scheduling and, thus, a very efficient implementation of an application. In comparison with synchronous dataflow, it is more versatile because it also supports algorithms with a cyclically changing, but predefined, behavior. Our examples show that this capability results in a higher degree of parallelism and, hence, a higher throughput, shorter delays, and less buffer memory. Moreover, they indicate that CSDF is essential for modelling prescheduled components, like application-specific integrated circuits. Besides introducing the CSDF paradigm, we also derive necessary and sufficient conditions for the schedulability of a CSDF graph. We present and compare two methods for checking the liveness of a graph. The first one checks the liveness of loops, and the second one constructs a single-processor schedule for one iteration of the graph. Once the schedulability is tested, a makespan optimal schedule on a multiprocessor can be constructed. We also introduce the heuristic scheduling method of our graphical rapid prototyping environment (GRAPE).

494 citations