Pipelined parallelization of multi-dimensional loops with multiple data dependencies

Patent

Pipelined parallelization of multi-dimensional loops with multiple data dependencies

Chats0

TLDR

In this paper, the authors propose a mechanism for folding all the data dependencies in a loop into a single, conservative dependence, which leads to one pair of synchronization primitives per loop.

Abstract:

A mechanism for folding all the data dependencies in a loop into a single, conservative dependence. This mechanism leads to one pair of synchronization primitives per loop. This mechanism does not require complicated, multi-stage compile time analysis. This mechanism considers only the data dependence information in the loop. The low synchronization cost balances the loss in parallelism due to the reduced overlap between iterations. Additionally, a novel scheme is presented to implement required synchronization to enforce data dependences in a DOACROSS loop. The synchronization is based on an iteration vector, which identifies a spatial position in the iteration space of the loop. Multiple iterations executing in parallel have their own iteration vector for synchronization where they update their position in the iteration space. As no sequential updates to the synchronization variable exist, this method exploits a greater degree of parallelism.

Citations

PDF

Open Access

More filters

Patent

Compiling apparatus and method of a multicore device

Ki-seok Kwon, +3 more

TL;DR: In this paper, the authors present an apparatus and method capable of reducing idle resources in a multicore device and improving the use of available resources in the multicore devices by dividing or combining the tasks included in the task groups based on the execution time estimates of task groups.

...read moreread less

Patent

Check-hazard instructions for processing vectors

Jeffry E. Gonion, +1 more

TL;DR: In this paper, a system that determines data dependencies between two vector memory operations or two memory operations that use vectors of memory addresses is presented, where the first memory operation occurs before the second memory operation in program order.

...read moreread less

Patent

Generation of parallel code representations

James J. Radigan

TL;DR: In this paper, a generated grouped representation of existing source code can be used to define regions of the source code that can be run in parallel as a set of tasks based on the grouped representation, which can be converted into a modified representation, such as modified source code or a modified intermediate compiler representation.

...read moreread less

Patent

Methods and systems to vectorize scalar computer program loops having loop-carried dependences

Jayashankar Bharadwaj, +3 more

TL;DR: In this paper, a method and a system to convert scalar computer program loops having loop carried dependences to vector computer programs is described, where the first predicate set contains a first set of predicates that cause a variable to be defined in a scalar program loop at or before the variable is defined by the first conditionally executed statement.

...read moreread less

Patent

Parallelized execution of instruction sequences based on pre-monitoring

Noam Mizrahi, +3 more

TL;DR: In this paper, the first thread is invoked to process at least one of the instructions in a second segment, at least partially in parallel with processing of the first segment by the first hardware thread, in accordance with a specification of register access.

...read moreread less

References

PDF

Open Access

More filters

Patent

Mechanism to restrict parallelization of loops

Raul E. Silvera, +2 more

TL;DR: In this paper, the number of threads from a plurality of threads is selected for processing iterations of the loop based on a parameter that specifies a minimum number of loop iterations that a thread should execute.

...read moreread less

Patent

Method and compiler for parallel execution of a program

Hideaki Komatsu, +2 more

TL;DR: In this article, a loop in a source program is located which is to be executed in parallel, and the result of the analysis is used for calculating data dependence vectors and communication vectors are ANDed to calculate communication dependence vectors.

...read moreread less

Journal ArticleDOI

Redundant synchronization elimination for DOACROSS loops

Ding-Kai Chen, +1 more

- 01 May 1999 -

IEEE Transactions on Parallel and Distri...

TL;DR: This paper proposes an efficient and general algorithm to identify redundant synchronizations in multiply nested DOACROSS loops which may have multiple statements and loop-exit control branches and addresses the issues of enforcing data synchronization in iterations near the boundary of the iteration space.

...read moreread less

Patent

Fast lock-free post-wait synchronization for exploiting parallelism on multi-core processors

Arun Kejariwal, +6 more

TL;DR: In this paper, a post-wait control structure is proposed for improving parallel processing of computer programs. But it is not suitable for DoACROSS loops and similar code are identified and parallelized using a post wait control structure, which can be implemented to include any one of a single counter to enforce an order of execution, an array to track code completion that is indexed by a modulus of a positive integer number, and/or a set of arrays to track a last code completed by a thread and a current code being executed by another.

...read moreread less

Patent

Parallel program execution time with message consolidation

Stephen Ray Shafer, +1 more

TL;DR: In this article, the authors proposed a method to reduce the execution time of a parallel program by merging messages (also called message combining, or message consolidation) after the program has already been partitioned and scheduled onto the PEs.

...read moreread less

Pipelined parallelization of multi-dimensional loops with multiple data dependencies

Citations

Compiling apparatus and method of a multicore device

Check-hazard instructions for processing vectors

Generation of parallel code representations

Methods and systems to vectorize scalar computer program loops having loop-carried dependences

Parallelized execution of instruction sequences based on pre-monitoring

References

Mechanism to restrict parallelization of loops

Method and compiler for parallel execution of a program

Redundant synchronization elimination for DOACROSS loops

Fast lock-free post-wait synchronization for exploiting parallelism on multi-core processors

Parallel program execution time with message consolidation

Related Papers (5)

Pipelined parallelization with localized self-helper threading

Method and system for converting a single-threaded software program into an application-specific supercomputer

Run-time methods for parallelizing partially parallel loops

Finding Synchronization-Free Slices of Operations in Arbitrarily Nested Loops

Optimal loop parallelization for maximizing iteration-level parallelism