scispace - formally typeset
Patent

Pipelined parallelization of multi-dimensional loops with multiple data dependencies

Reads0
Chats0
TLDR
In this paper, the authors propose a mechanism for folding all the data dependencies in a loop into a single, conservative dependence, which leads to one pair of synchronization primitives per loop.
Abstract
A mechanism for folding all the data dependencies in a loop into a single, conservative dependence. This mechanism leads to one pair of synchronization primitives per loop. This mechanism does not require complicated, multi-stage compile time analysis. This mechanism considers only the data dependence information in the loop. The low synchronization cost balances the loss in parallelism due to the reduced overlap between iterations. Additionally, a novel scheme is presented to implement required synchronization to enforce data dependences in a DOACROSS loop. The synchronization is based on an iteration vector, which identifies a spatial position in the iteration space of the loop. Multiple iterations executing in parallel have their own iteration vector for synchronization where they update their position in the iteration space. As no sequential updates to the synchronization variable exist, this method exploits a greater degree of parallelism.

read more

Citations
More filters
Patent

Compiling apparatus and method of a multicore device

TL;DR: In this paper, the authors present an apparatus and method capable of reducing idle resources in a multicore device and improving the use of available resources in the multicore devices by dividing or combining the tasks included in the task groups based on the execution time estimates of task groups.
Patent

Check-hazard instructions for processing vectors

TL;DR: In this paper, a system that determines data dependencies between two vector memory operations or two memory operations that use vectors of memory addresses is presented, where the first memory operation occurs before the second memory operation in program order.
Patent

Generation of parallel code representations

TL;DR: In this paper, a generated grouped representation of existing source code can be used to define regions of the source code that can be run in parallel as a set of tasks based on the grouped representation, which can be converted into a modified representation, such as modified source code or a modified intermediate compiler representation.
Patent

Methods and systems to vectorize scalar computer program loops having loop-carried dependences

TL;DR: In this paper, a method and a system to convert scalar computer program loops having loop carried dependences to vector computer programs is described, where the first predicate set contains a first set of predicates that cause a variable to be defined in a scalar program loop at or before the variable is defined by the first conditionally executed statement.
Patent

Parallelized execution of instruction sequences based on pre-monitoring

TL;DR: In this paper, the first thread is invoked to process at least one of the instructions in a second segment, at least partially in parallel with processing of the first segment by the first hardware thread, in accordance with a specification of register access.
References
More filters
Patent

Mechanism to restrict parallelization of loops

TL;DR: In this paper, the number of threads from a plurality of threads is selected for processing iterations of the loop based on a parameter that specifies a minimum number of loop iterations that a thread should execute.
Patent

Method and compiler for parallel execution of a program

TL;DR: In this article, a loop in a source program is located which is to be executed in parallel, and the result of the analysis is used for calculating data dependence vectors and communication vectors are ANDed to calculate communication dependence vectors.
Journal ArticleDOI

Redundant synchronization elimination for DOACROSS loops

TL;DR: This paper proposes an efficient and general algorithm to identify redundant synchronizations in multiply nested DOACROSS loops which may have multiple statements and loop-exit control branches and addresses the issues of enforcing data synchronization in iterations near the boundary of the iteration space.
Patent

Fast lock-free post-wait synchronization for exploiting parallelism on multi-core processors

TL;DR: In this paper, a post-wait control structure is proposed for improving parallel processing of computer programs. But it is not suitable for DoACROSS loops and similar code are identified and parallelized using a post wait control structure, which can be implemented to include any one of a single counter to enforce an order of execution, an array to track code completion that is indexed by a modulus of a positive integer number, and/or a set of arrays to track a last code completed by a thread and a current code being executed by another.
Patent

Parallel program execution time with message consolidation

TL;DR: In this article, the authors proposed a method to reduce the execution time of a parallel program by merging messages (also called message combining, or message consolidation) after the program has already been partitioned and scheduled onto the PEs.
Related Papers (5)