Topic
Degree of parallelism
About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.
Papers published on a yearly basis
Papers
More filters
••
19 Oct 1992TL;DR: The authors present a staggered distribution scheme for DOACROSS loops that utilizes processors more efficiently, since, relative to the equal distribution approach, it requires fewer processors to attain maximum speedup.
Abstract: The authors present a staggered distribution scheme for DOACROSS loops. The scheme uses heuristics to distribute the loop iterations unevenly among processors in order to mask the delay caused by data dependencies and inter-PE (processing element) communication. Simulation results have shown that this scheme is effective for loops that have a large degree of parallelism among iterations. The scheme, due to its nature, distributes loop iterations among PEs based on architectural characteristics of the underlying organization, i.e. processor speed and communication cost. The maximum speedup attained is very close to the maximum speedup possible for a particular loop even in the presence of inter-PE communication cost. This scheme utilizes processors more efficiently, since, relative to the equal distribution approach, it requires fewer processors to attain maximum speedup. Although this scheme produces an unbalanced distribution among processors, this can be remedied by considering other loops when making the distribution to produce a balanced load among processors. >
6 citations
••
01 Jan 2013TL;DR: This work investigates the performance of a highly parallel Particle Swarm Optimization (PSO) algorithm implemented on the graphics processing unit (GPU) and shows that the GPU offers a high degree of performance and achieves a maximum of 37 times speedup over a sequential implementation when the problem size in terms of tasks is large and many swarms are used.
Abstract: We investigate the performance of a highly parallel Particle Swarm Optimization (PSO) algorithm implemented on the graphics processing unit (GPU). In order to achieve this high degree of parallelism we implement a collaborative multi-swarm PSO algorithm on the GPU which relies on the use of many swarms rather than just one. We choose to apply our PSO algorithm against a real-world application: the task matching problem in a heterogeneous distributed computing environment. Due to the potential for large problem sizes with high dimensionality, the task matching problem proves to be very thorough in testing the GPU’s capabilities for handling PSO. Our results show that the GPU offers a high degree of performance and achieves a maximum of 37 times speedup over a sequential implementation when the problem size in terms of tasks is large and many swarms are used.
6 citations
••
03 Apr 2011TL;DR: The implementation and optimization of a state-of-the-art Lattice Boltzmann code for computational fluid-dynamics for massively parallel systems using multi-core processors is described and a sustained performance is obtained that is a large fraction of peak performance.
Abstract: We describe the implementation and optimization of a state-of-the-art Lattice Boltzmann code for computational fluid-dynamics for massively parallel systems using multi-core processors We consider a code describing 2D compressible fluid flows, including thermal and combustion effects We carefully match and balance the large degree of parallelism of the underlying algorithm with all available parallel resources (inter-node, intra-node, SIMD) We test our code on the prototype of the application-driven AuroraScience machine; our results can be readily ported to virtually any large scale system We obtain a sustained performance for this ready-for-physics code that is a large fraction of peak performance
6 citations
••
TL;DR: The fundamental structure of Fastbus and details of its basic operations are presented and it is shown that both high and low speed devices can be accomodated.
6 citations
19 Jun 2007
TL;DR: A new evolution rules application algorithm to a multiset of objects to use in the P system implementation in digital devices that reaches a certain degree of parallelism due to a rule that can be applied several times in a single step.
Abstract: This paper presents a new evolution rules application algorithm to a multiset of objects to use in the P system implementation in digital devices. In each step of this algorithm two main actions are carried out eliminating, at least, an evolution rule to the set of active rules. Therefore, the number of operations executed is limited and it can be known a priori which is its execution time at worst. This is very important as it allows for determination of the number of membranes to be located in each processor in the distributed implementation architectures of P systems to obtain optimal times with minimal resources. Although the algorithm is sequential, it reaches a certain degree of parallelism due to a rule that can be applied several times in a single step. In addition to this, this algorithm has shown in the experimental tests that the execution times is better than the ones previously published.
6 citations