scispace - formally typeset
Search or ask a question
Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.


Papers
More filters
Proceedings ArticleDOI
19 Oct 1992
TL;DR: The authors present a staggered distribution scheme for DOACROSS loops that utilizes processors more efficiently, since, relative to the equal distribution approach, it requires fewer processors to attain maximum speedup.
Abstract: The authors present a staggered distribution scheme for DOACROSS loops. The scheme uses heuristics to distribute the loop iterations unevenly among processors in order to mask the delay caused by data dependencies and inter-PE (processing element) communication. Simulation results have shown that this scheme is effective for loops that have a large degree of parallelism among iterations. The scheme, due to its nature, distributes loop iterations among PEs based on architectural characteristics of the underlying organization, i.e. processor speed and communication cost. The maximum speedup attained is very close to the maximum speedup possible for a particular loop even in the presence of inter-PE communication cost. This scheme utilizes processors more efficiently, since, relative to the equal distribution approach, it requires fewer processors to attain maximum speedup. Although this scheme produces an unbalanced distribution among processors, this can be remedied by considering other loops when making the distribution to produce a balanced load among processors. >

6 citations

Book ChapterDOI
01 Jan 2013
TL;DR: This work investigates the performance of a highly parallel Particle Swarm Optimization (PSO) algorithm implemented on the graphics processing unit (GPU) and shows that the GPU offers a high degree of performance and achieves a maximum of 37 times speedup over a sequential implementation when the problem size in terms of tasks is large and many swarms are used.
Abstract: We investigate the performance of a highly parallel Particle Swarm Optimization (PSO) algorithm implemented on the graphics processing unit (GPU). In order to achieve this high degree of parallelism we implement a collaborative multi-swarm PSO algorithm on the GPU which relies on the use of many swarms rather than just one. We choose to apply our PSO algorithm against a real-world application: the task matching problem in a heterogeneous distributed computing environment. Due to the potential for large problem sizes with high dimensionality, the task matching problem proves to be very thorough in testing the GPU’s capabilities for handling PSO. Our results show that the GPU offers a high degree of performance and achieves a maximum of 37 times speedup over a sequential implementation when the problem size in terms of tasks is large and many swarms are used.

6 citations

Proceedings ArticleDOI
03 Apr 2011
TL;DR: The implementation and optimization of a state-of-the-art Lattice Boltzmann code for computational fluid-dynamics for massively parallel systems using multi-core processors is described and a sustained performance is obtained that is a large fraction of peak performance.
Abstract: We describe the implementation and optimization of a state-of-the-art Lattice Boltzmann code for computational fluid-dynamics for massively parallel systems using multi-core processors We consider a code describing 2D compressible fluid flows, including thermal and combustion effects We carefully match and balance the large degree of parallelism of the underlying algorithm with all available parallel resources (inter-node, intra-node, SIMD) We test our code on the prototype of the application-driven AuroraScience machine; our results can be readily ported to virtually any large scale system We obtain a sustained performance for this ready-for-physics code that is a large fraction of peak performance

6 citations

Journal ArticleDOI
TL;DR: The fundamental structure of Fastbus and details of its basic operations are presented and it is shown that both high and low speed devices can be accomodated.

6 citations

19 Jun 2007
TL;DR: A new evolution rules application algorithm to a multiset of objects to use in the P system implementation in digital devices that reaches a certain degree of parallelism due to a rule that can be applied several times in a single step.
Abstract: This paper presents a new evolution rules application algorithm to a multiset of objects to use in the P system implementation in digital devices. In each step of this algorithm two main actions are carried out eliminating, at least, an evolution rule to the set of active rules. Therefore, the number of operations executed is limited and it can be known a priori which is its execution time at worst. This is very important as it allows for determination of the number of membranes to be located in each processor in the distributed implementation architectures of P systems to obtain optimal times with minimal resources. Although the algorithm is sequential, it reaches a certain degree of parallelism due to a rule that can be applied several times in a single step. In addition to this, this algorithm has shown in the experimental tests that the execution times is better than the ones previously published.

6 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Scheduling (computing)
78.6K papers, 1.3M citations
83% related
Network packet
159.7K papers, 2.2M citations
80% related
Web service
57.6K papers, 989K citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202147
202048
201952
201870
201775