scispace - formally typeset
Search or ask a question
Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.


Papers
More filters
Book ChapterDOI
TL;DR: This work studies the implementation of Multi-Objective DE (MODE) on the GPU with C-CUDA, evaluating the gain in processing time against the sequential version and shows that the approach achieves an expressive speed up and a highly efficient processing power.
Abstract: In some applications, evolutionary algorithms may require high computational resources and high processing power, sometimes not producing a satisfactory solution after running for a considerable amount of time. One possible improvement is a parallel approach to reduce the response time. This work proposes to study a parallel multi-objective algorithm, the multi-objective version of Differential Evolution (DE). The generation of trial individuals can be done in parallel, greatly reducing the overall processing time of the algorithm. A novel approach to parallelize this algorithm is the implementation on the Graphic Processing Units (GPU). These units present high degree of parallelism and they were initially developed for image rendering. However, NVIDIA has released a framework, named CUDA, which allows developers to use GPU for general-purpose computing (GPGPU). This work studies the implementation of Multi-Objective DE (MODE) on the GPU with C-CUDA, evaluating the gain in processing time against the sequential version. Benchmark functions are used to validate the implementation and to confirm the efficiency of MODE on the GPU. The results show that the approach achieves an expressive speed up and a highly efficient processing power.

13 citations

Proceedings ArticleDOI
26 Apr 1992
TL;DR: The authors have parallelized the AMBER molecular dynamics program for the AP1000 highly parallel computer and showed that a problem with 41095 atoms is processed 226 times faster with a 512 processor AP1000 than by a single processor.
Abstract: The authors have parallelized the AMBER molecular dynamics program for the AP1000 highly parallel computer. To obtain a high degree of parallelism and an even load balance between processors for model problems of protein and water molecules, protein amino acid residues and water molecules are distributed to processors randomly. Global interprocessor communication required by this data mapping is efficiently done using the AP1000 broadcast network, to broadcast atom coordinate data for other processors' reference and its torus network; also for point-to-point communication to accumulate forces for atoms assigned to other processors. Experiments showed that a problem with 41095 atoms is processed 226 times faster with a 512 processor AP1000 than by a single processor. >

13 citations

Proceedings ArticleDOI
01 Feb 2021
TL;DR: DepGraph as discussed by the authors prefetches the vertices for the core on-the-fly along the dependency chains between their states and the active vertices' new states, aiming to effectively accelerate the propagations of the active nodes’ new states and also ensure better data locality.
Abstract: Many graph processing systems have been recently developed for many-core processors. However, for iterative graph processing, due to the dependencies between vertices’ states, the propagations of new states of vertices are inherently conducted along graph paths sequentially and are also dependent on each other. Despite the years’ research effort, existing solutions still severely underutilize many-core processors to quickly propagate the new states of vertices, suffering from slow convergence speed. In this paper, we propose a dependency-driven programmable accelerator, DepGraph, which couples with the core architecture of the many-core processor and can fundamentally alleviate the challenge of dependencies for faster state propagation. Specifically, we propose an effective dependency-driven asynchronous execution approach into novel microarchitecture designs for faster state propagations. DepGraph prefetches the vertices for the core on-the-fly along the dependency chains between their states and the active vertices’ new states, aiming to effectively accelerate the propagations of the active vertices’ new states and also ensure better data locality. Through transforming the dependency chains along the frequently-used paths into direct ones at runtime and maintaining these calculated direct dependencies as a set of fast shortcuts, called hub index, DepGraph further accelerates most state propagations. Also, many propagations do not need to wait for the completion of other propagations, which enables more propagations to be effectively conducted along the paths with higher degree of parallelism. The experimental results show that for iterative graph processing on a simulated 64-core processor, a cutting-edge software graph processing system can achieve 5.0–22.7 times speedup after integrating with our DepGraph while incurring only 0.6% area cost. In comparison with three state-of-the-art hardware solutions, i.e., HATS, Minnow, and PHI, DepGraph improves the performance by up to 3.0–14.2, 2.2–5.8, and 2.4–10.1 times, respectively.

13 citations

Journal ArticleDOI
01 Jan 1992
TL;DR: The efficiencies obtained by an implementation on a message-passing multiprocessor demonstrate the suitability of the time-parallel extrapolation method for this type of equation.
Abstract: We consider the problem of solving unsteady partial differential equations on an MIMD machine. Conventional parallel methods use a data partitioning type approach in which the solution grid at each time-step is divided amongst the available processors. The sequential nature of the time integration is, however, retained. The algorithm presented in this paper makes use of a time-parallel approach, whreby several processors may be employed to solve at several time-steps simultaneously. The time-parallel method enables the inherent parallelism of the extrapolation scheme to be efficiently exploited, allowing a significant increase both in accuracy and in the degree of parallelism. The efficiencies obtained by an implementation on a message-passing multiprocessor demonstrate the suitability of the time-parallel extrapolation method for this type of equation.

13 citations

Journal ArticleDOI
TL;DR: The labelled dependency graph associated with a P system is defined, and this new concept is used for proving some results concerning the maximum number of applications of rules in a single step through the computation of a P systems.

13 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Scheduling (computing)
78.6K papers, 1.3M citations
83% related
Network packet
159.7K papers, 2.2M citations
80% related
Web service
57.6K papers, 989K citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202147
202048
201952
201870
201775