scispace - formally typeset
Search or ask a question
Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.


Papers
More filters
Book ChapterDOI
28 Aug 2006
TL;DR: A distributed implementation of a load balancing heuristic for parallel adaptive FEM simulations based on a disturbed diffusion scheme embedded in a learning framework that helps to omit unnecessary computations as well as replace the domain decomposition by an alternative data distribution scheme reducing the communication overhead.
Abstract: Load balancing is an important issue in parallel numerical simulations. However, state-of-the-art libraries addressing this problem show several deficiencies: they are hard to parallelize, focus on small edge-cuts rather than few boundary vertices, and often produce disconnected partitions. We present a distributed implementation of a load balancing heuristic for parallel adaptive FEM simulations. It is based on a disturbed diffusion scheme embedded in a learning framework. This approach incorporates a high degree of parallelism that can be exploited and it computes well-shaped partitions as shown in previous publications. Our focus lies on improving the condition of the involved matrix and solving the resulting linear systems with local accuracy. This helps to omit unnecessary computations as well as allows to replace the domain decomposition by an alternative data distribution scheme reducing the communication overhead, as shown by experiments with our new MPI based implementation.

14 citations

Proceedings ArticleDOI
19 Apr 2010
TL;DR: This paper shows how the efficient parallelization of the tau-leap method for GPUs includes the redefinition of its data structures, the redesign of its algorithm, and the selection of the most appropriate degree of parallelism on a single GPU or multiple GPUs.
Abstract: The Coarse-Grained Monte Carlo (CGMC) method is a multi-scale stochastic mathematical and simulation framework for spatially distributed systems. CGMC simulations are important tools for studying phenomena such as catalysis, crystal growth, surface diffusion, phase transitions on single crystals, and cell membrane receptor dynamics. In parallel CGMC, the tau-leap method is used for parallel simulations that are executed on traditional CPU clusters in a master-slave setting. Unfortunately the communications between master and slaves negatively impact speedup and scalability. In this paper, we explore the potentials of GPUs for the tau-leap method and we present an extensive performance evaluation that leads to the most suitable degree of parallelism for this method under different simulation profiles. We show how the efficient parallelization of the tau-leap method for GPUs includes (1) the redefinition of its data structures, (2) the redesign of its algorithm, and (3) the selection of the most appropriate degree of parallelism (i.e., fine-grained or course-gained) on a single GPU or multiple GPUs. Exceptional performance improvements can thus be achieved for this method.

14 citations

Proceedings ArticleDOI
24 Aug 2015
TL;DR: The proposed compilation flow aims at exposing high degree of parallelism in loop nests in HPC application kernels using polyhedral analysis and generates meta-data to effectively utilize the computational resources in HyperCells.
Abstract: In this paper, we present a compilation flow for HPC kernels on the REDEFINE coarse-grain reconfigurable architecture (CGRA). REDEFINE is a scalable macro-dataflow machine in which the compute elements (CEs) communicate through messages. REDEFINE offers the ability to exploit high degree of coarse-grain and pipeline parallelism. The CEs in REDEFINE are enhanced with reconfigurable macro data-paths called HyperCells that enable exploitation of fine-grain and pipeline parallelism at the level of basic instructions in static dataflow order. Application kernels that exhibit regularity in computations and memory accesses such as affine loop nests benefit from the architecture of HyperCell [1], [2]. The proposed compilation flow aims at exposing high degree of parallelism in loop nests in HPC application kernels using polyhedral analysis and generates meta-data to effectively utilize the computational resources in HyperCells. Memory is explicitly managed through compiler's assistance. We address the compilation challenges such as partitioning with load balancing, mapping and scheduling computations and management of operand data while targeting multiple HyperCells in the REDEFINE architecture. The proposed solution scales well meeting the performance objectives of HPC computing.

14 citations

Journal ArticleDOI
TL;DR: The hardware implementation of a spiking neuron model, which uses a spike time dependent plasticity (STDP) rule that allows synaptic changes by discrete time steps, is described and the serial implementation has been realized.
Abstract: In this paper we describe the hardware implementation of a spiking neuron model, which uses a spike time dependent plasticity (STDP) rule that allows synaptic changes by discrete time steps. For this purpose an integrate-and-fire neuron is used with recurrent local connections. The connectivity of this model has been set to 24 neighbours, so there is a high degree of parallelism. After obtaining good results with the hardware implementation of the model, we proceed to simplify this hardware description, trying to keep the same behaviour. Some experiments using dynamic grading patterns have been used in order to test the learning capabilities of the model. Finally, the serial implementation has been realized.

14 citations

Book ChapterDOI
01 Nov 2000
TL;DR: In this article, the authors present a process algebra approach for the integrated verification of correctness and performance in concurrent systems, which is entirely performed within the Circal process algebra, without any recourse to other formalisms.
Abstract: In this paper we present a process algebra approach for the integrated verification of correctness and performance in concurrent systems. The verification procedure is entirely performed within the Circal process algebra, without any recourse to other formalisms. Performance is characterised in terms of logical properties, which do not incorporate explicit time. Such properties are then interpreted in terms of degree of parallelism and allow the quantitative evaluation of the throughput of the system. The approach has been applied to two four-phase handshaking protocols, which are motivated by the implementation of the AMULET2 asynchronous RISC processor. Both correctness and performance properties are captured in the same verification framework and automatically proved using the Circal System.

14 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Scheduling (computing)
78.6K papers, 1.3M citations
83% related
Network packet
159.7K papers, 2.2M citations
80% related
Web service
57.6K papers, 989K citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202147
202048
201952
201870
201775