Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A Topology-Based Approach to Exploiting Sparsity in Multibody Dynamics: Joint Formulation*

[...]

Dan Negrut, Radu Serban, Florian A. Potra

01 Jan 1997-Mechanics of Structures and Machines

TL;DR: In this paper, it is shown that the inertia matrix associated with any open-or closed-loop mechanism is positive definite by finding a simple mathematical expression for the quadratic form expressing the kinetic energy in an associated state space.

...read moreread less

Abstract: In this paper, advantage is taken of the problem structure in multibody dynamics simulation when the mechanical system is modeled using a minimal set of generalized coordinates. It is shown that the inertia matrix associated with any open- or closed-loop mechanism is positive definite by finding a simple mathematical expression for the quadratic form expressing the kinetic energy in an associated state space. Based on this result, an algorithm that efficiently solves for second time derivatives of the generalized coordinates is presented. Significant speed-ups accrue due to both the no fill-in factorization of the composite inertia matrix technique and the degree of parallelism attainable with the new algorithm.

...read moreread less

20 citations

Journal Article•DOI•

sLASs: A fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs Library)

[...]

Pedro Valero-Lara¹, Sandra Catalán¹, Xavier Martorell², Tetsuzo Usui³, Jesús Labarta² - Show less +1 more•Institutions (3)

Barcelona Supercomputing Center¹, Polytechnic University of Catalonia², Fujitsu³

01 Apr 2020-Journal of Parallel and Distributed Computing

TL;DR: This work has implemented a novel Linear Algebra Library on top of the task-based runtime OmpSs-2, a novel library for auto-tunable codes for linear algebra operations based on LASs library that presents an improvement in terms of execution time against other reference libraries.

...read moreread less

20 citations

Journal Article•DOI•

A GPU-based discrete event simulation kernel

[...]

Wenjie Tang¹, Yiping Yao¹•Institutions (1)

National University of Defense Technology¹

01 Nov 2013

TL;DR: A GPU-based simulation kernel (gDES) to support DES is presented and three algorithms to support high efficiency are proposed to increase the degree of parallelism while retaining the number of synchronizations.

...read moreread less

Abstract: The graphic processing unit (GPU) can perform some large-scale simulations in an economical way. However, harnessing the power of a GPU to discrete event simulation (DES) is difficult because of the mismatch between GPU's synchronous execution mode and DES's asynchronous time advance mechanism. In this paper, we present a GPU-based simulation kernel (gDES) to support DES and propose three algorithms to support high efficiency. Since both limited parallelism and redundant synchronization affect the performance of DES based on a GPU, we propose a breadth-expansion conservative time window algorithm to increase the degree of parallelism while retaining the number of synchronizations. By using the expansion method, it can import as many as possible 'safe' events. The irregular and dynamic requirement for storing the events leads to uneven and sparse memory usage, thereby causing waste of memory and unnecessary overhead. A memory management algorithm is proposed to store events in a balanced and compact way by using a lightweight stochastic method. When events processed by threads in a warp have different types, the performance of gDES decreases rapidly because of branch divergence. An event redistribution algorithm is proposed by reassigning events of the same type to neighboring threads to reduce the probability of branch divergence. We analyze the superiority of the proposed algorithms and gDES with a series of experiments. Compared to a CPU-based simulator on a multicore platform, the gDES can achieve up to 11A, 5A, and 8A speedup in PHOLD, QUEUING NETWORK, and epidemic simulation, respectively.

...read moreread less

20 citations

Proceedings Article•DOI•

Reconfiguring Parallel State Machine Replication

[...]

Eduardo Alchieri¹, Fernando Luís Dotti², Odorico Machado Mendizabal³, Fernando Pedone⁴•Institutions (4)

University of Brasília¹, Pontifícia Universidade Católica do Rio Grande do Sul², Universidade Federal do Rio Grande do Sul³, University of Lugano⁴

01 Sep 2017

TL;DR: This paper proposes a protocol to reconfigure the degree of parallelism in parallel SMR on-the-fly and shows the gains due to reconfiguration and shed some light on the behavior of parallel and reconfigurable SMR.

...read moreread less

Abstract: State Machine Replication (SMR) is a well-known technique to implement fault-tolerant systems. In SMR, servers are replicated and client requests are deterministically executed in the same order by all replicas. To improve performance in multi-processor systems, some approaches have proposed to parallelize the execution of non-conflicting requests. Such approaches perform remarkably well in workloads dominated by non-conflicting requests. Conflicting requests introduce expensive synchronization and result in considerable performance loss. Current approaches to parallel SMR define the degree of parallelism statically. However, it is often difficult to predict the best degree of parallelism for a workload and workloads experience variations that change their best degree of parallelism. This paper proposes a protocol to reconfigure the degree of parallelism in parallel SMR on-the-fly. Experiments show the gains due to reconfiguration and shed some light on the behavior of parallel and reconfigurable SMR.

...read moreread less

20 citations

Book Chapter•DOI•

On Portability, Performance and Scalability of an MPI OpenCL Lattice Boltzmann Code

[...]

Enrico Calore¹, Sebastiano Fabio Schifano², Raffaele Tripiccione²•Institutions (2)

Istituto Nazionale di Fisica Nucleare¹, University of Ferrara²

25 Aug 2014

TL;DR: A performance assessment of a massively parallel and portable Lattice Boltzmann code, based on the Open Computing Language (OpenCL) and the Message Passing Interface (MPI), and techniques to move data between accelerators minimizing overheads of communication latencies are presented.

...read moreread less

Abstract: High performance computing increasingly relies on heterogeneous systems, based on multi-core CPUs, tightly coupled to accelerators: GPUs or many core systems. Programming heterogeneous systems raises new issues: reaching high sustained performances means that one must exploit parallelism at several levels; at the same time the lack of a standard programming environment has an impact on code portability. This paper presents a performance assessment of a massively parallel and portable Lattice Boltzmann code, based on the Open Computing Language (OpenCL) and the Message Passing Interface (MPI). Exactly the same code runs on standard clusters of multi-core CPUs, as well as on hybrid clusters including accelerators. We consider a state-of-the-art Lattice Boltzmann model that accurately reproduces the thermo-hydrodynamics of a fluid in 2 dimensions. This algorithm has a regular structure suitable for accelerator architectures with a large degree of parallelism, but it is not straightforward to obtain a large fraction of the theoretically available performance. In this work we focus on portability of code across several heterogeneous architectures preserving performances and also on techniques to move data between accelerators minimizing overheads of communication latencies. We describe the organization of the code and present and analyze performance and scalability results on a cluster of nodes based on NVIDIA K20 GPUs and Intel Xeon-Phi accelerators.

...read moreread less

20 citations

Collapse

Network Information

Performance

Metrics

1,515

Papers

27,447

Citations

No. of papers in the topic in previous years
Year	Papers
2022	1
2021	47
2020	48
2019	52
2018	70
2017	75

Degree of parallelism

Papers published on a yearly basis

Papers

Trending Questions (7)

Network Information

Related Topics (5)

Performance

Metrics