Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Parallel simulation by multi-instruction, longest-path algorithms

[...]

Liang Chen¹•Institutions (1)

Alcatel-Lucent¹

14 Dec 1997-Queueing Systems

TL;DR: The key idea is that the customer departure times are represented by longest-path distance in directed graphs instead of by the usual recursive equations, which leads to scalable algorithms with a high degree of parallelism that can be implemented on either MIMD or SIMD parallel computers.

...read moreread less

Abstract: This paper presents several basic algorithms for the parallel simulation of G/G/1 queueing systems and certain networks of such systems. The coverage includes systems subject to manufacturing or communication blocking, or to loss of customer due to capacity constraints. The key idea is that the customer departure times are represented by longest-path distance in directed graphs instead of by the usual recursive equations. This representation leads to scalable algorithms with a high degree of parallelism that can be implemented on either MIMD or SIMD parallel computers.

...read moreread less

12 citations

Journal Article•DOI•

Accelerated FDPS: Algorithms to use accelerators with FDPS

[...]

Masaki Iwasawa¹, Daisuke Namekata, Keigo Nitadori, Kentaro Nomura¹, Long Wang¹, Miyuki Tsubouchi, Junichiro Makino², Junichiro Makino¹ - Show less +4 more•Institutions (2)

Kobe University¹, Tokyo Institute of Technology²

01 Feb 2020-Publications of the Astronomical Society of Japan

TL;DR: In this article, the authors describe the algorithms implemented in FDPS to make efficient use of accelerator hardware such as GPGPUs, and they have constructed a detailed performance model, and found that the current implementation can achieve good performance on systems with much smaller memory and communication bandwidth.

...read moreread less

Abstract: In this paper, we describe the algorithms we implemented in FDPS to make efficient use of accelerator hardware such as GPGPUs. We have developed FDPS to make it possible for many researchers to develop their own high-performance parallel particle-based simulation programs without spending large amount of time for parallelization and performance tuning. The basic idea of FDPS is to provide a high-performance implementation of parallel algorithms for particle-based simulations in a "generic" form, so that researchers can define their own particle data structure and interparticle interaction functions and supply them to FDPS. FDPS compiled with user-supplied data type and interaction function provides all necessary functions for parallelization, and using those functions researchers can write their programs as though they are writing simple non-parallel program. It has been possible to use accelerators with FDPS, by writing the interaction function that uses the accelerator. However, the efficiency was limited by the latency and bandwidth of communication between the CPU and the accelerator and also by the mismatch between the available degree of parallelism of the interaction function and that of the hardware parallelism. We have modified the interface of user-provided interaction function so that accelerators are more efficiently used. We also implemented new techniques which reduce the amount of work on the side of CPU and amount of communication between CPU and accelerators. We have measured the performance of N-body simulations on a systems with NVIDIA Volta GPGPU using FDPS and the achieved performance is around 27 \% of the theoretical peak limit. We have constructed a detailed performance model, and found that the current implementation can achieve good performance on systems with much smaller memory and communication bandwidth.

...read moreread less

12 citations

Proceedings Article•DOI•

Energy-efficient runtime resource management for adaptable multi-application mapping

[...]

Robert Khasanov, Jeronimo Castrillon

09 Mar 2020

TL;DR: A runtime manager for firm real-time applications that generates such mapping segments based on partial solutions and aims at minimizing the overall energy consumption without deadline violations is presented.

...read moreread less

Abstract: Modern embedded computing platforms consist of a high amount of heterogeneous resources, which allows executing multiple applications on a single device. The number of running application on the system varies with time and so does the amount of available resources. This has considerably increased the complexity of analysis and optimization algorithms for runtime mapping of firm real-time applications. To reduce the runtime overhead, researchers have proposed to pre-compute partial mappings at compile time and have the runtime efficiently compute the final mapping. However, most existing solutions only compute a fixed mapping for a given set of running applications, and the mapping is defined for the entire duration of the workload execution. In this work we allow applications to adapt to the amount of available resources by using mapping segments. This way, applications may switch between different configurations with varied degree of parallelism. We present a runtime manager for firm real-time applications that generates such mapping segments based on partial solutions and aims at minimizing the overall energy consumption without deadline violations. The proposed algorithm outperforms the state-of-the-art approaches on the overall energy consumption by up to 13% while incurring an order of magnitude less scheduling overhead.

...read moreread less

12 citations

Proceedings Article•DOI•

The design and implementation of a multimedia storage server to support video-on-demand applications

[...]

A. Molano¹, A. Garcia-Martinez¹, A. Vina¹•Institutions (1)

Autonomous University of Madrid¹

02 Sep 1996

TL;DR: The proposed multithreaded architecture obtains a high degree of parallelism at the server side, allowing both the disk controller and the network card controller work in parallel, and achieves synchronized playback of the video stream at its precise rate at the client side.

...read moreread less

Abstract: In this paper we present the design and implementation of a client/server based multimedia architecture for supporting video-on-demand applications. We describe in detail the software architecture of the implementation along with the adopted buffering mechanism. The proposed multithreaded architecture obtains, on one hand, a high degree of parallelism at the server side, allowing both the disk controller and the network card controller work in parallel. On the other hand; at the client side, it achieves the synchronized playback of the video stream at its precise rate, decoupling this process from the reception of data through the network. Additionally, we have derived, under an engineering perspective, some services that a real-time operating system should offer to satisfy the requirements found in video-on-demand applications.

...read moreread less

11 citations

Journal Article•DOI•

The preconditioned conjugate gradient method on the connection machine

[...]

Charles Tong

01 Jun 1989-International Journal of High Speed Computing

TL;DR: To search for the best preconditioner on a parallel machine, it has to consider the tradeoffs between fast convergence rate and high degree of parallelism as well as the architecture of the target parallel computer.

...read moreread less

Abstract: This paper presents the results of the Connection Machine implementation of a number of preconditioners for the preconditioned conjugate gradient method. The preconditioners implemented include those based on the incomplete LU factorization, the modified incomplete LU factorization, the symmetric successive overrelaxation, and others such as several polynomial preconditioners and the hierarchical basis preconditioner. Results based on numerical experiments show that both the degree of parallelism inherent in a preconditioner and its convergence rate improvement play important roles on the overall execution time performance on parallel computers. Factors that affect the performance of the preconditioners will also be discussed. We conclude that to search for the best preconditioner on a parallel machine, we have to consider the tradeoffs between fast convergence rate and high degree of parallelism as well as the architecture of the target parallel computer.

...read moreread less

11 citations

Collapse

Network Information

Performance

Metrics

1,515

Papers

27,447

Citations

No. of papers in the topic in previous years
Year	Papers
2022	1
2021	47
2020	48
2019	52
2018	70
2017	75

Degree of parallelism

Papers published on a yearly basis

Papers

Trending Questions (7)

Network Information

Related Topics (5)

Performance

Metrics