scispace - formally typeset
Search or ask a question
Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.


Papers
More filters
Journal ArticleDOI
Liang Chen1
TL;DR: The key idea is that the customer departure times are represented by longest-path distance in directed graphs instead of by the usual recursive equations, which leads to scalable algorithms with a high degree of parallelism that can be implemented on either MIMD or SIMD parallel computers.
Abstract: This paper presents several basic algorithms for the parallel simulation of G/G/1 queueing systems and certain networks of such systems. The coverage includes systems subject to manufacturing or communication blocking, or to loss of customer due to capacity constraints. The key idea is that the customer departure times are represented by longest-path distance in directed graphs instead of by the usual recursive equations. This representation leads to scalable algorithms with a high degree of parallelism that can be implemented on either MIMD or SIMD parallel computers.

12 citations

Journal ArticleDOI
TL;DR: In this article, the authors describe the algorithms implemented in FDPS to make efficient use of accelerator hardware such as GPGPUs, and they have constructed a detailed performance model, and found that the current implementation can achieve good performance on systems with much smaller memory and communication bandwidth.
Abstract: In this paper, we describe the algorithms we implemented in FDPS to make efficient use of accelerator hardware such as GPGPUs. We have developed FDPS to make it possible for many researchers to develop their own high-performance parallel particle-based simulation programs without spending large amount of time for parallelization and performance tuning. The basic idea of FDPS is to provide a high-performance implementation of parallel algorithms for particle-based simulations in a "generic" form, so that researchers can define their own particle data structure and interparticle interaction functions and supply them to FDPS. FDPS compiled with user-supplied data type and interaction function provides all necessary functions for parallelization, and using those functions researchers can write their programs as though they are writing simple non-parallel program. It has been possible to use accelerators with FDPS, by writing the interaction function that uses the accelerator. However, the efficiency was limited by the latency and bandwidth of communication between the CPU and the accelerator and also by the mismatch between the available degree of parallelism of the interaction function and that of the hardware parallelism. We have modified the interface of user-provided interaction function so that accelerators are more efficiently used. We also implemented new techniques which reduce the amount of work on the side of CPU and amount of communication between CPU and accelerators. We have measured the performance of N-body simulations on a systems with NVIDIA Volta GPGPU using FDPS and the achieved performance is around 27 \% of the theoretical peak limit. We have constructed a detailed performance model, and found that the current implementation can achieve good performance on systems with much smaller memory and communication bandwidth.

12 citations

Proceedings ArticleDOI
09 Mar 2020
TL;DR: A runtime manager for firm real-time applications that generates such mapping segments based on partial solutions and aims at minimizing the overall energy consumption without deadline violations is presented.
Abstract: Modern embedded computing platforms consist of a high amount of heterogeneous resources, which allows executing multiple applications on a single device. The number of running application on the system varies with time and so does the amount of available resources. This has considerably increased the complexity of analysis and optimization algorithms for runtime mapping of firm real-time applications. To reduce the runtime overhead, researchers have proposed to pre-compute partial mappings at compile time and have the runtime efficiently compute the final mapping. However, most existing solutions only compute a fixed mapping for a given set of running applications, and the mapping is defined for the entire duration of the workload execution. In this work we allow applications to adapt to the amount of available resources by using mapping segments. This way, applications may switch between different configurations with varied degree of parallelism. We present a runtime manager for firm real-time applications that generates such mapping segments based on partial solutions and aims at minimizing the overall energy consumption without deadline violations. The proposed algorithm outperforms the state-of-the-art approaches on the overall energy consumption by up to 13% while incurring an order of magnitude less scheduling overhead.

12 citations

Proceedings ArticleDOI
02 Sep 1996
TL;DR: The proposed multithreaded architecture obtains a high degree of parallelism at the server side, allowing both the disk controller and the network card controller work in parallel, and achieves synchronized playback of the video stream at its precise rate at the client side.
Abstract: In this paper we present the design and implementation of a client/server based multimedia architecture for supporting video-on-demand applications. We describe in detail the software architecture of the implementation along with the adopted buffering mechanism. The proposed multithreaded architecture obtains, on one hand, a high degree of parallelism at the server side, allowing both the disk controller and the network card controller work in parallel. On the other hand; at the client side, it achieves the synchronized playback of the video stream at its precise rate, decoupling this process from the reception of data through the network. Additionally, we have derived, under an engineering perspective, some services that a real-time operating system should offer to satisfy the requirements found in video-on-demand applications.

11 citations

Journal ArticleDOI
TL;DR: To search for the best preconditioner on a parallel machine, it has to consider the tradeoffs between fast convergence rate and high degree of parallelism as well as the architecture of the target parallel computer.
Abstract: This paper presents the results of the Connection Machine implementation of a number of preconditioners for the preconditioned conjugate gradient method. The preconditioners implemented include those based on the incomplete LU factorization, the modified incomplete LU factorization, the symmetric successive overrelaxation, and others such as several polynomial preconditioners and the hierarchical basis preconditioner. Results based on numerical experiments show that both the degree of parallelism inherent in a preconditioner and its convergence rate improvement play important roles on the overall execution time performance on parallel computers. Factors that affect the performance of the preconditioners will also be discussed. We conclude that to search for the best preconditioner on a parallel machine, we have to consider the tradeoffs between fast convergence rate and high degree of parallelism as well as the architecture of the target parallel computer.

11 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Scheduling (computing)
78.6K papers, 1.3M citations
83% related
Network packet
159.7K papers, 2.2M citations
80% related
Web service
57.6K papers, 989K citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202147
202048
201952
201870
201775