scispace - formally typeset
Search or ask a question
Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.


Papers
More filters
Book ChapterDOI
23 Sep 1997
TL;DR: A theory and proof rules for the refinement of action systems that communicate via remote procedures based on the data refinement approach are developed and the atomicity refinement of actions is studied.
Abstract: Recently the action systems formalism for parallel and distributed systems has been extended with the procedure mechanism. This gives us a very general framework for describing different communication paradigms for action systems, e.g. remote procedure calls. Action systems come with a design methodology based on the refinement calculus. Data refinement is a powerful technique for refining action systems. In this paper we will develop a theory and proof rules for the refinement of action systems that communicate via remote procedures based on the data refinement approach. The proof rules we develop are compositional so that modular refinement of action systems is supported. As an .example we will especially study the atomicity refinement of actions. This is an important refinement strategy, as it potentially increases the degree of parallelism in an action system.

14 citations

Proceedings ArticleDOI
08 Apr 2013
TL;DR: This paper defines a novel cost model for intra-node parallel dataflow programs with user-defined functions and introduces different batching schemes to reduce the number of output buffers.
Abstract: The performance of intra-node parallel dataflow programs in the context of streaming systems depends mainly on two parameters: the degree of parallelism for each node of the dataflow program as well as the batching size for each node. In the state-of-the-art systems the user has to specify those values manually. Manual tuning of both parameters is necessary in order to get good performance. However, this process is difficult and time consuming-even for experts. In this paper we introduce and optimization algorithm that optimizes both parameters automatically. We define a novel cost model for intra-node parallel dataflow programs with user-defined functions. Furthermore, we introduce different batching schemes to reduce the number of output buffers, i. e., main memory consumption. We implemented our approach on top of the open source system Storm and ran experiments with different workloads. Our results show a throughput improvement of more than one order of magnitude while the optimization time is less than a second.

13 citations

Proceedings ArticleDOI
20 Jul 2015
TL;DR: This paper looks at the interplay between data organization and data layout, data-communication options and overlapping of communication and computation in Lattice Boltzmann Methods, considering as a use case a state-of-the-art two-dimensional LB model that accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equation- of-state of a perfect gas.
Abstract: An increasingly large number of scientific applications run on large clusters based on GPU systems. In most cases the large scale parallelism of the applications uses MPI, widely recognized as the de-facto standard for building parallel applications, while several programming languages are used to express the parallelism available in the application and map it onto the parallel resources available on GPUs. Regular grids and stencil codes are used in a subset of these applications, often corresponding to computational “Grand Challenges”. One such class of applications are Lattice Boltzmann Methods (LB) used in computational fluid dynamics. The regular structure of LB algorithms makes them suitable for processor architectures with a large degree of parallelism like GPUs. Scalability of these applications on large clusters requires a careful design of processor-to-processor data communications, exploiting all possibilities to overlap communication and computation. This paper looks at these issues, considering as a use case a state-of-the-art two-dimensional LB model, that accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equation-of-state of a perfect gas. We study in details the interplay between data organization and data layout, data-communication options and overlapping of communication and computation. We derive partial models of some performance features and compare with experimental results for production-grade codes that we run on a large cluster of GPUs.

13 citations

Journal ArticleDOI
TL;DR: This work designs a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs.
Abstract: Aiming to fully exploit the computing power of all CPUs and all graphics processing units GPUs on hybrid CPU-GPU systems to solve dense linear algebra problems, we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasks without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor e.g., Intel MKL and open source libraries e.g., StarPU in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs. Copyright © 2014 John Wiley & Sons, Ltd.

13 citations

Journal ArticleDOI
TL;DR: A simple rule of thumb for choosing the degree of parallelism in order to maximize the throughput of the hybrid hash algorithm and in the case of Grace join, asymptotic conditions on the amount of skew for a limit on parallelism to exist are established.

13 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Scheduling (computing)
78.6K papers, 1.3M citations
83% related
Network packet
159.7K papers, 2.2M citations
80% related
Web service
57.6K papers, 989K citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202147
202048
201952
201870
201775