Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Parallelizing query optimization on shared-nothing architectures

[...]

Immanuel Trummer¹, Christoph Koch¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

01 May 2016

TL;DR: In this paper, the authors present algorithms for parallel query optimization in left-deep and bushy plan spaces, where each worker returns the optimal plan in its partition to the master which determines the globally optimal plan from the partition-optimal plans.

...read moreread less

Abstract: Data processing systems offer an ever increasing degree of parallelism on the levels of cores, CPUs, and processing nodes. Query optimization must exploit high degrees of parallelism in order not to gradually become the bottleneck of query evaluation. We show how to parallelize query optimization at a massive scale.We present algorithms for parallel query optimization in left-deep and bushy plan spaces. At optimization start, we divide the plan space for a given query into partitions of equal size that are explored in parallel by worker nodes. At the end of optimization, each worker returns the optimal plan in its partition to the master which determines the globally optimal plan from the partition-optimal plans. No synchronization or data exchange is required during the actual optimization phase. The amount of data sent over the network, at the start and at the end of optimization, as well as the complexity of serial steps within our algorithms increase only linearly in the number of workers and in the query size. The time and space complexity of optimization within one partition decreases uniformly in the number of workers. We parallelize single- and multi-objective query optimization over a cluster with 100 nodes in our experiments, using more than 250 concurrent worker threads (Spark executors). Despite high network latency and task assignment overheads, parallelization yields speedups of up to one order of magnitude for large queries whose optimization takes minutes on a single node.

...read moreread less

9 citations

Journal Article•DOI•

Balanced scheduling of distributed workflow tasks based on clustering

[...]

Dongjin Yu¹, Ying Yuke¹, Lei Zhang¹, Chengfei Liu², Xiaoxiao Sun¹, Hongsheng Zheng¹ - Show less +2 more•Institutions (2)

Hangzhou Dianzi University¹, Swinburne University of Technology²

08 Jul 2020-Knowledge Based Systems

TL;DR: This work proposes Runtime Balance Clustering Algorithm (RBCA), which employs the Backtracking approach to make the runtime of each cluster more balanced, and DBCA, which defines the dependency correlation to measure the similarity between tasks in terms of data dependencies.

...read moreread less

Abstract: Distributed computing, such as Cloud, provides traditional workflow applications with completely new deployment architecture offering high performance and scalability. However, when executing the workflow tasks in a distributed computing environment, significant scheduling overheads are generated. Task clustering is a key technology to optimize process execution. Unreasonable task clustering can lead to the problems of runtime and dependency imbalance, which reduces the degree of parallelism during task execution. In order to solve the problem of runtime imbalance, we propose Runtime Balance Clustering Algorithm (RBCA), which employs the Backtracking approach to make the runtime of each cluster more balanced. In addition, to address the problem of dependency imbalance, we also propose Dependency Balance Clustering Algorithm (DBCA), which defines the dependency correlation to measure the similarity between tasks in terms of data dependencies. The tasks with high dependency correlation are clustered together so as to avoid the dependency imbalance to most extent. We conducted extensive experiments on the WorkflowSim platform and compared our algorithms with the existing task clustering algorithms. The results show that RBCA and DBCA significantly reduce the execution time of the whole workflow.

...read moreread less

9 citations

Proceedings Article•DOI•

XL-STaGe: A cross-layer scalable tool for graph generation, evaluation and implementation

[...]

Pedro Campos¹, Nizar Dahir¹, Colin Bonney¹, Martin A. Trefzer¹, Andy M. Tyrrell¹, Gianluca Tempesti¹ - Show less +2 more•Institutions (1)

University of York¹

01 Jul 2016

TL;DR: This paper presents XL-STaGe, a cross-layer tool for traffic-inclusive directed acyclic graph generation and implementation, which consists of a set of processes that generate the task-graphs as well as a detailed process model for each node in each graph.

...read moreread less

Abstract: This paper presents XL-STaGe, a cross-layer tool for traffic-inclusive directed acyclic graph generation and implementation. In contrast to other graph-generation tools which focus on high-level DAG models, XL-STaGe consists of a set of processes that generate the task-graphs as well as a detailed process model for each node in each graph. The tool is highly customizable, with many parameters that can be tuned to meet the user's requirements to control the topology, connection density, degree of parallelism and duration the task-graph. Moreover, two use cases are presented, a high-level one, which illustrate the benefit of the developed tool in application mapping and a circuit-level one to verify the accuracy of the XL-STaGe process models when implemented in hardware.

...read moreread less

9 citations

Journal Article•DOI•

Fast induced sorting suffixes on a multicore machine

[...]

Bin Lao¹, Ge Nong¹, Wai Hong Chan², Yi Pan³•Institutions (3)

Sun Yat-sen University¹, University of Hong Kong², Georgia State University³

01 Jul 2018-The Journal of Supercomputing

TL;DR: The attempt for designing a parallel variant of SAIS on a multicore machine which is considered as a shared memory parallel model, called pSAIS, has a high degree of parallelism and achieves the best average time and space performance among all the parallel algorithms in comparison.

...read moreread less

Abstract: Sorting the suffixes of an input string is a fundamental task in many applications such as data compression, genome alignment, and full-text search. The induced sorting (IS) method has been successfully applied to design a number of state-of-the-art suffix sorting algorithms. In particular, the SAIS algorithm designed by the IS method is not only linear in time but also fast in practice. However, the parallelization of SAIS remains a challenge due to that the IS process in the algorithm is inherently sequential and the performance bottleneck of the whole algorithm. This article presents our attempt for designing a parallel variant of SAIS on a multicore machine which is considered as a shared memory parallel model, called pSAIS. By a combined use of multithreading and pipelining, the inducing process is accelerated by fully utilizing the machine's parallel computing power. An experimental study is conducted for performance evaluation of pSAIS with the other existing parallel suffix sorting algorithms, on a set of realistic inputs with varying sizes and alphabets. The experiment results show that our program for pSAIS has a high degree of parallelism and achieves the best average time and space performance among all the parallel algorithms in comparison. While pSAIS is designed for quickly building big suffix arrays in a multicore machine, our study may give some hints for extending the induced sorting method to GPU for constructing small suffix arrays even faster.

...read moreread less

9 citations

Journal Article•DOI•

A quantitative study of parallel scientific applications with explicit communication

[...]

Robert Cypher¹, Alex Ho², Smaragda Konstantinidou¹, Paul Messina³•Institutions (3)

Johns Hopkins University¹, IBM², California Institute of Technology³

01 Mar 1996-The Journal of Supercomputing

TL;DR: The goal is to quantify the floating point, memory, I/O, and communication requirements of highly parallel scientific applications that perform explicit communication and develop analytical models for the effects of changing the problem size and the degree of parallelism for several of the applications.

...read moreread less

Abstract: This paper studies the behavior of scientific applications running on distributed memory parallel computers Our goal is to quantify the floating point, memory, I/O, and communication requirements of highly parallel scientific applications that perform explicit communication In addition to quantifying these requirements for fixed problem sizes and numbers of processors, we develop analytical models for the effects of changing the problem size and the degree of parallelism for several of the applications The contribution of our paper is that it provides quantitative data about real parallel scientific applications in a manner that is largely independent of the specific machine on which the application was run Such data, which are clearly very valuable to an architect who is designing a new parallel computer, were not previously available For example, the majority of research papers in interconnection networks have used simulated communication loads consisting of fixed-size messages Our data, which show that using such simulated loads is unrealistic, can be used to generate more realistic communication loads

...read moreread less

9 citations

Collapse

Network Information

Performance

Metrics

1,515

Papers

27,447

Citations

No. of papers in the topic in previous years
Year	Papers
2022	1
2021	47
2020	48
2019	52
2018	70
2017	75

Degree of parallelism

Papers published on a yearly basis

Papers

Trending Questions (7)

Network Information

Related Topics (5)

Performance

Metrics