scispace - formally typeset
Search or ask a question
Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.


Papers
More filters
Journal ArticleDOI
01 May 2016
TL;DR: In this paper, the authors present algorithms for parallel query optimization in left-deep and bushy plan spaces, where each worker returns the optimal plan in its partition to the master which determines the globally optimal plan from the partition-optimal plans.
Abstract: Data processing systems offer an ever increasing degree of parallelism on the levels of cores, CPUs, and processing nodes. Query optimization must exploit high degrees of parallelism in order not to gradually become the bottleneck of query evaluation. We show how to parallelize query optimization at a massive scale.We present algorithms for parallel query optimization in left-deep and bushy plan spaces. At optimization start, we divide the plan space for a given query into partitions of equal size that are explored in parallel by worker nodes. At the end of optimization, each worker returns the optimal plan in its partition to the master which determines the globally optimal plan from the partition-optimal plans. No synchronization or data exchange is required during the actual optimization phase. The amount of data sent over the network, at the start and at the end of optimization, as well as the complexity of serial steps within our algorithms increase only linearly in the number of workers and in the query size. The time and space complexity of optimization within one partition decreases uniformly in the number of workers. We parallelize single- and multi-objective query optimization over a cluster with 100 nodes in our experiments, using more than 250 concurrent worker threads (Spark executors). Despite high network latency and task assignment overheads, parallelization yields speedups of up to one order of magnitude for large queries whose optimization takes minutes on a single node.

9 citations

Journal ArticleDOI
TL;DR: This work proposes Runtime Balance Clustering Algorithm (RBCA), which employs the Backtracking approach to make the runtime of each cluster more balanced, and DBCA, which defines the dependency correlation to measure the similarity between tasks in terms of data dependencies.
Abstract: Distributed computing, such as Cloud, provides traditional workflow applications with completely new deployment architecture offering high performance and scalability. However, when executing the workflow tasks in a distributed computing environment, significant scheduling overheads are generated. Task clustering is a key technology to optimize process execution. Unreasonable task clustering can lead to the problems of runtime and dependency imbalance, which reduces the degree of parallelism during task execution. In order to solve the problem of runtime imbalance, we propose Runtime Balance Clustering Algorithm (RBCA), which employs the Backtracking approach to make the runtime of each cluster more balanced. In addition, to address the problem of dependency imbalance, we also propose Dependency Balance Clustering Algorithm (DBCA), which defines the dependency correlation to measure the similarity between tasks in terms of data dependencies. The tasks with high dependency correlation are clustered together so as to avoid the dependency imbalance to most extent. We conducted extensive experiments on the WorkflowSim platform and compared our algorithms with the existing task clustering algorithms. The results show that RBCA and DBCA significantly reduce the execution time of the whole workflow.

9 citations

Proceedings ArticleDOI
01 Jul 2016
TL;DR: This paper presents XL-STaGe, a cross-layer tool for traffic-inclusive directed acyclic graph generation and implementation, which consists of a set of processes that generate the task-graphs as well as a detailed process model for each node in each graph.
Abstract: This paper presents XL-STaGe, a cross-layer tool for traffic-inclusive directed acyclic graph generation and implementation. In contrast to other graph-generation tools which focus on high-level DAG models, XL-STaGe consists of a set of processes that generate the task-graphs as well as a detailed process model for each node in each graph. The tool is highly customizable, with many parameters that can be tuned to meet the user's requirements to control the topology, connection density, degree of parallelism and duration the task-graph. Moreover, two use cases are presented, a high-level one, which illustrate the benefit of the developed tool in application mapping and a circuit-level one to verify the accuracy of the XL-STaGe process models when implemented in hardware.

9 citations

Journal ArticleDOI
TL;DR: The attempt for designing a parallel variant of SAIS on a multicore machine which is considered as a shared memory parallel model, called pSAIS, has a high degree of parallelism and achieves the best average time and space performance among all the parallel algorithms in comparison.
Abstract: Sorting the suffixes of an input string is a fundamental task in many applications such as data compression, genome alignment, and full-text search. The induced sorting (IS) method has been successfully applied to design a number of state-of-the-art suffix sorting algorithms. In particular, the SAIS algorithm designed by the IS method is not only linear in time but also fast in practice. However, the parallelization of SAIS remains a challenge due to that the IS process in the algorithm is inherently sequential and the performance bottleneck of the whole algorithm. This article presents our attempt for designing a parallel variant of SAIS on a multicore machine which is considered as a shared memory parallel model, called pSAIS. By a combined use of multithreading and pipelining, the inducing process is accelerated by fully utilizing the machine's parallel computing power. An experimental study is conducted for performance evaluation of pSAIS with the other existing parallel suffix sorting algorithms, on a set of realistic inputs with varying sizes and alphabets. The experiment results show that our program for pSAIS has a high degree of parallelism and achieves the best average time and space performance among all the parallel algorithms in comparison. While pSAIS is designed for quickly building big suffix arrays in a multicore machine, our study may give some hints for extending the induced sorting method to GPU for constructing small suffix arrays even faster.

9 citations

Journal ArticleDOI
TL;DR: The goal is to quantify the floating point, memory, I/O, and communication requirements of highly parallel scientific applications that perform explicit communication and develop analytical models for the effects of changing the problem size and the degree of parallelism for several of the applications.
Abstract: This paper studies the behavior of scientific applications running on distributed memory parallel computers Our goal is to quantify the floating point, memory, I/O, and communication requirements of highly parallel scientific applications that perform explicit communication In addition to quantifying these requirements for fixed problem sizes and numbers of processors, we develop analytical models for the effects of changing the problem size and the degree of parallelism for several of the applications The contribution of our paper is that it provides quantitative data about real parallel scientific applications in a manner that is largely independent of the specific machine on which the application was run Such data, which are clearly very valuable to an architect who is designing a new parallel computer, were not previously available For example, the majority of research papers in interconnection networks have used simulated communication loads consisting of fixed-size messages Our data, which show that using such simulated loads is unrealistic, can be used to generate more realistic communication loads

9 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Scheduling (computing)
78.6K papers, 1.3M citations
83% related
Network packet
159.7K papers, 2.2M citations
80% related
Web service
57.6K papers, 989K citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202147
202048
201952
201870
201775