Topic
Degree of parallelism
About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.
Papers published on a yearly basis
Papers
More filters
••
07 Dec 2011
TL;DR: This paper presents the GPU acceleration of an important category of DP problems called nonserial polyadic dynamic programming (NPDP), and proposes a methodology that can adaptively adjust the thread-level parallelism in mapping a NPDP problem onto the GPU, thus providing sufficient and steady degrees of parallelism across different compute stages.
Abstract: Dynamic programming (DP) is an important computational method for solving a wide variety of discrete optimization problems such as scheduling, string editing, packaging, and inventory management. In general, DP is classified into four categories based on the characteristics of the optimization equation. Because applications that are classified in the same category of DP have similar program behavior, the research community has sought to propose general solutions for parallelizing each category of DP. However, most existing studies focus on running DP on CPU-based parallel systems rather than on accelerating DP algorithms on the graphics processing unit (GPU). This paper presents the GPU acceleration of an important category of DP problems called nonserial polyadic dynamic programming (NPDP). In NPDP applications, the degree of parallelism varies significantly in different stages of computation, making it difficult to fully utilize the compute power of hundreds of processing cores in a GPU. To address this challenge, we propose a methodology that can adaptively adjust the thread-level parallelism in mapping a NPDP problem onto the GPU, thus providing sufficient and steady degrees of parallelism across different compute stages. We realize our approach in a real-world NPDP application -- the optimal matrix parenthesization problem. Experimental results demonstrate our method can achieve a speedup of 13.40 over the previously published GPU algorithm.
20 citations
••
TL;DR: The method developed, called the distributed parallel integration evaluation model (DPIEM) models the workflow in the distributed enterprise based on three integration scenarios and minimizes the integrated tasks total cost by adding as many parallel servers per task as possible.
Abstract: Distribution has become an increasingly common characteristic for modern service and production companies Enterprises nowadays rely on distribution of their operations for provision of their supplies, labor, and for selling their products in dynamic global markets Much of today enterprises efforts to cope with global markets are being directed towards the finding of effective collaboration means among their operations and partners This research proposes a model for assisting distributed enterprises in modeling their operations by optimizing and integrating their workflow to accomplish the collaborative objective The method developed, called the distributed parallel integration evaluation model (DPIEM) models the workflow in the distributed enterprise based on three integration scenarios DPIEM minimizes the integrated tasks total cost by adding as many parallel servers per task as possible The method was tested for a case of distributed assembly of two part-types A total of eight scenarios for the case were analyzed, yielding the recommended number of parallel servers per integrated task For comparison, each scenario was also simulated with the TIE parallel-computer environment The TIE simulation results corroborate the DPIEM recommendation based on the lowest total cost for the case analyzed
20 citations
•
IBM1
TL;DR: In this article, the maximum supported degree of parallel sort operations in a multi-processor computing environment is determined by an allocation module that allocates a minimum number of sort files to each data source that participates in the parallel sort.
Abstract: An apparatus, system, and method for determining the maximum supported degree of parallel sort operations in a multi-processor computing environment. An allocation module allocates a minimum number of sort files to a sort operation for each data source that participates in the parallel sort. The allocation module attempts to allocate sort files of one-half the sort operation data source file size, and iteratively reduces the sort file size requests in response to determinations that sort files of the requested size are not available. After allocation, a parallel operation module determines whether there is sufficient virtual storage to execute the sort operations in parallel. If there is not, the parallel operations module collapses the two smallest sort operations, thereby reducing the degree of parallelism by one, and repeats the request. The parallel operation module repeats the process until the sorts are executed or the process fails for lack of virtual storage.
20 citations
01 Jan 2004
TL;DR: This work describes the use and implementation of skeletons in a distributed computation environment, with the Java-based system Lithium as the reference implementation, and proposes three different optimizations based on an asynchronous, optimized RMI interaction mechanism that optimize the collection of results and the work-load balancing.
Abstract: Skeletons are common patterns of parallelism such as farm and pipeline that can be abstracted and offered to the application programmer as programming primitives. We describe the use and implementation of skeletons in a distributed computation environment, with the Java-based system Lithium as our reference implementation. Our main contribution is optimization techniques based on an asynchronous, optimized RMI interaction mechanism, which we integrated into the macro data flow (MDF) evaluation technology of Lithium. In detail, we show three different optimizations: 1) a lookahead mechanism that allows to process multiple tasks concurrently at each single server and thereby increases the overall degree of parallelism, 2) a lazy task-binding technique that reduces interactions between remote servers and the task dispatcher, and 3) dynamic improvements based on process monitoring that optimize the collection of results and the work-load balancing. We report experimental results that demonstrate the achieved improvements due to the proposed optimizations on various testbeds, including heterogeneous environments.
20 citations
•
22 Jul 1999TL;DR: In this paper, a method and apparatus for computing degrees of parallelism for parallel operations in a computer system is provided, based on a set of factors, such as a target degree, a current workload and a requested degree.
Abstract: A method and apparatus are provided for computing degrees of parallelism for parallel operations in a computer system. The degree of parallelism for a given parallel operation is computed based on a set of factors. The set of factors includes a target degree of parallelism that represents a desired total amount of parallelism in the computer system, a current workload and a requested degree of parallelism.
20 citations