scispace - formally typeset
Search or ask a question
Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The unification is a basic component of Prolog processing and its parallel processing has not been well studied because the number of arguments, which corresponds to the degree of the unification parallelism, is small, and a consistency check operation is necessary after a parallel unification operation.
Abstract: The unification is a basic component of Prolog processing. However, its parallel processing has not been well studied because the number of arguments, which corresponds to the degree of the unification parallelism, is small, and a consistency check operation is necessary after a parallel unification operation. On these issues, we have implemented the following ideas: (1) enhancing the degree of parallelism by decomposing a compound term into a functor and the arguments at compile-time; (2) allocating decomposed unification processing to multiple processor units (PUs) at run-time; (3) decreasing the number of consistency checks by the compile-time clustering and reducing the overhead by embedding the consistency check operations into the unification processing; and (4) stopping the operations of the other processors if the unification fails. To clarify the effect, we have developed and evaluated a Prolog processor on a multiprocessor system. The results show that statistically: (1) the decomposition of compound terms makes the number of arguments 3.2 on the average even after clustering, and that dynamically, (1) the unification parallelism performs 41 percent speed up, and the effect is evident at a small number of processors; (2) the compile-time clustering makes the consistency check unnecessary; (3) the stop operation of processors, running in parallel, attains 0.5 – 6 percent (and 10 percent for some problems) performance improvement; and (4) the processing of clause head occupies 60 – 70 percent of dynamic microsteps and is an important object of parallel processing.

7 citations

Proceedings ArticleDOI
26 Jul 2010
TL;DR: The goal of this talk is to recall one of the most important theory to understand loops - the decomposition of Karp, Miller, and Winograd (1967) for systems of uniform recurrence equations - and its connections with two different developments on loops: the theory of transformation and parallelization of (nested) DO loops and the Theory of ranking functions for proving the termination of (imperative) programs with WHILE loops.
Abstract: Loops are a fundamental control structure in programming languages. Being able to analyze, to transform, to optimize loops is a key feature for compilers to handle repetitive schemes with a complexity proportional to the program size and not to the number of operations it describes. This is true for the generation of optimized software as well as for the generation of hardware, for both sequential and parallel execution. The goal of this talk is to recall one of the most important theory to understand loops - the decomposition of Karp, Miller, and Winograd (1967) for systems of uniform recurrence equations - and its connections with two different developments on loops: the theory of transformation and parallelization of (nested) DO loops and the theory of ranking functions for proving the termination of (imperative) programs with WHILE loops. Other connections, which will not be covered, include reachability problems in vector addition systems and Petri nets.

7 citations

Journal ArticleDOI
TL;DR: This work employs a per-application predictive power manager that autonomously controls the power states of the cores with the goal of energy efficiency, and allows the applications to lend their idle cores for a short time period to expedite other critical applications.
Abstract: We present a scalable Dynamic Power Management (DPM) schem e where malleable applications may change their degree of parallelism at run time depending upon the workload and performance constraints. We employ a per-application predictive power manager that autonomously controls the power states of the cores with the goal of energy efficiency. Furthermore, our DPM allows the applications to lend their idle cores for a short time period to expedite other critical applications. In this way, it allows for application-level scalability, while aiming at the overall system energy optimization. Compared to state-of-the-art centralized and distributed power management approaches, we achieve up to 58 percent (average ≍15-20 percent) ED2P reduction.

7 citations

Proceedings Article
24 Aug 2007
TL;DR: This paper deals with the implementation of the Fast Fourier Transform on a novel graphics architecture offered recently by NVIDIA, and takes into consideration memory reference locality issues, that are crucial when pursuing a high degree of parallelism.
Abstract: The growing computational power of modern graphics processing units is making them very suitable for general purpose computing. These commodity processors operate generally as parallel SIMD platforms and, among other factors, the effectiveness of the codes is subject to a right exploitation of the underlying memory hierarchy. This paper deals with the implementation of the Fast Fourier Transform on a novel graphics architecture offered recently by NVIDIA. Such an implementation takes into consideration memory reference locality issues, that are crucial when pursuing a high degree of parallelism, that is, a good occupancy of the processing elements. The proposed implementation has been tested and compared to the manufacturer's own implementation.

7 citations

Proceedings ArticleDOI
09 Nov 2010
TL;DR: This paper presents a systematic methodology for identifying independent operations in algorithms and hence quantifying the intrinsic degree of parallelism based on the dataflow modeling and subsequent eigen-decomposition of the data flow graphs, capable of providing insight into architectural characteristics in early design stages.
Abstract: Algorithmic complexity analysis and dataflow models play significant roles in the concurrent optimization of both algorithms and architectures, which is now a new design paradigm referred to as Algorithm/Architecture Co-exploration. One of the essential complexity metrics is the parallelism revealing the number of operations that can be concurrently executed. Inspired by the principle component analysis (PCA) capable of transforming random variables into uncorrelated ones and hence dependency analysis, this paper presents a systematic methodology for identifying independent operations in algorithms and hence quantifying the intrinsic degree of parallelism based on the dataflow modeling and subsequent eigen-decomposition of the dataflow graphs. Our quantified degree of parallelism is platform-independent and is capable of providing insight into architectural characteristics in early design stages. Starting from different dataflows derived from signal flow graphs in basic signal processing algorithms, the case study on DCT shows that our proposed method is capable of quantitatively characterizing the algorithmic parallelisms making possible the potentially facilitation of the design space exploration in early system design stages especially for parallel processing platforms.

7 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Scheduling (computing)
78.6K papers, 1.3M citations
83% related
Network packet
159.7K papers, 2.2M citations
80% related
Web service
57.6K papers, 989K citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202147
202048
201952
201870
201775