scispace - formally typeset
Search or ask a question
Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.


Papers
More filters
01 Jan 1997
TL;DR: This work proposes multidimensional piecewise regular arrays: arrays of loosely connected subarrays of lower dimensionality where two different clock rates are used, and introduces a method for developing pipestructures for spreading the shared data between distinct computations and for gathering partial results in the case of a reduction operator.
Abstract: Regular arrays, particularly systolic arrays, have been the subject of continuous interest for the past 15 years One reason is that they present an excellent example of the unity between hardware and software, especially for application-specific computations This results in a cost effective implementation of systolic algorithms in hardware, in VLSI chips or on FPGAs To the present time, systolic/regular arrays have primarily been considered as 2-D structures The chief purposes of this work are: (i) to develop methods to transform an algorithm into a form that fits the 3-D physical construction of the processor and is easy to fabricate; (ii) to find ways of increasing the available degree of parallelism and thus improve scalability and latency For this purpose, we propose multidimensional piecewise regular arrays: arrays of loosely connected subarrays of lower dimensionality where two different clock rates are used One, a high clock rate, is used inside subarrays, eg inside VLSI chips, and the other, a low clock rate, is used in the interconnection part of subarrays These properties permit easy physical realization of n-D large arrays, as the n-D array is formed from (n-1)-D subarrays that are connected to each other only by edges using a low clock rate Thus, 3-D arrays that consist of 2-D arrays are easily fabricated, eg using multichip modules, wafer scale integration etc While several of the approaches that we use to achieve our aims have been considered in the literature, they have unfortunately been studied separately and without a unified approach We combine our approach with commonly used synthesizing methods for regular arrays: with space-time transformations on polytopes The approach we propose can be used for all associative and commutative problems The thesis presents the synthesis of large variety of new, higher-dimensional arrays The two main issues involved in addition to the existing methods in the polytope model are: (1) In order to achieve a higher degree of parallelism, and to decrease latency, we increase the dimensionality of the source representation of the program by partitioning the range of indices (2) We introduce a method for developing pipestructures (an extension of pipelines) for spreading the shared data between distinct computations and for gathering partial results in the case of a reduction operator As an example, we consider template matching on systolic arrays A 2-D mesh of linear arrays --- conventional systolic arrays for 1-D convolution --- that exploits two different clock rates is presented

5 citations

Proceedings ArticleDOI
09 Nov 2011
TL;DR: A novel parallel state justification tool, GACO, utilizing Ant Colony Optimization (ACO) on Graphical Processing Units (GPU), capable of launching a large number of artificial ants to search for the target state.
Abstract: In this paper, we propose a novel parallel state justification tool, GACO, utilizing Ant Colony Optimization (ACO) on Graphical Processing Units (GPU). With the high degree of parallelism supported by the GPU, GACO is capable of launching a large number of artificial ants to search for the target state. A novel parallel simulation technique, utilizing partitioned navigation tracks as guides during the search, is proposed to achieve extremely high computation efficiency for state justification. We present the results on a GPU platform from NVIDIA (a GeForce GTX 285 graphics card) that demonstrate a speedup of up to 228× compared to deterministic methods and a speedup of up to 40× over previous state-of-the-art heuristic based serial tools.

5 citations

Book ChapterDOI
23 Mar 1998
TL;DR: It is shown that a keystream generator built as a word-wide non-linear-feedback shift register can offer both a high degree of parallelism and the hardware simplicity and flexible security of an iterated design.
Abstract: We explore the problem of designing a stream cipher that is fast in software yet may be efficiently implemented in hardware. We show that a keystream generator built as a word-wide non-linear-feedback shift register can offer both a high degree of parallelism and the hardware simplicity and flexible security of an iterated design. WAKE-ROFB is shown to be an example of this topology. A modified non-linear mixing function is proposed for WAKE-ROFB which makes it better suited to hardware implementation. The high degree of parallelism allows efficient implementation on processors having instruction-level parallelism, and leads naturally to high-speed pipelined hardware implementations. The recommended variant runs at 340 Mbps on a 266MHz Pentium II and 270 Mbps on a 100MHz TriMedia VLIW CPU, while a 2000 gate hardware implementation of the same cipher achieves 200 Mbps from a 50MHz clock. A higher speed variant achieves 600 Mbps, 340 Mbps and 400 Mbps respectively with some loss of security, while needing slightly less hardware.

5 citations

Journal ArticleDOI
TL;DR: SAccO (Scalable Accelerator platform Osnabruck) is a novel framework for implementing data-intensive applications using scalable and portable reconfigurable hardware accelerators based on standard PCs and PCI-Express extension cards featuring Field-Programmable Gate Arrays (FPGAs) and memory.

5 citations

Proceedings ArticleDOI
09 Mar 2003
TL;DR: It is shown that, in the case where the function involves a Fourier transform, the degree of parallelism in the program generated by automatic differentiation can be increased leading to a rich set of automatic parallelism strategies that are not available when employing a black box automatic parallelization approach.
Abstract: For functions given in the form of a computer program, automatic differentiation is an efficient technique to accurately evaluate the derivatives of that function. Starting from a given computer program, automatic differentiation generates another program for the evaluation of the original function and its derivatives in a fully mechanical way. While the efficiency of this black box approach is already high as compared to numerical differentiation based on divided differences, automatic differentiation can be applied even more efficiently by taking into account high-level knowledge about the given computer program. We show that, in the case where the function involves a Fourier transform, the degree of parallelism in the program generated by automatic differentiation can be increased leading to a rich set of automatic parallelization strategies that are not available when employing a black box automatic parallelization approach. Experiments of the new automatic parallelization approach are reported on a SunFire 6800 server using up to 20 processors.

5 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Scheduling (computing)
78.6K papers, 1.3M citations
83% related
Network packet
159.7K papers, 2.2M citations
80% related
Web service
57.6K papers, 989K citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202147
202048
201952
201870
201775