Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Multidimensional Piecewise Regular Arrays

[...]

Toomas P. Plaks

01 Jan 1997

TL;DR: This work proposes multidimensional piecewise regular arrays: arrays of loosely connected subarrays of lower dimensionality where two different clock rates are used, and introduces a method for developing pipestructures for spreading the shared data between distinct computations and for gathering partial results in the case of a reduction operator.

...read moreread less

Abstract: Regular arrays, particularly systolic arrays, have been the subject of continuous interest for the past 15 years One reason is that they present an excellent example of the unity between hardware and software, especially for application-specific computations This results in a cost effective implementation of systolic algorithms in hardware, in VLSI chips or on FPGAs To the present time, systolic/regular arrays have primarily been considered as 2-D structures The chief purposes of this work are: (i) to develop methods to transform an algorithm into a form that fits the 3-D physical construction of the processor and is easy to fabricate; (ii) to find ways of increasing the available degree of parallelism and thus improve scalability and latency For this purpose, we propose multidimensional piecewise regular arrays: arrays of loosely connected subarrays of lower dimensionality where two different clock rates are used One, a high clock rate, is used inside subarrays, eg inside VLSI chips, and the other, a low clock rate, is used in the interconnection part of subarrays These properties permit easy physical realization of n-D large arrays, as the n-D array is formed from (n-1)-D subarrays that are connected to each other only by edges using a low clock rate Thus, 3-D arrays that consist of 2-D arrays are easily fabricated, eg using multichip modules, wafer scale integration etc While several of the approaches that we use to achieve our aims have been considered in the literature, they have unfortunately been studied separately and without a unified approach We combine our approach with commonly used synthesizing methods for regular arrays: with space-time transformations on polytopes The approach we propose can be used for all associative and commutative problems The thesis presents the synthesis of large variety of new, higher-dimensional arrays The two main issues involved in addition to the existing methods in the polytope model are: (1) In order to achieve a higher degree of parallelism, and to decrease latency, we increase the dimensionality of the source representation of the program by partitioning the range of indices (2) We introduce a method for developing pipestructures (an extension of pipelines) for spreading the shared data between distinct computations and for gathering partial results in the case of a reduction operator As an example, we consider template matching on systolic arrays A 2-D mesh of linear arrays --- conventional systolic arrays for 1-D convolution --- that exploits two different clock rates is presented

...read moreread less

5 citations

Proceedings Article•DOI•

Utilizing GPGPUs for design validation with a modified Ant Colony Optimization

[...]

Min Li¹, Kelson Gent¹, Michael S. Hsiao¹•Institutions (1)

Virginia Tech¹

09 Nov 2011

TL;DR: A novel parallel state justification tool, GACO, utilizing Ant Colony Optimization (ACO) on Graphical Processing Units (GPU), capable of launching a large number of artificial ants to search for the target state.

...read moreread less

Abstract: In this paper, we propose a novel parallel state justification tool, GACO, utilizing Ant Colony Optimization (ACO) on Graphical Processing Units (GPU). With the high degree of parallelism supported by the GPU, GACO is capable of launching a large number of artificial ants to search for the target state. A novel parallel simulation technique, utilizing partitioned navigation tracks as guides during the search, is proposed to achieve extremely high computation efficiency for state justification. We present the results on a GPU platform from NVIDIA (a GeForce GTX 285 graphics card) that demonstrate a speedup of up to 228× compared to deterministic methods and a speedup of up to 40× over previous state-of-the-art heuristic based serial tools.

...read moreread less

5 citations

Book Chapter•DOI•

Joint Hardware / Software Design of a Fast Stream Cipher

[...]

Craig S. K. Clapp¹•Institutions (1)

PictureTel Corp.¹

23 Mar 1998

TL;DR: It is shown that a keystream generator built as a word-wide non-linear-feedback shift register can offer both a high degree of parallelism and the hardware simplicity and flexible security of an iterated design.

...read moreread less

Abstract: We explore the problem of designing a stream cipher that is fast in software yet may be efficiently implemented in hardware. We show that a keystream generator built as a word-wide non-linear-feedback shift register can offer both a high degree of parallelism and the hardware simplicity and flexible security of an iterated design. WAKE-ROFB is shown to be an example of this topology. A modified non-linear mixing function is proposed for WAKE-ROFB which makes it better suited to hardware implementation. The high degree of parallelism allows efficient implementation on processors having instruction-level parallelism, and leads naturally to high-speed pipelined hardware implementations. The recommended variant runs at 340 Mbps on a 266MHz Pentium II and 270 Mbps on a 100MHz TriMedia VLIW CPU, while a 2000 gate hardware implementation of the same cipher achieves 200 Mbps from a 50MHz clock. A higher speed variant achieves 600 Mbps, 340 Mbps and 400 Mbps respectively with some loss of security, while needing slightly less hardware.

...read moreread less

5 citations

Journal Article•DOI•

SAccO: An implementation platform for scalable FPGA accelerators

[...]

Markus Weinhardt¹, Bernhard Lang¹, Frank M. Thiesing¹, Alexander Krieger, Thomas Kinder - Show less +1 more•Institutions (1)

University of Osnabrück¹

01 Oct 2015-Microprocessors and Microsystems

TL;DR: SAccO (Scalable Accelerator platform Osnabruck) is a novel framework for implementing data-intensive applications using scalable and portable reconfigurable hardware accelerators based on standard PCs and PCI-Express extension cards featuring Field-Programmable Gate Arrays (FPGAs) and memory.

...read moreread less

5 citations

Proceedings Article•DOI•

Automatic parallelism in differentiation of Fourier transforms

[...]

H. Martin Bücker¹, Bruno Lang¹, Arno Rasch¹, Christian Bischof¹•Institutions (1)

RWTH Aachen University¹

09 Mar 2003

TL;DR: It is shown that, in the case where the function involves a Fourier transform, the degree of parallelism in the program generated by automatic differentiation can be increased leading to a rich set of automatic parallelism strategies that are not available when employing a black box automatic parallelization approach.

...read moreread less

Abstract: For functions given in the form of a computer program, automatic differentiation is an efficient technique to accurately evaluate the derivatives of that function. Starting from a given computer program, automatic differentiation generates another program for the evaluation of the original function and its derivatives in a fully mechanical way. While the efficiency of this black box approach is already high as compared to numerical differentiation based on divided differences, automatic differentiation can be applied even more efficiently by taking into account high-level knowledge about the given computer program. We show that, in the case where the function involves a Fourier transform, the degree of parallelism in the program generated by automatic differentiation can be increased leading to a rich set of automatic parallelization strategies that are not available when employing a black box automatic parallelization approach. Experiments of the new automatic parallelization approach are reported on a SunFire 6800 server using up to 20 processors.

...read moreread less

5 citations

Collapse

Network Information

Performance

Metrics

1,515

Papers

27,447

Citations

No. of papers in the topic in previous years
Year	Papers
2022	1
2021	47
2020	48
2019	52
2018	70
2017	75

Degree of parallelism

Papers published on a yearly basis

Papers

Trending Questions (7)

Network Information

Related Topics (5)

Performance

Metrics