scispace - formally typeset
Search or ask a question

Showing papers on "Degree of parallelism published in 1986"


Journal ArticleDOI
01 May 1986
TL;DR: The automatic generation of programs with global parallelism seems to be a promising possibility for algorithms in which parallelism is introduced at the top of the program structure hierarchy, i.e. MIMD computational model.
Abstract: This paper discusses the class of algorithms having global parallelism, i.e. those in which parallelism is introduced at the top of the program structure hierarchy. Such algorithms have performance advantages in a shared-memory. MIMD computational model. A programming environment consisting of FORTRAN, enhanced by some pre-processed macros, has been built to aid in writing programs for such algorithms for the Denelcor HEP multiprocessor. Applications of from tens to hundreds of FORTRAN statements have been written and tested in this environment. A few parallelism constructs suffice to yield understandable programs with a high degree of parallelism. The automatic generation of programs with global parallelism seems to be a promising possibility.

51 citations


Journal ArticleDOI
TL;DR: A parallel algorithm to generate the permutations of at mostk out ofn objects which achieves the best possible speedup for any givenk and can be modified to generate combinations.
Abstract: In this paper we will present a parallel algorithm to generate the permutations of at mostk out ofn objects. The architecture consists of a linear processor array and a selector. When one single processor array is available, a parallel algorithm to generate permutations is presented which achieves the best possible speedup for any givenk. Also, this algorithm can easily be modified to generate combinations. When multiple processor arrays are available, a parallel scheme is proposed to speed up the generation by fully utilizing these processor arrays. The degree of parallelism is related to the number of available processor arrays.

36 citations


Journal ArticleDOI
01 Mar 1986
TL;DR: A parallel multigrid algorithm for solving elliptic partial differential equations is developed and evaluated, and a V-cycle multigrids method is altered to increase the degree of parallelism.
Abstract: A parallel multigrid algorithm for solving elliptic partial differential equations is developed and evaluated. A V-cycle multigrid method is altered to increase the degree of parallelism. A numerical analysis of the resulting concurrent-iteration multigrid algorithm is performed; its architectural implications are considered; highly parallel systems without shared memory are examined (including mesh-connected arrays, mesh-shuffle-connected systems, permutation networks, and direct VLSI embeddings); and the results of numerical experiments are presented in tables and graphs. 30 references.

24 citations


Journal ArticleDOI
Wolfram Büttner1
TL;DR: This work improves upon earlier approaches by basing (pure) AC unification on a firm theoretical basis and presenting algorithms which fully exploit the properties of the underlying mathematical structure and the high degree of parallelism will become apparent.
Abstract: In a recent paper A. Herold and J. Siekmann generalize ‘pure’ AC unification to terms containing additional function symbols. Generalized AC unification thus attains practical relevance for a broad range of applications. Pure AC unification is used as a basic mechanism and it is this key role that has motivated our research. We have improved upon earlier approaches by basing (pure) AC unification on a firm theoretical basis and presenting algorithms which fully exploit the properties of the underlying mathematical structure. In particular, the high degree of parallelism for AC unification will become apparent. Our algorithms have been designed for parallel hardware but still yield significant improvements over earlier algorithms when used in the sequential mode.

19 citations


Proceedings Article
01 Jan 1986
TL;DR: This paper introduces a new model of pipeline architecture: the Data Synchronized Pipeline Architecture (DSPA), based on an independent sequencing of the functional units that allows a high degree of parallelism in the pipeline, even in the case of unforeseeable behaviors of some resource.
Abstract: To satisfy the growing need for computing power, a high degree of parallelism will be necessary in future supercomputers. Up to the late 1970s, supercomputers were either multiprocessors (SIMD-MIMD) or pipelined monoprocessors. Future industrial realizations should combine these two levels of parallelism. In a multiprocessor, classical pipeline controls become inefficient because the interdependent behaviors of the processing elements cannot be foreseen either at compile time or at decode time. In this paper, we introduce a new model of pipeline architecture: the Data Synchronized Pipeline Architecture (DSPA). Based on an independent sequencing of the functional units, this model allows a high degree of parallelism in the pipeline, even in the case of unforeseeable behaviors of some resource.

10 citations


Journal ArticleDOI
TL;DR: The Data Synchronized Pipeline Architecture (DSPA) as mentioned in this paper allows a high degree of parallelism in the pipeline, even in the case of unforeseeable behaviors of some resource.

8 citations


Journal ArticleDOI
TL;DR: It is shown that modifications to the algorithm can be made based on the use of a high degree of parallelism yielding an efficient structure which relieves constraints for high-speed execution and can operate at speeds meeting real-time requirements.
Abstract: Architecture elements suitable for VLSI implementation and real-time operation in movement-compensated video (MCV) processors are presented The algorithm used in the video processor is based on motion estimation and compensation techniques An overview of the algorithm is given with emphasis placed on one of the key functions used in the prediction, the two-dimensional interpolator A VLSI implementation is presented which incorporates design techniques of pipelining, parallelism and module replication Furthermore, it is shown that modifications to the algorithm can be made based on the use of a high degree of parallelism yielding an efficient structure which relieves constraints for high-speed execution The operations then rely on a simpler one-dimensional interpolator to form one of the building blocks of the two-dimensional interpolator It is indicated that the parallel structure which is formed with these building blocks can be implemented on two circuits and that it can operate at speeds meeting real-time requirements

7 citations


Journal ArticleDOI
TL;DR: It is shown that modifications to the algorithm can be made based on the use of a high degree of parallelism yielding an efficient structure which relieves constraints for high-speed execution and can operate at speeds meeting real-time requirements.
Abstract: Architecture elements suitable for VLSI implementation and real-time operation in movement-compensated video (MCV) processors are presented. The algorithm used in the video processor is based on motion estimation and compensation techniques. An overview of the algorithm is given with emphasis placed on one of the key functions used in the prediction, the two-dimensional interpolator. A VLSI implementation is presented which incorporates design techniques of pipelining, parallelism, and module replication. Furthermore, it is shown that modifications to the algorithm can be made based on the use of a high degree of parallelism yielding an efficient structure which relieves constraints for high-speed execution. The operations then rely on a simpler one-dimensional interpolator to form one of the building blocks of the two-dimensional interpolator. It is indicated that the parallel structure which is formed with these building blocks can be implemented on two circuits and that it can operate at speeds meeting real-time requirements.

7 citations


Book ChapterDOI
01 Jan 1986
TL;DR: Speedup and efficiency of some simple parallel multigrid algorithms for a class of bus coupled systems are investigated and it can be shown that all systems are of identical suitability if the tasks are sufficiently large.
Abstract: Speedup and efficiency of some simple parallel multigrid algorithms for a class of bus coupled systems are investigated. We consider some basic multigrid methods (V-cycle, W-cycle) with regular grid generation and without local refinements. Our bus coupled systems consist of many independent processors each with its own local memory. A typical example for our abstract bus concept is a ring bus. The investigation of such systems is restricted to hierarchical orthogonal systems. Simple orthogonal bus systems, tree structures and mixed types are included in our general model. It can be shown that all systems are of identical suitability if the tasks are sufficiently large. The smaller however the degree of parallelism of an algorithm is, the clearer are the differences in the performance of the various systems. We can classify the most powerful systems and systems with lower performance but better technical properties. Complexity investigations enabled us to evaluate the different systems. These investigations are complemented by simulations based on the different parallel algorithms.

6 citations


John R. Rice1
01 Jan 1986
TL;DR: This paper examines the potential of parallel computation methods for pamal differential equations (PDEs) and concludes that dramatically increased software support is needed for the general scientific and engineering community to exploit.
Abstract: This paper examines the potential of parallel computation methods for pamal differential equations (PDEs). We start by observing Utat linear algebra is nOI the right model for PDE methods, that data slructures should be based on the physical geometry. We observe that there is a naturally high level of parallelism in the physical world to be exploited. An analysis is made showing there is a natural level of granularity or degree of parallelism which depends on the accuracy needed and the complexity of the POE problem. It is noted that the granularity leads to the usc of superelemenlS and that computational efficiency suggeslS that these should be of higher accuracy. We discuss the inherent complexity of parallel mclhods and parallel machines and conclude that dramatically increased software support is needed for the general scientific and engineering community to exploit !.he power of highly parallel machines. The paper ends with a brief taxonomy of methods for PDEs. the classification is based on the method's use of three basic procedures: Partitioning, Discretization and Iteration. • To appear as chaptt-rin Taxooomy of Pal1l1lel Algorithms, GannOR and Iamieson, MIT press. 1987. ... This wOfk supported in pan by Air Force Office of Scientific R=rch granl AH)SR·84-0385"

3 citations


Proceedings ArticleDOI
09 Jun 1986
TL;DR: Owechko et al. as discussed by the authors used square law detection for representing bipolar and complex data in PRIMO using square lawdetectors and showed that by utilizing a bias and properly sequencing the data the nonlinearity can be fully compensated.
Abstract: Representation of Bipolar and Complex DataIn the PRIMO Optical Matrix MultiplierY. Owechko, E. Marom, J. Grinberg, and B. H. SofferHughes Research LaboratoriesMalibu, California 90265AbstractThe PRIMO optical processor is an analog outer product processor based on 1 -D arrays ofelectrooptic modulators. PRIMO is capable of performing matrix -matrix multiplication,convolution, correlation, and other linear computational algorithms. In this paper wediscuss techniques for representing bipolar and complex data in PRIMO using square lawdetectors. It is shown that by utilizing a bias and properly sequencing the data thesquare law detection nonlinearity can be fully compensated.Introduction and description of PRIMOThe solution to many important linear problems such as singular value decomposition,Fourier transformation, ambiguity function generation, adaptive array problems, etc. can becast as a series of matrix -vector or matrix -matrix multiplications. Optical processorshave great potential for such problems because of their high degree of parallelism andinterconnection. Various architectures which utilize 2 -D spatial light modulators (SLMs)1or 1 -D acousto -optic Bragg cells2 have been proposed for matrix -matrix or matrix -vectormultiplication. Architectures which utilize 2 -D SLMs are attractive because of the highpotential degree of parallelism. The principal obstacle to implementation of such systemsat this time is the lack of a suitable commercially available high speed, high resolution,and uniform 2 -D SLM. In response to the present research status of high performance 2 -DSLMs, architectures which are built on the more mature technology of 1 -D Bragg cells havebeen developed. Bragg cells have had a successful history in optical signal processing asreal- time correlators and spectrum analyzers3 and have many positive attributes for use inmatrix multiplication. They also, however, have some disadvantages such as close toler-ances for alignment in multicell systems and the need for coherent illumination.

Journal ArticleDOI
TL;DR: Low external data access rates indicate that the H-R bus-connected multiprocessor system operates with high efficiency even under high multiplicity, and a quantitative evaluation of the locality in data access indicates that under sufficient access locality, even high external dataAccess rates do not unduly impair efficiency.
Abstract: One of the most serious problems in a highly multiple parallel processing system involves memory access contention. In the hierarchically structured access mechanism, the number of simultaneous access paths increases with degree of parallelism. Consequently, it is considered as a configuration suited to a highly multiple parallel system. This paper uses a queue model to analyze the hierarchical routing bus (H-R bus), which is an access mechanism with a hierarchical structure. The paper further makes a quantitative evaluation of the locality in data access. Based on the derived theoretical expressions and measured results obtained by executing several test programs on the H-R bus-connected parallel computer, the H-R bus performance is evaluated. The results obtained in this paper are as follows: (1) Low external data access rates (i.e., external data access time/processing time) indicate that the H-R bus-connected multiprocessor system operates with high efficiency even under high multiplicity. (2) Under sufficient access locality, even high external data access rates do not unduly impair efficiency. Although the above properties have qualitatively been known before, this paper derives theoretical expressions for the properties, supported by experiments.

Book ChapterDOI
17 Sep 1986
TL;DR: This paper presents a parallel algorithm for the direct display of solid objects represented by Constructive Solid Geometry by using an adaptive technique that ensures a high degree of parallelism.
Abstract: This paper presents a parallel algorithm for the direct display of solid objects represented by Constructive Solid Geometry. The algorithm overcomes many of the limitations of previous approaches by using an adaptive technique that ensures a high degree of parallelism. Performance estimates have been obtained by simulation, and a parallel architecture is proposed.

Book ChapterDOI
01 Jan 1986
TL;DR: The Massively Parallel Processor is a highly parallel scientific computer which was originally intended for image processing and analysis applications but it is also suitable for a large range of other scientific applications.
Abstract: The Massively Parallel Processor (MPP) [1,2] is a highly parallel scientific computer which was originally intended for image processing and analysis applications but it is also suitable for a large range of other scientific applications. Currently the highest degree of parallelism is achieved with the SIMD type of parallel computer architecture. With this scheme a single program sequence unit broadcasts a sequence of instructions to a large number of slave Processing Elements (PE’s). All PE’s perform the same function at the same time but on different data elements; in this way a whole data structure such as a matrix can be manipulated with a single instruction. The alternative highly parallel organization, the MIMD type, is to have an instruction unit with every PE. This scheme is much more flexible but also much more complex and expensive.

Proceedings ArticleDOI
John R. Rice1
02 Nov 1986
TL;DR: It is shown that linear algebra does not give the best data structures for exploiting parallelism in solving PDEs, the data structures should be based on the physical geometry and there is a naturally high level of parallelism to be exploited.
Abstract: This paper examines the potential of parallel computation methods for partial differential equations (PDEs) We first observe that linear algebra does not give the best data structures for exploiting parallelism in solving PDEs, the data structures should be based on the physical geometry There is a naturally high level of parallelism in the physical world to be exploited and we show there is a namral level of granularity or degree of parallelism which depends on the accuracy needed and the complexity of the POE problem We discuss lhe inherent complexity of parallel methods and parallel machines and conclude that dramatically increased software support is needed for the general scientific and engineering community to exploit the power of highly parallel machines This won: roppolUl:l in part by Air Fort:e Office or Scientific Resean:h grantAFOSR-34-0385

Book ChapterDOI
01 Jan 1986
TL;DR: The justification for developing optical computers is based on the assertion that optics is capable of performing better than electronics in some respects, but the areas in which optics can outperform electronics must represent critical limitations preventing electronic computers from providing reasonable solutions to a broad class of problems of interest.
Abstract: The justification for developing optical computers is based on the assertion that optics is capable of performing better than electronics in some respects. Furthermore, the areas in which optics can outperform electronics must represent critical limitations preventing electronic computers from providing reasonable solutions to a broad class of problems of interest. Parallel processing capability is certainly a desirable property of optics, however its importance as an advantageous feature compared to electronics has been perhaps overemphasized, since there is no fundamental limitation to the degree of parallelism that can be achieved electronically. Already there are projects in progress to implement electronic systems with hundreds of thousands of electronic parallel processing elements. Global communication capability on the other hand is a property of optics that is clearly very difficult to duplicate electronically. One of the reasons that optics can provide global communication is the fact that optical systems are configurable in three dimensions. For instance optics can be used to optically interconnect a large number of processing units in a plane with light propagating in the third dimension and the interconnection pattern itself being specified externally to the plane of the processors.