scispace - formally typeset
Search or ask a question
Journal Article•DOI•

Analysis of Programs for Parallel Processing

A. J. Bernstein1•
01 Oct 1966-IEEE Transactions on Electronic Computers (IEEE)-Vol. 15, Iss: 5, pp 757-763
TL;DR: A set of conditions are described which determine whether or not two successive portions of a given program can be performed in parallel and still produce the same results.
Abstract: A set of conditions are described which determine whether or not two successive portions of a given program can be performed in parallel and still produce the same results. The conditions are general and can be applied to sections of the program of arbitrary size. The conditions are interesting because of the light they shed on the structure of programs amenable to parallel processing and the memory organization of a multi-computer system.
Citations
More filters
Journal Article•DOI•

[...]

2,428 citations

Journal Article•DOI•
TL;DR: A hierarchical model of computer organizations is developed, based on a tree model using request/service type resources as nodes, which indicates that saturation develops when the fraction of task time spent locked out approaches 1/n, where n is the number of processors.
Abstract: A hierarchical model of computer organizations is developed, based on a tree model using request/service type resources as nodes. Two aspects of the model are distinguished: logical and physical. General parallel- or multiple-stream organizations are examined as to type and effectiveness?especially regarding intrinsic logical difficulties. The overlapped simplex processor (SISD) is limited by data dependencies. Branching has a particularly degenerative effect. The parallel processors [single-instruction stream-multiple-data stream (SIMD)] are analyzed. In particular, a nesting type explanation is offered for Minsky's conjecture?the performance of a parallel processor increases as log M instead of M (the number of data stream processors). Multiprocessors (MIMD) are subjected to a saturation syndrome based on general communications lockout. Simplified queuing models indicate that saturation develops when the fraction of task time spent locked out (L/E) approaches 1/n, where n is the number of processors. Resources sharing in multiprocessors can be used to avoid several other classic organizational problems.

1,982 citations


Cites background from "Analysis of Programs for Parallel P..."

  • ...Bernstein [9] has developed three (rather strong) Typically it is resolved in space by combinatorial logic. sufficient conditions for two programs to operate in Each decision element may resolve between zero and parallel based on the referencing of partitions of storage. one bit of information…...

    [...]

Journal Article•DOI•
TL;DR: The structure of Petr i nets, thei r markings and execution, several examples of Petm net models of computer hardware and software, and research into the analysis of Pet m nets are presented, as are the use of the reachabil i ty tree and the decidability and complexity of some Petr i net problems.
Abstract: Over the last decade, the Petr i net has gamed increased usage and acceptance as a basic model of systems of asynchronous concurrent computation. This paper surveys the basic concepts and uses of Petm nets. The structure of Petr i nets, thei r markings and execution, several examples of Petm net models of computer hardware and software, and research into the analysis of Petm nets are presented, as are the use of the reachabil i ty tree and the decidability and complexity of some Petr i net problems. Petr i net languages, models of computation related to Petm nets, and some extensions and subclasses of the Petri net model are also bmefly discussed

1,184 citations

Journal Article•DOI•
TL;DR: It is shown that two variations of each type of race exist: feasible general races and data races, and that locating feasible races is an NP-hard problem, implying that only the apparent races can be detected in practice.
Abstract: In shared-memory parallel programs that use explicit synchronization, race conditions result when accesses to shared memory are not properly synchronized. Race conditions are often considered to be manifestations of bugs, since their presence can cause the program to behave unexpectedly. Unfortunately, there has been little agreement in the literature as to precisely what constitutes a race condition. Two different notions have been implicitly considered: one pertaining to programs intended to be deterministic (which we call general races) and the other to nondeterministic programs containing critical sections (which we call data races). However, the differences between general races and data races have not yet been recognized. This paper examines these differences by characterizing races using a formal model and exploring their properties. We show that two variations of each type of race exist: feasible general races and data races capture the intuitive notions desired for debugging and apparent races capture less accurate notions implicitly assumed by most dynamic race detection methods. We also show that locating feasible races is an NP-hard problem, implying that only the apparent races, which are approximations to feasible races, can be detected in practice. The complexity of dynamically locating apparent races depends on the type of synchronization used by the program. Apparent races can be exhaustively located efficiently only for weak types of synchronization that are incapable of implementing mutual exclusion. This result has important implications since we argue that debugging general races requires exhaustive race detection and is inherently harder than debugging data races (which requires only partial race detection). Programs containing data races can therefore be efficiently debugged by locating certain easily identifiable races. In contrast, programs containing general races require more complex debugging techniques.

471 citations


Cites background from "Analysis of Programs for Parallel P..."

  • ...Bernstein’s conditions state that atomic execution is guaranteed if shared variables that are read and modified by the critical section are not modified by any other concurrently executing section of code[3]....

    [...]

References
More filters
Proceedings Article•DOI•
12 Nov 1963
TL;DR: Parallel processing is not so mysterious a concept as the dearth of algorithms which explicitly use it might suggest, but any or all of the processes can be performed simultaneously, if conflicts arising from multiple access to common storage can be resolved.
Abstract: Parallel processing is not so mysterious a concept as the dearth of algorithms which explicitly use it might suggest As a rule of thumb, if N processes are performed and the outcome is independent of the order in which their steps are executed, provided that within each process the order of steps is preserved, then any or all of the processes can be performed simultaneously, if conflicts arising from multiple access to common storage can be resolved All the elements of a matrix sum may be evaluated in parallel The ith summand of all elements of a matrix product may be computed simultaneously In an internal merge sort all strings in any pass may be created at the same time All the coroutines of a separable program may be run concurrently

178 citations

Proceedings Article•DOI•
04 Dec 1962
TL;DR: The SOLOMON (Simultaneous Operation Linked Ordinal Modular Network), a parallel network computer, is a new system involving the interconnections and programming, under the supervision of a central control unit, of many identical processing elements in an arrangement that can simulate directly the problem being solved.
Abstract: The SOLOMON (Simultaneous Operation Linked Ordinal Modular Network), a parallel network computer, is a new system involving the interconnections and programming, under the supervision of a central control unit, of many identical processing elements (as few or as many as a given problem requires), in an arrangement that can simulate directly the problem being solved.

161 citations

Journal Article•DOI•
TL;DR: The use is discussed of a fast core memory of, say, 32000 words as a slave to a slower core memory in such a way that in practical cases the effective access time is nearer that of the fast memory than that ofThe slow memory.
Abstract: The use is discussed of a fast core memory of, say, 32000 words as a slave to a slower core memory of, say, one million words in such a way that in practical cases the effective access time is nearer that of the fast memory than that of the slow memory.

117 citations

Journal Article•DOI•
TL;DR: Two statements are suggested which allow a programmer writing in a procedure-oriented language to indicate sections of program which are to be executed in parallel, and should be particularly effective for use with computing devices capable of attaining some degree of compute-compute overlap.
Abstract: Two statements are suggested which allow a programmer writing in a procedure-oriented language to indicate sections of program which are to be executed in parallel. The statements are DO TOGETHER and HOLD. These serve partly as brackets in establishing a range of parallel operation and partly to define each parallel path within this range. DO TOGETHERs may be nested. The statements should be particularly effective for use with computing devices capable of attaining some degree of compute-compute overlap. Computers with parallel processing capabilities are seldom used to full advantage. In some systems, more than one single program is processed with simultaneity while in others, different portions of a single program are processed in parallel. In the former, inefficiency often results because the mix of individual programs, each written for sole occupancy of a computer, is unlikely to demand equal loading of each parallel element. In the latter ease, the distribution of program functions to hardware elements is frequently left to computer logic (e.g. inputoutput commands are sent to a special processor, floatingpoint arithmetic commands to another, and so forth.) The following is directed toward better utilization of computer systems which process a single program by performing functionally different portions with separate computing elements. The Bull Gamma 60 and the CDC 6000 series are representative of this class. Procedure-oriented languages developed for serial computation have serious limitations when used to express problem solutions involving parallelism since the control statements (GO TO, DO, FOR, IF, etc.) define a single serial path for tile computation. Two statements are suggested as possible additions to these languages (ALGOL, FORTRAN, COUOL, etc.) to facilitate tile efficient application of parallel computers. The statements provide the analyst with a tool for stating which procedures may be executed in parallel. They also provide the compiler designer with a language element that

29 citations

Proceedings Article•DOI•
J. R. Ball1, R. C. Bollinger1, T. A. Jeeves1, R. C. McReynolds1, D. H. Shaffer1 •
04 Dec 1962
TL;DR: The task of evaluating the feasibility of the SOLOMON computer has led to investigations of problems which primarily involve elementary, simultaneous computations of an iterative nature, and the solution of linear systems and the maintenance of real-time multi-dimensional control and surveillance situations have been considered.
Abstract: The SOLOMON computer has a novel design which is intended to give it unusual capabilities in certain areas of computation. The arithmetic unit of SOLOMON is constructed with a large number of simple processing elements suitably interconnected, and hence differs from that of a conventional computer by being capable of a basically parallel operation. The initial development and study of this new computer has led to considerable scientific and engineering inquiry in three problem areas:1. The design, development, and construction of the hardware necessary to make SOLOMON a reality.2. The identification and investigation of numerical problems which most urgently need the unusual features of SOLOMON.3. The generation of computational techniques which make the most effective use of SOLOMON'S particular parallel construction.This paper is an early report on some work which has been done in the second and third of these areas.SOLOMON has certain inherent speed advantages as a consequence of its design. In the first place, computers conventionally require two memory cycles per simple command--one cycle to obtain the instructions and one cycle to obtain the operand. Although SOLOMON has the same basic requirement it handles up to 1024 operands with each instruction. Consequently, the time per operand spent in obtaining the instruction is negligible. This results in increasing the speed by a factor of two. In the second place, the fact that the processing elements handle 1024 operands at once greatly increases the effective speed. The factor is not 1024, however. Since the processors are serial-by-bit they require n memory references to add an n-bit word. If n is taken nominally to be 32, then the resulting net speed advantage is 1024/n, that is 1024/32 = 32. These two factors result in a fundamental speed increase on the order of 64 to 1 for comparable memory cycle times.In addition to these concrete factors, there are other factors whose contribution to speed cannot be as easily measured. Among these are i) the advantages due to the intercommunication paths between the processing elements, ii) the advantage of using variable word length operations, iii) the net effect resulting from either eliminating conventional indexing operations or else superseding them by mode operations, and iv) the loss in effectiveness resulting from the inability of utilizing all processors in every operation. The net speed advantage can only be determined by detailed analysis of individual specific problems.The task of evaluating the feasibility of the SOLOMON computer has led to investigations of problems which primarily involve elementary, simultaneous computations of an iterative nature. In particular, the solution of linear systems and the maintenance of real-time multi-dimensional control and surveillance situations have been considered. Within these very broad areas two special problems have been rather thoroughly studied and are presented here to demonstrate the scope and application of SOLOMON. The first of these is a problem from partial differential equations, namely the discrete analogue of Dirichlet's problem on a rectangular gird. The second is the real-time problem of satellite tracking and the computations which attend it. These problems are discussed here individually and are followed by a brief summation of other work.

20 citations