scispace - formally typeset
Search or ask a question

Showing papers on "Parallel algorithm published in 1979"


Journal ArticleDOI
TL;DR: A parallel algorithm which uses n=2 processors to find the connected components of an undirected graph with n vertices in time in time O(n), which can be used to finding the transitive closure of a symmetric Boolean matrix.
Abstract: We present a parallel algorithm which uses n2 processors to find the connected components of an undirected graph with n vertices in time O(log2n). An O(log2n) time bound also can be achieved using only n⌈n/⌈log2n⌉⌉ processors. The algorithm can be used to find the transitive closure of a symmetric Boolean matrix. We assume that the processors have access to a common memory. Simultaneous access to the same location is permitted for fetch instructions but not for store instructions.

266 citations


Journal ArticleDOI
TL;DR: Overall conclusions indicate that the parallel algorithm is always much faster and sometimes has better convergence characteristics than the classical trapezoidal integration algorithm.
Abstract: The numerical method presented in this paper permits the solution of differential equations by trapezoidal integration in a time of order log2 T, where T is the number of discrete time steps required for the solution. The number of required parallel processors is T/2. Linear and nonlinear examples are presented. The nonlinear example corresponds to a small stability problem. The classical trapezoidal integration algorithm is compared to the new parallel trapezoidal algorithm in terms of solution time requirements. Also, for the nonlinear example the comparison includes the number of iterations and convergence characteristics. Overall conclusions indicate that the parallel algorithm is always much faster and sometimes has better convergence characteristics. Potential limitations of the method are also discussed.

120 citations


Proceedings ArticleDOI
29 Oct 1979
TL;DR: This work describes in detail how to program the cube-connected-cycles for efficiently solving a large class of problems, which includes Fast-Fourier-Transform, sorting, permutations, and derived algorithms, and the CCC can also be used as a general purpose parallel processor.
Abstract: We introduce a network of processing elements, the cube-connected-cycles (CCC), complying with the present technological constraints of VLSI design. By combining the principles of parallelism and pipelining, the CCC can emulate the cube-connected machine with no significant degradation of performance but with a much more compact structure. We describe in detail how to program the CCC for efficiently solving a large class of problems, which includes Fast-Fourier-Transform, sorting, permutations, and derived algorithms. The CCC can also be used as a general purpose parallel processor.

79 citations


Journal ArticleDOI
TL;DR: This work considers the problem of triangulating a sparse matrix in a parallel processing system and attempts to answer the following questions: how should the rows and columns of the matrix be reordered in order to minimize the completion time of the parallel triangulation process if an unrestricted number of processors are used.
Abstract: We consider the problem of triangulating a sparse matrix in a parallel processing system and attempt to answer the following questions: 1) How should the rows and columns of the matrix be reordered in order to minimize the completion time of the parallel triangulation process if an unrestricted number of processors are used? 2) If the number of processors is fixed, what is the minimum completion time and how should the parallel operations be scheduled? Implementation of the parallel algorithm is discussed and experimental results are given

47 citations


Proceedings ArticleDOI
J. Soukup1
25 Jun 1979
TL;DR: A new router which develops all connections simultaneously as connected irregularly shaped areas which grow and retract in an amoeba-like manner because the cell map is scanned sequentially.
Abstract: The paper describes a new router which develops all connections simultaneously. Routes do not exist as lines, but rather as connected irregularly shaped areas which grow and retract in an amoeba-like manner. It is as if some routes are being rerouted, but it is all done at once. Because the cell map is scanned sequentially, the data handling and storage is vastly simplified.

47 citations


Journal ArticleDOI
TL;DR: A parallel algorithm for the solution of the general tridiagonal system is presented, based on an efficient implementation of Cramer's rule, in which the only divisions are by the determinant of the matrix.
Abstract: . A parallel algorithm for the solution of the general tridiagonal system is presented. The method is based on an efficient implementation of Cramer's rule, in which the only divisions are by the determinant of the matrix. Therefore, the algorithm is defined without pivoting for any nonsingular system. 0(n) storage is required for n equations and 0(log n) operations are required on a parallel computer with n processors. 0(n) operations are required on a sequential computer. Experimental results are presented from both the CDC 7600 and CRAY-1 computers.

35 citations


Journal ArticleDOI
TL;DR: It is shown that the highly variable frequencies of the fragments of a given type, e.g., augmented atom or bonded pair, may be compensated for by employing several levels of description, the frequently occurring characteristics being delineated in some detail in the final screen set while the less common features are described in more general term^.
Abstract: Atom-by-atom matching of a substructural query against a file of connection tables, a subgraph isomorphism search, belongs to the class of problems known as NP-complete' for which no efficient algorithms are known and thus may require excessive amounts of computer time if many trial structures need to be compared with the query. Such searches are accordingly feasible only if the number of matches can be reduced by the rapid and inexpensive elimination of that large portion of the file not satisfying certain minimal requirements in the query.2 We use the term screen set to describe the group of structural Characteristics which is used to carry out this initial partitioning. For even quite small files of compounds the total number of potential screens is very large indeed,3 and hence strict criteria must be used to determine which features should be selected for use in the screen set. Lynch and his co-workers showed that the highly variable frequencies of the fragments of a given type, e.g., augmented atom or bonded pair, may be compensated for by employing several levels of description, the frequently occurring characteristics being delineated in some detail in the final screen set while the less common features are described in more general term^.^^^^^ In this way, a balance may be achieved between the proliferation of low incidence fragments of superfluous specificity and the small number of high-incidence, low-precision fragments. Also, the occurrences of the resultant screen set members will become much less disparate than if a single level of description were to be employed, in accordance with simple considerations of information theory.6 The move toward screen equifrequency may, however, be lessened by the need to describe frequent characteristics at the more general levels, as well as in detail, to allow easy query encoding since, otherwise, the union of many highly specific features may be required in order to describe a more general feature common to all of these. This suggests the need for a hierarchically ordered screen set in which there are well-marked relationships between the fragments at different levels of description. A second problem is that the screenout performance of a screen set cannot be predicted accurately from fragment incidence data since it is found that the incidences of the screen set members are not independent of one another. Thus an analysis of the co-assignment frequencies of pairs of screens showed that the association between fragments of a given type increased with the type si.ze; however, the study concluded that, in practice, no consideration need be given to such fragment associations as long as the screen set members were not too large.7 Additionally, iterative fragmentation procedures introduce very strong associations between a fragment and its immediate parent, Le., the fragment obtained in the previous iteration of the algorithm from which the new fragment has been derived. It is clear that if the two incidences are not dissimilar the filial fragment is redundant and should not be included in the screen st%. A theoretical description of such

29 citations


25 Sep 1979
TL;DR: An (N/lg N)-processor version of the machine can solve the problem of constructing minimum spanning trees in time proportional to N lg N and is an improvement over existing algorithms in several ways.
Abstract: : This report consists of two papers describing various aspects of a new tree-structured parallel computer. The first paper, 'A tree machine for searching problems' by J. L. Bentley and H. T. Kung, describes the basic architecture of the machine. A set of N elements can be maintained on an N-processor version of the machine such that insertions, deletions, queries and updates can all be processed in 2 lg N time units. The queries can be very complex, including problems arising in ordered set manipulation, data bases, and statistics. The machine is pipelined so that M successive operations can be performed in M-1 + 2 lg N time units. The paper studies both the basic machine structure and a VLSI implementation of the machine. The second paper, 'A parallel algorithm for constructing minimum spanning trees' by J. L. Bentley, shows how an (N/lg N)-processor version of the machine can solve the problem of constructing minimum spanning trees in time proportional to N lg N. This algorithm is an improvement over existing algorithms in several ways. (Author)

14 citations



Proceedings ArticleDOI
01 Dec 1979
TL;DR: The proposed algorithms are based on a new idea of group conjugacy and they can be considered parallel extensions of conjugate direction methods for locating the minimum point of a strictly convex quadratic function.
Abstract: In this paper, parallel algorithms are proposed for locating the minimum point of a strictly convex quadratic function. The proposed algorithms are based on a new idea of group conjugacy and they can be considered parallel extensions of conjugate direction methods.

9 citations


Journal ArticleDOI
TL;DR: The singling out of the FS in the role of an independent component, of interpretation, turns out to be expedient not only from the considerations just stated but also when proceeding to practical realization of languages of parallel algorithms.
Abstract: As follows from the definition of FS, the following three components are important on their own account when we investigate the methods of specifying computable functions and the processes computing their values: the functional scheme, the interpretation, and the computation model.

Journal ArticleDOI
TL;DR: A fast metalgonthm for adaptive quadrature on a MIMD parallel computer and its speedup ts at least constant times M/ log M using a total of M processors is described.
Abstract: We describe a fast metalgonthm for adaptive quadrature on a MIMD (Multiple Instruction, Multiple Data) parallel computer and show that its speedup ts at least constant times M/ log M using a total of M processors

ReportDOI
01 Jan 1979
TL;DR: This dissertation demonstrates the implementation and evaluation of parallel algorithms on C.mmp, a multiprocessor computer system, to discover and measure the major sources that perturbed the performance of the parallel algorithm.
Abstract: : This dissertation demonstrates the implementation and evaluation of parallel algorithms on C.mmp, a multiprocessor computer system. Initial attempts to demonstrate the performance of a simple parellel algorithm yielded unexpectedly large performance degradations from the theoretical calculations. This unexpected result spawned a study of the C.mmp system to discover and measure the major sources that perturbed the performance of the parallel algorithm. The performance study was conducted at several levels: basic hardware measurements; runtime performance of Hydra, C.mmp's operating system; and overall performance of a particular application: a parallel rootfinding algorithm. The results of this study identified six major sources of performance perturbation. The six sources, in order of importance, were: variations in the compute time to perform the repetitive calculation; memory contention caused by finite memory bandwidth; the operating system's scheduling processes can become a bottleneck; variations in the individual processor speeds; interrupts associated with I/O device service routines; and variations in the individual memory bank speeds.


Proceedings ArticleDOI
15 May 1979
TL;DR: The Load Flow problem is treated as a minimization problem and is solved using a parallel nangradient optimization procedure similar to the one suggested by Chazan and Miranker, and a speed-up nearly equal to q is possible.
Abstract: The Load Flow problem is treated as a minimization problem and is solved using a parallel nangradient optimization procedure similar to the one suggested by Chazan and Miranker. The algorithm is described and test case results are presented. A speed-up nearly equal to q is possible if a parallel computer with q processors is used for the solution of the problem.

Journal ArticleDOI
D.A. Zein1, C.W. Ho1
TL;DR: An algorithm for solving simultaneous complex algebraic equations is implemented which has the advantages of leaving the circuit-matrix structure invariant in the d.c, a.p.u.d. and transient cases.
Abstract: The small-signal frequency-domain analysis of linear and nonlinear electronic circuits has been implemented in a general purpose c.a.d. program in APL. The techniques used include a parallel pivoting and ordering scheme, and APL-oriented parallel algorithms to enhance speed and save storage. An algorithm for solving simultaneous complex algebraic equations is implemented which has the advantages of leaving the circuit-matrix structure invariant in the d.c, a.c. and transient cases. A highly nonlinear circuit of 450 branches, analysed at 40 frequency points, took approximately 3 min of c.p.u. time on the IBM 370/168 machine and the program and data fitted in less than 112K bytes of storage. The turn-around time was approximately 8 min

Journal ArticleDOI
TL;DR: It is shown in two examples — matrix multiplication and the knapsack problem — that L Systems can characterize essential features of the control and data structure of algorithms.
Abstract: — The parallel program schemata of Karp and Miller are considérée for unbounded parallelism and their main decidability result is extended to this case. The complexity of these schemata is investigated and a scheduling algorithm is presented whose sequential time complexity is polynomial. The scheduling algorithm is applied to a schema representing the StrassenWinograd algorithm for multiplication of 2x2 matrices, where itfinds a f as ter computation than the \"obvious\" one. Finally it is shown in two examples — matrix multiplication and the knapsack problem — that L Systems can characterize essential features of the control and data structure of algorithms. This observation gives rise to the concept of parallel program schema over a 0 L System. By réduction to the membership problem for 0 L Systems, questions like \"are certain opérations executed simultaneously?\" and \"can certain data conjlicts avise?\" can be decided for parallel algorithms represented by this class of schemata. Résumé. — On considère les schémas de programmes parallèles de Karp et Miller pour parallélisme non borné et étend leur résultat principal sur la décidabilité en ce cas. La complexité des calculs parallèles de ces schémas est examinée, et on présente un algorithme de « scheduling » de complexité {séquentielle) du temps polynomiale. L'algorithme de « scheduling » est appliqué à un schéma représentant l'algorithme de StrassenWinograd pour la multiplication de matrices 2 x 2 où il trouve un calcul plus rapide que le calcul « évident ». Enfin on démontre en deux exemples, la multiplication des matrices et le problème de « knapsack », que les L-langages peuvent caractériser les propriétés essentielles de la structure du contrôle et des données des algorithmes. Cette observation donne lieu à un concept formel de schéma d'un programme parallèle sur un OL-langage. Par réduction au problème du « membership » pour 0 L-langages on peut décider des questions telles que « est-ce que certaines opérations sont exécutées simultanément ? » et « certains conflits de données peuvent-ils se produire? » pour les algorithmes parallèles représentés par cette classe de schémas.


Book ChapterDOI
08 Oct 1979
TL;DR: In this paper, a range of use cases of nets have been discussed, and the modifications to nets that have been adopted for specific applications have been described, including models of a parallel algorithm for lexical analysis, net augmentation to detect and correct errors, an approach to error correction in distributed systems with no central control, and modeling aids to design of properly functioning systems.
Abstract: Selected applications of nets by other authors are discussed, to show a range of use of the models, and to show the modifications to nets that have been adopted for specific applications. The topics include models of a parallel algorithm for lexical analysis, net augmentation to detect and correct errors, an approach to error correction in distributed systems with no central control, and modeling aids to design of properly functioning systems.

Journal ArticleDOI
TL;DR: A new algorithm based on matrix representation is developed and is compared with the associative version of the Ford and Fulkerson labeling method, showing that the ratio of the labeling algorithm to the new algorithm is about 3 for a dense network with 5 nodes.
Abstract: Application of associative processors to the solution of the maximal flow problem is investigated. To take maximum advantage of the capability of associative processors, a new algorithm based on matrix representation is developed. The new algorithm is then compared with the associative version of the Ford and Fulkerson labeling method. The comparison is made on the total associative memory access time required for problem solution by each algorithm running on an associate processor. Results show that the ratio of the labeling algorithm to the new algorithm is about 3 for a dense network with 5 nodes. This ratio increases as the number of nodes increases, and decreases as the density of the network decreases.

Journal ArticleDOI
TL;DR: This paper presents a potentially parallel iterative algorithm for the solution of the unconstrainedN-stage decision problem of dynamic programming using the use of variable-metric minimization techniques to develop a quadratic approximation to the cost function at each stage.
Abstract: This paper presents a potentially parallel iterative algorithm for the solution of the unconstrainedN-stage decision problem of dynamic programming. The basis of the algorithm is the use of variable-metric minimization techniques to develop a quadratic approximation to the cost function at each stage. The algorithm is applied to various problems, and comparisons with other algorithms are made.