scispace - formally typeset
Search or ask a question

Showing papers on "Sequential algorithm published in 1988"


ReportDOI
01 Jan 1988
TL;DR: This thesis describes and evaluates techniques for speeding up unification, including an extension of Chang's static data-dependency analysis (SDDA), and ways in which these techniques may be applied to the Berkeley PLM machine.
Abstract: : Unification, the fundamental operation in the Prolog logic programming language can take up to 50% of the execution time of a typical Prolog system. One approach to speeding up the unification operation is to perform it on parallel hardware. Although it has been shown that, in general, there is no parallel algorithm for unification that is better than the best sequential algorithm, there is a substantial subset of unification which may be done in parallel. Identifying these subsets involves gathering data using an extension of Chang's static data-dependency analysis (SDDA), then using that data to schedule the components of a unification for parallel unification. Improvements to the information gathered by SDDA may be achieved through procedures splitting, a source-level transformation of the program. This thesis describes and evaluates the above-mentioned techniques and their implementation. Results are compared to other techniques for speeding up unification. Ways in which these techniques may be applied to the Berkeley PLM machine are also described.

21 citations


01 Jan 1988
TL;DR: An algorithm that runs in <9(logmlogrt) time and uses mn processors on a CRCW PRAM, where m and n are the lengths of the strings and the largest common submatrix of two matrices is considered and shown to be NP-hard.
Abstract: We consider the problem of determining in parallel the cost of converting a source string to a destination string by a sequence of insert, delete and transform operations. Each operation has an integer cost in some fixed range. We present an algorithm that runs in <9(logmlogrt) time and uses mn processors on a CRCW PRAM, where m and n are the lengths of the strings. The best known sequential algorithm [MP83] runs in time 0(n/ log n) for strings of length n, indicating that our parallel algorithm (with time-processor product equal to 0(mn log m log n)) is nearly optimal. An instance of the edit distance problem is represented as a graph. The algorithm finds the shortest path in the graph using a path doubling method with efficient pruning due to the structure of the problem. Extensions of the algorithm solve approximate string matching and local best fit problems. The problem of finding the largest common submatrix of two matrices is considered and shown to be NP-hard. Finally we present an algorithm for exact two-dimensional pattern matching that runs in OClog n) time using n processors for a n x n search matrix.

20 citations


Journal ArticleDOI
TL;DR: In this paper, the authors presented an algorithm for convolving a k*k window of weighting coefficients with an n*n image matrix on a pyramid computer of O(n/sup 2/) processors.
Abstract: An algorithm for convolving a k*k window of weighting coefficients with an n*n image matrix on a pyramid computer of O(n/sup 2/) processors in time O(logn+k/sup 2/), excluding the time to load the image matrix, is presented. If k= Omega ( square root log n), which is typical in practice, the algorithm has a processor-time product O(n/sup 2/ k/sup 2/) which is optimal with respect to the usual sequential algorithm. A feature of the algorithm is that the mechanism for controlling the transmission and distribution of data in each processor is finite state, independent of the values of n and k. Thus, for convolving two (0, 1)-valued matrices using Boolean operations rather than the typical sum and product operations, the processors of the pyramid computer are finite-state. >

18 citations


Journal ArticleDOI
TL;DR: A fast parallel algorithm for preemptive scheduling of n independent jobs on m uniform machines and a parallel version of this algorithm for a Concurrent Read Exclusive Write (CREW) shared memory computer is developed.

12 citations


Journal ArticleDOI
TL;DR: A normalization method is introduced to fix the positions of the broadcast sources so that the derived design can be further transformed by retimings into a systolic array.
Abstract: When a sequential algorithm is directly mapped into an array of processing elements, quite likely data broadcasts are required and their source places vary during the computation. The authors introduce a normalization method to fix the positions of the broadcast sources so that the derived design can be further transformed by retimings into a systolic array. The method is fully illustrated in designing systolic arrays for enumeration sort, solving simultaneous linear equations, and transitive closure. >

12 citations


Journal ArticleDOI
TL;DR: This work defines and discusses an objective measure of the effect of parallelism on a sequential algorithm, known as the potential parallel factor (PPF), which is applied to parallel versions of the unification algorithms of Yasuura and Jaffar.
Abstract: Parallel unification algorithms are not nearly so numerous or well-developed as sequential ones In order to estimate the improvement in efficiency which may be expected, we define and discuss an objective measure of the effect of parallelism on a sequential algorithm This measure, known as thepotential parallel factor (PPF), is applied to parallel versions of the unification algorithms of Yasuura and Jaffar The PPFs for these algorithms are measured on a variety of running Prolog programs to estimate what increase in speed may be expected in a Prolog environment from the use of parallelism Other potential uses of parallelism may be evaluated by different applications of our general methods and techniques

11 citations


Proceedings ArticleDOI
25 Oct 1988
TL;DR: A scheme of extracting edge information from parallel spatial frequency bands using the formalism of a Gaussian pyramid to create an integrated image of most significant edges of different scales is presented.
Abstract: We present a scheme of extracting edge information from parallel spatial frequency bands. From these we create an integrated image of most significant edges of different scales. The frequency bands are realized using the formalism of a Gaussian pyramid in which the levels represent a bank of spatial lowpass filters. The integrated edge image is created in a top-down algorithm, starting from the smallest version of the image. The sequential algorithm uses mutual edge information of two consecutive levels to control the processing in the lower one. This edge detection algorithm constitutes an image-dependent nonuniform processing scheme. Computational results show that only 20%-50% of the operations are needed to create an edge pyramid, compared to the number required in the regular scheme. The proposed generic scheme of image-dependent processing can be also implemented with operators other than edge detectors to exploit the advantages inherent in biological processing of images.

11 citations


Book
01 Feb 1988
TL;DR: This work describes a simple interpreter-based fcp implementation of the algorithm, analyzes its performance under Logix, and includes initial measurements of its speedup on the parallel implementation of fcp .
Abstract: We describe a simple or -parallel execution algorithm for PROLOG that naturally collects all solutions to a goal. For a large class of programs the algorithm has O(log n) overhead and exhibits O(n/(log n)2) parallel speedup over the standard sequential algorithm. Its constituent parallel processes are independent, and hence the algorithm is suitable for implementation on non-shared-memory parallel computers. The algorithm can be implemented directly in Flat Concurrent PROLOG. We describe a simple interpreter-based fcp implementation of the algorithm, analyze its performance under Logix, and include initial measurements of its speedup on the parallel implementation of fcp . The implementation is easily extended. We show an extension that performs parallel demand-driven search. We define two parallel variants of cut, cut-clause and cut-goal, and describe their implementation. We discuss the execution of the algorithm on a parallel computer, and describe implementations of it that perform centralized and distributed dynamic load balancing. Since the fcp implementation of the algorithm relies on full test unification, the algorithm does not seem to have a similarly natural implementation in ghc or parlog .

7 citations


Proceedings ArticleDOI
20 Apr 1988
TL;DR: It is feasible to generalize techniques for mapping sequential algorithms onto a neural model of a parallel distributed processor and implement a neural compiler for sequential algorithms, according to this paper.
Abstract: Teledyne Brown Engineering has developed techniques for mapping sequential algorithms onto a neural model of a parallel distributed processor. Any sequential algorithm (including NP algorithms) can be mapped onto a neural network. This paper discusses some practical considerations for implementation of sequential to parallel mappings (SPM). It is feasible to generalize these techniques and implement a neural compiler for sequential algorithms. Neural networks (the interconnection matrix) generated by the neural compiler will be implemented in a fixed, holographic, optical computer.

7 citations


Journal ArticleDOI
TL;DR: An algorithm for simultaneous order identification and parameter estimation of linear, discrete, MIMO system with unknown observability indices is presented, considered as a multivariable extension of conventional loss function tests used to detect the order of SISO systems.

6 citations


Book ChapterDOI
03 Oct 1988
TL;DR: A mathematical model for analysing the speedup behaviour of a parallel k processor backtracking algorithm compared with sequential backtracking is studied and it is shown that in case of sufficiently unbalanced distributions superlinear speedups will occur in the average.
Abstract: A mathematical model for analysing the speedup behaviour of a parallel k processor backtracking algorithm compared with sequential backtracking is studied. The essential parameter of a problem class, which is incorporated in the model, is the distribution of solutions in the corresponding backtracking trees. Under the model assumptions it is shown that in case of sufficiently unbalanced distributions superlinear speedups will occur in the average. Further a result is shown, indicating that in case of restricted classes of CNF-formulas unbalanced distributions of solutions actually occur.

Proceedings ArticleDOI
25 Sep 1988
TL;DR: In this article, a sequential decision rule is described to discriminate probability distributions of VF from ventricular tachycardia (VT) and supraventricular thymus (SVT).
Abstract: Ventricular fibrillation (VF) must be accurately detected by an automatic implantable cardioconverter-defibrillator and must also be discriminated from ventricular tachycardia (VT) and supraventricular tachycardia (SVT). A sequential decision rule is described to discriminate probability distributions of VF from VT and SVT. Intracardiac signals are first converted to binary sequences by comparison with a threshold. Probability distributions of threshold-crossing intervals are determined. The sequential test calculates a log-likelihood and compares that with preset detection thresholds. The thresholds are set so as to result in desired test accuracy. Essentially, the sequential algorithm trades off the time to reach decision (number of sequential decision steps) with accuracy. In a study of 170 electrograms from humans, 95.3% of VF signals are classified in 3 s, 97.6% in 5 s, and 100% in 7 s. The sequential algorithm offers ease of implementation for implantable devices and excellent performance. >

Proceedings ArticleDOI
D.A. Field1, K. Yarnall
14 Nov 1988
TL;DR: A software package for generation of tetrahedral finite-element mesh, whose kernel is a robust three-dimensional Delaunay triangulation algorithm, was ported to a CRAY X-MP for vector processing, reducing the total execution time for the critical subroutines of the kernel by a factor of six.
Abstract: A software package for generation of tetrahedral finite-element mesh, whose kernel is a robust three-dimensional Delaunay triangulation algorithm, was ported to a CRAY X-MP for vector processing. The total execution time for the critical subroutines of the kernel decreased by a factor of six over scalar mode on the CRAY X-MP. The kernel is characterized by simple data structures and O(N/sup 2/) arithmetic operation counts, N being the number of finite-element nodes. Although the kernel is essentially a sequential algorithm, its simple data structures allow for key uses of vector processing and for streamlining sequential processing. >

Journal ArticleDOI
TL;DR: It is shown in the paper that regularization problems, such as the smoothest velocity field computation and the computation of the minimum dilatation velocity field, can be solved with a parallel algorithm or a fast sequential algorithm.
Abstract: The computation of the velocity field along image curves belongs to the class of ill-posed problems (in the sense of Hadamard). Local measurements of image pattern changes are usually insufficient to solve the velocity field uniquely. Therefore regularization techniques are applied, yielding solutions that are robust against noise and that are correct for a limited class of curve velocity fields. It is shown in the paper that regularization problems, such as the smoothest velocity field computation and the computation of the minimum dilatation velocity field, can be solved with a parallel algorithm or a fast sequential algorithm. This follows from the block-tridiagonal structure to which these variational techniques give rise.

Dissertation
01 Jan 1988
TL;DR: Parallel solutions for two classes of linear programs are presented and it is shown that it is possible to get linear improvement in performance and a particular variation of the decomposed simplex algorithm which can run 2 times faster than the original one is discovered.
Abstract: Parallel solutions for two classes of linear programs are presented. First we parallelized the two-phase revised simplex algorithm and showed that it is possible to get linear improvement in performance. The simplex algorithm is the best known algorithm for solving linear programs, and we claim our result is the best one which can be achieved. Next we study the parallelization of the decomposed simplex algorithm. One of our new parallel algorithms has achieved 2*P time of performance improvement over the decomposed simplex algorithm using P processors. Meanwhile, we discovered a particular variation of the decomposed simplex algorithm which can run 2 times faster than the original one. The new parallel algorithm linearly speedups the fast sequential algorithm. As in any parallel program, unbalanced processor load causes the performance of the parallel decomposed simplex algorithm to drop significantly when the size of the input data is not a multiple of the number of available processors. To remove this limitation, we invented a load balance technique called Loop Spreading that evenly distributes parallel tasks on multiple processors without a drop in performance even when the size of the input data is not a multiple of the number of processors. Loop Spreading is a general technique that can be used automatically by a compiler to balance processor load in any language that supports parallel loop constructs.

Book ChapterDOI
01 Jan 1988
TL;DR: Appearance of multiprocessor computer systems and local computer networks gives wide latitude for constructing optimization techniques using parallel iterations including simultaneous (due to many processors) computations (trials) of function to be optimized values at several points in a parameter space.
Abstract: Appearance of multiprocessor computer systems and local computer networks gives wide latitude for constructing optimization techniques using parallel iterations including simultaneous (due to many processors) computations (trials) of function to be optimized values at several points in a parameter space. Each trial appearing in sach parallel iteration could be performed at a separate processing unit using the same (shared or copied) program.

Book ChapterDOI
01 Jan 1988
TL;DR: In this work, a parallel stratagem is detailed which gives about four times speedup over a sequential scheme for a six-links robot-arm.
Abstract: The fast real-time control of a rigid open link robot-arm necessitates that the forward kinematics and inverse dynamics problems be solved in as short a time as possible. The solution of the Newton-Euler equations sequentially, although fast compared with the Lagrange-Euler approach, may still not be fast enough for the real-time determination of applied joint torques and the efficient feedback control of the non-linear effects. In this work, a parallel stratagem is detailed which gives about four times speedup over a sequential scheme for a six-links robot-arm. For an n - link robot-arm, the stratagem uses 2n processing elements with 2 processing elements assigned to each link. The processors are arranged in two layers, with n processors in each layer. In general, the top layer computes the angular velocity terms and the bottom layer computes angular acceleration terms.