Showing papers in "IEEE Transactions on Parallel and Distributed Systems in 1990"

PDF

Open Access

Journal Article•DOI•

Hypertool: a programming aid for message-passing systems

[...]

Min-You Wu¹, Daniel D. Gajski¹•Institutions (1)

01 Jul 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Programming assistance, automation concepts, and their application to a message-passing system program development tool called Hypertool, which performs scheduling and handles the communication primitive insertion automatically, thereby increasing productivity and eliminating synchronization errors.

...read moreread less

Abstract: Programming assistance, automation concepts, and their application to a message-passing system program development tool called Hypertool are discussed. Hypertool performs scheduling and handles the communication primitive insertion automatically, thereby increasing productivity and eliminating synchronization errors. Two algorithms, based on the critical-path method, are presented for scheduling processes statically. Hypertool also generates the performance estimates and other program quality measures to help programmers improve their algorithms and programs. >

...read moreread less

700 citations

Journal Article•DOI•

The performance of spin lock alternatives for shared-money multiprocessors

[...]

Thomas Anderson¹•Institutions (1)

University of Washington¹

01 Jan 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The author examines the questions of whether there are efficient algorithms for software spin-waiting given hardware support for atomic instructions, or whether more complex kinds of hardware support are needed for performance.

...read moreread less

Abstract: The author examines the questions of whether there are efficient algorithms for software spin-waiting given hardware support for atomic instructions, or whether more complex kinds of hardware support are needed for performance. He considers the performance of a number of software spin-waiting algorithms. Arbitration for control of a lock is in many ways similar to arbitration for control of a network connecting a distributed system. He applies several of the static and dynamic arbitration methods originally developed for networks to spin locks. A novel method is proposed for explicitly queueing spinning processors in software by assigning each a unique number when it arrives at the lock. Control of the lock can then be passed to the next processor in line with minimal effect on other processors. >

...read moreread less

683 citations

Journal Article•DOI•

Efficient scheduling algorithms for real-time multiprocessor systems

[...]

Krithi Ramamritham, John A. Stankovic, P. Shiah

01 Apr 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper presents an efficient scheduling algorithm for dynamic scheduling in real-time systems that focuses its attention on a small subset of tasks with the shortest deadlines and is shown to be very effective when the maximum allowable scheduling overhead is fixed.

...read moreread less

Abstract: Efficient scheduling algorithms based on heuristic functions are developed for scheduling a set of tasks on a multiprocessor system. The tasks are characterized by worst-case computation times, deadlines, and resources requirements. Starting with an empty partial schedule, each step of the search extends the current partial schedule by including one of the tasks yet to be scheduled. The heuristic functions used in the algorithm actively direct the search for a feasible schedule, i.e. they help choose the task that extends the current partial schedule. Two scheduling algorithms are evaluated by simulation. To extend the current partial schedule, one of the algorithms considers, at each step of the search, all the tasks that are yet to be scheduled as candidates. The second focuses its attention on a small subset of tasks with the shortest deadlines. The second algorithm is shown to be very effective when the maximum allowable scheduling overhead is fixed. This algorithm is hence appropriate for dynamic scheduling in real-time systems. >

...read moreread less

349 citations

Journal Article•DOI•

Broadcast protocols for distributed systems

[...]

Peter M. Melliar-Smith¹, Louise E. Moser¹, Vivek Agrawala¹•Institutions (1)

University of California, Santa Barbara¹

01 Jan 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: An innovative approach is presented to the design of fault-tolerant distributed systems that avoids the several rounds of message exchange required by current protocols for consensus agreement.

...read moreread less

Abstract: An innovative approach is presented to the design of fault-tolerant distributed systems that avoids the several rounds of message exchange required by current protocols for consensus agreement. The approach is based on broadcast communication over a local area network, such as an Ethernet or a token ring, and on two novel protocols, the Trans protocol, which provides efficient reliable broadcast communication, and the Total protocol, which with high probability promptly places a total order on messages and achieves distributed agreement even in the presence of fail-stop, omission, timing, and communication faults. Reliable distributed operations, such as locking, update, and commitment, typically require only a single broadcast message rather than the several tens of messages required by current algorithms. >

...read moreread less

272 citations

Journal Article•DOI•

Parallelizing programs with recursive data structures

[...]

Laurie Hendren¹, Alexandru Nicolau•Institutions (1)

Cornell University¹

01 Jan 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The authors focus on developing efficient and implementable methods for recursive data structures and present interference analysis tools and parallelization techniques for imperative programs that contain dynamically updatable trees and directed acyclic graphs.

...read moreread less

Abstract: A study is made of the problem of estimating interference in an imperative language with dynamic data structures. The authors focus on developing efficient and implementable methods for recursive data structures. In particular, they present interference analysis tools and parallelization techniques for imperative programs that contain dynamically updatable trees and directed acyclic graphs. The analysis methods are based on a regular-expression-like representation of the relationship between accessible nodes in the data structure. They authors have implemented their analysis, and they present some concrete examples that have been processed by this system. >

...read moreread less

241 citations

Journal Article•DOI•

IPS-2: the second generation of a parallel program measurement system

[...]

Barton P. Miller¹, M. Clark², Jeffrey K. Hollingsworth¹, S. Kierstead¹, S.-S. Lim², T. Torzewski³ - Show less +2 more•Institutions (3)

University of Wisconsin-Madison¹, AT&T², University of Colorado Colorado Springs³

01 Apr 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: IPS, a performance measurement system for parallel and distributed programs, is currently running on its second implementation, IPS-2, extends the basic system with new instrumentation techniques, an interactive and graphical user interface, and new automatic guidance analysis techniques.

...read moreread less

Abstract: IPS, a performance measurement system for parallel and distributed programs, is currently running on its second implementation. IPS's model of parallel programs uses knowledge about the semantics of a program's structure to provide two important features. First, IPS provides a large amount of performance data about the execution of a parallel program, and this information is organized so that access to it is easy and intuitive. Secondly, IPS provides performance analysis techniques that help to guide the programmer automatically to the location of program bottlenecks. The first implementation of IPS was a testbed for the basic design concepts, providing experience with a hierarchical program and measurement model, interactive program analysis, and automatic guidance techniques. It was built on the Charlotte distributed operating system. The second implementation, IPS-2, extends the basic system with new instrumentation techniques, an interactive and graphical user interface, and new automatic guidance analysis techniques. This implementation runs on 4.3BSD UNIX systems, on the VAX, DECstation, Sun 4, and Sequent Symmetry multiprocessor. >

...read moreread less

202 citations

Journal Article•DOI•

Constant time algorithms for the transitive closure and some related graph problems on processor arrays with reconfigurable bus systems

[...]

Biing-Feng Wang¹, Gen-Huey Chen¹•Institutions (1)

National Taiwan University¹

01 Oct 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Using the O(1) time transitive closure algorithms, many other graph problems are solved in O( 1) time, including recognizing bipartite graphs and finding connected components, articulation points, biconnected components, bridges and minimum spanning trees in undirected graphs.

...read moreread less

Abstract: The transitive closure problem in O(1) time is solved by a new method that is far different from the conventional solution method. On processor arrays with reconfigurable bus systems, two O(1) time algorithms are proposed for computing the transitive closure of an undirected graph. One is designed on a three-dimensional n*n*n processor array with a reconfigurable bus system, and the other is designed on a two-dimensional n/sup 2/*n/sup 2/ processor array with a reconfigurable bus system, where n is the number of vertices in the graph. Using the O(1) time transitive closure algorithms, many other graph problems are solved in O(1) time. These problems include recognizing bipartite graphs and finding connected components, articulation points, biconnected components, bridges, and minimum spanning trees in undirected graphs. >

...read moreread less

184 citations

Journal Article•DOI•

Depth-first search approach for fault-tolerant routing in hypercube multicomputers

[...]

Ming-Syan Chen¹, Kang G. Shin²•Institutions (2)

IBM¹, University of Michigan²

01 Apr 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Using depth-first search, the authors develop and analyze the performance of a routing scheme for hypercube multicomputers in the presence of an arbitrary number of faulty components and derive an exact expression for the probability of routing messages by way of optimal paths from the source node to an obstructed node.

...read moreread less

Abstract: Using depth-first search, the authors develop and analyze the performance of a routing scheme for hypercube multicomputers in the presence of an arbitrary number of faulty components. They derive an exact expression for the probability of routing messages by way of optimal paths (of length equal to the Hamming distance between the corresponding pair of nodes) from the source node to an obstructed node. The obstructed node is defined as the first node encountered by the message that finds no optimal path to the destination node. It is noted that the probability of routing messages over an optimal path between any two nodes is a special case of the present results and can be obtained by replacing the obstructed node with the destination node. Numerical examples are given to illustrate the results, and they show that, in the presence of component failures, depth-first search routing can route a message to its destination by means of an optimal path with a very high probability. >

...read moreread less

149 citations

Journal Article•DOI•

An empirical study of Fortran programs for parallelizing compilers

[...]

Z. Shen, Zhiyuan Li¹, Pen-Chung Yew²•Institutions (2)

University of York¹, University of Illinois at Urbana–Champaign²

01 Jul 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: An empirical study of program characteristics, that are important in parallelizing compiler writers, especially in the area of data dependence analysis and program transformations, finds nonzero coefficients of loop indexes in most subscripts are found to be simple, allowing an exact real-valued test to be as accurate as an exact integer-valuedTest for one-dimensional or two-dimensional arrays.

...read moreread less

Abstract: Some results are reported from an empirical study of program characteristics, that are important in parallelizing compiler writers, especially in the area of data dependence analysis and program transformations. The state of the art in data dependence analysis and some parallel execution techniques are examined. The major findings are included. Many subscripts contain symbolic terms with unknown values. A few methods of determining their values at compile time are evaluated. Array references with coupled subscripts appear quite frequently; these subscripts must be handled simultaneously in a dependence test, rather than being handled separately as in current test algorithms. Nonzero coefficients of loop indexes in most subscripts are found to be simple: they are either 1 or -1. This allows an exact real-valued test to be as accurate as an exact integer-valued test for one-dimensional or two-dimensional arrays. Dependencies with uncertain distance are found to be rather common, and one of the main reasons is the frequent appearance of symbolic terms with unknown values. >

...read moreread less

135 citations

Journal Article•DOI•

Predicting performance of parallel computations

[...]

V.W. Mak, S.F. Lundstrom¹•Institutions (1)

Stanford University¹

01 Jul 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: An accurate and computationally efficient method for predicting the performance of a class of parallel computations running on concurrent systems is described and validated against both detailed simulation and actual execution on a commercial multiprocessor.

...read moreread less

Abstract: An accurate and computationally efficient method for predicting the performance of a class of parallel computations running on concurrent systems is described. A parallel computation is modeled as a task system with precedence relationships expressed as a series-parallel directed acyclic graph. Resources in a concurrent system are modeled as service centers in a queuing network model. Using these two models as inputs, the method outputs predictions of expected execution time of the parallel computation and the concurrent system utilization. The method is validated against both detailed simulation and actual execution on a commercial multiprocessor. Using 100 test cases, the average error of the prediction when compared to simulation statistics is 1.7%, with a standard deviation of 1.5%; the maximum error is about 10%. >

...read moreread less

131 citations

Journal Article•DOI•

Error recovery in shared memory multiprocessors using private caches

[...]

Kun-Lung Wu¹, W.K. Fuchs¹, Janak H. Patel¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Apr 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The problem of recovering from processor transient faults in shared memory multiprocessor systems is examined and a user-transparent checkpointing and recovery scheme using private caches is presented, which prevents rollback propagation, provides rapid recovery, and can be integrated into standard cache coherence protocols.

...read moreread less

Abstract: The problem of recovering from processor transient faults in shared memory multiprocessor systems is examined. A user-transparent checkpointing and recovery scheme using private caches is presented. Processes can recover from errors due to faulty processors by restarting from the checkpointed computation state. Implementation techniques using checkpoint identifiers and recovery stacks are examined as a means of reducing performance degradation in processor utilization during normal execution. This cache-based checkpointing technique prevents rollback propagation, provides rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions to take error latency into account are presented. >

...read moreread less

Journal Article•DOI•

An efficient data dependence analysis for parallelizing compilers

[...]

Zhiyuan Li¹, Pen-Chung Yew², C.-Q. Zhu³•Institutions (3)

University of York¹, University of Illinois at Urbana–Champaign², Fudan University³

01 Jan 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A novel algorithm, called the lambda test, is presented for an efficient and accurate data dependence analysis of multidimensional array references that combines the efficiency and the accuracy of both approaches.

...read moreread less

Abstract: A novel algorithm, called the lambda test, is presented for an efficient and accurate data dependence analysis of multidimensional array references. It extends the numerical methods to allow all dimensions of array references to be tested simultaneously. Hence, it combines the efficiency and the accuracy of both approaches. This algorithm has been implemented in Parafrase, a Fortran program parallelization restructurer developed at the University of Illinois at Urbana-Champaign. Some experimental results are presented to show its effectiveness. >

...read moreread less

Journal Article•DOI•

Deciding properties of timed transition models

[...]

Jonathan S. Ostroff¹•Institutions (1)

York University¹

01 Apr 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Real-time distributed systems are modeled by a times transition model (TTM) and decision procedures are provided for checking a small but important class of properties (specified in real-time temporal logic) that includes invariance, precedence, eventuality and real- time response specifications.

...read moreread less

Abstract: Real-time distributed systems are modeled by a times transition model (TTM). For any finite-state TTM, decision procedures are provided for checking a small but important class of properties (specified in real-time temporal logic). The procedures are linear in the size of the system reachability graph. The class of properties includes invariance, precedence, eventuality and real-time response specifications. >

...read moreread less

Journal Article•DOI•

Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

[...]

Prithviraj Banerjee¹, M.H. Jones², J.S. Sargent•Institutions (2)

University of Illinois at Urbana–Champaign¹, AT&T²

01 Jan 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A discussion is presented of two ways of mapping the cells in a two-dimensional area of a chip onto processors in an n-dimensional hypercube such that both small and large cell moves can be applied.

...read moreread less

Abstract: A discussion is presented of two ways of mapping the cells in a two-dimensional area of a chip onto processors in an n-dimensional hypercube such that both small and large cell moves can be applied. Two types of move are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support such a parallel cost evaluation. A novel tree broadcasting strategy is presented for the hypercube that is used extensively in the algorithm for updating cell locations in the parallel environment. A dynamic parallel annealing schedule is proposed that estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control. The performance on an Intel iPSC-2/D4/MX hypercube is reported. >

...read moreread less

Journal Article•DOI•

A fault-tolerant protocol for atomic broadcast

[...]

Shyh-Wei Luan¹, V.D. Gligor¹•Institutions (1)

University of Maryland, College Park¹

01 Jul 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A performance analysis of this protocol is presented, showing that this protocol commits with high probability under realistic operating conditions without invoking termination protocol if N is sufficiently large.

...read moreread less

Abstract: A general protocol for atomic broadcast in networks is presented. The protocol tolerates loss, duplication, reordering, delay of messages, and network partitioning in an arbitrary network of fail-stop sites (i.e. no Byzantine site behavior is tolerated). The protocol is based on majority-concensus decisions to commit on unique ordering of received broadcast messages. Under normal operating conditions, the protocol requires three phases to complete and approximately 4N/V messages where N is the number of sites. This overhead is distributed among the messages of which the delivery decision is made and the heavier the broadcast message traffic, the lower the overhead per broadcast message becomes. Under abnormal operating conditions, a decentralized termination protocol (also presented) is invoked. A performance analysis of this protocol is presented, showing that this protocol commits with high probability under realistic operating conditions without invoking termination protocol if N is sufficiently large. The protocol retains its efficiency in wide-area networks where broadcast communication media are unavailable. >

...read moreread less

Journal Article•DOI•

Modelling speedup (n) greater than n

[...]

David P. Helmbold¹, Charlie McDowell¹•Institutions (1)

University of California, Santa Cruz¹

01 Apr 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A simple model of parallel computation which is capable of explaining speedups greater than n on n processors is presented and several of the contradictory previous results relating to parallel speedup are resolved by using the model.

...read moreread less

Abstract: A simple model of parallel computation which is capable of explaining speedups greater than n on n processors is presented. Necessary and sufficient conditions for these exceptional speedups are derived from the model. Several of the contradictory previous results relating to parallel speedup are resolved by using the model. >

...read moreread less

Journal Article•DOI•

Designing efficient parallel algorithms on mech-connected computers with multiple broadcasting

[...]

Y.-C. Chen¹, W.-T. Chen¹, Gen-Huey Chen², Jang-Ping Sheu³•Institutions (3)

National Tsing Hua University¹, National Taiwan University², National Central University³

01 Apr 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: It is found that square machines are not the best form for semigroup computations, and an O-time algorithm is derived on an N/sup 5/8/*N/sup 3/8/ rectangular 2-MCCMB.

...read moreread less

Abstract: Semigroup and prefix computations on two-dimensional mesh-connected computers with multiple broadcasting (2-MCCMBs) are studied. Previously, only square 2-MCCMBs with N processing elements were considered for semigroup computations of N data items, and O(N/sup 1/6/) time was required. It is found that square machines are not the best form for semigroup computations, and an O(N/sup 1/8/)-time algorithm is derived on an N/sup 5/8/*N/sup 3/8/ rectangular 2-MCCMB. This time complexity can be further reduced to O(N/sup 1/9/) if fewer processing elements are used. Parallel algorithms for prefix computations with the same time complexities are derived. >

...read moreread less

Journal Article•DOI•

Exploiting lookahead in parallel simulation

[...]

Yi-Bing Lin¹, Edward D. Lazowska¹•Institutions (1)

University of Washington¹

01 Oct 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The analyses show that using implicit lookahead can significantly improve the lookahead ratios of RR and PP system simulations, which is correlated with other performance measures of more direct interest, such as speedup.

...read moreread less

Abstract: Lookahead is the ability of a process to predict its future behavior. The feasibility of implicit lookahead for non-FCFS stochastic queuing systems is demonstrated. Several lookahead exploiting techniques are proposed for round-robin (RR) system simulations. An algorithm that generates lookahead in O(1) time is described. Analytical models and experiments are constructed to evaluate these techniques. A lookahead technique for preemptive priority (PP) systems is evaluated using an analytical model. The performance metric for these techniques is the lookahead ratio, which is correlated with other performance measures of more direct interest, such as speedup. The analyses show that using implicit lookahead can significantly improve the lookahead ratios of RR and PP system simulations. >

...read moreread less

Journal Article•DOI•

Prefetching in file systems for MIMD multiprocessors

[...]

David Kotz¹, Carla Schlatter Ellis¹•Institutions (1)

Duke University¹

01 Apr 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: In this paper, prefetching blocks on the file into the block cache can effectively reduce overall execution time of a parallel computation, even under favorable assumptions, is considered, and experiments have been conducted with an interleaved file system testbed on the Butterfly Plus multiprocessor.

...read moreread less

Abstract: The question of whether prefetching blocks on the file into the block cache can effectively reduce overall execution time of a parallel computation, even under favorable assumptions, is considered. Experiments have been conducted with an interleaved file system testbed on the Butterfly Plus multiprocessor. Results of these experiments suggest that (1) the hit ratio, the accepted measure in traditional caching studies, may not be an adequate measure of performance when the workload consists of parallel computations and parallel file access patterns, (2) caching with prefetching can significantly improve the hit ratio and the average time to perform an I/O (input/output) operation, and (3) an improvement in overall execution time has been observed in most cases. In spite of these gains, prefetching sometimes results in increased execution times (a negative result, given the optimistic nature of the study). The authors explore why it is not trivial to translate savings on individual I/O requests into consistently better overall performance and identify the key problems that need to be addressed in order to improve the potential of prefetching techniques in the environment. >

...read moreread less

Journal Article•DOI•

Implementing location independent invocation

[...]

Andrew P. Black, Y. Artsy

01 Jan 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The authors have designed and developed a location-independent-invocation (LII) mechanism that combines finding with invocation, using temporal location information, and show how LII can be achieved in a large and dynamic environment in which objects are supported by neither are operating system nor the programming language.

...read moreread less

Abstract: The problems of finding objects in large and wide-area networks where objects may change their location in volatile memory as well as on stable storage are presented. The authors discuss possible solutions and describe those adopted in the Hermes system (a corporate wide, real life office application). They have designed and developed a location-independent-invocation (LII) mechanism that combines finding with invocation, using temporal location information. The mechanism also updates the system's knowledge of an object's location as a side-effect of invocation and object migration. Assumptions about object mobility indicate that objects are likely to be found within a few propagations of an invocation. If they cannot be found in this way, stable-storage and name services are used to locate the object. The major contribution of this work is to show how LII can be achieved in a large and dynamic environment in which objects are supported by neither are operating system nor the programming language. >

...read moreread less

Journal Article•DOI•

The use of feedback in multiprocessors and its application to tree saturation control

[...]

Steven L. Scott¹, Gurindar S. Sohi•Institutions (1)

University of Wisconsin-Madison¹

01 Oct 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Using feedback control schemes in multiprocessor systems is proposed in this article to control tree saturation, reducing degradation to memory request that are not to the hot spot, thereby increasing overall system performance.

...read moreread less

Abstract: Using feedback control schemes in multiprocessor systems is proposed. In a multiprocessor, individual processors do not have complete control over, nor information about, the overall state of the system. The potential exists, then, for the processors to unknowingly interact in such a way as to degrade the performance of the system. An example of this is the problem of tree saturation caused by hot-spot accesses in multiprocessors using multistage interconnection networks. Tree saturation degrades the performance of all processors in the system, including those not participating in the hot spot activity. Feedback schemes can be used to control tree saturation, reducing degradation to memory request that are not to the hot spot, thereby increasing overall system performance. As a companion to feedback schemes, damping schemes are also considered. Simulation studies show that feedback schemes can improve overall system performance significantly and with relatively little hardware cost in many cases. Damping schemes in conjunction with feedback are shown to further improve. >

...read moreread less

Journal Article•DOI•

Design and implementation of a Petri net based toolkit for Ada tasking analysis

[...]

Sol M. Shatz¹, K. Mai¹, C. Black¹, Shengru Tu²•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of Illinois at Chicago²

01 Oct 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The use of Petri nets for defining a general static analysis framework for Ada tasking is advocated and the design and implementation of tools that make up the tasking-oriented toolkit for the Ada language (TOTAL) are defined and discussed.

...read moreread less

Abstract: The use of Petri nets for defining a general static analysis framework for Ada tasking is advocated. The framework has evolved into a collection of tools that have proven to be a very valuable platform for experimental research. The design and implementation of tools that make up the tasking-oriented toolkit for the Ada language (TOTAL) are defined and discussed. Modeling and query/analysis methods and tools are discussed. Example Ada tasking programs are used to demonstrate the utility of each tool individually as well as the way the tools integrate. TOTAL is divided into two major subsystems, the front-end translator subsystem (FETS) and the back-end information display subsystem (BIDS). Three component tools that make up FETS are defined. Examples demonstrate the way these tools integrate in order to perform the translation of Ada source to Petri-net format. The BIDS subsystem and, in particular, the use of tools and techniques to support user-directed, but transparent, searches of Ada-net reachability graphs are discussed. >

...read moreread less

Journal Article•DOI•

Experimental application-driven architecture analysis of an SIMD/MIMD parallel processing system

[...]

Edward C. Bronson¹, Thomas L. Casavant, Leah H. Jamieson•Institutions (1)

Purdue University¹

01 Apr 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: In this article, an experimental analysis of the architecture of an SIMD/MIMD parallel processing system is presented, where detailed implementations of parallel fast Fourier transform (FFT) programs are used to examine the performance of the prototype of the PASM (Partitionable SIMD and MIMD) parallel processing systems.

...read moreread less

Abstract: An experimental analysis of the architecture of an SIMD/MIMD parallel processing system is presented. Detailed implementations of parallel fast Fourier transform (FFT) programs were used to examine the performance of the prototype of the PASM (Partitionable SIMD/MIMD) parallel processing system. Detailed execution-time measurements using specialized timing hardware were made for the complete FFT and for components of SIMD, MIMD, and barrier-synchronized MIMD implementations. The component measurements isolated the effects of floating-point arithmetic operations, interconnection network transfer operations, and program control overhead. The measurements allow an accurate extrapolation of the execution time, speedup, and efficiency of the MIMD, SIMD, and barrier-synchronized MIMD programs to a full 1024-processor PASM system. This constitutes one of the first results of this kind, in which controlled experiments on fixed hardware were used to make comparisons of these fundamental modes of computing. Overall, the experimental results demonstrate the value of mixed-mode SIMD/MIMD computing and its suitability for computational intensive algorithms such as the FET. >

...read moreread less

Journal Article•DOI•

Pipelined data parallel algorithms-I: concept and modeling

[...]

Chung-Ta King¹, W.-H. Chou², Lionel M. Ni²•Institutions (2)

New Jersey Institute of Technology¹, Michigan State University²

01 Oct 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: An analytic model is presented for modeling pipelined data-parallel computation on multicomputers that uses timed Petri nets to describe data pipelining operations and predicts results match closely with the measured performance on a 64-node NCUBE hypercube multicomputer.

...read moreread less

Abstract: The basic concept of pipelined data-parallel algorithms is introduced by contrasting the algorithms with other styles of computation and by a simple example (a pipeline image distance transformation algorithm). Pipelined data-parallel algorithms are a class of algorithms which use pipelined operations and data level partitioning to achieve parallelism. Applications which involve data parallelism and recurrence relations are good candidates for this kind of algorithm. The computations are ideal for distributed-memory multicomputers. By controlling the granularity through data partitioning and overlapping the operations through pipelining, it is possible to achieve a balanced computation on multicomputers. An analytic model is presented for modeling pipelined data-parallel computation on multicomputers. The model uses timed Petri nets to describe data pipelining operations. As a case study, the model is applied to a pipelined matrix multiplication algorithm. Predicted results match closely with the measured performance on a 64-node NCUBE hypercube multicomputer. >

...read moreread less

Journal Article•DOI•

Analysis of fork-join program response times on multiprocessors

[...]

Don Towsley, C. G. Rommel, John A. Stankovic

01 Jul 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The performance of processor sharing and first come first serve is studied with two classes of jobs, and for when a specific number of processors is statically assigned to each of the classes.

...read moreread less

Abstract: Models for two processor sharing policies called task scheduling processor sharing and job scheduling processor sharing are developed and analyzed. The first policy schedules each task independently and allows parallel execution of an individual program, whereas the second policy schedules each job as a unit, thereby not allowing parallel execution of an individual program. It is found that task scheduling performs better than job scheduling for most system parameter values. The performance of the task scheduling processor sharing is compared to a first come first serve policy. First come first serve performs better than processor sharing over a wide range of system parameters. Processor sharing performs best when the task service time variability is high. The performance of processor sharing and first come first serve is studied with two classes of jobs, and for when a specific number of processors is statically assigned to each of the classes. >

...read moreread less

Journal Article•DOI•

A resilient mutual exclusion algorithm for computer networks

[...]

S. Nishio¹, K.F. Li, E.G. Manning•Institutions (1)

Osaka University¹

01 Jul 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The authors present an extension to the work of I. Suzuki and T. Kasami, where a mutual exclusion algorithm uses a message called a token to transfer the privilege of entering a critical region among the participating sites by guaranteeing regeneration of only one token in the network.

...read moreread less

Abstract: The authors present an extension to the work of I. Suzuki and T. Kasami (see Proc. 3rd Int. Conf. Distributed Compact Syst., p.365-70 (1982)), where a mutual exclusion algorithm uses a message called a token to transfer the privilege of entering a critical region among the participating sites. The proposed algorithm checks whether the token is lost during network failure, and regenerates it if necessary. The mutual exclusion requirement is satisfied by guaranteeing regeneration of only one token in the network. Failures in a computer network are classified into three types: processor failure, communication controller failure, and communication link failure. To detect failures, a time-out mechanism based on message delay is used. The execution of the algorithm is described for each type of failure; each site follows a rather simple execution procedure. Each site is not required to observe the failure of other sites or communication links. >

...read moreread less

Journal Article•DOI•

The banyan-hypercube networks

[...]

Abdou Youssef¹, Bhagirath Narahari¹•Institutions (1)

George Washington University¹

01 Apr 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The authors introduce a family of networks that are a synthesis of banyans and hypercubes and are called the banyan-hypercubes (BH), which have better diameters and average distances than hyperCubes and thus have better communication capabilities.

...read moreread less

Abstract: The authors introduce a family of networks that are a synthesis of banyans and hypercubes and are called the banyan-hypercubes (BH) They combine the advantageous features of banyans and hypercubes and thus have better communication capabilities The networks can be viewed as consisting of interconnecting hypercubes It is shown that many hypercube features can be incorporated into BHs with regard to routing, embedding of rings and meshes, and partitioning, and that improvements over the hypercube result are made In particular, it is shown that BHs have better diameters and average distances than hypercubes, and they embed pyramids and multiple pyramids with dilation cost 1 An optimal routing algorithm for BHs and an efficient partitioning strategy are presented >

...read moreread less

Journal Article•DOI•

Pipelined data parallel algorithms-II: design

[...]

Chung-Ta King¹, W.-H. Chou¹, Lionel M. Ni¹•Institutions (1)

Michigan State University¹

01 Oct 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A methodology for designing pipelined data-parallel algorithms on multicomputers is studied and various properties of grouping are studied, and methods for generating communication-efficient grouping are given.

...read moreread less

Abstract: For pt.I see ibid., p.470-85. A methodology for designing pipelined data-parallel algorithms on multicomputers is studied. The design procedure starts with a sequential algorithm which can be expressed as a nested loop with constant loop-carried dependencies. The procedure's main focus is on partitioning the loop by grouping related iterations together. Grouping is necessary to balance the communication overhead with the available parallelism and to produce pipelined execution patterns, which result in pipelined data-parallel computations. The grouping should satisfy dependence relationships among the iterations and also allow the granularity to be controlled. Various properties of grouping are studied, and methods for generating communication-efficient grouping are given. Given a grouping and an assignment of the groups to the processors, an analytic model is combined with the grouping results to describe the behavior and to estimate the performance of the resultant parallel program. Expressions characterizing the performance are derived. >

...read moreread less

Journal Article•DOI•

A UNITY-style programming logic for shared dataspace programs

[...]

H.C. Cunningham, Gruia-Catalin Roman¹•Institutions (1)

Washington University in St. Louis¹

01 Jul 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The Swarm logic is used to verify the correctness of a program for labeling connected equal-intensity regions of a digital image to demonstrate an assertional programming logic which relies upon proof of programwide properties, e.g. global invariants and progress properties.

...read moreread less

Abstract: A proof system for a shared dataspace programming notation called Swarm (a programming logic similar in style to that of UNITY) is specified. Relevant aspects of the Swarm language and model are overviewed. To illustrate the proof system, the Swarm logic is used to verify the correctness of a program for labeling connected equal-intensity regions of a digital image. Like UNITY, the Swarm proof system uses an assertional programming logic which relies upon proof of programwide properties, e.g. global invariants and progress properties. The Swarm logic is defined in terms of the same logical relations as UNITY (unless, ensures, and leads-to), but several of the concepts are reformulated to accommodate Swarm's distinctive features. >

...read moreread less

Journal Article•DOI•

Design, analysis, and simulation of I/O architectures for hypercube multiprocessors

[...]

A.L.N. Reddy¹, Prithviraj Banerjee¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Apr 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Several issues concerning the design of an I/O system for a multiprocessor such as a hypercube, including the problem of mapping specific data structures such as matrices onto the disks, are examined.

...read moreread less

Abstract: Several issues concerning the design of an I/O (input/output) system for a multiprocessor such as a hypercube are examined. A methodology is proposed for connecting the I/O processors to such a system for efficient I/O access. The effect of I/O communication on the multiprocessor network is analyzed. Different disk organizations that can be employed within such a system are evaluated to see which organization has a better performance. It is observed that parallelism in serving an I/O request plays a dominant role in the scientific workload. The problem of mapping specific data structures such as matrices onto the disks so that the data can be accessed efficiently is considered. >

...read moreread less