scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Parallel and Distributed Systems in 1990"


Journal ArticleDOI
TL;DR: Programming assistance, automation concepts, and their application to a message-passing system program development tool called Hypertool, which performs scheduling and handles the communication primitive insertion automatically, thereby increasing productivity and eliminating synchronization errors.
Abstract: Programming assistance, automation concepts, and their application to a message-passing system program development tool called Hypertool are discussed. Hypertool performs scheduling and handles the communication primitive insertion automatically, thereby increasing productivity and eliminating synchronization errors. Two algorithms, based on the critical-path method, are presented for scheduling processes statically. Hypertool also generates the performance estimates and other program quality measures to help programmers improve their algorithms and programs. >

700 citations


Journal ArticleDOI
TL;DR: The author examines the questions of whether there are efficient algorithms for software spin-waiting given hardware support for atomic instructions, or whether more complex kinds of hardware support are needed for performance.
Abstract: The author examines the questions of whether there are efficient algorithms for software spin-waiting given hardware support for atomic instructions, or whether more complex kinds of hardware support are needed for performance. He considers the performance of a number of software spin-waiting algorithms. Arbitration for control of a lock is in many ways similar to arbitration for control of a network connecting a distributed system. He applies several of the static and dynamic arbitration methods originally developed for networks to spin locks. A novel method is proposed for explicitly queueing spinning processors in software by assigning each a unique number when it arrives at the lock. Control of the lock can then be passed to the next processor in line with minimal effect on other processors. >

683 citations


Journal ArticleDOI
TL;DR: This paper presents an efficient scheduling algorithm for dynamic scheduling in real-time systems that focuses its attention on a small subset of tasks with the shortest deadlines and is shown to be very effective when the maximum allowable scheduling overhead is fixed.
Abstract: Efficient scheduling algorithms based on heuristic functions are developed for scheduling a set of tasks on a multiprocessor system. The tasks are characterized by worst-case computation times, deadlines, and resources requirements. Starting with an empty partial schedule, each step of the search extends the current partial schedule by including one of the tasks yet to be scheduled. The heuristic functions used in the algorithm actively direct the search for a feasible schedule, i.e. they help choose the task that extends the current partial schedule. Two scheduling algorithms are evaluated by simulation. To extend the current partial schedule, one of the algorithms considers, at each step of the search, all the tasks that are yet to be scheduled as candidates. The second focuses its attention on a small subset of tasks with the shortest deadlines. The second algorithm is shown to be very effective when the maximum allowable scheduling overhead is fixed. This algorithm is hence appropriate for dynamic scheduling in real-time systems. >

349 citations


Journal ArticleDOI
TL;DR: An innovative approach is presented to the design of fault-tolerant distributed systems that avoids the several rounds of message exchange required by current protocols for consensus agreement.
Abstract: An innovative approach is presented to the design of fault-tolerant distributed systems that avoids the several rounds of message exchange required by current protocols for consensus agreement. The approach is based on broadcast communication over a local area network, such as an Ethernet or a token ring, and on two novel protocols, the Trans protocol, which provides efficient reliable broadcast communication, and the Total protocol, which with high probability promptly places a total order on messages and achieves distributed agreement even in the presence of fail-stop, omission, timing, and communication faults. Reliable distributed operations, such as locking, update, and commitment, typically require only a single broadcast message rather than the several tens of messages required by current algorithms. >

272 citations


Journal ArticleDOI
TL;DR: The authors focus on developing efficient and implementable methods for recursive data structures and present interference analysis tools and parallelization techniques for imperative programs that contain dynamically updatable trees and directed acyclic graphs.
Abstract: A study is made of the problem of estimating interference in an imperative language with dynamic data structures. The authors focus on developing efficient and implementable methods for recursive data structures. In particular, they present interference analysis tools and parallelization techniques for imperative programs that contain dynamically updatable trees and directed acyclic graphs. The analysis methods are based on a regular-expression-like representation of the relationship between accessible nodes in the data structure. They authors have implemented their analysis, and they present some concrete examples that have been processed by this system. >

241 citations


Journal ArticleDOI
TL;DR: IPS, a performance measurement system for parallel and distributed programs, is currently running on its second implementation, IPS-2, extends the basic system with new instrumentation techniques, an interactive and graphical user interface, and new automatic guidance analysis techniques.
Abstract: IPS, a performance measurement system for parallel and distributed programs, is currently running on its second implementation. IPS's model of parallel programs uses knowledge about the semantics of a program's structure to provide two important features. First, IPS provides a large amount of performance data about the execution of a parallel program, and this information is organized so that access to it is easy and intuitive. Secondly, IPS provides performance analysis techniques that help to guide the programmer automatically to the location of program bottlenecks. The first implementation of IPS was a testbed for the basic design concepts, providing experience with a hierarchical program and measurement model, interactive program analysis, and automatic guidance techniques. It was built on the Charlotte distributed operating system. The second implementation, IPS-2, extends the basic system with new instrumentation techniques, an interactive and graphical user interface, and new automatic guidance analysis techniques. This implementation runs on 4.3BSD UNIX systems, on the VAX, DECstation, Sun 4, and Sequent Symmetry multiprocessor. >

202 citations


Journal ArticleDOI
TL;DR: Using the O(1) time transitive closure algorithms, many other graph problems are solved in O( 1) time, including recognizing bipartite graphs and finding connected components, articulation points, biconnected components, bridges and minimum spanning trees in undirected graphs.
Abstract: The transitive closure problem in O(1) time is solved by a new method that is far different from the conventional solution method. On processor arrays with reconfigurable bus systems, two O(1) time algorithms are proposed for computing the transitive closure of an undirected graph. One is designed on a three-dimensional n*n*n processor array with a reconfigurable bus system, and the other is designed on a two-dimensional n/sup 2/*n/sup 2/ processor array with a reconfigurable bus system, where n is the number of vertices in the graph. Using the O(1) time transitive closure algorithms, many other graph problems are solved in O(1) time. These problems include recognizing bipartite graphs and finding connected components, articulation points, biconnected components, bridges, and minimum spanning trees in undirected graphs. >

184 citations


Journal ArticleDOI
TL;DR: Using depth-first search, the authors develop and analyze the performance of a routing scheme for hypercube multicomputers in the presence of an arbitrary number of faulty components and derive an exact expression for the probability of routing messages by way of optimal paths from the source node to an obstructed node.
Abstract: Using depth-first search, the authors develop and analyze the performance of a routing scheme for hypercube multicomputers in the presence of an arbitrary number of faulty components. They derive an exact expression for the probability of routing messages by way of optimal paths (of length equal to the Hamming distance between the corresponding pair of nodes) from the source node to an obstructed node. The obstructed node is defined as the first node encountered by the message that finds no optimal path to the destination node. It is noted that the probability of routing messages over an optimal path between any two nodes is a special case of the present results and can be obtained by replacing the obstructed node with the destination node. Numerical examples are given to illustrate the results, and they show that, in the presence of component failures, depth-first search routing can route a message to its destination by means of an optimal path with a very high probability. >

149 citations


Journal ArticleDOI
TL;DR: An empirical study of program characteristics, that are important in parallelizing compiler writers, especially in the area of data dependence analysis and program transformations, finds nonzero coefficients of loop indexes in most subscripts are found to be simple, allowing an exact real-valued test to be as accurate as an exact integer-valuedTest for one-dimensional or two-dimensional arrays.
Abstract: Some results are reported from an empirical study of program characteristics, that are important in parallelizing compiler writers, especially in the area of data dependence analysis and program transformations. The state of the art in data dependence analysis and some parallel execution techniques are examined. The major findings are included. Many subscripts contain symbolic terms with unknown values. A few methods of determining their values at compile time are evaluated. Array references with coupled subscripts appear quite frequently; these subscripts must be handled simultaneously in a dependence test, rather than being handled separately as in current test algorithms. Nonzero coefficients of loop indexes in most subscripts are found to be simple: they are either 1 or -1. This allows an exact real-valued test to be as accurate as an exact integer-valued test for one-dimensional or two-dimensional arrays. Dependencies with uncertain distance are found to be rather common, and one of the main reasons is the frequent appearance of symbolic terms with unknown values. >

135 citations


Journal ArticleDOI
TL;DR: An accurate and computationally efficient method for predicting the performance of a class of parallel computations running on concurrent systems is described and validated against both detailed simulation and actual execution on a commercial multiprocessor.
Abstract: An accurate and computationally efficient method for predicting the performance of a class of parallel computations running on concurrent systems is described. A parallel computation is modeled as a task system with precedence relationships expressed as a series-parallel directed acyclic graph. Resources in a concurrent system are modeled as service centers in a queuing network model. Using these two models as inputs, the method outputs predictions of expected execution time of the parallel computation and the concurrent system utilization. The method is validated against both detailed simulation and actual execution on a commercial multiprocessor. Using 100 test cases, the average error of the prediction when compared to simulation statistics is 1.7%, with a standard deviation of 1.5%; the maximum error is about 10%. >

131 citations


Journal ArticleDOI
TL;DR: The problem of recovering from processor transient faults in shared memory multiprocessor systems is examined and a user-transparent checkpointing and recovery scheme using private caches is presented, which prevents rollback propagation, provides rapid recovery, and can be integrated into standard cache coherence protocols.
Abstract: The problem of recovering from processor transient faults in shared memory multiprocessor systems is examined. A user-transparent checkpointing and recovery scheme using private caches is presented. Processes can recover from errors due to faulty processors by restarting from the checkpointed computation state. Implementation techniques using checkpoint identifiers and recovery stacks are examined as a means of reducing performance degradation in processor utilization during normal execution. This cache-based checkpointing technique prevents rollback propagation, provides rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions to take error latency into account are presented. >

Journal ArticleDOI
TL;DR: A novel algorithm, called the lambda test, is presented for an efficient and accurate data dependence analysis of multidimensional array references that combines the efficiency and the accuracy of both approaches.
Abstract: A novel algorithm, called the lambda test, is presented for an efficient and accurate data dependence analysis of multidimensional array references. It extends the numerical methods to allow all dimensions of array references to be tested simultaneously. Hence, it combines the efficiency and the accuracy of both approaches. This algorithm has been implemented in Parafrase, a Fortran program parallelization restructurer developed at the University of Illinois at Urbana-Champaign. Some experimental results are presented to show its effectiveness. >

Journal ArticleDOI
TL;DR: Real-time distributed systems are modeled by a times transition model (TTM) and decision procedures are provided for checking a small but important class of properties (specified in real-time temporal logic) that includes invariance, precedence, eventuality and real- time response specifications.
Abstract: Real-time distributed systems are modeled by a times transition model (TTM). For any finite-state TTM, decision procedures are provided for checking a small but important class of properties (specified in real-time temporal logic). The procedures are linear in the size of the system reachability graph. The class of properties includes invariance, precedence, eventuality and real-time response specifications. >

Journal ArticleDOI
TL;DR: A discussion is presented of two ways of mapping the cells in a two-dimensional area of a chip onto processors in an n-dimensional hypercube such that both small and large cell moves can be applied.
Abstract: A discussion is presented of two ways of mapping the cells in a two-dimensional area of a chip onto processors in an n-dimensional hypercube such that both small and large cell moves can be applied. Two types of move are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support such a parallel cost evaluation. A novel tree broadcasting strategy is presented for the hypercube that is used extensively in the algorithm for updating cell locations in the parallel environment. A dynamic parallel annealing schedule is proposed that estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control. The performance on an Intel iPSC-2/D4/MX hypercube is reported. >

Journal ArticleDOI
TL;DR: A performance analysis of this protocol is presented, showing that this protocol commits with high probability under realistic operating conditions without invoking termination protocol if N is sufficiently large.
Abstract: A general protocol for atomic broadcast in networks is presented. The protocol tolerates loss, duplication, reordering, delay of messages, and network partitioning in an arbitrary network of fail-stop sites (i.e. no Byzantine site behavior is tolerated). The protocol is based on majority-concensus decisions to commit on unique ordering of received broadcast messages. Under normal operating conditions, the protocol requires three phases to complete and approximately 4N/V messages where N is the number of sites. This overhead is distributed among the messages of which the delivery decision is made and the heavier the broadcast message traffic, the lower the overhead per broadcast message becomes. Under abnormal operating conditions, a decentralized termination protocol (also presented) is invoked. A performance analysis of this protocol is presented, showing that this protocol commits with high probability under realistic operating conditions without invoking termination protocol if N is sufficiently large. The protocol retains its efficiency in wide-area networks where broadcast communication media are unavailable. >

Journal ArticleDOI
TL;DR: A simple model of parallel computation which is capable of explaining speedups greater than n on n processors is presented and several of the contradictory previous results relating to parallel speedup are resolved by using the model.
Abstract: A simple model of parallel computation which is capable of explaining speedups greater than n on n processors is presented. Necessary and sufficient conditions for these exceptional speedups are derived from the model. Several of the contradictory previous results relating to parallel speedup are resolved by using the model. >

Journal ArticleDOI
TL;DR: It is found that square machines are not the best form for semigroup computations, and an O-time algorithm is derived on an N/sup 5/8/*N/sup 3/8/ rectangular 2-MCCMB.
Abstract: Semigroup and prefix computations on two-dimensional mesh-connected computers with multiple broadcasting (2-MCCMBs) are studied. Previously, only square 2-MCCMBs with N processing elements were considered for semigroup computations of N data items, and O(N/sup 1/6/) time was required. It is found that square machines are not the best form for semigroup computations, and an O(N/sup 1/8/)-time algorithm is derived on an N/sup 5/8/*N/sup 3/8/ rectangular 2-MCCMB. This time complexity can be further reduced to O(N/sup 1/9/) if fewer processing elements are used. Parallel algorithms for prefix computations with the same time complexities are derived. >

Journal ArticleDOI
TL;DR: The analyses show that using implicit lookahead can significantly improve the lookahead ratios of RR and PP system simulations, which is correlated with other performance measures of more direct interest, such as speedup.
Abstract: Lookahead is the ability of a process to predict its future behavior. The feasibility of implicit lookahead for non-FCFS stochastic queuing systems is demonstrated. Several lookahead exploiting techniques are proposed for round-robin (RR) system simulations. An algorithm that generates lookahead in O(1) time is described. Analytical models and experiments are constructed to evaluate these techniques. A lookahead technique for preemptive priority (PP) systems is evaluated using an analytical model. The performance metric for these techniques is the lookahead ratio, which is correlated with other performance measures of more direct interest, such as speedup. The analyses show that using implicit lookahead can significantly improve the lookahead ratios of RR and PP system simulations. >

Journal ArticleDOI
TL;DR: In this paper, prefetching blocks on the file into the block cache can effectively reduce overall execution time of a parallel computation, even under favorable assumptions, is considered, and experiments have been conducted with an interleaved file system testbed on the Butterfly Plus multiprocessor.
Abstract: The question of whether prefetching blocks on the file into the block cache can effectively reduce overall execution time of a parallel computation, even under favorable assumptions, is considered. Experiments have been conducted with an interleaved file system testbed on the Butterfly Plus multiprocessor. Results of these experiments suggest that (1) the hit ratio, the accepted measure in traditional caching studies, may not be an adequate measure of performance when the workload consists of parallel computations and parallel file access patterns, (2) caching with prefetching can significantly improve the hit ratio and the average time to perform an I/O (input/output) operation, and (3) an improvement in overall execution time has been observed in most cases. In spite of these gains, prefetching sometimes results in increased execution times (a negative result, given the optimistic nature of the study). The authors explore why it is not trivial to translate savings on individual I/O requests into consistently better overall performance and identify the key problems that need to be addressed in order to improve the potential of prefetching techniques in the environment. >

Journal ArticleDOI
TL;DR: The authors have designed and developed a location-independent-invocation (LII) mechanism that combines finding with invocation, using temporal location information, and show how LII can be achieved in a large and dynamic environment in which objects are supported by neither are operating system nor the programming language.
Abstract: The problems of finding objects in large and wide-area networks where objects may change their location in volatile memory as well as on stable storage are presented. The authors discuss possible solutions and describe those adopted in the Hermes system (a corporate wide, real life office application). They have designed and developed a location-independent-invocation (LII) mechanism that combines finding with invocation, using temporal location information. The mechanism also updates the system's knowledge of an object's location as a side-effect of invocation and object migration. Assumptions about object mobility indicate that objects are likely to be found within a few propagations of an invocation. If they cannot be found in this way, stable-storage and name services are used to locate the object. The major contribution of this work is to show how LII can be achieved in a large and dynamic environment in which objects are supported by neither are operating system nor the programming language. >

Journal ArticleDOI
TL;DR: Using feedback control schemes in multiprocessor systems is proposed in this article to control tree saturation, reducing degradation to memory request that are not to the hot spot, thereby increasing overall system performance.
Abstract: Using feedback control schemes in multiprocessor systems is proposed. In a multiprocessor, individual processors do not have complete control over, nor information about, the overall state of the system. The potential exists, then, for the processors to unknowingly interact in such a way as to degrade the performance of the system. An example of this is the problem of tree saturation caused by hot-spot accesses in multiprocessors using multistage interconnection networks. Tree saturation degrades the performance of all processors in the system, including those not participating in the hot spot activity. Feedback schemes can be used to control tree saturation, reducing degradation to memory request that are not to the hot spot, thereby increasing overall system performance. As a companion to feedback schemes, damping schemes are also considered. Simulation studies show that feedback schemes can improve overall system performance significantly and with relatively little hardware cost in many cases. Damping schemes in conjunction with feedback are shown to further improve. >

Journal ArticleDOI
TL;DR: The use of Petri nets for defining a general static analysis framework for Ada tasking is advocated and the design and implementation of tools that make up the tasking-oriented toolkit for the Ada language (TOTAL) are defined and discussed.
Abstract: The use of Petri nets for defining a general static analysis framework for Ada tasking is advocated. The framework has evolved into a collection of tools that have proven to be a very valuable platform for experimental research. The design and implementation of tools that make up the tasking-oriented toolkit for the Ada language (TOTAL) are defined and discussed. Modeling and query/analysis methods and tools are discussed. Example Ada tasking programs are used to demonstrate the utility of each tool individually as well as the way the tools integrate. TOTAL is divided into two major subsystems, the front-end translator subsystem (FETS) and the back-end information display subsystem (BIDS). Three component tools that make up FETS are defined. Examples demonstrate the way these tools integrate in order to perform the translation of Ada source to Petri-net format. The BIDS subsystem and, in particular, the use of tools and techniques to support user-directed, but transparent, searches of Ada-net reachability graphs are discussed. >

Journal ArticleDOI
TL;DR: In this article, an experimental analysis of the architecture of an SIMD/MIMD parallel processing system is presented, where detailed implementations of parallel fast Fourier transform (FFT) programs are used to examine the performance of the prototype of the PASM (Partitionable SIMD and MIMD) parallel processing systems.
Abstract: An experimental analysis of the architecture of an SIMD/MIMD parallel processing system is presented. Detailed implementations of parallel fast Fourier transform (FFT) programs were used to examine the performance of the prototype of the PASM (Partitionable SIMD/MIMD) parallel processing system. Detailed execution-time measurements using specialized timing hardware were made for the complete FFT and for components of SIMD, MIMD, and barrier-synchronized MIMD implementations. The component measurements isolated the effects of floating-point arithmetic operations, interconnection network transfer operations, and program control overhead. The measurements allow an accurate extrapolation of the execution time, speedup, and efficiency of the MIMD, SIMD, and barrier-synchronized MIMD programs to a full 1024-processor PASM system. This constitutes one of the first results of this kind, in which controlled experiments on fixed hardware were used to make comparisons of these fundamental modes of computing. Overall, the experimental results demonstrate the value of mixed-mode SIMD/MIMD computing and its suitability for computational intensive algorithms such as the FET. >

Journal ArticleDOI
TL;DR: An analytic model is presented for modeling pipelined data-parallel computation on multicomputers that uses timed Petri nets to describe data pipelining operations and predicts results match closely with the measured performance on a 64-node NCUBE hypercube multicomputer.
Abstract: The basic concept of pipelined data-parallel algorithms is introduced by contrasting the algorithms with other styles of computation and by a simple example (a pipeline image distance transformation algorithm). Pipelined data-parallel algorithms are a class of algorithms which use pipelined operations and data level partitioning to achieve parallelism. Applications which involve data parallelism and recurrence relations are good candidates for this kind of algorithm. The computations are ideal for distributed-memory multicomputers. By controlling the granularity through data partitioning and overlapping the operations through pipelining, it is possible to achieve a balanced computation on multicomputers. An analytic model is presented for modeling pipelined data-parallel computation on multicomputers. The model uses timed Petri nets to describe data pipelining operations. As a case study, the model is applied to a pipelined matrix multiplication algorithm. Predicted results match closely with the measured performance on a 64-node NCUBE hypercube multicomputer. >

Journal ArticleDOI
TL;DR: The performance of processor sharing and first come first serve is studied with two classes of jobs, and for when a specific number of processors is statically assigned to each of the classes.
Abstract: Models for two processor sharing policies called task scheduling processor sharing and job scheduling processor sharing are developed and analyzed. The first policy schedules each task independently and allows parallel execution of an individual program, whereas the second policy schedules each job as a unit, thereby not allowing parallel execution of an individual program. It is found that task scheduling performs better than job scheduling for most system parameter values. The performance of the task scheduling processor sharing is compared to a first come first serve policy. First come first serve performs better than processor sharing over a wide range of system parameters. Processor sharing performs best when the task service time variability is high. The performance of processor sharing and first come first serve is studied with two classes of jobs, and for when a specific number of processors is statically assigned to each of the classes. >

Journal ArticleDOI
TL;DR: The authors present an extension to the work of I. Suzuki and T. Kasami, where a mutual exclusion algorithm uses a message called a token to transfer the privilege of entering a critical region among the participating sites by guaranteeing regeneration of only one token in the network.
Abstract: The authors present an extension to the work of I. Suzuki and T. Kasami (see Proc. 3rd Int. Conf. Distributed Compact Syst., p.365-70 (1982)), where a mutual exclusion algorithm uses a message called a token to transfer the privilege of entering a critical region among the participating sites. The proposed algorithm checks whether the token is lost during network failure, and regenerates it if necessary. The mutual exclusion requirement is satisfied by guaranteeing regeneration of only one token in the network. Failures in a computer network are classified into three types: processor failure, communication controller failure, and communication link failure. To detect failures, a time-out mechanism based on message delay is used. The execution of the algorithm is described for each type of failure; each site follows a rather simple execution procedure. Each site is not required to observe the failure of other sites or communication links. >

Journal ArticleDOI
TL;DR: The authors introduce a family of networks that are a synthesis of banyans and hypercubes and are called the banyan-hypercubes (BH), which have better diameters and average distances than hyperCubes and thus have better communication capabilities.
Abstract: The authors introduce a family of networks that are a synthesis of banyans and hypercubes and are called the banyan-hypercubes (BH) They combine the advantageous features of banyans and hypercubes and thus have better communication capabilities The networks can be viewed as consisting of interconnecting hypercubes It is shown that many hypercube features can be incorporated into BHs with regard to routing, embedding of rings and meshes, and partitioning, and that improvements over the hypercube result are made In particular, it is shown that BHs have better diameters and average distances than hypercubes, and they embed pyramids and multiple pyramids with dilation cost 1 An optimal routing algorithm for BHs and an efficient partitioning strategy are presented >

Journal ArticleDOI
TL;DR: A methodology for designing pipelined data-parallel algorithms on multicomputers is studied and various properties of grouping are studied, and methods for generating communication-efficient grouping are given.
Abstract: For pt.I see ibid., p.470-85. A methodology for designing pipelined data-parallel algorithms on multicomputers is studied. The design procedure starts with a sequential algorithm which can be expressed as a nested loop with constant loop-carried dependencies. The procedure's main focus is on partitioning the loop by grouping related iterations together. Grouping is necessary to balance the communication overhead with the available parallelism and to produce pipelined execution patterns, which result in pipelined data-parallel computations. The grouping should satisfy dependence relationships among the iterations and also allow the granularity to be controlled. Various properties of grouping are studied, and methods for generating communication-efficient grouping are given. Given a grouping and an assignment of the groups to the processors, an analytic model is combined with the grouping results to describe the behavior and to estimate the performance of the resultant parallel program. Expressions characterizing the performance are derived. >

Journal ArticleDOI
TL;DR: The Swarm logic is used to verify the correctness of a program for labeling connected equal-intensity regions of a digital image to demonstrate an assertional programming logic which relies upon proof of programwide properties, e.g. global invariants and progress properties.
Abstract: A proof system for a shared dataspace programming notation called Swarm (a programming logic similar in style to that of UNITY) is specified. Relevant aspects of the Swarm language and model are overviewed. To illustrate the proof system, the Swarm logic is used to verify the correctness of a program for labeling connected equal-intensity regions of a digital image. Like UNITY, the Swarm proof system uses an assertional programming logic which relies upon proof of programwide properties, e.g. global invariants and progress properties. The Swarm logic is defined in terms of the same logical relations as UNITY (unless, ensures, and leads-to), but several of the concepts are reformulated to accommodate Swarm's distinctive features. >

Journal ArticleDOI
TL;DR: Several issues concerning the design of an I/O system for a multiprocessor such as a hypercube, including the problem of mapping specific data structures such as matrices onto the disks, are examined.
Abstract: Several issues concerning the design of an I/O (input/output) system for a multiprocessor such as a hypercube are examined. A methodology is proposed for connecting the I/O processors to such a system for efficient I/O access. The effect of I/O communication on the multiprocessor network is analyzed. Different disk organizations that can be employed within such a system are evaluated to see which organization has a better performance. It is observed that parallelism in serving an I/O request plays a dominant role in the scientific workload. The problem of mapping specific data structures such as matrices onto the disks so that the data can be accessed efficiently is considered. >