scispace - formally typeset
Search or ask a question

Showing papers on "Distributed memory published in 1990"


Proceedings ArticleDOI
01 May 1990
TL;DR: A new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models is introduced and is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization.
Abstract: Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the fast processors. Unless carefully controlled, such architectural optimizations can cause memory accesses to be executed in an order different from what the programmer expects. The set of allowable memory access orderings forms the memory consistency model or event ordering model for an architecture.This paper introduces a new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models. A framework for classifying shared accesses and reasoning about event ordering is developed. The release consistency model is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization. Possible performance gains from the less strict constraints of the release consistency model are explored. Finally, practical implementation issues are discussed, concentrating on issues relevant to scalable architectures.

1,169 citations


Proceedings ArticleDOI
01 May 1990
TL;DR: This work re-define weak ordering as a contract between software and hardware, where software agrees to some formally specified constraints, and hardware agrees to appear sequentially consistent to at least the software that obeys those constraints.
Abstract: A memory model for a shared memory, multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency. This model guarantees that all memory accesses will appear to execute atomically and in program order. An alternative model, weak ordering, offers greater performance potential. Weak ordering was first defined by Dubois, Scheurich and Briggs in terms of a set of rules for hardware that have to be made visible to software.The central hypothesis of this work is that programmers prefer to reason about sequentially consistent memory, rather than having to think about weaker memory, or even write buffers. Following this hypothesis, we re-define weak ordering as a contract between software and hardware. By this contract, software agrees to some formally specified constraints, and hardware agrees to appear sequentially consistent to at least the software that obeys those constraints. We illustrate the power of the new definition with a set of software constraints that forbid data races and an implementation for cache-coherent systems that is not allowed by the old definition.

473 citations


Proceedings ArticleDOI
01 Feb 1990
TL;DR: This paper focuses on the design and use of Munin's memory coherence mechanisms, and compares the approach to previous work in this area.
Abstract: We are developing Munin, a system that allows programs written for shared memory multiprocessors to be executed efficiently on distributed memory machines. Munin attempts to overcome the architectural limitations of shared memory machines, while maintaining their advantages in terms of ease of programming. Our system is unique in its use of loosely coherent memory, based on the partial order specified by a shared memory parallel program, and in its use of type-specific memory coherence. Instead of a single memory coherence mechanism for all shared data objects, Munin employs several different mechanisms, each appropriate for a different class of shared data object. These type-specific mechanisms are part of a runtime system that accepts hints from the user or the compiler to determine the coherence mechanism to be used for each object. This paper focuses on the design and use of Munin's memory coherence mechanisms, and compares our approach to previous work in this area.

455 citations


Proceedings ArticleDOI
01 Aug 1990
TL;DR: A general formulation of atonuc wzap~hot rnenzory, a shared memory partitioned into words written (apduted) by individual processes, or instantaneously read (scanned) in its entirety is introduced.
Abstract: This paper introduces a general formulation of atonuc wzap~hot rnenzory, a shared memory partitioned into words written (apduted) by individual processes, or instantaneously read (scanned) in its entirety. Thk paw’ Presents three wait-free implementations of atomic snapshot A preliminary version of this paper appeared in Proceedings of the 9th Annaa[ ACM SVmpmnwn on Plznctptes of’ Distributed Compafing (Quebec city. Quebec, A%). ACM New York, 199Q pp. 1-14. H. Attiya’s and N. Shavit’s research was partially supported by National Science Foundation grant CCR-86-1 1442, by Office of Naval Research contract NW014-S5-K-0168, and by DARPA cmltracts NOO014-83-K-0125 and NOO014-89-J1988. E. Gafni’s research was partially supported by National Science Foundation Grant DCR 84-51396 and XEROX Co. grant W8S1111. Part of this work was done while N. Shavit was at Hebrew University, Jerusalem, visiting AT&T Bell Laboratories and the Theory of Distributed Systems Group at Massachusetts Institute of Technology, and while H. Attiya was at the LaboratoV for Computer Science at Massachusetts Institute of Technology. Authors’ present addresses: Y. Afek, Computer Science Department. Tel-Aviv University, Ramat-Aviv, Israel 69978; H. Attiya, Department of Computer Science, Technion, Haifa, Israel 3~000:” D Dolev, Department of computer Science, Hebrew University, Jerusalem, Israel 91904: E. Gafni, 3732 Boelter Hall, Computer Science Department, U. C. L.A., Los Angeles. Cahfornia 90024. M. Merritt, 600 Mountain Ave., Murray Hill. NJ 07974; N. Shavit, Laborato~ for Computer Scienee, MIT NE43, 367 Technology Square, Cambridge MA 02139. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice N gwen that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. (!2 1993 ACM 0004-541 1/93/0900-0873 $01.50 Joumd of ihe Amocl.]tmn for Computmg Mdchmerv, Vd 40. No 4. Scptemhcr 1993. pp 873-89[1

358 citations


Journal ArticleDOI
TL;DR: It is shown that the correct choice of algorithm is determined largely by the memory access behavior of the applications, and some limitations of distributed shared memory are noted.
Abstract: Four basic algorithms for implementing distributed shared memory are compared. Conceptually, these algorithms extend local virtual address spaces to span multiple hosts connected by a local area network, and some of them can easily be integrated with the hosts' virtual memory systems. The merits of distributed shared memory and the assumptions made with respect to the environment in which the shared memory algorithms are executed are described. The algorithms are then described, and a comparative analysis of their performance in relation to application-level access behavior is presented. It is shown that the correct choice of algorithm is determined largely by the memory access behavior of the applications. Two particularly interesting extensions of the basic algorithms are described, and some limitations of distributed shared memory are noted. >

250 citations


Proceedings ArticleDOI
Ragunathan Rajkumar1
01 Jan 1990
TL;DR: A priority-based synchronization protocol that explicitly uses shared-memory primitives is defined and analyzed, and the underlying priority consideration for a shared memory synchronization protocol are studied and priority assignments to be used by the protocol are derived.
Abstract: A priority-based synchronization protocol that explicitly uses shared-memory primitives is defined and analyzed. A solution that has been proposed for bounding and minimizing synchronization delays in real-time systems is briefly reviewed. The waiting times introduced by synchronization requirements in multiple-processor environments are identified, and a set of goals for priority-based multiprocessor synchronization protocols is derived. The underlying priority consideration for a shared memory synchronization protocol are studied and priority assignments to be used by the protocol are derived. >

247 citations


Journal ArticleDOI
TL;DR: This work examines the effectiveness of optimizations aimed to allowing distributed machine to efficiently compute inner loops over globally defined data structures by targeting loops in which some array references are made through a level of indirection.

194 citations


Proceedings ArticleDOI
01 Jan 1990
TL;DR: Slow memory is presented as a memory that allows the effects of writes to propagate slowly through the system, eliminating the need for costly consistency maintenance protocols that limit concurrency.
Abstract: The use of weakly consistent memories in distributed shared memory systems to combat unacceptable network delay and to allow such systems to scale is proposed. Proposed memory correctness conditions are surveyed, and how they are related by a weakness hierarchy is demonstrated. Multiversion and messaging interpretations of memory are introduced as means of systematically exploring the space of possible memories. Slow memory is presented as a memory that allows the effects of writes to propagate slowly through the system, eliminating the need for costly consistency maintenance protocols that limit concurrency. Slow memory processes a valuable locality property and supports a reduction from traditional atomic memory. Thus slow memory is as expressive as atomic memory. This expressiveness is demonstrated by two exclusion algorithms and a solution to M.J. Fischer and A. Michael's (1982) dictionary problem on slow memory. >

183 citations


Proceedings ArticleDOI
01 Feb 1990
TL;DR: A new programming environment for distributed memory architectures is presented, providing a global name space and allowing direct access to remote parts of data values and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes is presented.
Abstract: Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece “owned” by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. This paper presents a new programming environment for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. We describe the analysis and program transformations required to implement this environment, and present the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes.

174 citations


Proceedings ArticleDOI
01 Apr 1990
TL;DR: It is shown that for a wide range of plausible overhead values, dynamic scheduling is superior to static scheduling, and within the class of static schedulers, a simple “run to completion” scheme is preferable to a round-robin approach.
Abstract: Existing work indicates that the commonly used “single queue of runnable tasks” approach to scheduling shared memory multiprocessors can perform very poorly in a multiprogrammed parallel processing environment. A more promising approach is the class of “two-level schedulers” in which the operating system deals solely with allocating processors to jobs while the individual jobs themselves perform task dispatching on those processors.In this paper we compare two basic varieties of two-level schedulers. Those of the first type, static, make a single decision per job regarding the number of processors to allocate to it. Once the job has received its allocation, it is guaranteed to have exactly that number of processors available to it whenever it is active. The other class of two-level scheduler, dynamic, allows each job to acquire and release processors during its execution. By responding to the varying parallelism of the jobs, the dynamic scheduler promises higher processor utilizations at the cost of potentially greater scheduling overhead and more complicated application level task control policies.Our results, obtained via simulation, highlight the tradeoffs between the static and dynamic approaches. We investigate how the choice of policy is affected by the cost of switching a processor from one job to another. We show that for a wide range of plausible overhead values, dynamic scheduling is superior to static scheduling. Within the class of static schedulers, we show that, in most cases, a simple “run to completion” scheme is preferable to a round-robin approach. Finally, we investigate different techniques for tuning the allocation decisions required by the dynamic policies and quantify their effects on performance.We believe our results are directly applicable to many existing shared memory parallel computers, which for the most part currently employ a simple “single queue of tasks” extension of basic sequential machine schedulers. We plan to validate our results in future work through implementation and experimentation on such a system.

169 citations


Patent
24 Oct 1990
TL;DR: In this paper, a method and system for independently resetting primary and secondary processors 20 and 120 respectively under program control in a multiprocessor, cache memory system is presented.
Abstract: A method and system for independently resetting primary and secondary processors 20 and 120 respectively under program control in a multiprocessor, cache memory system. Processors 20 and 120 are reset without causing cache memory controllers 24 and 124 to reset.

Journal ArticleDOI
TL;DR: An optical volume memory based on the two-photon effect which allows for high density and parallel access and has the advantages of having high capacity and throughput which may overcome the disadvantages of current memories.
Abstract: The advent of optoelectronic computers and highly parallel electronic processors has brought about a need for storage systems with enormous memory capacity and memory bandwidth. These demands cannot be met with current memory technologies (i.e., semiconductor, magnetic, or optical disk) without having the memory system completely dominate the processors in terms of the overall cost, power consumption, volume, and weight. As a solution, we propose an optical volume memory based on the two-photon effect which allows for high density and parallel access. In addition, the two-photon 3-D memory system has the advantages of having high capacity and throughput which may overcome the disadvantages of current memories.

Journal ArticleDOI
01 May 1990
TL;DR: Simulations show that in terms of access time and network traffic both directory methods provide significant performance improvements over a memory system in which shared-writeable data is not cached.
Abstract: This paper presents an empirical evaluation of two memory-efficient directory methods for maintaining coherent caches in large shared memory multiprocessors. Both directory methods are modifications of a scheme proposed by Censier and Feautrier [5] that does not rely on a specific interconnection network and can be readily distributed across interleaved main memory. The schemes considered here overcome the large amount of memory required for tags in the original scheme in two different ways. In the first scheme each main memory block is sectored into sub-blocks for which the large tag overhead is shared. In the second scheme a limited number of large tags are stored in an associative cache and shared among a much larger number of main memory blocks. Simulations show that in terms of access time and network traffic both directory methods provide significant performance improvements over a memory system in which shared-writeable data is not cached. The large block sizes required for the sectored scheme, however, promotes sufficient false sharing that its performance is markedly worse than using a tag cache.

Proceedings ArticleDOI
01 Jun 1990
TL;DR: An algorithm for finding the earliest point in a program that a block of data can be prefetched, based on the control and data dependencies in the program, is presented, an integral part of more general memory management algorithms.
Abstract: Memory hierarchies are used by multiprocessor systems to reduce large memory access times. It is necessary to automatically manage such a hierarchy, to obtain effective memory utilization. In this paper, we discuss the various issues involved in obtaining an optimal memory management strategy for a memory hierarchy. We present an algorithm for finding the earliest point in a program that a block of data can be prefetched. This determination is based on the control and data dependencies in the program. Such a method is an integral part of more general memory management algorithms. We demonstrate our method's potential by using static analysis to estimate the performance improvement afforded by our prefetching strategy and to analyze the reference patterns in a set of Fortran benchmarks. We also study the effectiveness of prefetching in a realistic shared-memory system using an RTL-level simulator and real codes. This differs from previous studies by considering prefetching benefits in the presence of network contention.

Journal ArticleDOI
TL;DR: A user-transparent checkpointing recovery scheme and a new twin-page disk storage management technique are presented for implementing recoverable distributed shared virtual memory.
Abstract: The problem of rollback recovery in distributed shared virtual environments, in which the shared memory is implemented in software in a loosely coupled distributed multicomputer system, is examined. A user-transparent checkpointing recovery scheme and a new twin-page disk storage management technique are presented for implementing recoverable distributed shared virtual memory. The checkpointing scheme can be integrated with the memory coherence protocol for managing the shared virtual memory. The twin-page disk design allows checkpointing to proceed in an incremental fashion without an explicit undo at the time of recovery. The recoverable distributed shared virtual memory allows the system to restart computation from a checkpoint without a global restart. >

Journal ArticleDOI
TL;DR: A discussion is presented of two ways of mapping the cells in a two-dimensional area of a chip onto processors in an n-dimensional hypercube such that both small and large cell moves can be applied.
Abstract: A discussion is presented of two ways of mapping the cells in a two-dimensional area of a chip onto processors in an n-dimensional hypercube such that both small and large cell moves can be applied. Two types of move are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support such a parallel cost evaluation. A novel tree broadcasting strategy is presented for the hypercube that is used extensively in the algorithm for updating cell locations in the parallel environment. A dynamic parallel annealing schedule is proposed that estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control. The performance on an Intel iPSC-2/D4/MX hypercube is reported. >

Patent
Alaiwan Haissam1
27 Feb 1990
TL;DR: In this paper, the authors propose a message passing mechanism for a plurality of processors interconnected by a shared intelligent memory for secure passing of messages between tasks operated on said processors, where each processor includes serving means for getting the messages to the task operated by each processor.
Abstract: In the environment of a plurality of processors interconnected by a shared intelligent memory, a mechanism for the secure passing of messages between tasks operated on said processors is provided. Inter-task message passing is provided by shared intelligent memory for storing the messages transmitted by sending tasks. Further, each processor includes serving means for getting the messages to be sent to the task operated by said each processor. The passing of messages from a processor to the shared intelligent memory and from the latter to another processor is made, using a set of high-level microcoded commands. A process is provided using the message passing mechanism together with redundancies built into the shared memory, to ensure fault-tolerant message passing in which the tasks operated primarily on a processor are automatically replaced by back-up tasks executed on another processor if the first processor fails.

Journal ArticleDOI
01 May 1990
TL;DR: The architecture has been simulated in detail and the paper presents some of the key measurements that have been used to substantiate the architectural decisions.
Abstract: PLUS is a multiprocessor architecture tailored to the fast execution of a single multithreaded process; its goal is to accelerate the execution of CPU-bound applications. PLUS supports shared memory and efficient synchronization. Memory access latency is reduced by non-demand replication of pages with hardware-supported coherence between replicated pages. The architecture has been simulated in detail and the paper presents some of the key measurements that have been used to substantiate our architectural decisions. The current implementation of PLUS is also described.

Patent
08 Feb 1990
TL;DR: In this paper, the active and backup processors are coupled asynchronously with some hardware assist functions comprising a memory change detector which captures memory changes in the memory of the active processor and a mirroring control circuit which causes the memory changes when committed by establish recovery point signals generated by the active processors.
Abstract: A checkpointing mechanism implemented in a data processing system comprising a dual processor configuration gives the system a fault tolerance capability while minimizing the complexity of both the software and the hardware. The active and backup processors are coupled asynchronously with some hardware assist functions comprising a memory change detector which captures the memory changes in the memory of the active processor and a mirroring control circuit which causes the memory changes when committed by establish recovery point signals generated by the active processor to be dumped into the memory of the back up processor so that the backup processor can resume the operations of the active processor from the last established recovery point. The active and backup processors may each be connected to a dedicated memory and recovery point storing means, or to a memory including two dual sides shared by all the processors for storing data structures and recovery points.

01 Jan 1990
TL;DR: This paper examines lhe design of a highly efficient, reliable, machine-independent prolOColused by the remote memory server to communicate with the client machines, and outlines the algorilhms and data structures employed by theRemote Memory Model to efficiently locate the data stored on lhe server.
Abstract: This paper describes a new model for constructing distributed systems called lhe Remote Memory Model. The remote memory model consisls of several client machines, one or morc dedicated machines called remote memory servers, and a communication channel interconnecting lhem. In the remote memory model, client machines share lhe memory resources located on the remote memory server. Client machines that exhaust lheir local memory move portions of lheir address space to the remote memory server and retrieve pieces as needed. Because lhe remote memory server uses a machineindependent prolOCOl to communicate wilh client machines, lhe remote memory server can support multiple heterogeneous client machines simultaneously. This paper describes lhe remote memory model and discusses lhe advantages and issues of systems that use this model. It examines lhe design of a highly efficient, reliable, machine-independent prolOColused by the remote memory server to communicate with the client machines. It also outlines the algorilhms and data structures employed by the remote memory server to efficiently locate the data stored on lhe server. Finally, it presenls measuremenls of a prototype implementation that clearly demonstrate the viability and competitive performance of the remote memory model.

Journal ArticleDOI
01 Jun 1990
TL;DR: The goal of the Pandore system is to allow the execution of parallel algorithms on DMPCs (Distributed Memory Parallel Computers) without having to take into account the low-level characteristics of the target distributed computer to program the algorithm.
Abstract: The goal of the Pandore system is to allow the execution of parallel algorithms on DMPCs (Distributed Memory Parallel Computers) without having to take into account the low-level characteristics of the target distributed computer to program the algorithm. No explicit process definition and interprocess communications are needed. Parallelization is achieved through logical data organization. The Pandore system provides the user with a mean to specify data partitioning and data distribution over a domain of virtual processors for each parallel step of his algorithm.At compile time, Pandore splits the original program into parallel processes. Each process will execute some appropriate parts of the original code, according to the given data decomposition. In order to achieve a correct utilization of the data structures distributed over the processors, the Pandore system provides an execution scheme based on a communication layer, which is an abstraction of a message-passing architecture. This intermediate level is them implemented using the effective primitives of the real architecture (in our specific case, an Intel iPSC/2).

Proceedings ArticleDOI
01 May 1990
TL;DR: A new model of asynchronous shared memory parallel computation is introduced, and it is shown that this model fulfils all the listed requirements and also analyzes in this model the complexity of several fundamental parallel algorithms.
Abstract: The contributions of this paper are twofold. First, we outline criteria by which any model of asynchronous shared memory parallel computation can be judged. Previous models are considered with respect to these factors. Next, we introduce a new model, and show that this model fulfils all the listed requirements. We also analyze in our model the complexity of several fundamental parallel algorithms.

01 Oct 1990
TL;DR: The paper presents a new programming environment, Kali, which provides a global name space and allows direct access to remote data values and a system of annotations, allowing the user to control those aspects of the program critical to performance, such as data distribution and load balancing.
Abstract: Programming nonshared memory systems is more difficult than programming shared memory systems, in part because of the relatively low level of current programming environments for such machines. A new programming environment is presented, Kali, which provides a global name space and allows direct access to remote data values. In order to retain efficiency, Kali provides a system on annotations, allowing the user to control those aspects of the program critical to performance, such as data distribution and load balancing. The primitives and constructs provided by the language is described, and some of the issues raised in translating a Kali program for execution on distributed memory systems are also discussed.

Proceedings ArticleDOI
08 Apr 1990
TL;DR: ASAR (Automatic and Symbolic PARallelization) is described which consists of a source-to-source parallelizer and a set of interactive graphic tools and is designed for easy modification for other languages such as Fortran.
Abstract: This paper describes ASPAR (Automatic and Symbolic PARallelization) which consists of a source-to-source parallelizer and a set of interactive graphic tools. While the issues of data dependency have already been explored and used in many parallel computer systems such as vector and shared memory machines, distributed memory parallel computers require, in addition, explicit data decomposition. New symbolic analysis and data-dependency analysis methods are used to determine an explicit data decomposition scheme. Automatic parallelization models using high level communications are also described in this paper. The target applications are of the “regular-mesh" type typical of many scientific calculations. The system has been implemented for the language C, and is designed for easy modification for other languages such as Fortran.

Patent
08 Nov 1990
TL;DR: In this article, a multiprocessor system linked by a fiber optic ring network uses some of the bandwidth of the ring network as a shared memory resource, which can carry message packets from one processor to another or network memory packets which circulate indefinitely on the network.
Abstract: A multiprocessor system linked by a fiber optic ring network uses some of the bandwidth of the ring network as a shared memory resource. Data slots are defined on the network which can carry message packets from one processor to another or network memory packets which circulate indefinitely on the network. One use of these network memory packets is as a lock manage­ment system for controlling concurrent access to a shared database by the multiple processors. The network memory packets are treated as lock entities. A processor indicates that it wants to procure a lock entity by circulating a packet, having a first network memory type, around the network. If no conflicting packets are detected when the circulated packet returns, the type of the slot is changed to a second network memory type indicating a procured lock entity.

Journal ArticleDOI
TL;DR: This paper uses caching and issues such as address space structure and page replacement schemes to define a taxonomy of DSM efforts and examines three DSM efforts in detail, namely: IVY, Clouds and MemNet.
Abstract: Two possible modes of Input/Output (I/O)are "sequential" and "random-access", and there is an extremely strong conceptual link between I/O and communication. Sequential communication, typified in the I/O setting by magnetic tape, is typified in the communication setting by a stream, e.g., a UNIX1 pipe. Random-access communication, typified in the I/O setting by a drum or disk device, is typified in the communication setting by shared memory. In this paper, we study and survey the extension of the random-access model to distributed computer systems.A Distributed Shared Memory (DSM) is a memory area shared by processes running on computers connected by a network. DSM provides direct system support of the shared memory programming model. When assisted by hardware, it can also provide a low-overhead interprocess communication (IPC) mechanism to software. Shared pages are migrated on demand between the hosts. Since computer network latency is typically much larger than that of a shared bus, caching in DSM is necessary for performance. We use caching and issues such as address space structure and page replacement schemes to define a taxonomy. Based on the taxonomy we examine three DSM efforts in detail, namely: IVY, Clouds and MemNet.

Proceedings Article
01 Sep 1990
TL;DR: This paper studies a number of hash-based join algorithms for general purpose multiprocessor computers with shared memory where the amount of memory allocated to the join operation is proportional to the number of processors assigned to the operation and a global hash table is built in this shared memory.
Abstract: This paper studies a number of hash-based join algorithms for general purpose multiprocessor computers with shared memory where the amount of memory allocated to the join operation is proportional to the number of processors assigned to the operation and a global hash table is built in this shared memory. The concurrent update and access to this global hash table is studied. The elapsed time and total processing time for these algorithms are analyzed. The results indicate that, hybrid hash join that outperforms other hash-based algorithms in uniprocessor systems does not always performs the best. A simpler algorithm, hash-based nested loops join, performs better in terms of elapsed time when both the relations are of similar sizes.

Journal ArticleDOI
TL;DR: A probabilistic protocol is presented that solves this Processor Identiy Problem for asynchronous processors that communicate through a common shared memory and simplifies shared memory processor design by eliminating the need to encode processor identifiers in system hardware or software structures.

Journal ArticleDOI
TL;DR: This work describes in detail the use of the one-sided Jacobi rotation as opposed to the rotation used in the “Hestenes” algorithm; it perceives this difference to have been widely misunderstood.

Patent
10 Apr 1990
TL;DR: In this paper, a linear block code error detection scheme is implemented with each shared memory, wherein the effect of random memory faults is sufficiently detected such that the inherent fault tolerance of a pair-spare architecture is not compromised.
Abstract: A highly reliable data processing system using the pair-spare architecture obviates the need for separate memory arrays for each processor. A single memory is shared between each pair of processors wherein a linear block code error detection scheme is implemented with each shared memory, wherein the effect of random memory faults is sufficiently detected such that the inherent fault tolerance of a pair-spare architecture is not compromised.