scispace - formally typeset
Search or ask a question

Showing papers on "Memory management published in 1991"


Journal ArticleDOI
B. Nitzberg1, V. Lo1
TL;DR: An overview of distributed shared memory issues covers memory coherence, design choices, and implementation methods, and algorithms that support process synchronization and memory management are discussed.
Abstract: An overview of distributed shared memory (DSM) issues is presented. Memory coherence, design choices, and implementation methods are included. The discussion of design choices covers structure and granularity, coherence semantics, scalability, and heterogeneity. Implementation issues concern data location and access, the coherence protocol, replacement strategy, and thrashing. Algorithms that support process synchronization and memory management are discussed. >

524 citations


Patent
04 Sep 1991
TL;DR: In this paper, the authors propose to commit data to the external storage to reduce the chance of overloading the I/O subsystem by frequent commits, which reduces the chance that the data will need to be written to the back-up memory or storage between commits.
Abstract: A computer system for processing and committing data comprises a processor, an external storage device such as DASD or tape coupled to the processor, and a working memory such as RAM. An application program updates data in the working memory and then requests that the data be committed, i.e. written to the external storage device. In response, an operating system function determines which data or blocks have been changed and supplies to an I/O service an identification of the changed data or blocks to cause the I/O service to write the changed data or blocks to the external storage device. Thus, the application program is not burdened with the management of the I/O. The operating system permits the program to continue with other processing while the data is being written from the working memory to the external storage device. As a result, the program need not wait while the data is written to the external storage. Also, because little time is required of the program in the commit process, the program can frequently request commits. With frequent commits, there is less chance that the data will have been written to back-up memory or back-up storage (due to an overload of the working memory) between commits, and as a result, time will not be required to read the data from the back-up memory or storage into the working memory en route to the application program's external storage. Also, the frequent commits reduce the chance of overloading the I/O subsystem.

458 citations


Proceedings ArticleDOI
01 Apr 1991
TL;DR: This work surveys several user-level algorithms that make use of page-protection techniques, and analyzes their common characteristics in an attempt to answer the question, "What virtual-memory primitives should the operating system provide to user processes, and how well do today's operating systems provide them?"
Abstract: Memory Management Units (MMUs) are traditionally used by operating systems to implement disk-paged virtual memory. Some operating systems allow user programs to specify the protection level (inaccessible, readonly, read-write) of pages, and allow user programs to handle protection violations, but these mechanisms are not always robust, e cient, or well-matched to the needs of applications. We survey several user-level algorithms that make use of page-protection techniques, and analyze their common characteristics, in an attempt to answer the question, \What virtual-memory primitives should the operating system provide to user processes, and how well do today's operating systems provide them?"

233 citations


Patent
Keizo Aoyagi1
25 Apr 1991
TL;DR: In this article, a central processing unit saves data from each storage area of the main memory into an auxiliary memory during a normal operation of a computer system, and sets a flag corresponding to the storage area, from which the data is saved, in a state indicating the end of a save operation.
Abstract: A main memory has a plurality of divided storage areas. A central processing unit saves data from each storage area of the main memory into an auxiliary memory during a normal operation of a computer system, and sets a flag corresponding to each storage area, from which the data is saved, in a state indicating the end of a save operation. In addition, when data stored in the main memory is updated, the central processing unit changes the flag into a state indicating an incomplete save state. When the computer system must be stopped, the central processing unit saves data, of the data stored in the main memory, only from a storage area for which the flag indicates an incomplete save state into the auxiliary memory, thereby shortening the time required for save processing.

180 citations


Patent
Kazuya Shinjo1, Eiji Ishibashi1
22 Mar 1991
TL;DR: In a computer system, when the system is first booted in a normal mode, the main memory data stored in a main memory immediately after the system boot, is stored as backup data in a backup memory or the like as mentioned in this paper.
Abstract: In a computer system, when the system is first booted in a normal mode, main memory data stored in a main memory immediately after the system is booted, is stored as backup data in a backup memory or the like A backup flag representing whether or not the backup data can be restored is set and the system is rebooted When the system is next booted in the normal mode, the backup data stored in the backup memory or the like is restored as the main memory data in the main memory The backup flag is automatically reset in a maintenance mode

164 citations


Journal ArticleDOI
TL;DR: The results show that there are memory management policies implemented in the system that can improve the performance of programs written using the simpler uniform memory access (UMA) programming model, and there appears to be no single policy that can be considered the best over a set of test applications.
Abstract: Non-uniformity of memory access is an almost inevitable feature of memory architecture in shared memory multiprocessor designs that can scale to large numbers of processors. One implication of NUMA architectures is that the placement and movement of code and data become crucial to performance. As memory architectures become more complex and the nonuniformity becomes less well hidden, systems software must assume a larger role in providing memory management support for the programmer. This paper investigates the role of the operating system. We take an experimental approach to evaluating a wide-range of memory management policies. The target NUMA environment is BBN''s GP-1000 multiprocessor. Extensive local modifications have been made to the memory management subsystem of BBN''s nX operating system to support multiple policy implementations. Policy comparisons are based on the measured performance of real parallel applications. Our results show that there are memory management policies implemented in our system that can improve the performance of programs written using the simpler uniform memory access (UMA) programming model. While achieving the level of performance of a highly tuned NUMA program is still a difficult problem, some examples come close. There appears to be no single policy that can be considered the best over our set of test applications. Investigations into the contributions made by individual policy features toward overall behavior of the workload provide some insight into the design of a set of effective policies.

117 citations


Journal ArticleDOI
TL;DR: The architecture of the Hector multiprocessor, which exploits current microprocessor technology to produce a machine with a good cost/performance tradeoff, is described, and its interconnection backplane is a key design feature that can accommodate future technology.
Abstract: The architecture of the Hector multiprocessor, which exploits current microprocessor technology to produce a machine with a good cost/performance tradeoff, is described. A key design feature of Hector is its interconnection backplane, which can accommodate future technology because it uses simple hardware with short critical paths in logic circuits and short lines in the interconnection network. The system is reliable and flexible and can be realized at a relatively low cost. The hierarchical structure results in a fast backplane and a bandwidth that increases linearly with the number of processors. Hector scales efficiently to larger sizes and faster processors. >

116 citations


Journal ArticleDOI
TL;DR: Three examples of star-coupled structures are introduced, one of which exhibits optical self-routing, and the complexity of the communication subsystem is reduced since intermediate buffering and routing of packets are eliminated.
Abstract: A multiple-instruction multiple-data (MIMD) distributed memory parallel computer system environment is considered. Media access control protocols that maintain good performance with high capacity optical channels are investigated. Three examples of star-coupled structures are introduced, one of which exhibits optical self-routing. Self-routing single-step optically interconnected communication structures can be designed through the incorporation of agile laser diode sources and wavelength tunable optical filters in a wavelength-division multiple-access environment. Intermediary latencies typical of MIMD distributed memory systems are eliminated. The degree and diameter of the resulting structures are dramatically reduced, and the complexity of the communication subsystem is reduced since intermediate buffering and routing of packets are eliminated. >

103 citations


Proceedings ArticleDOI
20 May 1991
TL;DR: A simple owner protocol for implementing a causal distributed shared memory (DSM) is presented, and it is argued that this implementation is more efficient than comparable coherent DSM implementations.
Abstract: A simple owner protocol for implementing a causal distributed shared memory (DSM) is presented, and it is argued that this implementation is more efficient than comparable coherent DSM implementations. Moreover, it is shown that writing programs for causal memory is no more difficult than writing programs for atomic shared memory. >

100 citations


Journal ArticleDOI
TL;DR: Paradigm (parallel distributed global memory), a shared-memory multicomputer architecture that is being developed to show that one can build a large-scale machine using high-performance microprocessors, is discussed and some results to date are summarized.
Abstract: Paradigm (parallel distributed global memory), a shared-memory multicomputer architecture that is being developed to show that one can build a large-scale machine using high-performance microprocessors, is discussed. The Paradigm architecture allows a parallel application program to execute any of its tasks on any processor in the machine, with all the tasks in a single address space. The focus is on novel design techniques that support scalability. The key performance issues are identified, and some results to date from this work and experience with the VMP architecture design on which it is based are summarized. >

90 citations


Journal ArticleDOI
TL;DR: This work proposes a set of primitives for managing distributed shared memory and presents an implementation of these primitives in the context of an object‐based operating system as well as on top of Unix.
Abstract: Shared memory is a simple yet powerful paradigm for structuring systems. Recently, there has been an interest in extending this paradigm to non-shared memory architectures as well. For example, the virtual address spaces for all objects in a distributed object-based system could be viewed as constituting a global distributed shared memory. We propose a set of primitives for managing distributed shared memory. We present an implementation of these primitives in the context of an object-based operating system as well as on top of Unix.

Journal ArticleDOI
TL;DR: A discussion is presented of the use of dynamic storage schemes to improve parallel memory performance during three important classes of data accesses: vector accesses in which multiple strides are used to access a single vector, block accesses, and constant-geometry FFT accesses.
Abstract: A discussion is presented of the use of dynamic storage schemes to improve parallel memory performance during three important classes of data accesses: vector accesses in which multiple strides are used to access a single vector, block accesses, and constant-geometry FFT accesses. The schemes investigated are based on linear address transformations, also known as XOR schemes. It has been shown that this class of schemes can be implemented more efficiently in hardware and has more flexibility than schemes based on row rotations or other techniques. Several analytical results are shown. These include: quantitative analysis of buffering effects in pipelined memory systems; design rules for storage schemes that provide conflict-free access using multiple strides, blocks, and FFT access patterns; and an analysis of the effects of memory bank cycle time on storage scheme capabilities. >

Journal ArticleDOI
TL;DR: The goal is to design an efficient memory management protocol which guarantees that the memory consumption of parallel simulation is of the same order as sequential simulation, and proposes an optimal algorithm called artifical rollback.
Abstract: Recently there has been a great deal of interest in performance evalution of parallel simulation. Most work is devoted to the time complexity and assumes that the amount of memory available for parallel simulation is unlimited. This paper studies the space complexity of parallel simulation. Our goal is to design an efficient memory management protocol which guarantees that the memory consumption of parallel simulation is of the same order as sequential simulation. (Such an algorithm is referred to as a optimal.) First, we derive the relationships among the space complexities of sequential simulation, Chandy-Misra simulation [2], and Time Warp simulation [7]. We show that Chandy-Misra may consume more storage than sequential simulation, or vice versa. Then we show that Time Warp never consumes less memory than sequential simulation. Then we describe cancelback, an optimal Time Warp memory management protocol proposed by Jefferson. Although cancelback is considered to be complete solution for the storage management problem in Time Warp, some efficiency issues in implementing this algorithm must be considered. We propose an optimal algorithm called artifical rollback. We show that this algorithm is easy to implement and analyze. An implementation of artificial rollback is given, which is integrated with processor scheduling to adjust the memory consumption rate based on the amount of free storage available in the system.

Patent
20 Mar 1991
TL;DR: In this paper, a method of managing the memory of a CM multiprocessor computer system is described, where the data and stack pages of a process are transferred to the coupled memory region of the CPU module to which the process is assigned, when the pages are called for by the process.
Abstract: A method of managing the memory of a CM multiprocessor computer system is disclosed. A CM multiprocessor computer system includes: a plurality of CPU modules 11a . . . 11n to which processes are assigned; one or more optional global memories 13a . . . 13n; a storage medium 15a, 15b . . . 15n; and a global interconnect 12. Each of the CPU modules 11a . . . 11n includes a processor 21 and a coupled memory 23 accessible by the local processor without using the global interconnect 12. Processors have access to remote coupled memory regions via the global interconnect 12. Memory is managed by transferring, from said storage medium, the data and stack pages of a process to be run to the coupled memory region of the CPU module to which the process is assigned, when the pages are called for by the process. Other pages are transferred to global memory, if available. At prescribed intervals, the free memory of each coupled memory region and global memory is evaluated to determine if it is below a threshold. If below the threshold, a predetermined number of pages of the memory region are scanned. Infrequently used pages are placed on the end of a list of pages that can be replaced with pages stored in the storage medium. Pages associated with processes that are terminating are placed at the head of the list of replacement pages.

Patent
27 Dec 1991
TL;DR: In this paper, an apparatus for memory management in network systems provides added margins of reliability for the receipt of vital maintenance operations protocol (MOP) and station management packets (SMP), in addition, additional overflow allocations of buffers are assigned for receipt of critical system packets which otherwise would typically be discarded in the event of a highly congested system.
Abstract: An apparatus for memory management in network systems provides added margins of reliability for the receipt of vital maintenance operations protocol (MOP) and station management packets (SMP). In addition, additional overflow allocations of buffers are assigned for receipt of critical system packets which otherwise would typically be discarded in the event of a highly congested system. Thus, if a MOP or a SMP packet is received from the network when the allocated space for storing these types of packets in full, the packets are stored in the overflow allocations, and thus the critical packets are not lost.

01 Sep 1991
TL;DR: A memory management toolkit for language implementors that offers efficient and flexible generation scavenging garbage collection and includes auxiliary compo- nents that east implementation of garbage collection for programming languages.
Abstract: We describe a memory management toolkit for language implementors. It offers efficient and flexible generation scavenging garbage collection. In addition to providing a core of language-independent algorithms and data structures, the toolkit includes auxiliary compo- nents that east implementation of garbage collection for programming languages. We have detailed designs for Smalltalk and Modula-3 and are confident the toolkit can be used with a wide variety of languages. The toolkit approach is itself novel, and our design includes a number of additional innovations in flexibility, efficiency, accuracy, and cooperation between the compiler and the collector.

Journal ArticleDOI
TL;DR: An object-oriented database approach for a direct access and memory management system which covers the needs of storing compound resp.
Abstract: This paper describes an object-oriented database approach for a direct access and memory management system which covers the needs of storing compound resp. multimedia documents in a multi-user and distributed environment. Document components may be distributed over several physical locations, documents may share components, and multi-user access is supported. In addition, the model allows to represent much of the document semantics, giving the opportunity to define and access components favourable through their properties. Rapid access and manipulation (using for example, associative methods) as well as encapsulation of data to minimize side effects are also considered.

Patent
10 Jun 1991
TL;DR: In this paper, the authors propose a memory system with security against unauthorized access of the contents of the memory system, consisting of a first alterable memory, a second non-alterable memory and a data bus.
Abstract: A memory system, having security against unauthorized accessing of the contents of the memory system, comprises a first alterable memory (6), a second non-alterable memory (14, 16) and a data bus (5) for allowing external access to data stored in the memory system during a test mode of operation. The first alterable memory (6) comprises an options register (10) having a security bit (SEC) which, when programmed to an active state, prevents external access to the data stored in the first alterable memory during the test mode. The first alterable memory (6) further comprises a first data memory (8) having at least one security byte (VALSEC) which, when programmed to a predetermined state, prevents external access to the data stored in both the first alterable memory (6) and the second non-alterable memory (14, 16) during the test mode.

Patent
20 Mar 1991
TL;DR: In this paper, an adaptive memory management method for coupled memory multiprocessor computer systems is described, where the most referenced data and stack pages are placed in the coupled memory of the processor to which a specific process is assigned and lesser referenced pages are located in global memory or coupled memory regions of other processors.
Abstract: An adaptive memory management method for coupled memory multiprocessor computer systems is disclosed. In a coupled memory multiprocessor system all the data and stack pages of processes assigned to individual multiprocessors are, preferably, located in a memory region coupled to the assigned processor. When this becomes impossible, some data and stack pages are assigned to global memory or memory regions coupled to other processors. The present invention is a method of making certain that the most referenced data and stack pages are located in the coupled memory of the processor to which a specific process is assigned and lesser referenced pages are located in global memory or the coupled memory region of other processors. This result is accomplished by sampling the memory references made by the processors of the computer system and causing the most recently referenced pages in each coupled memory region to be maintained at the head of an active page list. References to remote data and stack pages are stored in a remote page hash table. Remote pages are pages stored in global memory or in coupled memory other than the coupled memory of the processor to which the process owning the pages is assigned. Any remote data and stack pages referenced more frequently than pages stored in a processor's coupled memory region are transferred to the processor's coupled memory region. If a processor's coupled memory region is tight, pages are transferred from the processor's coupled memory region to global memory or to the coupled memory region of another processor.

Patent
08 Apr 1991
TL;DR: In this paper, a variable-length directory containing descriptors of disk records is used to locate a selected record in non-volatile memory and a table, ordered sequentially by record number, is used in order to quickly locate a record.
Abstract: A method is described for managing data records stored in non-volatile memory in a disk drive system with cache memory A variable-length directory containing descriptors of disk records is used to locate a selected record non-volatile memory A table, ordered sequentially by record number, is used to quickly locate a record in non-volatile memory without having to perform a time-consuming search In order to efficiently utilize space in non-volatile memory, a list is kept of free space for storing record descriptors After an initial nominal allocation, additional free space is allocated only when required, thus further increasing the efficiency of use of non-volatile memory

Journal ArticleDOI
TL;DR: Recently implemented parallel system address-tracing methods based on several metrics are surveyed and the issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted.
Abstract: Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately. >

Proceedings ArticleDOI
01 Aug 1991
TL;DR: MTOOL as mentioned in this paper augments a program with low instrumentation which perturbs the program's execution as little as possible while generating enough information to isolate memory and synchronization bottlenecks.
Abstract: This paper describes MTOOL, a software tool for ana­ lyzin g performance losses in shared memory parallel pro­ grams. MTOOL augments a program with low instrumentation which perturbs the program's execution as little as possible while generating enough information to isolate memory and synchronization bottlenecks. After running the instrumented version of the parallel program, the programmer can use MTOOL's window-based user interface to view memory, synchronization, and compute time bottlenecks at increasing levels of detail from a whole program level down to the level of individual procedures, loops and synchronization objects. An initial implementa­ tion of MTOOL runs on Silicon Graphics multi processors and is in use by several groups at Stanford.

Patent
09 Sep 1991
TL;DR: In this article, a plurality of processing units, each having a local memory connected thereto, are disclosed, and a write sense controller is also connected to each of the processing units to transmit a memory write word into a shared portion of local memory over a reflective memory line.
Abstract: A plurality of processing units, each having a local memory connected thereto is disclosed. A write sense controller is also connected to each of the processing units to transmit a memory write word into a shared portion of local memory over a reflective memory line. Other write sense controllers receive the memory words from the reflective memory bus and cause them to be written into corresponding storage locations in the shared partitions of their local memories.

Journal ArticleDOI
TL;DR: An optimal algorithm for performing the communication described by exchanging the bits of the node address with that of the local address is described, typically in both matrix transposition and bit reversal for the fast Fourier transform.

Journal ArticleDOI
02 Dec 1991
TL;DR: The authors introduce a formalism which allows to treat computer architecture as a formal optimization problem and applies this to the design of shared memory parallel machines, resulting in a surprisingly good price/performance ratio even if compared with distributed memory machines.
Abstract: The authors introduce a formalism which allows to treat computer architecture as a formal optimization problem. They apply this to the design of shared memory parallel machines. Present computers of this type support the programming model of a shared memory. But simultaneous access to the shared memory by several processors is in many situations processed sequentially. Asymptotically good solutions for this problem are offered by theoretical computer science. The authors modify these constructions under engineering aspects and improve the price/performance ratio by roughly a factor of 6. The resulting machine has surprisingly good price/performance ratio even if compared with distributed memory machines. For almost all access patterns of all processors into the shared memory, access is as fast as the access of only a single processor. The re-engineered machine is based on Fluent Machine. >

Patent
09 Jan 1991
TL;DR: In this article, a data processing system handles memory management exceptions caused by a faulting vector instruction in a vector processor by halting the execution of the vector instruction being executed when the exception occurred.
Abstract: A data processing system handles memory management exceptions caused by a faulting vector instruction in a vector processor by halting the execution of the faulting vector instruction being executed when the exception occurred and by setting the state information for the vector processor to acknowledge the presence of the exception and to include information about the suboperation of the vector instruction being executed when the exception occurred The scalar processor is not interrupted at this time, however Any other vector instructions executing simutaneously with the faulting vector instruction are allowed to continue so long as those instructions do not require data from the faulting instruction The faulting partially completed vector instruction resumes execution after the operating system has processed the memory management exception

Patent
19 Mar 1991
TL;DR: In this paper, an inhibiting device is provided for the switching from the second mode to the first mode, thereby ensuring that the contents of the internal ROM cannot be read-out.
Abstract: A single-chip microcomputer connectable to an external memory for expanding the address space, and having a first mode of operation in which the available memory region is both the region of an internal ROM and the external memory, and having a second mode of operation in which the available memory region is the region of the external memory only. An inhibiting device is provided for inhibiting the switching from the second mode to the first mode, thereby ensuring that the contents of the internal ROM cannot be read-out.

Patent
03 Apr 1991
TL;DR: In this paper, a digital computer system is described which is capable of processing 2 or more computer instructions in parallel and which has the capability of generating compounding tag information for those instructions, the tag information being associated with instructions for the purpose of indicating groups of instructions which are to be concurrently executed.
Abstract: A digital computer system is described which is capable of processing 2 or more computer instructions in parallel and which has the capability of generating compounding tag information for those instructions, the compounding tag information being associated with instructions for the purpose of indicating groups of instructions which are to be concurrently executed. A compounding tag has a value which indicates the size of the group of instructions which are to be concurrently executed. The computer system includes a hierarchially-arranged memory which provides instructions to a CPU for execution. The instructions are compounded in the memory, and provision is made in the memory for storage of their compounding tags. In the event of modification of an instruction in memory, the invention provides for reduction of the value of the compounding tags for the modified instruction and instructions which are capable of being compounded with the modified instruction or for generation of new tag values for the modified instruction and instructions which are adjacent it in memory.

Patent
Kogure Masayuki1
13 May 1991
TL;DR: A memory allocation system is made up of a unit for storing information about the memory volume required at the time of initializing an executable program in the control information of the file for storing the program by obtaining the necessary memory volume when the program is translated, assembled or compiled.
Abstract: A memory allocation system is made up of a unit for storing the information about the memory volume required at the time of initializing an executable program in the control information of the file for storing the program by obtaining the necessary memory volume when the program is translated, assembled or compiled, a unit for reading the information about the memory volume required at the time of initializing the program stored in the control information of the file, when the execution format program is loaded, a unit for statically allocating a memory commensurate with the read memory volume; and a unit for dynamically allocating a memory, when the memory availability is deficient for executing the program.

Journal ArticleDOI
TL;DR: Under certain workload assumptions, results show that placement algorithms that are strongly biased toward local frame allocation but are able to borrow remote frames can reduce the number of page faults over strictly local allocation.