Showing papers on "Memory management published in 1987"

PDF

Open Access

Journal Article•DOI•

Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures

[...]

Richard F. Rashid¹, Avadis Tevanian¹, Michael Young¹, David B. Golub¹, Robert V. Baron¹, David L. Black¹, William J Bolosky¹, Jonathan Chew¹ - Show less +4 more•Institutions (1)

Carnegie Mellon University¹

01 Oct 1987

TL;DR: The design and implementation of virtual memory management within the CMU Mach Operating System and the experiences gained by the Mach kernel group in porting that system to a variety of architectures are described.

...read moreread less

Abstract: This paper describes the design and implementation of virtual memory management within the CMU Mach Operating System and the experiences gained by the Mach kernel group in porting that system to a variety of architectures. As of this writing, Mach runs on more than half a dozen uniprocessors and multiprocessors including the VAX family of uniprocessors and multiprocessors, the IBM RT PC, the SUN 3, the Encore MultiMax, the Sequent Balance 21000 and several experimental computers. Although these systems vary considerably in the kind of hardware support for memory management they provide, the machine-dependent portion of Mach virtual memory consists of a single code module and its related header file. This separation of software memory management from hardware support has been accomplished without sacrificing system performance. In addition to improving portability, it makes possible a relatively unbiased examination of the pros and cons of various hardware memory management schemes, especially as they apply to the support of multiprocessors.

...read moreread less

356 citations

Journal Article•DOI•

The duality of memory and communication in the implementation of a multiprocessor operating system

[...]

Michael Young¹, Avadis Tevanian¹, Richard F. Rashid¹, David B. Golub¹, Jeffrey L. Eppinger¹ - Show less +1 more•Institutions (1)

Carnegie Mellon University¹

01 Nov 1987

TL;DR: The relationship between memory and communication in Mach is examined as it relates to overall performance, applicability of Mach to new multiprocessor architectures, and the structure of application programs.

...read moreread less

Abstract: Mach is a multiprocessor operating system being implemented at Carnegie-Mellon University. An important component of the Mach design is the use of memory objects which can be managed either by the kernel or by user programs through a message interface. This feature allows applications such as transaction management systems to participate in decisions regarding secondary storage management and page replacement.This paper explores the goals, design and implementation of Mach and its external memory management facility. The relationship between memory and communication in Mach is examined as it relates to overall performance, applicability of Mach to new multiprocessor architectures, and the structure of application programs.

...read moreread less

274 citations

Proceedings Article•DOI•

A model for hierarchical memory

[...]

Alok Aggarwal¹, Bowen Alpern¹, Ashok K. Chandra¹, Marc Snir¹•Institutions (1)

IBM¹

01 Jan 1987

TL;DR: An algorithm that uses LRU policy at the successive “levels” of the memory hierarchy is shown to be optimal for arbitrary memory access time.

...read moreread less

Abstract: In this paper we introduce the Hierarchical Memory Model (HMM) of computation. It is intended to model computers with multiple levels in the memory hierarchy. Access to memory location x is assumed to take time ⌈ log x ⌉. Tight lower and upper bounds are given in this model for the time complexity of searching, sorting, matrix multiplication and FFT. Efficient algorithms in this model utilize locality of reference by bringing data into fast memory and using them several times before returning them to slower memory. It is shown that the circuit simulation problem has inherently poor locality of reference. The results are extended to HMM's where memory access time is given by an arbitrary (nondecreasing) function. Tight upper and lower bounds are obtained for HMM's with polynomial memory access time; the algorithms for searching, FFT and matrix multiplication are shown to be optimal for arbitrary memory access time. On-line memory management algorithms for the HMM model are also considered. An algorithm that uses LRU policy at the successive “levels” of the memory hierarchy is shown to be optimal.

...read moreread less

269 citations

Patent•

Multinode reconfigurable pipeline computer

[...]

Daniel M. Nosenchuck¹, Michael G. Littman¹•Institutions (1)

Princeton University¹

13 Nov 1987

TL;DR: In this article, a multinode parallel-processing computer is made up of a plurality of inner-connected, large capacity nodes each including a reconfigurable pipeline of functional units such as Integer Arithmetic Logic Processors, Floating Point Arithmetic Processors and Special Purpose Processors.

...read moreread less

Abstract: A multinode parallel-processing computer is made up of a plurality of innerconnected, large capacity nodes each including a reconfigurable pipeline of functional units such as Integer Arithmetic Logic Processors, Floating Point Arithmetic Processors, Special Purpose Processors, etc. The reconfigurable pipeline of each node is connected to a multiplane memory by a Memory-ALU switch NETwork (MASNET). The reconfigurable pipeline includes three (3) basic substructures formed from functional units which have been found to be sufficient to perform the bulk of all calculations. The MASNET controls the flow of signals from the memory planes to the reconfigurable pipeline and vice versa. the nodes are connectable together by an internode data router (hyperspace router) so as to form a hypercube configuration. The capability of the nodes to conditionally configure the pipeline at each tick of the clock, without requiring a pipeline flush, permits many powerful algorithms to be implemented directly.

...read moreread less

187 citations

Journal Article•DOI•

Line (Block) Size Choice for CPU Cache Memories

[...]

Alan Jay Smith¹•Institutions (1)

University of California, Berkeley¹

01 Sep 1987-IEEE Transactions on Computers

TL;DR: In this article, the authors examined the cache miss ratio as a function of line size, and found that for high performance microprocessor designs, line sizes in the range 16-64 bytes seem best; shorter line sizes yield high delays due to memory latency, although they reduce memory traffic somewhat.

...read moreread less

Abstract: The line (block) size of a cache memory is one of the parameters that most strongly affects cache performance. In this paper, we study the factors that relate to the selection of a cache line size. Our primary focus is on the cache miss ratio, but we also consider influences such as logic complexity, address tags, line crossers, I/O overruns, etc. The behavior of the cache miss ratio as a function of line size is examined carefully through the use of trace driven simulation, using 27 traces from five different machine architectures. The change in cache miss ratio as the line size varies is found to be relatively stable across workloads, and tables of this function are presented for instruction caches, data caches, and unified caches. An empirical mathematical fit is obtained. This function is used to extend previously published design target miss ratios to cover line sizes from 4 to 128 bytes and cache sizes from 32 bytes to 32K bytes; design target miss ratios are to be used to guide new machine designs. Mean delays per memory reference and memory (bus) traffic rates are computed as a function of line and cache size, and memory access time parameters. We find that for high performance microprocessor designs, line sizes in the range 16-64 bytes seem best; shorter line sizes yield high delays due to memory latency, although they reduce memory traffic somewhat. Longer line sizes are suitable for mainframes because of the higher bandwidth to main memory.

...read moreread less

180 citations

Proceedings Article•DOI•

The 5 minute rule for trading memory for disc accesses and the 10 byte rule for trading memory for CPU time

[...]

Jim Gray, Franco Putzolu

01 Dec 1987

TL;DR: If an item is accessed frequently enough, it should be main memory resident, and the results depend on current price ratios of processors, memory and disc accesses, and hence the constants in the rules are changing.

...read moreread less

Abstract: If an item is accessed frequently enough, it should be main memory resident. For current technology, “frequently enough” means about every five minutes.Along a similar vein, one can frequently trade memory space for CPU time. For example, bits can be packed in a byte at the expense of extra instructions to extract the bits. It makes economic sense to spend ten bytes of main memory to save one instruction per second.These results depend on current price ratios of processors, memory and disc accesses. These ratios are changing and hence the constants in the rules are changing.

...read moreread less

149 citations

Patent•

Software emulation of bank-switched memory using a virtual DOS monitor and paged memory management

[...]

Gary Allen Stimac, William Caldwell Crosswy, Stephen Bruce Preston, James Steven Flannigan

03 Aug 1987

TL;DR: In this paper, a virtual DOS monitor uses the paging hardware of a processor such as the Intel 80386 microprocessor in conjunction with its Virtual-8086 mode of operation to emulate expanded memory using extended memory.

...read moreread less

Abstract: A virtual DOS monitor uses the paging hardware of a processor such as the Intel 80386 microprocessor in conjunction with its Virtual-8086 mode of operation to emulate expanded memory using extended memory. Support for application programs which access expanded memory is thereby provided without the need for additional memory boards or other hardware.

...read moreread less

143 citations

Journal Article•DOI•

Discrete Optimization Problem in Local Networks and Data Alignment

[...]

Fiol, Yebra, Alegre, Valero¹•Institutions (1)

Polytechnic University of Puerto Rico¹

01 Jun 1987-IEEE Transactions on Computers

TL;DR: This paper presents the solution of the following optimization problem that appears in the design of double-loop structures for local networks and also in data memory, allocation and data alignment in SIMD processors.

...read moreread less

Abstract: This paper presents the solution of the following optimization problem that appears in the design of double-loop structures for local networks and also in data memory, allocation and data alignment in SIMD processors.

...read moreread less

132 citations

Journal Article•DOI•

Vector Access Performance in Parallel Memories Using a Skewed Storage Scheme

[...]

David T. Harper¹, J. Robert Jump²•Institutions (2)

University of Texas at Dallas¹, Rice University²

01 Dec 1987-IEEE Transactions on Computers

TL;DR: The skewing scheme evaluated here does not eliminate all memory conflicts but it does improve the average performance of vector access over interleaved systems for a wide range of strides.

...read moreread less

Abstract: The degree to which high-speed vector processors approach their peak performance levels is closely tied to the amount of interference they encounter while accessing vectors in memory. In this paper we present an evaluation of a storage scheme that reduces the average memory access time in a vector-oriented architecture. A skewing scheme is used to map vector components into parallel memory modules such that, for most vector access patterns, the number of memory conflicts is reduced over that observed in interleaved parallel memory systems. Address and data buffers are used locally in each module so that transient nonuniformities which occur in some access patterns do not degrade performance. Previous investigations into skewing techniques have attempted to provide conflict-free access for a limited subset of access patterns. The goal of this investigation is different. The skewing scheme evaluated here does not eliminate all memory conflicts but it does improve the average performance of vector access over interleaved systems for a wide range of strides. It is shown that little extra hardware is required to implement the skewing scheme. Also, far fewer restrictions are placed on the number of memory modules in the system than are present in other proposed schemes.

...read moreread less

88 citations

Debugging Fortran on a shared memory machine

[...]

Todd R. Allen¹, David Padua•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jan 1987

59 citations

Proceedings Article•

Data prefetching in shared memory multiprocessors.

[...]

Roland Lun Lee¹, Pen-Chung Yew¹, Duncan H. Lawrie¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jan 1987

TL;DR: Using the multiprocessor cache model for comparison, data prefetching is found to be more effective than caches in addressing the memory access bottleneck.

...read moreread less

Abstract: The trace driven simulation of 16 numerical subroutines is used to compare instruction lookahead and data prefetching with private caches in shared memory multiprocessors with hundreds or thousands of processors and memory modules interconnected with a pipelined network. These multiprocessors are characterized by long memory access delays that create a memory access bottleneck. Using the multiprocessor cache model for comparison, data prefetching is found to be more effective than caches in addressing the memory access bottleneck. 5 refs., 6 figs.

...read moreread less

Proceedings Article•DOI•

Memory access patterns of parallel scientific programs

[...]

F. Darema-Rogers¹, G. F. Pfister¹, Kimming So¹•Institutions (1)

IBM¹

01 May 1987

TL;DR: It is found that, even though the shared data comprise the largest portion of the data in the application program, on the average a small fraction of the memory references are to shared data.

...read moreread less

Abstract: A parallel simulator, PSIMUL, has been used to collect information on the memory access patterns and synchronization overheads of several scientific applications. The parallel simulation method we use is very efficient and it allows us to simulate execution of an entire application program, amounting to hundreds of millions of instructions. We present our measurements on the memory access characteristics of these applications; particularly our observations on shared and private data, their frequency of access and locality. We have found that, even though the shared data comprise the largest portion of the data in the application program, on the average a small fraction of the memory references are to shared data. The low averages do not preclude bursts of traffic to shared memory nor does it rule out positive benefits from caching shared data. We also discuss issues of synchronization overheads and their effect on performance.

...read moreread less

Book Chapter•DOI•

Concurrent garbage collection on stock hardware

[...]

Stephen C. North¹, John Reppy¹•Institutions (1)

Bell Labs¹

14 Sep 1987

TL;DR: It is believed that the modification of Brooks' forwarding pointers can successfully off-load much of the cost of memory management to, otherwise unused, dead-time.

...read moreread less

Abstract: We have demonstrated a practical design for memory management in a concurrent system running on stack hardware. Under our modification of Brooks' forwarding pointers, the only runtime costs, owing to storage reclamation, incurred by user processes are an extra level of indirection when accessing object contents and the need to scavenge when updating mutable objects. Therefore, we believe that our system can successfully off-load much of the cost of memory management to, otherwise unused, dead-time.

...read moreread less

Journal Article•DOI•

A Survey of Microprocessor Architectures for Memory Management

[...]

Furht¹, Milutinovic²•Institutions (2)

University of Miami¹, Purdue University²

01 Mar 1987-IEEE Computer

TL;DR: The article discusses the capabilities of current microprocessors to support virtual memory, which includes abilities to recognize an address fault, to abort the execution of the current instruction and save necessary information, and to restore the saved state and resume normal processing.

...read moreread less

Abstract: This article presents an overview of current 16- and 32-bit microprocessor architectures that support memory management. The authors define the basic requirements for a processor to support memory management and introduce hierarchically organized memory. They describe several address translation schemes, such as paging, segmentation, and combined paging/segmentation and discuss their implementation in current microprocessors. They give special emphasis to the application of associative cache memory, and analyze and compare single-level and multi-level address mapping schemes. Futhermore, the article discusses the capabilities of current microprocessors to support virtual memory, which includes abilities to recognize an address fault, to abort the execution of the current instruction and save necessary information, and to restore the saved state and resume normal processing. The authors evaluate two methods to restart the interrupted instruction, instruction restart and instruction continuation, and discuss their implementation in current microprocessors. They also discuss protection and security issues, and evaluate two protection schemes, hierarchical and nonhierarchical.

...read moreread less

Patent•

Semantic network machine for artificial intelligence computer

[...]

Shigeru Oyanagi¹, Sumikazu Fujita¹, Sadao Nakamura¹•Institutions (1)

Toshiba¹

19 Nov 1987

TL;DR: In this paper, a semantic network machine is applied to an artificial intelligence computer for performing inferential retrieval with respect to a knowledge base, where a sub associative memory is connected in parallel with the main memory and stores specific knowledge data including "is-a" attributes of the knowledge data.

...read moreread less

Abstract: A semantic network machine is applied to an artificial intelligence computer for performing inferential retrieval with respect to a knowledge base. A main associative memory stores the knowledge base consisting of knowledge data arranged to form a semantic network. Each knowledge data consists of a set of an object, an attribute, and a value. A sub associative memory is connected in parallel with the main memory and stores specific knowledge data including "is-a" attributes of the knowledge data. In an inferential retrieval mode, when a question associated with a given object is input, an initial retrieval condition for retrieving, from the knowledge base, knowledge data necessary for answering the question is defined. While the main memory is accessed using the initial retrieval condition, the sub memory is also accessed simultaneously to read out data representing another object concept associated with the object concept included in the question by the "is-a" attribute from the sub memory in a parallel manner. The readout data is temporarily stored in a buffer memory. When retrieval in the main memory fails, the initial retrieval condition is updated using the data stored in the buffer memory to generate a secondary initial retrieval condition supplied to the main memory, which is then successively accessed using the updated condition.

...read moreread less

Book•

Memory Performance of Prolog Architectures

[...]

Evan Tick¹•Institutions (1)

Stanford University¹

31 Dec 1987

TL;DR: A comparison between Prolog-10 and WAM and the Consistency Problem, and the importance of locking in Broadcast Caches and Coherent Cache Measurements.

...read moreread less

Abstract: 1. Introduction.- 1.1. What is Prolog?.- 1.2. Why Prolog?.- 1.2.1. Reduced Instruction Set Architectures.- 1.2.2. Parallel Logic Programming Languages.- 1.2.3. Lisp.- 1.3. Previous Work.- 1.3.1. Architectures.- 1.3.2. Benchmarking.- 1.3.3. Memory Organization.- 1.4. Book Outline.- 2. Prolog Architectures.- 2.1. Canonical Prolog Architectures.- 2.1.1. CIF Data Encoding.- 2.1.2. Naive and Traditional Prolog CIFs.- 2.1.3. Register-Based CIF.- 2.1.4. Other CIF Metrics: Stability.- 2.1.5. Summary.- 2.2. Environment Stacking Architectures.- 2.2.1. DEC-10 Prolog Abstract Machine.- 2.2.2. Warren Abstract Machine.- 2.2.3. Comparison Between Prolog-10 and WAM.- 2.2.4. Lcode Architecture.- 2.3. Restricted AND-Parallel Prolog Architecture.- 2.4. Summary.- 3. Prolog Architecture Measurements.- 3.1. Methodology.- 3.1.1. Compiler.- 3.1.2. Assembler.- 3.1.3. Emulator.- 3.1.4. Simulators.- 3.2. Benchmarks.- 3.3. WAM Referencing Characteristics.- 3.3.1. Data Referencing.- 3.3.2. Instruction Referencing.- 3.4. CIF Referencing Characteristics.- 3.5. PWAM Referencing Characteristics.- 3.6. Summary.- 4. Uniprocessor Memory Organizations.- 4.1. Memory Model.- 4.2. Data Referencing.- 4.2.1. Choice Point Buffer.- 4.2.2. Stack Buffer.- 4.2.3. Environment Stack Buffer.- 4.2.4. Copyback Cache.- 4.2.5. Smart Cache.- 4.2.6. Comparison of Data Memories.- 4.3. Instruction Referencing.- 4.3.1. Instruction Buffer.- 4.3.2. Instruction Caches.- 4.4. Local Memory Configurations.- 4.5. Main Memory Design.- 4.5.1. General Queueing Model.- 4.5.2. Memory Bus Model.- 4.5.3. Copyback I/D Cache System.- 4.5.4. Stack and Instruction Buffer System.- 4.6. Summary.- 5. Multiprocessor Memory Organizations.- 5.1. Memory Model.- 5.2. The Consistency Problem.- 5.2.1. Broadcast Cache Coherency.- 5.2.2. Locking in Broadcast Caches.- 5.2.3. Hybrid Cache Coherency.- 5.3. Coherent Cache Measurements.- 5.4. Shared Memory Design.- 5.4.1. Shared Memory and Bus Queueing Models.- 5.4.2. Measurements.- 5.5. Summary.- 6. Conclusions and Future Research.- 6.1. Conclusions.- 6.2. Future Research.- Appendix A. Glossary of Notation.- Appendix B. Lcode Instruction Set Summary.- Appendix C Local Memory Management Algorithms.- References.

...read moreread less

Proceedings Article•DOI•

Distributed shared memory in a loosely coupled distributed system

[...]

B. D. Fleisch¹•Institutions (1)

University of California, Los Angeles¹

01 Aug 1987

TL;DR: The development and performance validation of an architecture for distributed shared memory in a loosely coupled distributed computing environment and metrics which will be used to measure its performance are described.

...read moreread less

Abstract: This work outlines the development and performance validation of an architecture for distributed shared memory in a loosely coupled distributed computing environment. This distributed shared memory may be used for communication and data exchange between communicants on different computing sites; the mechanism will operate transparently and in a distributed manner. This paper describes the architecture of this mechanism and metrics which will be used to measure its performance. We also discuss a number of issues related to the overall design and what research contribution such an implementation can provide to the computer science field.

...read moreread less

Patent•

System for loading initial program loader routine into secondary computer without bootstrap ROM

[...]

Dieter Kopp¹, Thomas Hormann¹, Uwe Ackermann¹•Institutions (1)

Alcatel-Lucent¹

20 Nov 1987

TL;DR: In this article, the primary computer loads the initial program loader into the shared memory, sends a restart instruction to a central processing unit of the secondary computer, and then controls a memory access logic circuit such that all program memory access instruction are routed to that portion of the common memory holding the program loader.

...read moreread less

Abstract: A secondary computer is connected to a primary computer via a shared memory. The initial program loader for the secondary computer is stored in a mass memory of the primary computer rather than in a bootstrap memory. The primary computer loads the initial program loader into the shared memory, sends a restart instruction to a central processing unit of the secondary computer, and then controls a memory access logic circuit such that all program memory access instruction are routed to that portion of the shared memory holding the initial program loader until the secondary computer has read out the last instruction of the initial program loader. The central processing unit then sends to the memory access logic circuit an acknowledge signal which causes the latter to route all further program memory access instructions to a program memory of the secondary computer.

...read moreread less

Journal Article•DOI•

Algorithms for an Advanced Fault Simulation System in MOTIS

[...]

Chi-Yuan Lo¹, H.N. Nham, A.K. Bose•Institutions (1)

Bell Labs¹

01 Mar 1987-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: Efficient algorithms to perform fault simulation are discussed in terms of fault list manipulation and primitive evaluation and special emphasis is on an innovative fast unit delay fault simulation algorithm that achieves an additional 33-39-percent improvement in speed and 20-28- percent improvement in memory usage.

...read moreread less

Abstract: In this paper, we will present algorithms developed for an advanced fault simulation system in the MOTIS simulation environment. In particular, the algorithm to perform fault modeling and collapsing is first reviewed. Efficient algorithms to perform fault simulation are discussed in terms of fault list manipulation and primitive evaluation. The simulator realizes a speed gain factor of 787 to 2088 over serial fault simulation. Special emphasis is on an innovative fast unit delay fault simulation algorithm that achieves an additional 33-39-percent improvement in speed and 20-28-percent improvement in memory usage.

...read moreread less

Proceedings Article•DOI•

Parallel discrete event simulation: a shared memory approach

[...]

Daniel A. Reed¹, Allen D. Malony¹, Bradley D. McCredie¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 May 1987

TL;DR: These experiments show that the Chandy-Misra approach to distributed simulation is not a viable approach to parallel simulation of queueing network models, and there is little prospect that they can be reduced to acceptable levels.

...read moreread less

Abstract: With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.

...read moreread less

Patent•

Self-configuring memory management system with on card circuitry for non-contentious allocation of reserved memory space among expansion cards

[...]

Jonathan Fitch¹, Ronald R Hochsprung¹•Institutions (1)

Apple Inc.¹

13 Mar 1987

TL;DR: In this article, a printed circuit board card adapted to fit into a slot and make electrical connections with cooperating terminals in the slot, the slot being disposed on the main circuit board of a personal computer system, including a CPU, memory, a 32-bit address bus with control signals associated therewith, and input/output circuitry.

...read moreread less

Abstract: A printed circuit board card adapted to fit into a slot and make electrical connections with cooperating terminals in the slot, the slot being disposed on the main circuit board of a personal computer system, the main circuit board including a CPU, memory, a 32-bit address bus with control signals associated therewith, and input/output circuitry. The slot is coupled to the 32-bit address bus, being substantially a NUBUS bus, and the slot includes distinct identification line means which provide the slot with an identification number (distinct number) in the computer system. The card includes a decoder means which is coupled to the slot to receive the identification number; the decoder means has memory reservation means which causes 256 megabytes of memory space to be reserved for the card in the slot, such that, where the slot number is X, the 256 megabytes of reserved memory space begins at location $X000 0000 and ends at locations $XFFF FFFF.

...read moreread less

Proceedings Article•DOI•

Performance of complex queries in Main Memory Database Systems

[...]

Dina Bitton¹, Maria Hanrahan², Carolyn Turbyfill¹•Institutions (2)

Cornell University¹, IBM²

03 Feb 1987

TL;DR: This paper describes and evaluates the main memory database structures and query processing algorithms implemented in this prototype of a Main Memory Database System (MMDBS) that was designed to support complex interactive queries and identifies strategies that exploit memory residence effectively.

...read moreread less

Abstract: Memory residence can buy both functionality and performance for a database management system. In this paper, we present a description and a benchmark of an experimental implementation of a Main Memory Database System (MMDBS) that was designed to support complex interactive queries. We describe and evaluate the main memory database structures and query processing algorithms implemented in this prototype. Our measurements and analysis, focused on aggregates and joins, include both memory requirements and response time, since there is a clear trade-off between space and time in the design of a MMDBS. In contrast to conventional Disk-based Database Systems (DDBS's), we found that an MMDBS can efficiently execute complex relational queries. We identify strategies that exploit memory residence effectively. We also identified a number of performance problems related to query optimization in main memory and memory management for MMDBS's.

...read moreread less

Studies in Prolog architectures

[...]

E. Tick

01 Jul 1987

TL;DR: This dissertation provides previously unavailable information concerning the memory-referencing characteristics of logic programming languages executing on hierarchical memory organizations, thus contributing to processor memory design.

...read moreread less

Abstract: This dissertation addresses the problem of how logic programs can be made to execute at high speeds. Prolog, chosen as a representative logic programming language, differs from procedural languages in that it is applicative, nondeterminate and uses unification as its primary operation. Program performance is directly related to memory performance because high-speed processors are ultimately limited by memory bandwidth and architectures that require less bandwidth have greater potential for high performance. This dissertation reports the dynamic data and instruction referencing characteristics of both sequential and parallel Prolog architectures and corresponding uniprocessor and multiprocessor memory-hierarchy performance tradeoffs. Initially, a family of canonical architectures, corresponding closely to Prolog, is defined from the principles of ideal machine architectures of Flynn, and is then refined into the realizable Warren Abstract Machine (WAM) architecture. The memory-referencing behavior of these architectures is examined by tracing memory references during emulation of a set of Prolog benchmarks. Measurements of the canonical architectures indicate the upper memory-performance bounds of sequential execution. Measurements of the WAM provide frequencies of memory references and indicate that the WAM approaches the performance of the canonical Prolog architectures on current hosts. Two-level memory hierarchies for both sequential (WAM) and parallel (PWAM) Prolog architectures are modeled. PWAM is the Restricted-AND Parallel architecture of Hermenegildo. Local memory designs are simulated using memory traces, whereas main memory designs are analyzed with queueing models. The results show that small buffers (256 words or less) can significantly reduce Prolog's memory bandwidth requirement, primarily by capturing shallow backtracking information. Larger, more general local memories, such as caches, are necessary in high-performance systems to further reduce memory traffic. Local memory consistency protocols for a shared memory PWAM multiprocessor are analyzed. Measurements indicate that the memory-referencing overheads of exploiting Restricted-AND Parallelism are minor. These results show, however, that as few as eight high-performance processing elements can saturate a shared bus. With emerging bus technology and properly interleaved shared-memory, limited-size multiprocessors of this type have great potential for cost-effective speedups. This dissertation provides previously unavailable information concerning the memory-referencing characteristics of logic programming languages executing on hierarchical memory organizations, thus contributing to processor memory design.

...read moreread less

Graph transformation algorithms for array memory optimization in applicative languages

[...]

James Mcgraw, John Elvin Ranelletti

01 Jan 1987

TL;DR: A compile-time graph algorithm strategy, for use with the SISAL programming language, that identifies cases where preallocating memory storage locations for array aggregates is possible, and indicates that array memory allocation for some applicative language programs can be efficient.

...read moreread less

Abstract: In applicative language implementations, the potential copying of array elements can severely restrict the efficiency of the run-time code. In many instances, copy avoidance can be achieved through memory preallocation. This PhD dissertation presents a compile-time graph algorithm strategy, for use with the SISAL programming language, that identifies cases where preallocating memory storage locations for array aggregates is possible. The preallocation actions to be taken are specified by providing intermediate language graph transformations that designate the run-time creation of memory buffers prior to array creation. The preallocation analysis algorithms predict a significant savings in the execution time of selected sample programs: more than 90% of the array copying operations can be removed from existing unoptimized implementations. The results indicate that array memory allocation for some applicative language programs can be efficient.

...read moreread less

Patent•

System for transferring data between an interleaved main memory and an I/O device at high speed

[...]

Jun-ichi Kihara¹, Hiroyuki Kaneko¹•Institutions (1)

Toshiba¹

27 Aug 1987

TL;DR: In this paper, the memory read request is inhibited in response to a request inhibit instruction generated by a request-inhibit generating section, which is canceled when memory interleaved data on the basis of memory read requests for the same memory bank are input to a buffer in the apparatus.

...read moreread less

Abstract: An input/output channel apparatus includes a system bus controller for generating a memory read request and outputting a memory address. Generation of the memory read request is inhibited in response to a request inhibit instruction generated by a request-inhibit generating section. When a memory bank other than one accessed in response to the immediately preceding memory read request is accessed, the request-inhibit instruction is generated. The request-inhibit instruction is canceled when memory interleaved data on the basis of the memory read requests for the same memory bank are input to a buffer in the apparatus.

...read moreread less

Patent•

Memory management unit

[...]

Norio Nakagawa¹, Katsuaki Takagi¹, Tuneo Funabashi¹•Institutions (1)

Hitachi¹

17 Feb 1987

TL;DR: In this article, serial communication between memory management units is used to reduce the overhead of communication between the memory management unit and the operating system, so that the overhead can be reduced and memory management can be made correctly.

...read moreread less

Abstract: In a multiprocessor system having a hierachal memory device employing a virtual memory system, serial communication means which makes it possible for memory management units, which are disposed for CPUs, respectively, to communicate with one another, so that any change of common memory management information can be exchanged directly between the memory management units As a result, it is not necessary for each CPU to inform the memory management unit of any change of the memory management information by an operating system, so that the overhead of communication between CPUs can be reduced and memory management can be made correctly without applying any load to the operating system even when any change occurs in the memory management information

...read moreread less

Proceedings Article•DOI•

High performance integrated Prolog processor IPP

[...]

Shigeo Abe¹, Tadaaki Bandoh¹, Shinichiro Yamaguchi¹, Kenichi Kurosawa¹, K. Kiriyama¹ - Show less +1 more•Institutions (1)

Hitachi¹

01 Jun 1987

TL;DR: An integrated Prolog processor (IPP) and its optimized compiler are now being developed and new functions such as indexing by the optimal argument and global register assignment across determinate built-in predicates are introduced.

...read moreread less

Abstract: To realize the highest performance possible for a sequential processor, and to realize utilization of a large amount of existing software, an integrated Prolog processor (IPP) and its optimized compiler are now being developed.A tagged architecture under constraints of a general purpose computer and a memory management strategy to achieve a high performance are discussed and then an IPP architecture is presented. Based on the Prolog instruction set, which is an extension of Warren's, the Prolog compiler introduces new functions such as indexing by the optimal argument and global register assignment across determinate built-in predicates.The performance of the IPP for the append program is 1 million logical inferences per second, which is the highest possible for a sequential processor. In the 8-queen program a considerable speed-up is obtained by the new functions.

...read moreread less

Patent•

Memory failure detection apparatus

[...]

Richard A. Lemay¹, David A. Wallace¹•Institutions (1)

Honeywell¹

27 Feb 1987

TL;DR: In this paper, a memory failure detection apparatus is described, which is used with a large capacity memory that is organized in banks of memory, and with which error correction circuitry is used to correct correctable errors and provide an indication of same.

...read moreread less

Abstract: Memory failure detection apparatus is disclosed which is used with a large capacity memory that is organized in banks of memory, and with which error correction circuitry is used to correct correctable errors and provide an indication of same. The detection apparatus is responsive to the error indications and to a bank select addressing signal to provide and store error counts for a bank or banks of memory located on each memory board. A system processor periodically reads the error counts and responds to same to provide a maintenance message indicating that a specific memory board is to be replaced.

...read moreread less

Proceedings Article•

KL1 Execution Model for PIM Cluster with Shared Memory.

[...]

Masatoshi Sato, Hajime Shimizu, Akira Matsumoto, Kazuaki Rokusawa, Atsuhiro Goto - Show less +1 more

01 Jan 1987

Proceedings Article•

Parallel garbage collection on a virtual memory system

[...]

Santosh G. Abraham, Janak H. Patel

01 Jan 1987

TL;DR: An architecture and an associated algorithm for parallel garbage collection on a virtual memory system for virtual memory systems where both the list processor and the garbage collector have private memories is presented.

...read moreread less

Abstract: Since most artificial intelligence applications are programmed in list processing languages, it is important to design architectures to support efficient garbage collection. This paper presents an architecture and an associated algorithm for parallel garbage collection on a virtual memory system. All the previously proposed parallel algorithms attempt to collect cells released by the list processor during the garbage collection cycle. We do not attempt to collect such cells. As a consequence, the list processor incurs little overhead in the proposed scheme, since it need not synchronize with the collector. Most parallel algorithms are designed for shared memory machines which have certain implicit synchronization functions on variable access. The proposed algorithm is designed for virtual memory systems where both the list processor and the garbage collector have private memories. The enforcement of coherence between the two private memories can be expensive and is not necessary in our scheme. 15 refs., 3 figs.

...read moreread less