Showing papers on "Memory management published in 1996"

PDF

Open Access

Journal Article•DOI•

TreadMarks: shared memory computing on networks of workstations

[...]

Cristiana Amza¹, Alan L. Cox¹, Sandhya Dwarkadas¹, P. Keleher¹, Honghui Lu¹, Ramakrishnan Rajamony¹, Weimin Yu¹, Willy Zwaenepoel¹ - Show less +4 more•Institutions (1)

Rice University¹

01 Feb 1996-IEEE Computer

TL;DR: This work discusses the experience with parallel computing on networks of workstations using the TreadMarks distributed shared memory system, which allows processes to assume a globally shared virtual memory even though they execute on nodes that do not physically share memory.

...read moreread less

Abstract: Shared memory facilitates the transition from sequential to parallel processing. Since most data structures can be retained, simply adding synchronization achieves correct, efficient programs for many applications. We discuss our experience with parallel computing on networks of workstations using the TreadMarks distributed shared memory system. DSM allows processes to assume a globally shared virtual memory even though they execute on nodes that do not physically share memory. We illustrate a DSM system consisting of N networked workstations, each with its own memory. The DSM software provides the abstraction of a globally shared memory, in which each processor can access any data item without the programmer having to worry about where the data is or how to obtain its value.

...read moreread less

917 citations

Proceedings Article•DOI•

Memory Bandwidth Limitations of Future Microprocessors

[...]

Doug Burger¹, James R. Goodman¹, Alain Kagi¹•Institutions (1)

University of Wisconsin-Madison¹

01 May 1996

TL;DR: It is predicted that off-chip accesses will be so expensive that all system memory will reside on one or more processor chips, and pin bandwidth limitations will make more complex on-chip caches cost-effective.

...read moreread less

Abstract: This paper makes the case that pin bandwidth will be a critical consideration for future microprocessors. We show that many of the techniques used to tolerate growing memory latencies do so at the expense of increased bandwidth requirements. Using a decomposition of execution time, we show that for modern processors that employ aggressive memory latency tolerance techniques, wasted cycles due to insufficient bandwidth generally exceed those due to raw memory latencies. Given the importance of maximizing memory bandwidth, we calculate effective pin bandwidth, then estimate optimal effective pin bandwidth. We measure these quantities by determining the amount by which both caches and minimal-traffic caches filter accesses to the lower levels of the memory hierarchy. We see that there is a gap that can exceed two orders of magnitude between the total memory traffic generated by caches and the minimal-traffic caches---implying that the potential exists to increase effective pin bandwidth substantially. We decompose this traffic gap into four factors, and show they contribute quite differently to traffic reduction for different benchmarks. We conclude that, in the short term, pin bandwidth limitations will make more complex on-chip caches cost-effective. For example, flexible caches may allow individual applications to choose from a range of caching policies. In the long term, we predict that off-chip accesses will be so expensive that all system memory will reside on one or more processor chips.

...read moreread less

376 citations

Book•

ARM System Architecture

[...]

Steve Furber

15 Jan 1996

TL;DR: This book helps the reader to get started with the ARM chip and get programs running under emulation and discusses assembly level programming, particularly for the ARM.

...read moreread less

Abstract: From the Publisher: Features allows the reader to get started with the ARM chip and get programs running under emulation discusses assembly level programming, particularly for the ARM provides information on general computer architecture (processor design, caches, memory management) with detailed illustrations based on ARM chips details the architecture development process covers embedded system design principles and case studies

...read moreread less

338 citations

Journal Article•DOI•

The Nexus Approach to Integrating Multithreading and Communication

[...]

Ian Foster¹, Carl Kesselman², Steven Tuecke¹•Institutions (2)

Argonne National Laboratory¹, California Institute of Technology²

25 Aug 1996-Journal of Parallel and Distributed Computing

TL;DR: This paper proposes an approach based on global pointer and remote service request mechanisms, and explains how these mechanisms support dynamic communication structures, asynchronous messaging, dynamic thread creation and destruction, and a global memory model via interprocessor references.

...read moreread less

298 citations

Journal Article•DOI•

ARB: a hardware mechanism for dynamic reordering of memory references

[...]

Manoj Franklin¹, Gurindar S. Sohi²•Institutions (2)

Clemson University¹, University of Wisconsin-Madison²

01 May 1996-IEEE Transactions on Computers

TL;DR: In this paper, an Address Resolution Buffer (ARB) is proposed for dynamic reordering of memory references in the sequential instruction stream, which supports disambiguation of memory reference addresses in a decentralized manner.

...read moreread less

Abstract: To exploit instruction level parallelism, it is important not only to execute multiple memory references per cycle, but also to reorder memory references-especially to execute loads before stores that precede them in the sequential instruction stream. To guarantee correctness of execution in such situations, memory reference addresses have to be disambiguated. This paper presents a novel hardware mechanism, called an Address Resolution Buffer (ARB), for performing dynamic reordering of memory references. The ARB supports the following features: (1) dynamic memory disambiguation in a decentralized manner, (2) multiple memory references per cycle, (3) out-of-order execution of memory references, (4) unresolved loads and stores, (5) speculative loads and stores, and (6) memory renaming. The paper presents the results of a simulation study that we conducted to verify the efficacy of the ARB for a superscalar processor. The paper also shows the ARB's application in a multiscalar processor.

...read moreread less

266 citations

Proceedings Article•DOI•

Static detection of dynamic memory errors

[...]

David Evans¹•Institutions (1)

Massachusetts Institute of Technology¹

01 May 1996

TL;DR: In this paper, the authors introduce annotations to make certain assumptions explicit at interface points, which can detect a broad class of errors including misuses of null pointers, uses of dead storage, memory leaks, and dangerous aliasing.

...read moreread less

Abstract: Many important classes of bugs result from invalid assumptions about the results of functions and the values of parameters and global variables. Using traditional methods, these bugs cannot be detected efficiently at compile-time, since detailed cross-procedural analyses would be required to determine the relevant assumptions. In this work, we introduce annotations to make certain assumptions explicit at interface points. An efficient static checking tool that exploits these annotations can detect a broad class of errors including misuses of null pointers, uses of dead storage, memory leaks, and dangerous aliasing. This technique has been used successfully to fix memory management problems in a large program.

...read moreread less

256 citations

Proceedings Article•DOI•

Missing the Memory Wall: The Case for Processor/Memory Integration

[...]

Ashley Saulsbury¹, Fong Pong², Andreas Nowatzyk²•Institutions (2)

Swedish Institute of Computer Science¹, Sun Microsystems²

01 May 1996

TL;DR: It is shown that processor memory integration can be used to build competitive, scalable and cost-effective MP systems and results from execution driven uni- and multi-processor simulations show that the benefits of lower latency and higher bandwidth can compensate for the restrictions on the size and complexity of the integrated processor.

...read moreread less

Abstract: Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems. These CPU-centric designs invest a lot of power and chip area to bridge the widening gap between CPU and main memory speeds. Yet, many large applications do not operate well on these systems and are limited by the memory subsystem performance.This paper argues for an integrated system approach that uses less-powerful CPUs that are tightly integrated with advanced memory technologies to build competitive systems with greatly reduced cost and complexity. Based on a design study using the next generation 0.25µm, 256Mbit dynamic random-access memory (DRAM) process and on the analysis of existing machines, we show that processor memory integration can be used to build competitive, scalable and cost-effective MP systems.We present results from execution driven uni- and multi-processor simulations showing that the benefits of lower latency and higher bandwidth can compensate for the restrictions on the size and complexity of the integrated processor. In this system, small direct mapped instruction caches with long lines are very effective, as are column buffer data caches augmented with a victim cache.

...read moreread less

235 citations

Patent•

Transparent Local And Distributed Memory Management System

[...]

Blaine Garst, Ali T. Ozer, Bertrand Serlet, Trey Redwood City Matteson

31 Jan 1996

TL;DR: In this paper, an autorelease pool is created at the beginning of a new duty cycle, which retains the newly allocated memory space during the duty cycle and is automatically disposed of at the end of a duty cycle.

...read moreread less

Abstract: The present invention discloses a system for transparent local and distributed memory management. The invention overcomes the prior art's requirement of keeping track of whether a memory space allocated to a new object or a new program or data structure can be reclaimed. According to the present invention an autorelease pool is created at the beginning of a new duty cycle. The autorelease pool retains the newly allocated memory space during the duty cycle. The autorelease pool is automatically disposed of at the end of the duty cycle. As a result of disposing the autorelease pool, the newly allocated memory space is reclaimed (i.e., deallocated). The present invention is useful in distributed networks where different programming conventions on remote and local machines made the prior art's memory management task particularly difficult. The present invention is also useful in an object-oriented programming environment.

...read moreread less

188 citations

Patent•

Dynamic memory allocation in a computer using a bit map index

[...]

Douglas James McMahon¹, George Buzsaki¹•Institutions (1)

Oracle Corporation¹

24 May 1996

TL;DR: A dynamic memory allocator as mentioned in this paper assigns portions of memory into a large number of slots that include zero or more memory blocks of equal size, not currently in use in the computer.

...read moreread less

Abstract: A dynamic memory allocator in a computer assigns portions of memory into a large number of slots that include zero or more memory blocks of equal size. Free lists identify memory blocks, corresponding to a slot size, not currently in use in the computer. Software programs generate requests, including a size, for a memory block. The size of the requests are rounded up to the nearest slot size. To allocate a memory block, the free lists are searched, using a bit map index or a hierarchical bit map index, to identify an available memory block to accommodate the memory block requested. The dynamic memory allocator handles large block allocations different from small block allocations. A virtual memory allocator stores a plurality of pointers to identify one or more virtual pages of memory for allocation to the dynamic memory allocator.

...read moreread less

175 citations

Patent•

Method and system for controlling power consumption in a computer system

[...]

Kenneth Reneris¹•Institutions (1)

Microsoft¹

12 Mar 1996

TL;DR: In this article, a portable, software-controlled system for managing power consumption in a computer system is presented, which is integrated with the operating system of the computer and is extensible to any add-on devices that are installed into the computer system.

...read moreread less

Abstract: A portable, software-controlled system for managing power consumption in a computer system. The power management system is integrated with the operating system of the computer system and is extensible to any add-on devices that are installed into the computer system. Upon the detection of a power down condition indicating that the computer system should be powered down, the power management system may verify that the computer system can be powered down without causing any of the devices that are connected to the computer to lose application data. If all of the devices agree that the computer system can be powered down, then each device has its state saved into memory and is powered down. Next, the state of each processor is saved into memory and power to the processors is disabled. In order to suspend the computer system, power to the memory is maintained, allowing each device state and processor state to be restored upon reboot. The computer system may be hibernated by writing all of the active memory (including each device state and processor state) to a secondary storage area and then powering off the entire computer system, including memory.

...read moreread less

173 citations

Patent•

Remote checkpoint memory system and protocol for fault-tolerant computer system

[...]

Jack J. Stiffler

27 Nov 1996

TL;DR: In this paper, a mechanism for maintaining a consistent, periodically updated state in main memory without constraining normal computer operation is provided, thereby enabling a computer system to recover from faults without loss of data or processing continuity.

...read moreread less

Abstract: A mechanism for maintaining a consistent, periodically updated state in main memory without constraining normal computer operation is provided, thereby enabling a computer system to recover from faults without loss of data or processing continuity. In this invention, a first computer includes a processor and input/output elements connected to a main memory subsystem including a primary element. A second computer has a remote checkpoint memory element, which may include one or more buffer memories and a shadow memory, which is connected to the main memory subsystem of the first computer. During normal processing, an image of data written to the primary memory element is captured by the remote checkpoint memory element. When a new checkpoint is desired (thereby establishing a consistent state in main memory to which all executing applications can safely return following a fault), the data previously captured is used to establish a new checkpointed state in the second computer. In case of failure of the first computer, the second computer can be restarted to operate from the last checkpoint established for the first computer. This structure and protocol can guarantee a consistent state in main memory, thus enabling fault-tolerant operation.

...read moreread less

Patent•

Unified memory computer architecture with dynamic graphics memory allocation

[...]

Michael J. K. Nielsen¹, Zahid S. Hussain¹•Institutions (1)

Nielsen Holdings N.V.¹

13 Sep 1996

TL;DR: In this paper, a computer system includes a memory controller, a unified system memory, and memory clients each having access to the system memory via the memory controller and translation hardware is included for mapping virtual addresses of pixel buffers to physical memory locations.

...read moreread less

Abstract: A computer system provides dynamic memory allocation for graphics. The computer system includes a memory controller, a unified system memory, and memory clients each having access to the system memory via the memory controller. Memory clients can include a graphics rendering engine, a CPU, an image processor, a data compression/expansion device, an input/output device, a graphics back end device. The computer system provides read/write access to the unified system memory, through the memory controller, for each of the memory clients. Translation hardware is included for mapping virtual addresses of pixel buffers to physical memory locations in the unified system memory. Pixel buffers are dynamically allocated as tiles of physically contiguous memory. Translation hardware is implemented in each of the computational devices, which are included as memory clients in the computer system, including primarily the rendering engine.

...read moreread less

Journal Article•DOI•

Memory channel network for PCI

[...]

R.B. Gillett

01 Feb 1996-IEEE Micro

TL;DR: MC implements a form of virtual shared memory that permits applications to completely bypass the operating system and perform cluster communication directly from the user level, and drops communication latency and overhead by up to three orders of magnitude.

...read moreread less

Abstract: A memory-based networking approach provides clusters of computers up to 1,000 times the communication performance of conventional networks, with no compromise in cost or reliability. The memory channel for PCI's performance gains are the result of a system design approach that exploits natural cluster constraints to define a memory-based network. MC implements a form of virtual shared memory that permits applications to completely bypass the operating system and perform cluster communication directly from the user level. The hardware's simple and powerful communication model supports error handling at almost no cost or complexity to the application; guaranteed ordering under errors is the key innovation. The end result: Real-world cluster communication latency dropped by up to two orders of magnitude, and overhead by up to three orders of magnitude. These improvements elevate a lowly set of standard PCI computers running Unix into an impressive, highly available, parallel computing system.

...read moreread less

Patent•

Storage system with a flash memory module

[...]

Hiroshi Sukegawa¹•Institutions (1)

Toshiba¹

25 Jun 1996

TL;DR: In this article, a memory module is composed of a plurality of memory blocks arranged in units of in-unison erase blocks, and some of the memory blocks are used as management tables storing management information for managing address allocation of each memory block and the number of rewrites.

...read moreread less

Abstract: In a storage system using flash EEPROMs, a memory module is composed of a plurality of memory blocks arranged in units of in-unison erase blocks. Some of the memory blocks are used as management tables storing management information for managing address allocation of each memory block and the number of rewrites for each memory block. A memory controller reads the management information from the management table at the system start-up, and based on the management information, controls the read/write access to each memory block.

...read moreread less

Patent•

DRAM with high bandwidth interface that uses packets and arbitration

[...]

Craig Hansen¹, Timothy B. Robinson¹, Alan G. Corry¹•Institutions (1)

MicroUnity¹

23 Feb 1996

TL;DR: In this paper, the authors propose a memory chip for storage and retrieval of data transmitted as streams of data at sustained peak data transfer rates. The memory chip includes a memory device and an interface capable of achieving high bandwidth throughput.

...read moreread less

Abstract: A memory chip for storage and retrieval of data transmitted as streams of data at sustained peak data transfer rates. The memory chip includes a memory device and an interface capable of achieving high bandwidth throughput. The memory device decodes, arbitrates between, and executes memory access commands, and generates memory access responses. The interface includes a data path, and a number of memory controllers. The interface receives and transmits input and output data streams, and the memory controllers control the flow of the input and output data streams within the memory chip. A packet buffer is coupled between the data path and the memory device. The packet buffer provides for temporary storage of memory access commands, response information, and forwarding data.

...read moreread less

Patent•

Secure memory management unit for microprocessor

[...]

Richard J. Takahashi¹, Daniel N. Heer¹•Institutions (1)

VLSI Technology¹

20 Sep 1996

TL;DR: In this paper, a secure embedded memory management unit for a microprocessor is used for encrypted instruction and data transfer from an external memory, where all of the processing takes place on buses internal to the chip, detection of clear unencrypted instructions and data is prevented.

...read moreread less

Abstract: A secure embedded memory management unit for a microprocessor is used for encrypted instruction and data transfer from an external memory. Physical security is obtained by embedding the direct memory access controller on the same chip with a microprocessor core, an internal memory, and an encryption/decryption logic. Data transfer to and from an external memory takes place between the external memory and the memory controller of the memory management unit. All firmware to and from the external memory is handled on a page-by-page basis. Since all of the processing takes place on buses internal to the chip, detection of clear unencrypted instructions and data is prevented.

...read moreread less

Patent•

Mirrored memory dual controller disk storage system

[...]

Marvin D. Nelson¹, Barry J. Oldfield¹, Mark D. Petersen¹•Institutions (1)

Hewlett-Packard¹

29 Apr 1996

TL;DR: In this article, a disk storage control system includes dual controllers having real-time, synchronous, mirrored memory therebetween to provide immediate, accurate, and reliable failover in the event of controller failure.

...read moreread less

Abstract: A disk storage control system includes dual controllers having real-time, synchronous, mirrored memory therebetween to provide immediate, accurate, and reliable failover in the event of controller failure. Non-volatile random access memory provides retention of data during a loss of power and during the manipulation of hardware for purposes of repair. A communication path is established within the mirrored memory between the controllers to monitor and coordinate their activities. The state of the mirrored memory is continuously monitored for accuracy of the mirror and failure detection. Concurrent and ready access by a host computer to the same disk storage control data set from each controller is provided without need for extra manipulation or extra direct memory access (DMA) activity to satisfy host requests. Accordingly, either controller can provide immediate and reliable failover control for the disk storage system. Furthermore, either controller can be hot swapped in the event of failure without the need for preparatory intervention. Finally, a secondary controller can recover a mirror image from a failed stand alone controller memory to provide continued operations thereby so long as the mirrored memory was not the failing component.

...read moreread less

Patent•

Apparatus and method for preventing theft of computer devices

[...]

Wayne W. Chou, Laszlo Elteto, Joseph M. Kulinets, Joseph LaRussa

19 Jul 1996

TL;DR: In this paper, the authors present an approach and method for discouraging computer theft by requiring that a password or other unique information be supplied to the computer before the computer BIOS routines can be completely executed.

...read moreread less

Abstract: Apparatus and method for discouraging computer theft The apparatus and method requires that a password or other unique information be supplied to the computer before the computer BIOS routines can be completely executed A BIOS memory storing the BIOS routines includes a security routine which will determine whether or not the required password entered by the user, or a known quantity read from an externally connected memory device is present The security function stored within the BIOS memory also includes an administration function which permits the computer to be either placed in a locked state, thereby requiring password or the known quantity read from an externally connected memory device to be present each time the computer is booted up The administration function also permits an unlock state which permits the computer boot up process to complete without entering any password or externally supplied quantity The external memory location is consulted during each boot up sequence, to determine whether the computer has been placed in the locked or in the unlocked state If the security depends upon the supply of the known quantity from an externally connected memory device, the computer will be inoperable to anyone not in possession of the external memory device In the event that the external memory location bearing the locked or unlocked code is removed, the security function assumes the computer to be in the locked state, thus frustrating avoidance of the locked state by tampering with the external memory

...read moreread less

Patent•

Method and apparatus for allocating shared memory resources among a plurality of queues each having a threshold value therefor

[...]

Nanying Yin¹•Institutions (1)

Nortel¹

22 Apr 1996

TL;DR: In this article, a system for allocating shared memory resources among a plurality of queues and discarding incoming data as necessary is presented, where a threshold value is generated for each queue indicating a maximum amount of data to be stored in the associated queue.

...read moreread less

Abstract: A system for allocating shared memory resources among a plurality of queues and discarding incoming data as necessary. The shared memory resources are monitored to determine a number of available memory buffers in the shared memory. A threshold value is generated for each queue indicating a maximum amount of data to be stored in the associated queue. Threshold values are updated in response to changes in the number of available memory buffers.

...read moreread less

Patent•

Method and system for fast recovery of a primary store database using selective recovery by data type

[...]

Ingvar Nilsson¹, Zoran Todorovic¹•Institutions (1)

Ericsson¹

06 Feb 1996

TL;DR: In this paper, the authors present an electronic data storage and processing system where non-persistent memory such as random access memory (RAM) stores a database with a first memory section storing semi-permanent data and a second memory section stored transient types of data.

...read moreread less

Abstract: The present invention provides an electronic data storage and processing system where non-persistent memory such as random access memory (RAM) stores a database with a first memory section storing semi-permanent data and a second memory section storing transient types of data. A third memory section in RAM may be used to buffer database transactions relating to the semi-permanent data stored in the first memory section of RAM. At periodic and appropriate checkpoint time intervals, the semi-permanent data currently stored in the first section of RAM are copied or "dumped" onto persistent (disk) memory. Only those database transactions that affect the semi-permanent data stored in the first section of RAM occurring after the most recent checkpoint "dump" are logged onto persistent memory. Database transactions affecting the transient type of data stored in the second portion of RAM are not logged. A recovery processor recovers from a system failure by reloading semi-permanent data is from the persistent memory into the first section of RAM and executing the log. However, in one embodiment, the recovery processor may leave the data in the second section of RAM in the state in which that data exists after the system failure. Considerable time is saved by not logging transient database transactions or executing a log for those transactions when recovering from a system failure.

...read moreread less

Lock-free data structures

[...]

John David Valois

01 Jan 1996

TL;DR: This thesis presents lock-free data structures, algorithms, and memory management techniques for several common abstract data types that are as efficient, if not more so, than conventional approaches, and thus provide a practical alternative to using spin locks.

...read moreread less

Abstract: Data structures which are shared among concurrent processes require some sort of synchronization in order to avoid becoming corrupted by conflicting updates and to ensure that the processes see correct results. This can be accomplished through mutual exclusion; guaranteeing a process exclusive access while performing critical operations on the data structure. While well understood, this approach can have detrimental effects on performance in an asynchronous environment where processes can suffer unpredictable delays. An alternative approach is to avoid the use of mutual exclusion through the use of simple synchronization primitives such as Compare-and-Swap. Such lock-free data structures can be immune from performance degradation due to slow processes. Universal methods for constructing lock-free data structures for any abstract data type are known, but the resulting implementations are much less efficient than using conventional techniques for mutual exclusion such as spin locks. In this thesis, we present lock-free data structures, algorithms, and memory management techniques for several common abstract data types. Our techniques result in implementations that are as efficient, if not more so, than conventional approaches, and thus provide a practical alternative to using spin locks. We demonstrate the efficiency of our techniques experimentally, and we also show how standard axiomatic formal proof methods can be adapted for the verification of our algorithms.

...read moreread less

Proceedings Article•DOI•

Power exploration for data dominated video applications

[...]

Sven Wuytack¹, Francky Catthoor¹, Lode Nachtergaele¹, H. De Man¹•Institutions (1)

Katholieke Universiteit Leuven¹

12 Aug 1996

TL;DR: This formalized methodology is based on the observation that for this type of application the power consumption is dominated by the memory architecture and the first exploration stage should be to come up with an optimized memory organisation.

...read moreread less

Abstract: In this paper we present our power exploration methodology for data dominated video applications. This formalized methodology is based on the observation that for this type of application the power consumption is dominated by the memory architecture. Hence, the first exploration stage should be to come up with an optimized memory organisation. Other important observations are that the power consumption of the address generators is of the same magnitude as that of the data-paths and that the address generators are better optimized using specialized techniques.

...read moreread less

Proceedings Article•DOI•

Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors

[...]

Mark Horowitz¹, Margaret Martonosi², Todd C. Mowry³, Michael D. Smith⁴•Institutions (4)

Stanford University¹, Princeton University², University of Toronto³, Harvard University⁴

01 May 1996

TL;DR: It is demonstrated that the runtime overhead of invoking the informing mechanism on the Alpha 21164 and MIPS R10000 processors is generally small enough to provide considerable flexibility to hardware and software designers, and that the cache coherence application has improved performance compared to other current solutions.

...read moreread less

Abstract: Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem successfully in specific situations. However, the generality of these software approaches has been limited because current architectures do not provide a fine-grained, low-overhead mechanism for observing and reacting to memory behavior directly. To fill this need, we propose a new class of memory operations called informing memory operations, which essentially consist of a memory operation combined (either implicitly or explicitly) with a conditional branch-and-link operation that is taken only if the reference suffers a cache miss. We describe two different implementations of informing memory operations---one based on a cache-outcome condition code and another based on low-overhead traps---and find that modern in-order-issue and out-of-order-issue superscalar processors already contain the bulk of the necessary hardware support. We describe how a number of software-based memory optimizations can exploit informing memory operations to enhance performance, and look at cache coherence with fine-grained access control as a case study. Our performance results demonstrate that the runtime overhead of invoking the informing mechanism on the Alpha 21164 and MIPS R10000 processors is generally small enough to provide considerable flexibility to hardware and software designers, and that the cache coherence application has improved performance compared to other current solutions. We believe that the inclusion of informing memory operations in future processors may spur even more innovative performance optimizations.

...read moreread less

Patent•

MPEG decoder frame memory interface which is reconfigurable for different frame store architectures

[...]

Srinivasa R. Malladi¹, Surya Varansi², Vanya Amla¹•Institutions (2)

LSI Corporation¹, Avago Technologies²

14 Jun 1996

TL;DR: In this paper, the authors present an MPEG decoder system and a method for decoding frames of a video sequence, which includes various slave devices which access a single external memory, wherein these slave devices include reconstruction logic or motion compensation logic, a reference frame buffer, display logic, prefetch buffer, and host bitstream logic, among others.

...read moreread less

Abstract: A frame memory interface architecture which is easily adaptable to interface to any of a plurality of frame memory storage architectures. In the preferred embodiment, the present invention comprises an MPEG decoder system and method for decoding frames of a video sequence. The MPEG decoder includes various slave devices which access a single external memory, wherein these slave devices include reconstruction logic or motion compensation logic, a reference frame buffer, display logic, a prefetch buffer, and host bitstream logic, among others. Each of the slave devices is capable of storing or retrieving data to/from the memory according to different frame storage formats, such as a scan line format, a tiled format, and a skewed tile format, among others. The frame memory interface is easily re-configurable to each of these different formats, thus providing improved efficiency according to the present invention. The slave device then generates a request to the memory controller. In response to the request, the memory controller reads the memory transfer values stored by the slave device and sets up an address generation process based on the memory transfer values. The memory controller then generates addresses to the memory according to this address generation process to perform the memory transfer based on the memory transfer values.

...read moreread less

Journal Article•DOI•

A high performance memory allocator for object-oriented systems

[...]

J.M. Chang¹, Edward F. Gehringer²•Institutions (2)

Illinois Institute of Technology¹, North Carolina State University²

01 Mar 1996-IEEE Transactions on Computers

TL;DR: A simple hardware design for buddy-system allocation that takes advantage of the speed of a pure combinational-logic implementation and uses memory more efficiently than the standard software approach is presented.

...read moreread less

Abstract: Object-oriented programming languages tend to allocate and deallocate blocks of memory very frequently. The growing popularity of these languages increases the importance of high-performance memory allocation. For speed and simplicity in memory allocation, the buddy system has been the method of choice for nearly three decades. A software realization incurs the overhead of internal fragmentation and of memory traffic due to splitting and coalescing memory blocks. This paper presents a simple hardware design for buddy-system allocation that takes advantage of the speed of a pure combinational-logic implementation. Two binary trees formed by anding and oring propagate information about the allocation status of blocks and subblocks. They implement a nonbacktracking search for the address of the first free block that is large enough to satisfy a request. Although the buddy system may allocate a block that is much larger than the requested size, the logic that finds a free block can be augmented by a "bit-flipper" to relinquish the unused portion at the end of the block. This effectively eliminates internal fragmentation. Simulation results show that the buddy system modified in this way uses less memory in most, though not all, programs than the unmodified buddy. Hence, the hardware buddy-system allocator is faster and uses memory more efficiently than the standard software approach.

...read moreread less

Proceedings Article•DOI•

MGS: A Multigrain Shared Memory System

[...]

Donald Yeung¹, John Kubiatowicz¹, Anant Agarwal¹•Institutions (1)

Massachusetts Institute of Technology¹

01 May 1996

TL;DR: This paper introduces the design of a shared memory system that uses multiple granularities of sharing, and presents an implementation on the Alewife multiprocessor, called MGS, and finds that unmodified shared memory applications can exploit multigrain sharing.

...read moreread less

Abstract: Parallel workstations, each comprising 10-100 processors, promise cost-effective general-purpose multiprocessing. This paper explores the coupling of such small- to medium-scale shared memory multiprocessors through software over a local area network to synthesize larger shared memory systems. We call these systems Distributed Scalable Shared-memory Multiprocessors (DSSMPs).This paper introduces the design of a shared memory system that uses multiple granularities of sharing, and presents an implementation on the Alewife multiprocessor, called MGS. Multigrain shared memory enables the collaboration of hardware and software shared memory, and is effective at exploiting a form of locality called multigrain locality. The system provides efficient support for fine-grain cache-line sharing, and resorts to coarse-grain page-level sharing only when locality is violated. A framework for characterizing application performance on DSSMPs is also introduced.Using MGS, an in-depth study of several shared memory applications is conducted to understand the behavior of DSSMPs. We find that unmodified shared memory applications can exploit multigrain sharing. Keeping the number of processors fixed, applications execute up to 85% faster when each DSSMP node is a multiprocessor as opposed to a uniprocessor. We also show that tightly-coupled multiprocessors hold a significant performance advantage over DSSMPs on unmodified applications. However, a best-effort implementation of a kernel from one of the applications allows a DSSMP to almost match the performance of a tightly-coupled multiprocessor.

...read moreread less

Patent•

Method and apparatus for consolidated buffer handling for computer device input/output

[...]

David Frank Harrison¹, Russell T. Williams¹, Thomas Eugene Saulpaugh¹•Institutions (1)

Apple Inc.¹

03 May 1996

TL;DR: In this paper, the I/O preparation table is set which defines the addresses and the type of addresses of the buffer, the size of the data to be transferred, the page size of buffer, and flags defining data flow and type.

...read moreread less

Abstract: A computer has a device driver and an operating system that call a consolidated buffer service routine to coordinate the transfer of data between a main memory and an external device. The consolidated buffer service routine includes a memory preparation service routine and a memory checking service routine. The memory preparation service routine coordinates data transfers between the external device and the memory with the operating system and a data cache, and with other data transfers. The memory preparation service routine ensures that the buffer remains assigned to the memory ranges until the memory checking service routine relinquishes the buffer. Before calling the memory preparation service routine, an I/O preparation table is set which defines the addresses and the type of addresses of the buffer, the size of the data to be transferred, the page size of the buffer, and flags defining data flow and type.

...read moreread less

Patent•

A sectored virtual memory management system and translation look-aside buffer (TLB) for the same

[...]

Alan H. Karp¹, Rajiv Gupta¹•Institutions (1)

Hewlett-Packard¹

30 May 1996

TL;DR: In this article, a memory management system is described which divides each virtual page into two or more sectors, each of these sectors can then be individually loaded into memory in order to reduce bandwidth consumed loading virtual pages into a physical memory.

...read moreread less

Abstract: A memory management system is described which divides each virtual page into two or more sectors. Each of these sectors can then be individually loaded into memory in order to reduce bandwidth consumed loading virtual pages into a physical memory. A TLB (10) for this system includes a plurality of TLB entries (12). Each TLB entry includes a variable physical page number (PPN FIELD) (18) and a variable length presence field (20). Each bit of the presence field indicates whether a corresponding sector is present in physical memory. The TLB entry (12) also includes a page size field (22), which indicates the size of the corresponding virtual page. This size field also indirectly controls the number of sectors within that page and, thus, the number of presence bits required. As the page size grows the number of bits required to store the physical page number reduces. These unused bits are then consumed by additional presence bits so that all the bits in the TLB entry are used for all page sizes and number of sectors.

...read moreread less

Patent•

Memory management for an MPEG2 compliant decoder

[...]

Dennis P. Cheney¹, Mark Louis Ciacelli¹, Steven B. Herndon¹, John David Myers¹, Chuck H. Ngai¹ - Show less +1 more•Institutions (1)

IBM¹

19 Mar 1996

TL;DR: Disclosed as discussed by the authors is a digital signal decoder system for receiving compressed encoded digitized video signals and transmitting decompressed decoded digital video signals with a minimum of DRAM demand through the use of a Spill Buffer.

...read moreread less

Abstract: Disclosed is a digital signal decoder system for receiving compressed encoded digitized video signals and transmitting decompressed decoded digital video signals. This is accomplished with a minimum of DRAM demand through the use of a Spill Buffer.

...read moreread less

Patent•

Computer system using software controlled power management method with respect to the main memory according to a program's main memory utilization states

[...]

Hideki Yoshida¹•Institutions (1)

Toshiba¹

29 Nov 1996

TL;DR: In this article, the authors present a power management method capable of realizing software control of the power supply with respect to the main memory according to a program's main memory utilization state, which is suitable for the power saving in a portable information terminal.

...read moreread less

Abstract: A computer system and its power management method capable of realizing a software control of the power supply with respect to the main memory according to a program's main memory utilization state, which are suitable for the power saving in a portable information terminal. In a computer system having a main memory device formed by a plurality of memory banks, a power source for supplying a power to operate the main memory device, and a processor for executing programs and managing allocation and release of memory regions on the main memory device with respect to the programs, a memory power management function determines any unused memory bank in which all memory regions are currently unused according to a state of the allocation and release of the memory regions on the main memory devices with respect to the programs managed by the processor, and selectively stops a power supply from the power source to the unused memory bank while supplying the power from the power source to remaining memory banks other than the unused memory bank.

...read moreread less

Collapse