scispace - formally typeset
Search or ask a question

Showing papers on "Memory management published in 1998"


Patent
19 Aug 1998
TL;DR: A transformation identification system for identification of transformation of cell addresses between different memory device topologies is proposed in this paper, providing the use of minimum memory space and time required for storage and computing defect data and also the flexibility of approach offering a user friendly interface and simplification of the transformation procedure.
Abstract: A transformation identification system for identification of transformation of cell addresses between different memory device topologies providing the use of minimum memory space and time required for storage and computing defect data and also the flexibility of approach offering a user friendly interface and simplification of the transformation procedure.

921 citations


Book
11 Feb 1998
TL;DR: This book discusses the role of the Device Driver, the Kernel Classes of Devices and Modules, and more about how Mounting and Unmounting works.
Abstract: Preface. Chapter 1. An Introduction to Device Drivers The Role of the Device Driver Splitting the Kernel Classes of Devices and Modules Security Issues Version Numbering License Terms Joining the Kernel Development Community Overview of the Book. Chapter 2. Building and Running Modules Kernel Modules Versus Applications Compiling and Loading The Kernel Symbol Table Initialization and Shutdown Using Resources Automatic and Manual Configuration Doing It in User Space Backward Compatibility Quick Reference. Chapter 3. Char Drivers The Design of scull Major and Minor Numbers File Operations The file Structure open and release scull's Memory Usage A Brief Introduction to Race Conditions read and write Playing with the New Devices The Device Filesystem Backward Compatibility Quick Reference. Chapter 4. Debugging Techniques Debugging by Printing Debugging by Querying Debugging by Watching Debugging System Faults Debuggers and Related Tools. Chapter 5. Enhanced Char Driver Operations ioctl Blocking I/O poll and select Asynchronous Notification Seeking a Device Access Control on a Device File Backward Compatibility Quick Reference. Chapter 6. Flow of Time Time Intervals in the Kernel Knowing the Current Time Delaying Execution Task Queues Kernel Timers Backward Compatibility Quick Reference. Chapter 7. Getting Hold of Memory The Real Story of kmalloc Lookaside Caches get_free_page and Friends vmalloc and Friends Boot-Time Allocation Backward Compatibility Quick Reference Chapter 8. Hardware Management I/O Ports and I/O Memory Using I/O Ports Using Digital I/O Ports Using I/O Memory Backward Compatibility Quick Reference. Chapter 9. Interrupt Handling Overall Control of Interrupts Preparing the Parallel Port Installing an Interrupt Handler Implementing a Handler Tasklets and Bottom-Half Processing Interrupt Sharing Interrupt-Driven I/O Race Conditions Backward Compatibility Quick Reference. Chapter 10. Judicious Use of Data Types Use of Standard C Types Assigning an Explicit Size to Data Items Interface-Specific Types Other Portability Issues Linked Lists Quick Reference. Chapter 11. kmod and Advanced Modularization Loading Modules on Demand Intermodule Communication Version Control in Modules Backward Compatibility Quick Reference. Chapter 12. Loading Block Drivers Registering the Driver The Header File blk.h Handling Requests: A Simple Introduction Handling Requests: The Detailed View How Mounting and Unmounting Works The ioctl Method Removable Devices Partitionable Devices Interrupt-Driven Block Drivers Backward Compatibility Quick Reference. Chapter 13. mmap and DMA Memory Management in Linux The mmap Device Operation The kiobuf Interface Direct Memory Access and Bus Mastering Backward Compatibility Quick Reference. Chapter 14. Network Drivers How snull Is Designed Connecting to the Kernel The net_device Structure in Detail Opening and Closing Packet Transmission Packet Reception The Interrupt Handler Changes in Link State The Socket Buffers MAC Address Resolution Custom ioctl Commands Statistical Information Multicasting Backward Compatibility Quick Reference. Chapter 15. Overview of Peripheral Buses The PCI Interface A Look Back: ISA PC/104 and PC/104+ Other PC Buses SBus NuBus External Buses Backward Compatibility Quick Reference. Chapter 16. Physical Layout of the Kernel Source Booting the Kernel Before Booting The init Process The kernel Directory The fs Directory The mm Directory The net directory ipc and lib include and arch Drivers. Glossary. Index

549 citations


Journal ArticleDOI
TL;DR: This work discusses the main additions to Java are immutable classes, multidimensional arrays, an explicitly parallel SPMD model of computation with a global address space, and zone-based memory management, and reports progress on the development of Titanium.
Abstract: Titanium is a language and system for high-performance parallel scientific computing. Titanium uses Java as its base, thereby leveraging the advantages of that language and allowing us to focus attention on parallel computing issues. The main additions to Java are immutable classes, multidimensional arrays, an explicitly parallel SPMD model of computation with a global address space, and zone-based memory management. We discuss these features and our design approach, and report progress on the development of Titanium, including our current driving application: a three-dimensional adaptive mesh refinement parallel Poisson solver. © 1998 John Wiley & Sons, Ltd.

433 citations



Proceedings ArticleDOI
01 May 1998
TL;DR: It is shown that on a suite of allocation-intensive C programs, regions are competitive with malloc/free and sometimes substantially faster and that regions support safe memory management with low overhead.
Abstract: Much research has been devoted to studies of and algorithms for memory management based on garbage collection or explicit allocation and deallocation. An alternative approach, region-based memory management, has been known for decades, but has not been well-studied. In a region-based system each allocation specifies a region, and memory is reclaimed by destroying a region, freeing all the storage allocated therein. We show that on a suite of allocation-intensive C programs, regions are competitive with malloc/free and sometimes substantially faster. We also show that regions support safe memory management with low overhead. Experience with our benchmarks suggests that modifying many existing programs to use regions is not difficult.

212 citations


Journal ArticleDOI
TL;DR: This work proposes a novel scheme called dynamic threshold (DT) that combines the simplicity of ST and the adaptivity of PO, and uses computer simulation to compare the loss performance of DT, ST, and PO.
Abstract: In shared-memory packet switches, buffer management schemes can improve overall loss performance, as well as fairness, by regulating the sharing of memory among the different output port queues. Of the conventional schemes, static threshold (ST) is simple but does not adapt to changing traffic conditions, while pushout (PO) is highly adaptive but difficult to implement. We propose a novel scheme called dynamic threshold (DT) that combines the simplicity of ST and the adaptivity of PO. The key idea is that the maximum permissible length, for any individual queue at any instant of time, is proportional to the unused buffering in the switch. A queue whose length equals or exceeds the current threshold value may accept no more arrivals. An analysis of the DT algorithm shows that a small amount of buffer space is (intentionally) left unallocated, and that the remaining buffer space becomes equally distributed among the active output queues. We use computer simulation to compare the loss performance of DT, ST, and PO. DT control is shown to be more robust to uncertainties and changes in traffic conditions than ST control.

211 citations


Patent
09 Oct 1998
TL;DR: In this article, a memory device with multiple clock domains is presented, where the different domains are sequentially turned on as needed to limit the power consumed, overlapped with the latency for the memory access to make the power control transparent to the user accessing the memory core.
Abstract: A memory device with multiple clock domains. Separate clocks to different portions of the control circuitry create different clock domains. The different domains are sequentially turned on as needed to limit the power consumed. The turn on time of the domains is overlapped with the latency for the memory access to make the power control transparent to the user accessing the memory core. The memory device can dynamically switch between a fast and a slow clock depending upon the needed data bandwidth. The data bandwidth across the memory interface can be monitored by the memory controller, and when it drops below a certain threshold, a slower clock can be used. The clock speed can be dynamically increased as the bandwidth demand increases.

166 citations


01 Jan 1998
TL;DR: The authors propose a zero copy message transfer with a pin-down cache technique which reuses the pinned-down area to decrease the number of calls to pin- down and release primitives.
Abstract: The overhead of copying data through the central processor by a message passing protocol limits data transfer bandwidth. If the network interface directly transfers the user's memory to the network by issuing DMA, such data copies may be eliminated. Since the DMA facility accesses the physical memory address space, user virtual memory must be pinned down to a physical memory location before the message is sent or received. If each message transfer involves pin-down and release kernel primitives, message transfer bandwidth will decrease since those primitives are quite expensive. The authors propose a zero copy message transfer with a pin-down cache technique which reuses the pinned-down area to decrease the number of calls to pin-down and release primitives. The proposed facility has been implemented in the PM low-level communication library on the RWC PC Cluster II, consisting of 64 Pentium Pro 200 MHz CPUs connected by a Myricom Myrinet network, and running NetBSD. The PM achieves 108.8 MBytes/sec for a 100% pin-down cache hit ratio and 78.7 MBytes/sec for all pin-down cache miss. The MPI library has been implemented on top of PM. According to the NAS parallel benchmarks result, an application is still better performance in case that cache miss ratio is very high.

140 citations


Proceedings ArticleDOI
01 Oct 1998
TL;DR: Comparing several virtual memory designs, including combinations of hierarchical and inverted page tables on hardware-managed and software-managed translation lookaside buffers (TLBs), shows that systems are fairly sensitive to TLB size and that VM overhead is roughly twice what was thought.
Abstract: Virtual memory is a staple in modem systems, though there is little agreement on how its functionality is to be implemented on either the hardware or software side of the interface. The myriad of design choices and incompatible hardware mechanisms suggests potential performance problems, especially since increasing numbers of systems (even embedded systems) are using memory management. A comparative study of the implementation choices in virtual memory should therefore aid system-level designers.This paper compares several virtual memory designs, including combinations of hierarchical and inverted page tables on hardware-managed and software-managed translation lookaside buffers (TLBs). The simulations show that systems are fairly sensitive to TLB size; that interrupts already account for a large portion of memory-management overhead and can become a significant factor as processors execute more concurrent instructions; and that if one includes the cache misses inflicted on applications by the VM system, the total VM overhead is roughly twice what was thought (10--20% rather than 5--10%).

130 citations


Patent
Kyle R. Johns1
17 Jul 1998
TL;DR: The virtual frame buffer controller maintains a data structure, called a pointer list, to keep track of the physical memory location and compression state of each block of pixels in the virtual buffer as discussed by the authors.
Abstract: A virtual frame buffer controller in a computer's display system manages accesses to a display image stored in discrete compressed and uncompressed blocks distributed in physical memory. The controller maps conventional linear pixel addresses of a virtual frame buffer to pixel locations within blocks stored at arbitrary places in physical memory. The virtual frame buffer controller maintains a data structure, called a pointer list, to keep track of the physical memory location and compression state of each block of pixels in the virtual frame buffer. The virtual frame buffer controller initiates a decompression process to decompress a block when a pixel request maps to a pixel in a compressed block. The block remains decompressed until physical memory needs to be reclaimed to free up memory. A software driver for the virtual frame buffer controller performs memory management functions, including adding to a free memory list when the virtual frame buffer requires more memory and reclaiming memory previously allocated to a block of pixels whose state has changed from a compressed to an uncompressed state, or from a decompressed back to a compressed state.

127 citations


Journal ArticleDOI
TL;DR: The surprisingly large design freedom available for the basic problem is explored in-depth and the outline of a systematic solution methodology is proposed and the efficiency of the methodology is illustrated on a real-life motion estimation application.
Abstract: Efficient use of an optimized custom memory hierarchy to exploit temporal locality in the data accesses can have a very large impact on the power consumption in data dominated applications. In the past, experiments have demonstrated that this task is crucial in a complete low-power memory management methodology. But effective formalized techniques to deal with this specific task have not been addressed yet. In this paper, the surprisingly large design freedom available for the basic problem is explored in-depth and the outline of a systematic solution methodology is proposed. The efficiency of the methodology is illustrated on a real-life motion estimation application. The results obtained for this application show power reductions of about 85% for the memory subsystem compared to the case without a custom memory hierarchy. These large gains justify that data reuse and memory hierarchy decisions should be taken early in the design flow.

Journal ArticleDOI
TL;DR: The memory management designs of a sampling of six recent processors are considered, focusing primarily on their architectural differences, and hint at optimizations that someone designing or porting system software might want to consider.
Abstract: Here, we consider the memory management designs of a sampling of six recent processors, focusing primarily on their architectural differences, and hint at optimizations that someone designing or porting system software might want to consider. We selected examples from the most popular commercial microarchitectures: the MIPS R10000, Alpha 21164, PowerPC 604, PA-8000, UltraSPARC-I, and Pentium II. This survey describes how each processor architecture supports the common features of virtual memory: address space protection, shared memory, and large address spaces.

Journal ArticleDOI
01 May 1998
TL;DR: A technique of partial data expansion is presented which leaves untouched the performances of the parallelization process, with the help of algebra techniques given by the polytope model.
Abstract: This article deals with automatic parallelization of static control programs. During the parallelization process the removal of memory related dependences is usually performed by translating the original program into single assignment form. This total data expansion has a very high memory cost. We present a technique of partial data expansion which leaves untouched the performances of the parallelization process, with the help of algebra techniques given by the polytope model.

Patent
23 Jun 1998
TL;DR: A storage management system for a Redundant Array of Independent Disks (RAID) data storage system and an AutoRAID memory transaction manager for a disk array controller are disclosed in this paper.
Abstract: A storage management system for a Redundant Array of Independent Disks (RAID) data storage system and an AutoRAID memory transaction manager for a disk array controller are disclosed. The disk array controller enables a consistent, coherent memory image of the data storage space to all processors across hot-plug interfaces. To external processes seeking to read or write data, the memory image looks the same across the hot-plug interface. The disk array controller has two identical controllers, each with its own non-volatile memory, to maintain redundant images of disk array storage space. A hot-plug interface interconnects the two controllers. Each controller has an AutoRAID memory transaction manager that enables sharing of cyclic redundancy check (CRC)-protected memory transactions over the hot-plug interface between the two controllers. The AutoRAID memory transaction managers also have transaction queues which facilitate ordered execution of the memory transactions regardless of which controller originated the transactions. The AutoRAID transaction manager includes first and second bus interfaces, a mirror entity, and a local memory interface. Mirrored read and write transactions are handled atomically across the hot-plug interface.

Proceedings ArticleDOI
01 Oct 1998
TL;DR: This work investigated how to build an allocator that is not only fast and memory efficient but also scales well on SMP machines, and designed and prototyped a new allocator, called LKmalloc, targeted for both traditional applications and server applications.
Abstract: Prior work on dynamic memory allocation has largely neglected long-running server applications, for example, web servers and mail servers. Their requirements differ from those of one-shot applications like compilers or text editors. We investigated how to build an allocator that is not only fast and memory efficient but also scales well on SMP machines. We found that it is not sufficient to focus on reducing lock contention - higher speedups require a reduction in cache misses and bus traffic. We then designed and prototyped a new allocator, called LKmalloc, targeted for both traditional applications and server applications. LKmalloc uses several subheaps, each one with a separate set of free lists and memory arena. A thread always allocates from the same subheap but can free a block belonging to any subheap. A thread is assigned to a subheap by hashing on its thread ID. WC compared its performance with several other allocators on a server-like, simulated workload and found that it indeed scales well and is quite fast hut memory more efficiently.

Patent
05 Feb 1998
TL;DR: The SEmulation system as discussed by the authors provides four modes of operation: (1) Software Simulation, (2) Simulation via Hardware Acceleration, (3) In-Circuit Emulation (ICE), and (4) Post-Simulation Analysis.
Abstract: The SEmulation system provides four modes of operation: (1) Software Simulation, (2) Simulation via Hardware Acceleration, (3) In-Circuit Emulation (ICE), and (4) Post-Simulation Analysis. At a high level, the present invention may be embodied in each of the above four modes or various combinations of these modes. At the core of these modes is a software kernel which controls the overall operation of this system. The main control loop of the kernel executes the following steps: initialize system, evaluate active test-bench processes/components, evaluate clock components, detect clock edge, update registers and memories, propagate combinational components, advance simulation time, and continue the loop as long as active test-bench processes are present. The Memory Mapping aspect of the invention provides a structure and scheme where the numerous memory blocks associated with the user's design is mapped into the SRAM memory devices in the Simulation system instead of inside the logic devices, which are used to configure and model the user's design. The Memory Mapping or Memory Simulation system includes a memory state machine, an evaluation state machine, and their associated logic to control and interface with: (1) the main computing system and its associated memory system, (2) the SRAM memory devices coupled to the FPGA buses in the Simulation system, and (3) the FPGA logic devices which contain the configured and programmed user design that is being debugged.

Patent
18 Feb 1998
TL;DR: In this article, a co-processor is adapted to use virtual memory with a host processor and a host memory is coupled to the host processor to implement the virtual memory, which is stored in two or more non-contiguously addressable regions of the host memory.
Abstract: The present invention relates to a method, apparatus and system for managing virtual memory, in which a co-processor ( 224 ) is adapted to use virtual memory with a host processor ( 202 ). A host memory ( 203 ) is coupled to the host processor ( 202 ) to implement the virtual memory. The co-processor ( 224 ) includes a virtual-physical memory mapping device ( 915 ) for interrogating a virtual memory table and for mapping one or more virtual memory addresses ( 880 ) requested by the co-processor ( 224 ) into corresponding physical addresses ( 873 ) in the host memory ( 203 ). The virtual memory table is stored in two or more non-contiguously addressable regions of the host memory ( 203 ), and is preferably a page table. The memory mapping device ( 915 ) further includes a multiple-entry translation lookaside buffer ( 889 ) for caching virtual-to-physical address mappings ( 872 ), where entries in the buffer ( 889 ) are replaced on a least recently used replacement basis. The memory mapping device ( 915 ) also includes devices ( 901 ) for comparing, replacing, singly invalidating and multiply invalidating one or more entries of the translation lookaside buffer ( 889 ). It also includes a hashing device ( 892 ) for, upon an occurrence of a miss in the translation lookaside buffer ( 889 ), hashing a virtual memory address ( 880 ) using a hash function to produce an index into the virtual memory table.

Patent
31 Jul 1998
TL;DR: In this paper, each memory request is processed in part by a plurality of stages, and a request buffer is used to hold each of the memory requests during the processing of each memory requests.
Abstract: A method for processing multiple memory requests in a pipeline. Each memory request is processed in part by a plurality of stages. In a first stage, the memory request is decoded. In a second stage, the address information for the memory request is processed. In a third stage, the data for the memory request is transferred. A request buffer is used to hold each of the memory requests during the processing of each of the memory requests.

Patent
Yuichi Kishida1
18 May 1998
TL;DR: In this article, a memory system having a plurality of banks which form interleave groups for independently forming an interleave, when a memory error is detected in an operating system resident space, the group having the error is interchanged with another group that has not had any error yet.
Abstract: In a memory system having a plurality of banks which forms interleave groups for independently forming an interleave, when a memory error is detected in an operating system resident space, the group having the error is interchanged with another group that has not had any error yet. After a group interchange, a page having the error is also deallocated. When a determination is made that the group interchange causes deterioration of performance, a bank deallocation can be also executed. As this criterion for determination, it is possible to employ a policy that a bank is deallocated when a capacity of a bank including an erroneous sub-bank is equal to or less than a predetermined rate of all the memory capacity and an interleaving factor is less than the interleaving factor of an interchange partner after the bank deallocation.

Proceedings ArticleDOI
01 Oct 1998
TL;DR: It is shown that using the register allocation's coloring paradigm to assign spilled values to memory can greatly reduce the amount of memory required by a program, and speedups from using CCM may be sizable.
Abstract: Optimizations aimed at reducing the impact of memory operations on execution speed have long concentrated on improving cache performance. These efforts achieve a. reasonable level of success. The primary limit on the compiler's ability to improve memory behavior is its imperfect knowledge about the run-time behavior of the program. The compiler cannot completely predict runtime access patterns.There is an exception to this rule. During the register allocation phase, the compiler often must insert substantial amounts of spill code; that is, instructions that move values from registers to memory and back again. Because the compiler itself inserts these memory instructions, it has more knowledge about them than other memory operations in the program.Spill-code operations are disjoint from the memory manipulations required by the semantics of the program being compiled, and, indeed, the two can interfere in the cache. This paper proposes a hardware solution to the problem of increased spill costs---a small compiler-controlled memory (CCM) to hold spilled values. This small random-access memory can (and should) be placed in a distinct address space from the main memory hierarchy. The compiler can target spill instructions to use the CCM, moving most compiler-inserted memory traffic out of the pathway to main memory and eliminating any impact that those spill instructions would have on the state of the main memory hierarchy. Such memories already exist on some DSP microprocessors. Our techniques can be applied directly on those chips.This paper presents two compiler-based methods to exploit such a memory, along with experimental results showing that speedups from using CCM may be sizable. It shows that using the register allocation's coloring paradigm to assign spilled values to memory can greatly reduce the amount of memory required by a program.

Proceedings ArticleDOI
01 Jun 1998
TL;DR: Measurements and analysis show that by using available global resources, cooperative prefetching can obtain significant speedups for I/O-bound programs, and shows that for a graphics rendering application, the PGMS system achieves a speedup of 4.9 over a non-prefetching version of the same program, and a 3.1-fold improvement over that program using local-diskPrefetching alone.
Abstract: This paper presents cooperative prefetching and caching --- the use of network-wide global resources (memories, CPUs, and disks) to support prefetching and caching in the presence of hints of future demands Cooperative prefetching and caching effectively unites disk-latency reduction techniques from three lines of research: prefetching algorithms, cluster-wide memory management, and parallel I/O When used together, these techniques greatly increase the power of prefetching relative to a conventional (non-global-memory) system We have designed and implemented PGMS, a cooperative prefetching and caching system, under the Digital Unix operating system running on a 128 Gb/sec Myrinet-connected cluster of DEC Alpha workstations Our measurements and analysis show that by using available global resources, cooperative prefetching can obtain significant speedups for I/O-bound programs For example, for a graphics rendering application, our system achieves a speedup of 49 over a non-prefetching version of the same program, and a 31-fold improvement over that program using local-disk prefetching alone

Patent
14 Jul 1998
TL;DR: In this article, an automated, portable, and time conservative memory test system for identifying test parameters including type, control line configuration, depth, width, access time, and burst features of any one of a wide variety of synchronous memories including SDRAMs and SGRAMs, and whether an IC chip, bank, board or module, without requiring hardware, modifications or additions to the memory device being identified, and without requiring storage of test patterns or characterizing data in the memory devices.
Abstract: An automated, portable, and time conservative memory test system for identifying test parameters including type, control line configuration, depth, width, access time, and burst features of any one of a wide variety of synchronous memories including SDRAMs and SGRAMs, and whether an IC chip, bank, board or module, without requiring hardware, modifications or additions to the memory device being identified, and without requiring storage of test patterns or characterizing data in the memory device. The tester is comprised of a 32-bit RISC CPU (80) in electrical communication with address/data/control bus (82) and with processor clock (84) which provides timing for the CPU. ROM (90) provides a non-volatile storage for all memory test system operating software programs, and RAM (93) provides temporary and intermediate storage for software programs.

Patent
12 Aug 1998
TL;DR: In this article, an application programming interface (API) enables application programs in a multitasking operating environment to control the allocation of physical memory in a virtual memory system, such that a piece of code or data will remain in physical memory.
Abstract: An application programming interface (API) enables application programs in a multitasking operating environment to control the allocation of physical memory in a virtual memory system. One API function enables applications to designate a soft page lock for code and data. The operating system ensures that the designated code and data is in physical memory when the application has the focus. When the application loses the focus, the pages associated with the code or data are released. When the application regains the focus, the operating system re-loads the pages into physical memory before the application begins to execute. The operating system is allowed to override the soft page lock where necessary. Another API enables applications to designate code or data that should have high priority access to physical memory, without using a lock. This API enables the application to specifically control the likelihood that a piece of code or data will remain in physical memory by assigning a priority to the code or data that defines its priority relative to the priority of other code or data contending for the same physical memory.

Journal ArticleDOI
TL;DR: The authors present the software mechanisms of virtual memory from a hardware perspective and then describe several hardware examples and how they support virtual memory software.
Abstract: Virtual memory was developed to automate the movement of program code and data between main memory and secondary storage to give the appearance of a single large store. This technique greatly simplified the programmer's job, particularly when program code and data exceeded the main memory's size. Virtual memory has now become widely used, and most modern processors have hardware to support it. Unfortunately, there has not been much agreement on the form that this support should take. The result of this lack of agreement is that hardware mechanisms are often completely incompatible. Thus, designers and porters of system level software have two somewhat unattractive choices: they can write software to fit many different architectures or they can insert layers of software to emulate a particular hardware interface. The authors present the software mechanisms of virtual memory from a hardware perspective and then describe several hardware examples and how they support virtual memory software. Their focus is to show the diversity of virtual memory support and, by implication, how this diversity complicates the design and porting of OSs. The authors introduce basic virtual memory technologies and then compare memory management designs in three commercial microarchitectures. They show the diversity of virtual memory support and, by implication, how this diversity can complicate and compromise system operations.

Patent
03 Aug 1998
TL;DR: In this article, a stand-alone memory interface is presented to provide a menu showing multiple ways in which the user's request can be physically configured by varying the number of rows of memory, number of blocks of memory and the column multiplexing factor of the memory array.
Abstract: A compilier methodology including a stand alone memory interface which provides a user specified memory device of a required number of words of memory of a required bits per word. The stand alone memory interface is a tool to provide a menu showing multiple ways in which the user's request can be physically configured by varying the number of rows of memory, the number of blocks of memory, and the column multiplexing factor of the memory array. From this menu the user selects the memory configuration that best meets the user's requirements and is provided with either various models or representations (views) of the selected memory configuration or a GDS format data file. The views can be used to design large scale integrated circuits in which the memory device is embedded while the data file is used to generate photo mask for making the memory device as an integrated circuit.

Patent
24 Jun 1998
TL;DR: In this article, the memory management of complex objects is located in an automatically generated client stub routine for a remote procedure call and the interface description language (IDL) for the Remote Procedure Call is extended to incorporate the duration idea for out parameters.
Abstract: Memory for complex objects is maintained in pools of dynamic memory on a “per-duration” basis. Each duration is assigned its own area or areas of the heap, and all the memory allocation for a specific duration comes from those assigned areas of the heap. Memory allocation for a complex object is performed with respect to a single duration and, hence, memory is allotted for the complex object from the corresponding memory pool. When a duration is terminated, the memory allocated for its corresponding heap is freed, thereby releasing memory for all the complex object using the memory from the memory pool for that duration. Management of other resources for complex objects such as opening and closing files may also be duration-based. In one aspect, the memory management of complex objects is located in an automatically generated client stub routine for a remote procedure call. Accordingly, the interface description language (IDL) for the remote procedure call is extended to incorporate the duration idea for out parameters.

Proceedings ArticleDOI
16 Apr 1998
TL;DR: Results suggest that an L2 cache achieves the original advantage of the pull architecture --- stemming the growth of local texture memory --- while at the same time stemming the current explosion in demand for texture bandwidth between host memory and the accelerator.
Abstract: Traditional graphics hardware architectures implement what we call the push architecture for texture mapping. Local memory is dedicated to the accelerator for fast local retrieval of texture during rasterization, and the application is responsible for managing this memory. The push architecture has a bandwidth advantage, but disadvantages of limited texture capacity, escalation of accelerator memory requirements (and therefore cost), and poor memory utilization. The push architecture also requires the programmer to solve the bin- packing problem of managing accelerator memory each frame. More recently graphics hardware on PC-class machines has moved to an implementation of what we call the pull architecture. Texture is stored in system memory and downloaded by the accelerator as needed. The pull architecture has advantages of texture capacity, stems the escalation of accelerator memory requirements, and has good memory utilization. It also frees the programmer from accelerator texture memory management. However, the pull architecture suffers escalating requirements for bandwidth from main memory to the accelerator. In this paper we propose multi-level texture caching to provide the accelerator with the bandwidth advantages of the push architecture combined with the capacity advantages of the pull architecture. We have studied the feasibility of 2-level caching and found the following: (1) significant re-use of texture between frames; (2) L2 caching requires significantly less memory than the push architecture; (3) L2 caching requires significantly less bandwidth from host memory than the pull architecture; (4) L2 caching enables implementation of smaller L1 caches that would otherwise bandwidth-limit accelerators on the workloads in this paper. Results suggest that an L2 cache achieves the original advantage of the pull architecture --- stemming the growth of local texture memory --- while at the same time stemming the current explosion in demand for texture bandwidth between host memory and the accelerator.

Patent
09 Feb 1998
TL;DR: In this article, a hierarchical bitmap-based memory manager maintains an entry for each memory block in a memory heap, each bitmap entry contains a multi-bit value that represents an allocation state of the corresponding memory block.
Abstract: A hierarchical bitmap-based memory manager maintains a hierarchical bitmap having an entry for each memory block in a memory heap. Each bitmap entry contains a multi-bit value that represents an allocation state of the corresponding memory block. The memory manager manages allocation, deallocation, and reallocation of the memory blocks, and tracks the changes in allocation state via the hierarchical bitmap. Using a two-bit value, the bitmap can represent at most four different allocation states of the corresponding memory block, including a “free” state, a “sub-allocated” state in which the corresponding memory block is itself an allocated set of smaller memory blocks, a “continue” state in which the corresponding memory block is allocated and part of, but not last in, a larger allocation of plural blocks, and a “last” state in which the corresponding memory block is allocated and last in an allocation of one or more memory blocks.

Patent
20 May 1998
TL;DR: In this article, a method and an apparatus of establishing multiple direct memory access connections between a peripheral and a main memory of a computer system is presented, where each of the access connection is managed in an improved manner such that one or more of the multiple access connections are non-real-time connections, but real-time operations may be performed on the data carried by the nonreal time connections.
Abstract: Accordingly, the present invention provides a method and an apparatus of establishing multiple direct memory access connections between a peripheral and a main memory of a computer system. Each of the multiple direct memory access connection is managed in an improved manner such that one or more of the multiple direct memory access connections are non-real-time connections, but real-time operations may be performed on the data carried by the non-real time connections. In another aspect of the present invention, a driver may be implemented on the computer system to facilitate the establishment and maintenance of the multiple direct memory access connections. The present inventions reduce arbitration and system interrupt latencies and reduces the management burden of the direct memory access connections on a central processing unit of the computer system.

Patent
07 Aug 1998
TL;DR: In this paper, a memory clock control system and method facilitates power reduction on a dynamic basis by detecting memory access request loading from a number of memory access devices, such as video and graphics engines.
Abstract: A memory clock control system and method facilitates power reduction on a dynamic basis by detecting memory access request loading from a number of memory access devices, such as video and graphics engines. Based on the detected memory access requirements, the system and method adaptively varies a memory clock frequency in response to determining the desired memory usage at a given point in time. The memory clock is varied based on the priority of a given memory access engine, such that the clock is kept or increased to a higher rate for high priority engines such as real-time processing engines to facilitate high performance video capture.