scispace - formally typeset
Search or ask a question

Showing papers on "Memory management published in 2001"


Journal ArticleDOI
TL;DR: A survey of the state-of-the-art techniques used in performing data and memory-related optimizations in embedded systems, covering a broad spectrum of optimization techniques that address memory architectures at varying levels of granularity.
Abstract: We present a survey of the state-of-the-art techniques used in performing data and memory-related optimizations in embedded systems. The optimizations are targeted directly or indirectly at the memory subsystem, and impact one or more out of three important cost metrics: area, performance, and power dissipation of the resulting implementation.We first examine architecture-independent optimizations in the form of code transoformations. We next cover a broad spectrum of optimization techniques that address memory architectures at varying levels of granularity, ranging from register files to on-chip memory, data caches, and dynamic memory (DRAM). We end with memory addressing related issues.

405 citations


Proceedings ArticleDOI
22 Jun 2001
TL;DR: A compiler-controlled dynamic on-chip scratch-pad memory (SPM) management framework that uses both loop and data transformations is proposed that indicates significant reductions in data transfer activity between SPM and off-chip memory.
Abstract: Optimizations aimed at improving the efficiency of on-chip memories are extremely important. We propose a compiler-controlled dynamic on-chip scratch-pad memory (SPM) management framework that uses both loop and data transformations. Experimental results obtained using a generic cost model indicate significant reductions in data transfer activity between SPM and off-chip memory.

296 citations


Patent
05 Oct 2001
TL;DR: In this article, a disclosed gaming machine provides a gaming machine with a nonvolatile memory storage device and gaming software that allows the dynamic allocation and deallocation of memory locations in a non-volatile RAM.
Abstract: A disclosed gaming machine provides a gaming machine with a non-volatile memory storage device and gaming software that allows the dynamic allocation and de-allocation of memory locations in a non-volatile memory. The non-volatile memory storage devices interface to an industry standard peripheral component interface (PCI) bus commonly used in the computer industry allowing communication between a master gaming controller the non-volatile memory. The master gaming controller executes software for a non-volatile memory allocation system that enables the dynamic allocation and de-allocation of non-volatile memory locations. In addition, the non-volatile memory allocation system enables a non-volatile memory file system. With the non-volatile memory file system, critical data stored in the non-volatile memory may be accessed and modified using operating system utilities such as word processors, graphic utilities and compression utilities.

283 citations


Patent
10 Jul 2001
TL;DR: In this article, a two-tier memory controller system with a first memory controller coupled to the bus and a second tier of memory controllers or RAM personality modules that translate between the first controller and a particular type of memory module is described.
Abstract: A memory controller capable of supporting heterogeneous memory configurations enables seamless communications between a bus and memory modules having different characteristics. Thus, owners of computer systems need no longer replace entire memory arrays to take advantage of new memory modules; some memory modules may be upgraded to a new type while other memory modules of an older type remain. The memory controller receives memory requests from multiple processors and bus masters, identifies a memory module and memory access parameters for each request, accesses the memory and returns the resulting data (during a read request) or stores the data (during a write request). In some systems, the memory controller of the present invention is a two-tier memory controller system having a first memory controller coupled to the bus and to the second tier of memory controllers or RAM personality modules that translate between the first memory controller and a particular type of memory module. Typically, between the tiers a protocol is used which is representative of a typical clocked synchronous dynamic random access memory (SDRAM), although another protocol could be used. From the perspective of the processor bus or host bus coupled to the front end of the first memory controller, the entire memory controller system behaves as a single memory controller. From the perspective of memory, the back end of the RAM personality module is seen as a memory controller designed specifically to be configured for that memory type. Consequently, although the front end of the RAM personality module can be standardized across the system, compatible with the back end of the first memory controller, and in most embodiments of the present invention, the back end of the RAM personality module differs among the controller modules in the second tier, according to the variety of the memory modules in the memory system.

262 citations


Proceedings ArticleDOI
06 Aug 2001
TL;DR: An analytic model is developed that approximates the idle time of DRAM chips using an exponential distribution, and is validated against trace-driven simulations to show that the simple policy of immediately transitioning a DRAM chip to a lower power state when it becomes idle is superior to more sophisticated policies that try to predictDRAM chip idle time.
Abstract: The increasing importance of energy efficiency has produced a multitude of hardware devices with various power management features. This paper investigates memory controller policies for manipulating DRAM power states in cache-based systems. We develop an analytic model that approximates the idle time of DRAM chips using an exponential distribution, and validate our model against trace-driven simulations. Our results show that, for our benchmarks, the simple policy of immediately transitioning a DRAM chip to a lower power state when it becomes idle is superior to more sophisticated policies that try to predict DRAM chip idle time.

241 citations


Patent
Carl A. Waldspurger1
25 Jul 2001
TL;DR: In this paper, the presence of redundant copies of pages is determined by hashing page contents and performing full content comparisons only when two or more pages are hash to the same key, and the shared copy is ten preferable marked copy-on-write sharing is preferably dynamic.
Abstract: A computer system has one or more software context that share use of a memory that is divided into units such as pages In the preferred embodiment of the invention, the context are, or include, virtual machines running on a common hardware platform The context, as opposed to merely the addresses or page numbers, of virtual memory pages that accessible to one or more contexts are examined If two or more context pages are identical, then their memory mappings are changed to point to a single, shared copy of the page in the hardware memory, thereby freeing the memory space taken up by the redundant copies The shared copy is ten preferable marked copy-on-write Sharing is preferably dynamic, whereby the presence of redundant copies of pages is preferably determined by hashing page contents and performing full content comparisons only when two or more pages hash to the same key

214 citations


Journal ArticleDOI
TL;DR: This architecture is the first of its kind to employ real-time main-memory content compression at a performance competitive with the best the market has to offer.
Abstract: Several technologies are leveraged to establish an architecture for a low-cost, high-performance memory controller and memory system that more than double the effective size of the installed main memory without significant added cost. This architecture is the first of its kind to employ real-time main-memory content compression at a performance competitive with the best the market has to offer. A large low-latency shared cache exists between the processor bus and a content-compressed main memory. Highspeed, low-latency hardware performs realtime compression and decompression of data traffic between the shared cache and the main memory. Sophisticated memory management hardware dynamically allocates main-memory storage in small sectors to accommodate storing the variable-sized compressed data without the need for "garbage" collection or significant wasted space due to fragmentation. Though the main-memory compression ratio is limited to the range 1:1-64:1, typical ratios range between 2:1 and 6:1, as measured in "real-world" system applications.

195 citations


Patent
26 Dec 2001
TL;DR: In this paper, a self-test controller is used to generate a sequence of generated memory addresses for performing memory access operations associated with the memory test algorithm having an associated memory cell physical access pattern.
Abstract: A memory self-test system is provided comprising a self-test controller operable in self-test mode to generate a sequence of generated memory addresses for performing memory access operations associated with the memory test algorithm having an associated memory cell physical access pattern. A programmable re-mapper is operable to re-map the sequence of generated memory addresses derived from the self-test instruction to a sequence of re-mapped memory addresses. The programmable re-mapper performs this re-mapping in response to programmable mapping selection data. The re-mapping of the generated memory addresses to re-mapped memory addresses ensures that the memory cell accesses performed during execution of the memory self-test are consistent with the associated memory cell physical access pattern regardless of the particular implementation of the memory array.

191 citations


Patent
15 Feb 2001
TL;DR: In this paper, the adaptive memory arbitration scheme introduces a flexible method of adjustable priority-weighting which permits selected devices to transact a programmable number of consecutive memory accesses without those devices losing request priority.
Abstract: A computer system includes an adaptive memory arbiter for prioritizing memory access requests, including a self-adjusting, programmable request-priority ranking system. The memory arbiter adapts during every arbitration cycle, reducing the priority of any request which wins memory arbitration. Thus, a memory request initially holding a low priority ranking may gradually advance in priority until that request wins memory arbitration. Such a scheme prevents lower-priority devices from becoming “memory-starved.” Because some types of memory requests (such as refresh requests and memory reads) inherently require faster memory access than other requests (such as memory writes), the adaptive memory arbiter additionally integrates a nonadjustable priority structure into the adaptive ranking system which guarantees faster service to the most urgent requests. Also, the adaptive memory arbitration scheme introduces a flexible method of adjustable priority-weighting which permits selected devices to transact a programmable number of consecutive memory accesses without those devices losing request priority.

170 citations


Proceedings Article
01 Jan 2001
TL;DR: A general problem of analyzing resource usage as a resource usage analysis problem is formalized, and a type-based method is proposed as a solution to the problem.
Abstract: It is an important criterion of program correctness that a program accesses resources in a valid manner. For example, a memory region that has been allocated should eventually be deallocated, and after the deallocation, the region should no longer be accessed. A file that has been opened should be eventually closed. So far, most of the methods to analyze this kind of property have been proposed in rather specific contexts (like studies of memory management and verification of usage of lock primitives), and it was not clear what the essence of those methods was or how methods proposed for individual problems are related. To remedy this situation, we formalize a general problem of analyzing resource usage as a resource usage analysis problem, and propose a type-based method as a solution to the problem.

167 citations


Patent
30 Oct 2001
TL;DR: In this article, a technique for writing data to memory cells of a phase change memory device, placing the memory cells in a state that is shared in common among the cells, is described.
Abstract: A technique includes, in response to a request to write data to memory cells of a phase change memory device, placing the memory cells in a state that is shared in common among the memory cells. Also, in response to this request, the data is written to the memory cells.

Proceedings ArticleDOI
01 May 2001
TL;DR: This work generalises type annotations that make the structure of a program's regions more explicit in a region type system whose main novelty is the use of existentially quantified abstract regions to represent pointers to objects whose region is partially or totally unknown.
Abstract: Region-based memory management systems structure memory by grouping objects in regions under program control. Memory is reclaimed by deleting regions, freeing all objects stored therein. Our compiler for C with regions, RC, prevents unsafe region deletions by keeping a count of references to each region. Using type annotations that make the structure of a program's regions more explicit, we reduce the overhead of reference counting from a maximum of 27% to a maximum of 11% on a suite of realistic benchmarks. We generalise these annotations in a region type system whose main novelty is the use of existentially quantified abstract regions to represent pointers to objects whose region is partially or totally unknown. A distribution of RC is available at http://www.cs.berkeley.edu/~dgay/rc.tar.gz.

Proceedings ArticleDOI
03 Jul 2001
TL;DR: A non-blocking FIFO queue algorithm for multiprocessor shared memory systems that deals with the pointer recycling problem, an inconsistency problem that all non- blocking algorithms based on the compare-and-swap synchronisation primitive have to address.
Abstract: A non-blocking FIFO queue algorithm for multiprocessor shared memory systems is presented in this paper The algorithm is very simple, fast and scales very well in both symmetric and non-symmetric multiprocessor shared memory systems Experiments on a 64-node SUN Enterprise 10000 — a symmetric multiprocessorsystem — and on a 64-node SGI Origin 2000 — a cache coherent non uniform memory access multiprocessorsystem — indicate that our algorithm considerably outperforms the best of the known alternatives in both multiprocessors in any level of multiprogramming This work introduces two new, simple algorithmic mechanisms The first lowers the contention to key variables used by the concurrent enqueue and/or dequeue operations which consequently results in the good performance of the algorithm, the second deals with the pointer recycling problem, an inconsistency problem that all non-blocking algorithms based on the compare-and-swap synchronisation primitive have to address In our construction we selected to use compare-and-swap since compare-and-swap is an atomic primitive that scales well under contention and either is supported by modern multiprocessors or can be implemented efficiently on them

Journal ArticleDOI
TL;DR: The design of the Impulse architecture and how an Impulse memory system can be used in a variety of ways to improve the performance of memory-bound applications are described and the effectiveness of these optimizations are demonstrated.
Abstract: Impulse is a memory system architecture that adds an optional level of address indirection at the memory controller. Applications can use this level of indirection to remap their data structures in memory. As a result, they can control how their data is accessed and cached, which can improve cache and bus utilization. The Impulse design does not require any modification to processor, cache, or bus designs since all the functionality resides at the memory controller. As a result, Impulse can be adopted in conventional systems without major system changes. We describe the design of the Impulse architecture and how an Impulse memory system can be used in a variety of ways to improve the performance of memory-bound applications. Impulse can be used to dynamically create superpages cheaply, to dynamically recolor physical pages, to perform strided fetches, and to perform gathers and scatters through indirection vectors. Our performance results demonstrate the effectiveness of these optimizations in a variety of scenarios. Using Impulse can speed up a range of applications from 20 percent to over a factor of 5. Alternatively, Impulse can be used by the OS for dynamic superpage creation; the best policy for creating superpages using Impulse outperforms previously known superpage creation policies.

Patent
David Craddock1, Charles S. Graham1, Ian David Judd1, Renato J. Recio1, Timothy J. Schimke1 
24 Sep 2001
TL;DR: In this paper, a mechanism for initiating and completing one or more I/O transactions using memory semantic messages is described, which is more akin to a memory copy than the simple transmission of a message.
Abstract: A mechanism for initiating and completing one or more I/O transactions using memory semantic messages is disclosed. Memory semantic messages are transmitted by means of a remote direct memory access (RDMA) operation; they are more akin to a memory copy than the simple transmission of a message.

Patent
02 Nov 2001
TL;DR: Memory management systems and methods that may be employed, for example, to provide efficient management of memory for network systems are discussed in this paper, where they utilize a multi-layer queue management structure to manage buffer/cache memory in an integrated fashion.
Abstract: Memory management systems and methods that may be employed, for example, to provide efficient management of memory for network systems. The disclosed systems and methods may utilize a multi-layer queue management structure to manage buffer/cache memory in an integrated fashion. The disclosed systems and methods may be implemented as part of an information management system, such as a network processing system that is operable to process information communicated via a network environment, and that may include a network processor operable to process network-communicated information and a memory management system operable to reference the information based upon a connection status associated with the content.

30 Nov 2001
TL;DR: This document informally describes the current design for the Titanium language in the form of a set of changes to Java, version 1.0.
Abstract: Titanium is a dialect of Java for large-scale scientific computing. The primary goal is a language that has high performance on large scale multiprocessors, including massively parallel processors and workstation clusters with one or more processors per node. Secondary goals include safety, portability, and support for building complex data structures. The main additions to Java are immutable classes, multi-dimensional arrays, an explicitly parallel SPMD model, and zone-based memory management. This document informally describes our current design for the Titanium language. It is in the form of a set of changes to Java, version 1.0.

Journal ArticleDOI
Maurice V. Wilkes1
TL;DR: Since 1980, the memory gap has been increasing steadily, and during the last ten years, processors have been improving in speed by 60% per annum, whereas DRAM memory access has been improving at barely 10%.
Abstract: The first main memories to be used on digital computers were constructed using a technology much slower than that used for the logic circuits, and it was taken for granted that there would be a memory gap. Mercury delay line memories spent a lot of their time waiting for the required word to come round and were very slow indeed. CRT (Williams Tube) memories and the core memories that followed them were much better. By the early 1970s semiconductor memories were beginning to appear. This did not result in memory performance catching up fully with processor performance, although in the 1970s it came close. It might have expected that from that point memories and processors would scale together, but this did not happen. This was because of significant differences in the DRAM semiconductor technology used for memories compared with the technology used for circuits. The memory gap makes itself felt when a cache miss occurs and the missing word must be be supplied from main memory. It thus only affects users whose programs do not fit into the L2 cache. As far as a workstation user is concerned, the most noticeable effect of an increased memory gap is to make the observed performance more dependent on the application area than it would otherwise be. Since 1980, the memory gap has been increasing steadily. During the last ten years, processors have been improving in speed by 60% per annum, whereas DRAM memory access has been improving at barely 10%. It may thus be said that, while the memory gap is not at present posing a major problem, the writing is on the wall. On an Alpha 21264 667 MHz workstation (XP1000) in 2000, a cache miss cost about 128 clock cycles. This may be compared with the 8 – 32 clock cycles in the minicomputer and workstations of 1990 [1]. If the memory latency remains unchanged, the number of cycles of processor idle time is doubled with each doubling of speed of the processor. A factor of four will bring us to about 500 clock cycles.

Journal ArticleDOI
TL;DR: Results show that the hardware compression of main memory has a negligible penalty compared to an uncompressed main memory, and for memory-starved applications it increases performance significantly, and the memory content of an application can usually be compressed by a factor of 2.
Abstract: A novel memory subsystem called Memory Expansion Technology (MXT) has been built for fast hardware compression of main-memory content. This allows a memory expansion to present a "real" memory larger than the physically available memory. This paper provides an overview of the memory-compression architecture, its OS support under Linux and Windows®, and an analysis of the performance impact of memory compression. Results show that the hardware compression of main memory has a negligible penalty compared to an uncompressed main memory, and for memory-starved applications it increases performance significantly. We also show that the memory content of an application can usually be compressed by a factor of 2.

Proceedings ArticleDOI
21 Oct 2001
TL;DR: Three simple, but significant improvements to the OoCS (Out-of-Core Simplification) algorithm of P. Lindstrom (2000) are proposed which increase the quality of approximations and extend the applicability of the algorithm to an even larger class of compute systems.
Abstract: The authors propose three simple, but significant improvements to the OoCS (Out-of-Core Simplification) algorithm of P. Lindstrom (2000) which increase the quality of approximations and extend the applicability of the algorithm to an even larger class of compute systems. The original OoCS algorithm has memory complexity that depends on the size of the output mesh, but no dependency on the size of the input mesh. That is, it can be used to simplify meshes of arbitrarily large size, but the complexity of the output mesh is limited by the amount of memory available. Our first contribution is a version of OoCS that removes the dependency of having enough memory to hold (even) the simplified mesh. With our new algorithm, the whole process is made essentially independent of the available memory on the host computer. Our new technique uses disk instead of main memory, but it is carefully designed to avoid costly random accesses. Our two other contributions improve the quality of the approximations generated by OoCS. We propose a scheme for preserving surface boundaries which does not use connectivity information, and a scheme for constraining the position of the "representative vertex" of a grid cell to an optimal position inside the cell.

Patent
18 Apr 2001
TL;DR: In this paper, the authors present a computer system adapted to efficiently execute binary translated code, in which foreign code is stored in a foreign virtual memory space, translated to acquire binary translation code, which is then executed in a host VM.
Abstract: The present invention relates to a computer system adapted to efficiently execute binary translated code. In accordance with the present invention, foreign code is stored in a foreign virtual memory space, translated to acquire binary translated code, which is stored in a host virtual memory space and then executed. The host computer system isolates each virtual memory configuration into separate processes referred to as a virtual machine while enabling multiple virtual machines to exist simultaneously. Execution may switch from one virtual machine to another merely by switching to a new page table, where each page table describes the memory configuration of a virtual machine. Common system level resources are shared by the virtual machines under the control of a virtual memory manager.

Patent
Thomas M. Deneau1
24 Apr 2001
TL;DR: In this paper, a virtual memory page replacement method is described for use in a computer system, wherein the virtual memory pages replacement method was designed to help maintain paged memory coherence within the multiprocessor computer system.
Abstract: A computer system including a first processor, a second processor in communication with the first processor, a memory coupled to the first and second processors (i.e., a shared memory) and including multiple memory locations, and a storage device coupled to the first processor. The first and second processors implement virtual memory using the memory. The first processor maintains a first set of page tables and a second set of page tables in the memory. The first processor uses the first set of page tables to access the memory locations within the memory. The second processor uses the second set of page tables, maintained by the first processor, to access the memory locations within the memory. A virtual memory page replacement method is described for use in the computer system, wherein the virtual memory page replacement method is designed to help maintain paged memory coherence within the multiprocessor computer system.

Patent
28 Sep 2001
TL;DR: In this paper, a technique for resynchronizing a plurality of memory segments in a redundant memory system after a hot-plug event is presented, where a refresh counter in each memory cartridge is disabled to generate a first refresh request to the corresponding memory segments, and after waiting a period of time to insure that regardless of what state each memory segment is in when the first request is initiated all cycles have been completely executed, each refresh counter is reenabled, thereby generating a second refresh request.
Abstract: A technique for resynchronizing a memory system. More specifically, a technique for resynchronizing a plurality of memory segments in a redundant memory system after a hot-plug event. After a memory cartridge is hot-plugged into a system, the memory cartridge is synchronized with the operational memory cartridges such that the memory system can operate in lock step. A refresh counter in each memory cartridge is disabled to generate a first refresh request to the corresponding memory segments in the memory cartridge. After waiting a period of time to insure that regardless of what state each memory cartridge is in when the first refresh request is initiated all cycles have been completely executed, each refresh counter is re-enabled, thereby generating a second refresh request. The generation of the second refresh request to each of the memory segments provides synchronous operation of each of the memory cartridges.

Proceedings ArticleDOI
23 Apr 2001
TL;DR: The potential for addressing bandwidth limitations by increasing global cache reuse is explored-that is, reusing data across whole program and over the entire data collection, in a two-step global strategy.
Abstract: Reusing data in cache is critical to achieving high performance on modern machines because it reduces the impact of the latency and bandwidth limitations of direct memory access. To date, most studies of software memory hierarchy management have focused on the latency problem. However today's machines are increasingly limited by insufficient memory bandwidth-on these machines, latency-oriented techniques are inadequate because they do not seek to minimize the total memory traffic over the whole program. This paper explores the potential for addressing bandwidth limitations by increasing global cache reuse-that is, reusing data across whole program and over the entire data collection. To this end, the paper explores a two-step global strategy. The first step fuses computations on the same data to enable the caching of repeated accesses. The second step groups data used by the same computation to bring about contiguous access to memory. While the first step reduces the frequency of memory accesses, the second step improves their efficiency. The paper demonstrates the effectiveness of this strategy and shows how to automate it in a production compiler.

Book ChapterDOI
Lars Arge1
28 Aug 2001
TL;DR: In this paper, recent advances in the development of worst-case I/O-efficient external memory data structures are surveyed.
Abstract: Many modern applications store and process datasets much larger than the main memory of even state-of-the-art high-end machines. Thus massive and dynamically changing datasets often need to be stored in data structures on external storage devices, and in such cases the Input/Output (or I/O) communication between internal and external memory can become a major performance bottleneck. In this paper we survey recent advances in the development of worst-case I/O-efficient external memory data structures.

Proceedings ArticleDOI
29 May 2001
TL;DR: A memory management algorithm (ECQF-MMA) for replenishing the cache and find a bound on the size of the SRAM is described and analyzed.
Abstract: An packet switches contain packet buffers to hold packets during times of congestion. The capacity of a high performance router is often dictated by the speed of its packet buffers. This is particularly true for a shared memory switch where the memory needs to operate at N times the line rate, where N is the number of ports in the system. Even input queued switches must be able to buffer packets at the rate at which they arrive. Therefore, as the link rates increase memory bandwidth requirements grow. With today's DRAM technology and for an OC192c (10 Gb/s) link, it is barely possible to write packets to (read packets from) memory at the rate at which they arrive (depart). As link rates increase, the problem will get harder. There are several techniques for building faster packet buffers, based on ideas from computer architecture such as memory interleaving and banking. While not directly applicable to packet switches, they form the basis of several techniques in use today. We consider one particular packet buffer architecture consisting of large, slow, low cost, DRAMs coupled with a small, fast SRAM "buffer". We describe and analyze a memory management algorithm (ECQF-MMA) for replenishing the cache and find a bound on the size of the SRAM.

Patent
Steven C. Woo1, Pradeep Batra1
30 Jul 2001
TL;DR: In this paper, a hardware memory controller receives memory instructions in terms of a logical address space, and the memory controller maps the logical address spaces to physical memory in a way that reduces the number of memory devices that are being used.
Abstract: A memory system includes physical memory devices or ranks of memory devices that can be set to reduced power modes. In one embodiment, a hardware memory controller receives memory instructions in terms of a logical address space. In response to the relative usages of different addresses within the logical address space, the memory controller maps the logical address space to physical memory in a way that reduces the number of memory devices that are being used. Other memory devices are then set to reduced power modes. In another embodiment, an operating system maintains a free page list indicating portions of physical memory that are not currently allocated. The operating system periodically sorts this list by group, where each group corresponds to a set or rank of memory devices. The groups are sorted in order from those receiving the heaviest usage to those receiving the lightest usage. When allocating memory, the memory is allocated from the sorted page list so that memory is preferentially allocated from those memory devices that are already receiving the highest usage.

Patent
Lance W. Russell1
04 Oct 2001
TL;DR: In this paper, the authors describe a shared memory multi-computer environment, where a local shared memory network is provided between local nodes and global shared memory networks are provided between nodes and one or more remote nodes.
Abstract: Systems and methods of processing packets in a shared memory multi-computer environment are described. A local shared memory network is provided between local nodes and a global shared memory network is provided between the local nodes and one or more remote nodes. In this way, local nodes may communicate through standard network interfaces while using shared memory as the physical transport medium. In addition, a multi-computer system may be addressed externally and internally as individual nodes over the local shared memory network. A multi-computer system also may be addressed externally and internally as a single node over the global shared memory network.

Journal ArticleDOI
TL;DR: This work presents a solution for efficiently mapping arbitrary C code with pointers and malloc/free into hardware and presents an implementation based on the SUIF framework along with case studies such as the realization of a video filter and an ATM segmentation engine.
Abstract: One of the greatest challenges in a C/C++-based design methodology is efficiently mapping C/C++ models into hardware. Many networking and multimedia applications implemented in hardware or mixed hardware/software systems now use complex data structures stored in multiple memories, so many C/C++ features that were originally designed for software applications are now making their way into hardware. Such features include dynamic memory allocation and pointers for managing data. We present a solution for efficiently mapping arbitrary C code with pointers and malloc/free into hardware. Our solution, which fits current memory management methodologies, instantiates an application-specific hardware memory allocator coupled with a memory architecture. Our work also supports the resolution of pointers without restriction on the data structures. We present an implementation based on the SUIF framework along with case studies such as the realization of a video filter and an ATM segmentation engine.

Patent
13 Feb 2001
TL;DR: In this article, a system and method for managing real memory usage comprising: a compressed memory device driver for receiving real-memory usage information from the compressed memory hardware controller, the information including a characterization of the real memory use state: and, a compression management subsystem for monitoring the memory usage and initiating memory allocation and memory recovery in accordance with thememory usage state, the subsystem including mechanism for adjusting memory usage thresholds for controlling memory state changes.
Abstract: In a computer system having an operating system and a compressed main memory defining a physical memory and a real memory characterized as an amount of main memory as seen by a processor, and including a compressed memory hardware controller device for controlling processor access to the compressed main memory, there is provided a system and method for managing real memory usage comprising: a compressed memory device driver for receiving real memory usage information from the compressed memory hardware controller, the information including a characterization of the real memory usage state: and, a compression management subsystem for monitoring the memory usage and initiating memory allocation and memory recovery in accordance with the memory usage state, the subsystem including mechanism for adjusting memory usage thresholds for controlling memory state changes. Such a system and method is implemented in software operating such that control of the real memory usage in the computer system is transparent to the operating system.