scispace - formally typeset
Search or ask a question

Showing papers on "Memory management published in 1995"


Journal ArticleDOI
TL;DR: In this paper, the authors consider the ways in which individual human memory systems are linked into group memory systems, such as directory updating (learning who knows what in the group), information allocation (assigning memory items to group members), and retrieval coordination (planning how to find items in a way that takes advantage of who-knows-what).
Abstract: Several of the design factors that must be considered in linking computers together into networks are also relevant to the ways in which individual human memory systems are linked into group memory systems. These factors include directory updating (learning who knows what in the group), information allocation (assigning memory items to group members), and retrieval coordination (planning how to find items in a way that takes advantage of who knows what). When these processes operate effectively in a group, the group's transactive memory is likely to be effective.

664 citations


Proceedings ArticleDOI
03 Dec 1995
TL;DR: The objective is to use a single, unified, but distributed memory management algorithm at the lowest level of the operating system to manage memory globally at this level so that all system- and higher-level software, including VM, file systems, transaction systems, and user applications, can benefit from available cluster memory.
Abstract: Advances in network and processor technology have greatly changed the communication and computational power of local-area workstation clusters. However, operating systems still treat workstation clusters as a collection of loosely-connected processors, where each workstation acts as an autonomous and independent agent. This operating system structure makes it difficult to exploit the characteristics of current clusters, such as low-latency communication, huge primary memories, and high-speed processors, in order to improve the performance of cluster applications. This paper describes the design and implementation of global memory management in a workstation cluster. Our objective is to use a single, unified, but distributed memory management algorithm at the lowest level of the operating system. By managing memory globally at this level, all system- and higher-level software, including VM, file systems, transaction systems, and user applications, can benefit from available cluster memory. We have implemented our algorithm in the OSF/1 operating system running on an ATM-connected cluster of DEC Alpha workstations. Our measurements show that on a suite of memory-intensive programs, our system improves performance by a factor of 1.5 to 3.5. We also show that our algorithm has a performance advantage over others that have been proposed in the past.

418 citations


Patent
24 Jul 1995
TL;DR: In this article, a flash memory component coupled to a computer system bus for storing non-volatile code and data is presented, where the contents of a portion of the flash memory can be replaced, modified, updated, or reprogrammed without the need for removing and/or replacing any computer system hardware components.
Abstract: A computer system wherein a portion of code/data stored in a non-volatile memory device can be dynamically modified or updated without removing any covers or parts from the computer system. The computer system of the preferred embodiment includes a flash memory component coupled to a computer system bus for storing non-volatile code and data. Using the present invention, the contents of a portion of the flash memory may be replaced, modified, updated, or reprogrammed without the need for removing and/or replacing any computer system hardware components. The flash memory device used in the preferred embodiment contains four separately erasable/programmable non-symmetrical blocks of memory. One of these four blocks may be electronically locked to prevent erasure or modification of its contents once it is installed. This configuration allows the processing logic of the computer system to update or modify any selected block of memory without affecting the contents of other blocks. One memory block contains a normal BIOS. An electronically protected flash memory area is used for storage of a recovery BIOS which is used for recovery operations. The present invention also includes hardware for selecting one of the two available update modes: normal or recovery. Thus, using a mode selection apparatus, either a normal system BIOS or a recovery BIOS may be activated.

362 citations


Patent
06 Jun 1995
TL;DR: In this article, the authors describe a computer system having a plurality of processors and memory elements, including multiple like processor memory elements within a node and external communication paths for communication external to the node to another like scalable node of the system.
Abstract: A computer system having a plurality of processors and memory including a plurality of scalable nodes having multiple like processor memory elements. Each of the processor memory elements has a plurality of communication paths for communication within a node to other like processor memory elements within the node. Each of the processor memory elements also has a communication path for communication external to the node to another like scalable node of the computer system.

324 citations


Proceedings ArticleDOI
A. Goldberg1, J. Trotter1
02 Oct 1995
TL;DR: This paper describes how to combine simple hardware support and sampling techniques to obtain empirical data on memory system behavior without appreciably perturbing system performance.
Abstract: Fueled by higher clock rates and superscalar technologies, growth in processor speed continues to outpace improvement in memory system performance. Reflecting this trend, architects are developing increasingly complex memory hierarchies to mask the speed gap, compiler writers are adding locality enhancing transformations to better utilize complex memory hierarchies, and applications programmers are recoding their algorithms to exploit memory systems. All of these groups need empirical data on memory system behavior to guide their optimizations. This paper describes how to combine simple hardware support and sampling techniques to obtain such data without appreciably perturbing system performance. The idea is implemented in the Mprof prototype that profiles data stall cycles, first level cache misses, and second level misses on the Sun Sparc 10/41.

249 citations


Patent
07 Jun 1995
TL;DR: In this paper, a television electronic program guide intelligent memory management system and method automatically deletes the least valuable stored program information at that moment as free memory space is needed by the system.
Abstract: A television electronic program guide intelligent memory management system and method automatically deletes the least valuable stored program information at that moment as free memory space is needed by the system. In advance of a program schedule update, the system executes a two-level memory "housekeeping" operation in which the system first scans the memory to identify obsolete schedule information. If, after this sweep, there is insufficient memory available for the next update, the system performs a second-level memory "triage" operation wherein schedule information is automatically prioritized in accordance with pre-defined rules for assessing the current value of the information to each viewer based on program air time, channel and other variables relating to program utility. The system then deletes schedule information in ascending order of value, starting with the least valuable information, and continues until enough space is available in memory to store the schedule update.

221 citations


Proceedings ArticleDOI
01 Dec 1995
TL;DR: The bare minimum amount of local memories that programs require to run without delay is measured by using the Value Reuse Profile, which contains the dynamic value reuse information of a program's execution, and by assuming the existence of efficient memory systems.
Abstract: As processor performance continues to improve, more emphasis must be placed on the performance of the memory system. In this paper, a detailed characterization of data cache behavior for individual load instructions is given. We show that by selectively applying cache line allocation according the characteristics of individual load instructions, overall performance can be improved for both the data cache and the memory system. This approach can improve some aspects of memory performance by as much as 60 percent on existing executables.

221 citations


Patent
12 May 1995
TL;DR: In this paper, a data storage subsystem includes a plurality of data storage elements configured into at least two redundancy groups, each redundancy group including n+m of the storage elements, and a cache memory connected to the redundancy groups and a host processor interface.
Abstract: A data storage subsystem dynamically maps a virtual data storage device image presented to associated processors to physical data storage devices used to implement the data storage subsystem. Multiple destage memory elements are concurrently active to increase an aggregate destage data transfer rate and to allow data to be stored on various memory elements appropriate to the type of data contained in each stored virtual object. An open logical cylinder list is used to maintain data integrity among multiple open destage memory elements. Memory elements are also selected to function as archive memory. The data storage subsystem includes a plurality of data storage elements configured into at least two redundancy groups, each redundancy group including n+m of the data storage elements, and a cache memory connected to the redundancy groups and a host processor interface. The data storage subsystem stores data indicative of the amount of available memory space on each of the open logical cylinders and little used data records are migrated or transferred from other memory elements to the archive memory elements to maintain sufficient available memory space.

181 citations


Patent
13 Jul 1995
TL;DR: In this article, the authors propose an apparatus and method for dynamically adjusting the power/performance characteristics of a memory subsystem by dynamically tracking the behavior of the memory subsystem and predicting the probability that the next event will have certain characteristics, such as whether it will result in a memory cycle that requires the attention of a cache memory.
Abstract: An apparatus and method for dynamically adjusting the power/performance characteristics of a memory subsystem. Since the memory subsystem access requirements are heavily dependent on the application being executed, static methods of enabling or disabling the individual memory system components (as are used in prior art) are less than optimal from a power consumption perspective. By dynamically tracking the behavior of the memory subsystem, the invention predicts the probability that the next event will have certain characteristics, such as whether it will result in a memory cycle that requires the attention of a cache memory, whether that memory cycle will result in a cache memory hit, and whether a DRAM page hit in main memory will occur if the requested data is not in one of the levels of cache memory. Based on these probabilities, the invention dynamically enables or disables components of the subsystem. By intelligently adjusting the state of these components, significant power savings are achieved without degradation in performance.

179 citations


Proceedings ArticleDOI
01 Jun 1995
TL;DR: This work improves upon the Tofte/Talpin region-based scheme for compile-time memory management and reduces memory requirements significantly, in some cases asymptotically.
Abstract: Static memory management replaces runtime garbage collection with compile-time annotations that make all memory allocation and deallocation explicit in a program. We improve upon the Tofte/Talpin region-based scheme for compile-time memory management[TT94]. In the Tofte/Talpin approach, all values, including closures, are stored in regions. Region lifetimes coincide with lexical scope, thus forming a runtime stack of regions and eliminating the need for garbage collection. We relax the requirement that region lifetimes be lexical. Rather, regions are allocated late and deallocated as early as possible by explicit memory operations. The placement of allocation and deallocation annotations is determined by solving a system of constraints that expresses all possible annotations. Experiments show that our approach reduces memory requirements significantly, in some cases asymptotically.

147 citations


Patent
14 Jul 1995
TL;DR: In this paper, a massively parallel data processing system is described, where each node has at least one processor, a memory for storing data, a processor bus that couples the processor to the memory, and a remote memory access controller coupled to the processor bus.
Abstract: A massively parallel data processing system is disclosed. This data processing system includes a plurality of nodes, with each node having at least one processor, a memory for storing data, a processor bus that couples the processor to the memory, and a remote memory access controller coupled to the processor bus. The remote memory access controller detects and queues processor requests for remote memory, processes and packages the processor requests into request packets, forwards the request packets to the network through a router that corresponds to that node, receives and queues request packets received from the network, recovers the memory request from the request packet, manipulates local memory in accordance with the request, generates an appropriate response packet acceptable to the network and forwards the response packet to the requesting node.

Patent
13 Nov 1995
TL;DR: In this paper, a memory management system employed in a settop terminal which utilizes memory available at the headend of a CATV system through a bidirectional CATV network to augment the memory resident within the settop terminals.
Abstract: The present invention comprises a memory management system employed in a settop terminal which utilizes memory available at the headend of a CATV system through a bidirectional CATV network to augment the memory resident within the settop terminal. The system includes a memory management unit that monitors the software application running on the settop terminal microprocessor, pre-fetches blocks of the program from the headend and stores these blocks in resident memory. The memory management unit manages the limited pool of settop terminal memory by dividing it into segments large enough to hold a single program block. Program blocks are fetched from the headend as needed by the microprocessor, and segments of memory containing program blocks not likely to be used are reused. The system provides sufficient read-ahead capability to ensure that the microprocessor has enough executable code to process at all times. The location of the memory is completely transparent to the microprocessor.

Journal ArticleDOI
22 Jan 1995
TL;DR: A novel scalable shared memory multiprocessor architecture that features the automatic data migration and replication capabilities of cache-only memory architecture (COMA) machines, without the accompanying hardware complexity.
Abstract: We present design details and some initial performance results of a novel scalable shared memory multiprocessor architecture. This architecture features the automatic data migration and replication capabilities of cache-only memory architecture (COMA) machines, without the accompanying hardware complexity. A software layer manages cache space allocation at a page-granularity-similarly to distributed virtual shared memory (DVSM) systems, leaving simpler hardware to maintain shared memory coherence at a cache line granularity. By reducing the hardware complexity, the machine cost and development time are reduced. We call the resulting hybrid hardware and software multiprocessor architecture Simple COMA. Preliminary results indicate that the performance of Simple COMA is comparable to that of more complex contemporary all hardware designs. >

Patent
07 Jun 1995
TL;DR: In this paper, a mechanism for maintaining a consistent state in main memory without constraining normal computer operation is provided, thereby enabling a computer system to recover from faults without loss of data or processing continuity.
Abstract: A mechanism for maintaining a consistent state in main memory without constraining normal computer operation is provided, thereby enabling a computer system to recover from faults without loss of data or processing continuity. In a typical computer system, a processor and input/output elements are connected to a main memory via a memory bus. A shadow memory element, which includes a buffer memory and a main storage element, is also attached to this memory bus. During normal processing, data written to primary memory is also captured by the buffer memory of the shadow memory element. When a checkpoint is desired (thereby establishing a consistent state in main memory to which all executing applications can safely return following a fault), the data previously captured in the buffer memory is then copied to the main storage element of the shadow memory element. This structure and protocol can guarantee a consistent state in main memory, thus enabling fault-tolerant operation.

Patent
12 Jan 1995
TL;DR: Disclosed as mentioned in this paper is a software generation system (SGS) based memory error detection system which may be utilized to detect various memory access errors, such as array dimension violations, dereferencing of invalid pointers, accessing freed memory, reading uninitialized memory, and automated detection of memory leaks.
Abstract: Disclosed is a software generation system (SGS) based memory error detection system which may be utilized to detect various memory access errors, such as array dimension violations, dereferencing of invalid pointers, accessing freed memory, reading uninitialized memory, and automated detection of memory leaks. Error checking commands and additional information are inserted into a parse tree associated with a source code file being tested at read-time which serve to initiate and facilitate run-time error detection processes. Wrapper functions may be provided for initiating error checking processes for associated library functions. A pointer check table maintains pointer information, including valid range information, for each pointer that is utilized to monitor the use and modification of the respective pointers. A memory allocation structure records allocation information, including a chain list of all pointers that point to the memory region and an initialization status for each byte in the memory region, for each region of memory. The chain list is utilized to monitor the deallocation of the associated memory region, as well as to detect when there is a memory leak. The initialization status is used to ensure that a region of uninitialized memory is not accessed. A data flow analysis algorithm minimizes the number of pointer checks that have to be performed and allows certain read-time errors to be detected.

Book ChapterDOI
07 Aug 1995
TL;DR: Using the algorithm, the traditional mark-sweep garbage collector employed by the Mjolner run-time system for the object-oriented BETA programming language was replaced by a non-disruptive one, with only negligible time and storage overheads.
Abstract: We present an implementation of the Train Algorithm, an incremental collection scheme for reclamation of mature garbage in generation-based memory management systems. To the best of our knowledge, this is the first Train Algorithm implementation ever. Using the algorithm, the traditional mark-sweep garbage collector employed by the Mjolner run-time system for the object-oriented BETA programming language was replaced by a non-disruptive one, with only negligible time and storage overheads.

Proceedings ArticleDOI
25 Apr 1995
TL;DR: Novel techniques used for efficient simulation of memory in SimICS; an instruction level simulator developed at SICS are described, a memory simulation scheme that supports a range of features for use in computer architecture research, program profiling, and debugging.
Abstract: We describe novel techniques used for efficient simulation of memory in SimICS; an instruction level simulator developed at SICS. The design has focused on efficiently supporting the simulation of multiprocessors, analyzing complex memory hierarchies and running large binaries with a mixture of system level and user level code. A software caching mechanism (the Simulator Translation Cache, STC) improves the performance of interpreted memory operations by reducing the number of calls to complex memory simulation code. Major data structures are allocated lazily to reduce the size of the simulator process. A well defined internal interface to generic memory simulation simplifies user extensions. Leveraging on a flexible interpreter based on threaded code allows runtime selection of statistics gathering, memory profiling, and cache simulation with low overhead. The result is a memory simulation scheme that supports a range of features for use in computer architecture research, program profiling, and debugging. >

Patent
21 Apr 1995
TL;DR: In this article, a memory management method for a microkernel architecture and the microkernel itself feature template regions which are defined by the micro kernel in the memory, as special objects.
Abstract: A memory management method for a microkernel architecture and the microkernel itself feature template regions which are defined by the microkernel in the memory, as special objects. In the memory management method, after the microkernel is loaded into the memory of a data processing system, it begins creating task containers in the memory. It does this by forming template regions as special objects in the memory, the template regions having a set of attributes. Then, when the microkernel forms a task in the memory, it does so by mapping the template region into the task. The microkernel defines a virtual address space for the task based upon the template region. Later, when the microkernel conducts virtual memory operations on the template regions, the effect of the virtual memory operations is manifested in the task by means of the mapping relationship. In this manner, a single template region can be mapped into multiple tasks, simultaneously. By directing virtual memory operations to the template region on which they will take effect, the sharing of the virtual memory operations is much easier to accomplish since the changes are made to a template region, not to the mapping of the template region within each task.

Patent
Souichi Kobayashi1
11 Sep 1995
TL;DR: In this paper, the least significant bits of an address of a to-accessed memory of a number corresponding to a minimum specified range of a plurality of to-be-controlled memory areas each specified in an arbitrary size in advance are masked by mask bits.
Abstract: A novel data processing system is disclosed. Least significant bits of an address of a to-be-accessed memory of a number corresponding to a minimum specified range of a plurality of to-be-controlled memory areas each specified in an arbitrary size in advance are masked by mask bits. The access address with a predetermined number of least significant bits thereof masked is compared with each head address of a plurality of the memory areas to be controlled. It is decided in which of the memory areas to be controlled the access address is included. The memory access is controlled by access control data set for each memory area to be controlled. Further, the plurality of memory areas to be controlled are arranged in the order of priority. The to-be-controlled memory areas of higher priority are removed from the whole of the memory areas, whereby discontinuous memory areas are treated as a single memory area to be controlled.

Journal ArticleDOI
TL;DR: A novel technique-founded on data-flow analysis which allows one to address the problem of background memory size evaluation for a given nonprocedural algorithm specification, operating on multidimensional signals with affine indexes is presented.
Abstract: Memory cost is responsible for a large amount of the chip and/or board area of customized video and image processing system realizations. In this paper, we present a novel technique-founded on data-flow analysis which allows one to address the problem of background memory size evaluation for a given nonprocedural algorithm specification, operating on multidimensional signals with affine indexes. Most of the target applications are characterized by a huge number of signals, so a new polyhedral data-flow model operating on groups of scalar signals is proposed. These groups are obtained by a novel analytical partitioning technique, allowing to select a desired granularity, depending on the application complexity. The method incorporates a way to tradeoff memory size with computational and controller complexity. >

Proceedings ArticleDOI
09 May 1995
TL;DR: The authors present scheduling techniques for optimum data memory compaction and present a suboptimum scheduling selection criterion, which call be used for SA and non SA-schedulers.
Abstract: For the design of complex digital signal processing systems, block diagram oriented synthesis of real time software for programmable target processors has become an important design aid. The synthesis approach discussed in the paper is based on multirate block diagrams with scalable synchronous dataflow (SSDF) semantics. For this class of dataflow graphs the authors present scheduling techniques for optimum data memory compaction. These techniques can be employed to map signals of a block diagram onto a minimum data memory space. In order to formalize the data memory compaction problem, they first derive appropriate implementation measures. Based on these implementation measures it can be shown that optimum data memory compaction consists of optimum scheduling as well as optimum memory allocation. For the class of single appearance (SA) block diagrams with SSDF semantics, scheduling can be reduced to an integer linear programming (ILP) problem. Due to the computational complexity of ILP, the authors also present a suboptimum scheduling selection criterion, which call be used for SA and non SA-schedulers.

Patent
12 Oct 1995
TL;DR: In this paper, a plurality of semiconductor memory modules (21,....2n) are connected through a common clock signal line and one or more other signal lines to an accessing circuit.
Abstract: In a semiconductor memory, a plurality of semiconductor memory modules (21, ....2n) are connected through a common clock signal line and one or more other signal lines to an accessing circuit. The accessing circuit has a timing information storage unit (3A, 3B) for storing beforehand access timing information associated with the respective semiconductor memory modules and a timing varying unit (6A, 6B) for varying a data receiving timing at a transfer destination in compliance with a semiconductor memory module to be accessed, on the basis of the access timing information stored in the timing information storage unit.

Patent
07 Jun 1995
TL;DR: In this article, a method of allocating free physical memory in a solid state memory disk for a sector of data of a given size is described, which is based on the sum of the amount of free memory in the selected block and one of the following: 1) the number of invalid data in the block, 2) the cycle count for the block; 3) the size of the invalid data as compared to a maximum amount of invalid invalid data for all non-volatile memory devices associated with the block.
Abstract: A method of allocating free physical memory in a solid state memory disk for a sector of data of a given size is described. Allocation begins by determining whether sufficient free memory remains in the block to which the previous sector of data was written. If there is not sufficient free memory remaining, then selection of another block to allocate the sector of data begins. The selection is based on the sum of the amount of free memory in a selected block and one of the following: 1) the amount of invalid data in the block; 2) the cycle count for the block; 3) the amount of invalid data as compared to a maximum amount of invalid data for all non-volatile memory devices associated with the block; and 4) the number of blocks already allocated to all non-volatile memory devices associated with the block. Afterward, the block with the greatest amount of available memory is selected to store the sector of data.

Patent
Carole Dulong1
23 Oct 1995
TL;DR: Memory to memory transfer as mentioned in this paper allows memory transfer operations to occur in parallel with the operation of arithmetic pipelines that process pattern recognition procedures, so that no additional processing time is consumed by a memory transfer.
Abstract: A computer implemented apparatus and method for transferring information from one set or sets of memory locations to another set or sets of memory locations. The present invention has particular advantageous use within a computer system specially implemented for pattern recognition applications, such as handwriting or voice recognition. The present invention includes a system with an automatic sequencer able to sequentially generate sequential source and destination addresses and able to generate appropriate data requests to internal and external memory controllers. The present invention memory to memory transfer unit allows memory transfer operations to occur in parallel with the operation of arithmetic pipelines that process pattern recognition procedures. Therefore, using the present invention, no additional processing time is consumed by a memory transfer. Double buffering is utilized to transfer information and process information in the same time frame.

Journal ArticleDOI
TL;DR: A new algorithm called Priority Adaptation Query Resource Scheduling (PAQRS) is introduced and evaluated for handling both single class and multiclass query workloads and confirms that PAQRS is very effective for real-time query scheduling.
Abstract: In recent years, a demand for real-time systems that can manipulate large amounts of shared data has led to the emergence of real-time database systems (RTDBS) as a research area. This paper focuses on the problem of scheduling queries in RTDBSs. We introduce and evaluate a new algorithm called Priority Adaptation Query Resource Scheduling (PAQRS) for handling both single class and multiclass query workloads. The performance objective of the algorithm is to minimize the number of missed deadlines, while at the same time ensuring that any deadline misses are scattered across the different classes according to an administratively-defined miss distribution. This objective is achieved by dynamically adapting the system's admission, memory allocation, and priority assignment policies according to its current resource configuration and workload characteristics. A series of experiments confirms that PAQRS is very effective for real-time query scheduling. >

Patent
13 Jun 1995
TL;DR: In this paper, a conflict resolution system for interleaved memories in processors capable of issuing multiple independent memory operations per cycle is presented, which includes an address bellow for temporarily storing memory requests, and cross-connect switches to variously route multiple parallel memory requests to multiple memory banks.
Abstract: A conflict resolution system for interleaved memories in processors capable of issuing multiple independent memory operations per cycle. The conflict resolution system includes an address bellow for temporarily storing memory requests, and cross-connect switches to variously route multiple parallel memory requests to multiple memory banks. A control logic block controls the address bellow and the cross-connect switches to reorder the sequence of memory requests to avoid conflicts. The reordering removes conflicts and increases the occurrence of alternating memory requests that can issue simultaneously.

Patent
28 Apr 1995
TL;DR: In this article, a memory control method includes a first step of managing control information in each of memory blocks constituting the memory, the control information joining the memory blocks in a chain according to the frequency of data erasures occurring in each memory block; and a second step of determining, on the basis of the control, a memory block that should be a transfer destination of write data.
Abstract: A memory control method includes a first step of managing control information in each of memory blocks constituting the memory, the control information joining the memory blocks in a chain according to the frequency of data erasures occurring in each of the memory blocks; and a second step of determining, on the basis of the control information, a memory block that should be a transfer destination of write data.

Patent
12 Jan 1995
TL;DR: In this paper, a method and apparatus for monitoring machinery in which data is collected and stored based on application specific retention rules is presented. But the proposed method is not suitable for the monitoring of large data sets.
Abstract: The present invention relates to a method and apparatus for monitoring machinery in which data is collected and stored based on application specific retention rules. Based on an alarm value, data records are adjusted by reducing the spectral resolution, coefficient precision and time intervals between records. An alarm is activated when the data value is greater than a predetermined alarm threshold value. Memory management can compare the memory availability to projected requirements. If the projected memory requirements exceeds memory availability, the data records can be interactively decimated. When decimation cannot make available sufficient memory for projected requirements within the retention rules, an alarm is issued. Collected data can be transferred with a removable nonvolatile memory or infrared communications with a computer device.

Patent
10 Feb 1995
TL;DR: In this paper, the size of unusable segments of dynamic memory is determined dynamically and those small segments are safely removed from the dynamic memory pool and placed on a separate list, and the contents of the separate list are agglomerated and returned to dynamic memory for use.
Abstract: A method for optimizing dynamic memory pool structures is presented. The size of unusable segments of dynamic memory is determined dynamically and those small segments are safely removed from the dynamic memory pool and placed on a separate list. Periodically or when dynamic memory is in heavy demand, the contents of the separate list are agglomerated and returned to dynamic memory for use. Consequently the time taken to search for a suitably-sized segment of dynamic memory is reduced considerably.

Journal ArticleDOI
TL;DR: In this paper, the authors describe MemSpy, a performance-monitoring tool designed to help programmers determine where and why memory bottlenecks occur, and guide programmers toward program transformations that improve memory performance through detailed statistics on cache-miss causes and frequency.
Abstract: To improve program memory performance, programmers and compiler writers can transform the application so that its memory-referencing behavior better exploits the memory hierarchy. The challenge in achieving these program transformations is overcoming the difficulty of statically analyzing or reasoning about an application's referencing behavior and interactions. In addition, many performance-monitoring tools collect high-level information that is inadequately detailed to analyze specific memory performance bugs. We describe MemSpy, a performance-monitoring tool we designed to help programmers discern where and why memory bottlenecks occur. MemSpy guides programmers toward program transformations that improve memory performance through detailed statistics on cache-miss causes and frequency. Because of the natural link between data-reference patterns and memory performance, MemSpy helps programmers comprehend data structure and code segment interactions by displaying statistics in terms of both the program's data and code structures, rather than for code structures alone. >