Showing papers on "Memory management published in 1988"

PDF

Open Access

Journal Article•DOI•

Strategies for cache and local memory management by global program transformation

[...]

GannonDennis, JalbyWilliam, GallivanKyle

01 Oct 1988-Journal of Parallel and Distributed Computing

309 citations

Patent•

Multiple processor system having shared memory with private-write capability

[...]

Richard W. Cutts, Nikhil A. Mehta, Douglas E. Jewett

13 Dec 1988

TL;DR: In this paper, a fault-tolerant configuration employs multiple identical CPUs executing the same instruction stream, with multiple identical memory modules in the address space of the CPUs storing duplicates of the same data.

...read moreread less

Abstract: A computer system in a fault-tolerant configuration employs multiple identical CPUs executing the same instruction stream, with multiple, identical memory modules in the address space of the CPUs storing duplicates of the same data. Memory references. The multiple CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts can be synchronized by ensuring that all CPUs implement the interrupt at the same point in their instruction stream. Memory references by the multiple CPUs are voted by each of the memory modules. A private-write area is included in the shared memory space in the memory modules to allow functions such as software voting of state information unique to CPUs. All CPUs write state information to their private-write area, then all CPUs read all the private-write areas for functions such as detecting differences in interrupt cause or the like.

...read moreread less

185 citations

Journal Article•DOI•

Improving locality of reference in a garbage-collecting memory management system

[...]

Robert Courts¹•Institutions (1)

Texas Instruments¹

01 Sep 1988-Communications of The ACM

TL;DR: An adaptive memory management algorithm allows substantial improvement in locality of reference in garbage-collected systems and indicates that page-wait time typically is reduced by a factor of four with constant memory size and disk technology.

...read moreread less

Abstract: Modern Lisp systems make heavy use of a garbage-collecting style of memory management. Generally, the locality of reference in garbage-collected systems has been very poor. In virtual memory systems, this poor locality of reference generally causes a large amount of wasted time waiting on page faults or uses excessively large amounts of main memory. An adaptive memory management algorithm, described in this article, allows substantial improvement in locality of reference. Performance measurements indicate that page-wait time typically is reduced by a factor of four with constant memory size and disk technology. Alternately, the size of memory typically can be reduced by a factor of two with constant performance.

...read moreread less

142 citations

Journal Article•DOI•

Impact of Hierarchical Memory Systems On Linear Algebra Algorithm Design

[...]

Kyle A. Gallivan¹, William Jalby¹, Ulrike Meier¹, Ahmed H. Sameh¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Mar 1988

TL;DR: A methodology is proposed that facilitates analysis of the behavior of the matrix-matrix primitives and the resulting block algorithms as a function of certain system parameters to identify the limits of performance improvement possible via blocking and any contradictory trends that require trade-off consideration.

...read moreread less

Abstract: Linear algebra algorithms based on the BLAS or ex tended BLAS do not achieve high performance on mul tivector processors with a hierarchical memory system because of a lack of data locality. For such machines, block linear algebra algorithms must be implemented in terms of matrix-matrix primitives BLAS3. Designing ef ficient linear algebra algorithms for these architectures requires analysis of the behavior of the matrix-matrix primitives and the resulting block algorithms as a func tion of certain system parameters. The analysis must identify the limits of performance improvement possible via blocking and any contradictory trends that require trade-off consideration. We propose a methodology that facilitates such an analysis and use it to analyze the per formance of the BLAS3 primitives used in block methods. A similar analysis of the block size-perfor mance relationship is also performed at the algorithm level for block versions of the LU decomposition and the Gram-Schmidt orthogonalization procedures.

...read moreread less

138 citations

Patent•

Topologically-distributed-memory multiprocessor computer

[...]

Herbert R. Carleton¹, J. Q. Broughton¹•Institutions (1)

State University of New York System¹

22 Jan 1988

TL;DR: In this paper, a modular, expandable, topologically-distributed-memory multiprocessor computer comprises a plurality of non-directly communicating slave processors under the control of a synchronizer and a master processor.

...read moreread less

Abstract: A modular, expandable, topologically-distributed-memory multiprocessor computer comprises a plurality of non-directly communicating slave processors under the control of a synchronizer and a master processor. Memory space is partitioned into a plurality of memory cells. Dynamic variables may be mapped into the memory cells so that they depend upon processing in nearby partitions. Each slave processor is connected in a topologically well-defined way through a dynamic bi-directional switching system (gateway) to different respective ones of the memory cells. Access by the slave processors to their respective topologically similar memory cells occurs concurrently or in parallel in such a way that no data-flow conflicts occur. The topology of data distribution may be chosen to take advantage of symmetries which occur in broad classes of problems. The system may be tied to a host computer used for data storage and analysis of data not efficiently processed by the multiprocessor computer.

...read moreread less

133 citations

Proceedings Article•DOI•

Strategies for cache and local memory management by global program transformation

[...]

Dennis Gannon¹, William Jalby¹, Kyle A. Gallivan¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Oct 1988

TL;DR: A method for using data dependence analysis to estimate cache and local memory demand in highly iterative scientific codes in the form of a family of “reference” windows for each variable that reflects the current set of elements that should be kept in cache.

...read moreread less

Abstract: In this paper we describe a method for using data dependence analysis to estimate cache and local memory demand in highly iterative scientific codes. The estimates take the form of a family of “reference” windows for each variable that reflects the current set of elements that should be kept in cache. It is shown that, in important special cases, we can estimate the size of the window and predict a lower bound on the number of cache hits. If the machine has local memory or cache that can be managed by the compiler, these estimates can be used to guide the management of this resource. It is also shown that these estimates can be used to guide program transformations in an attempt to optimize cache performance.

...read moreread less

97 citations

Patent•

Multiprocessing system having nodes containing a processor and an associated memory module with dynamically allocated local/global storage in the memory modules

[...]

William C. Brantley¹, Kevin Patrick Mcaulifee¹, Vern Alan Norton¹, Gregory Francis Pfister¹, Joseph Weiss¹ - Show less +1 more•Institutions (1)

IBM¹

16 Mar 1988

TL;DR: In this paper, a multiprocessing system is presented having a plurality of processing nodes interconnected together by a communication network, each processing node including a processor, responsive to user software running on the system, and an associated memory module, and capable under user control of dynamically partitioning each memory module into a global storage efficiently accessible by a number of processors connected to the network, and local storage efficient accessible by its associated processor.

...read moreread less

Abstract: A multiprocessing system is presented having a plurality of processing nodes interconnected together by a communication network, each processing node including a processor, responsive to user software running on the system, and an associated memory module, and capable under user control of dynamically partitioning each memory module into a global storage efficiently accessible by a number of processors connected to the network, and local storage efficiently accessible by its associated processor.

...read moreread less

83 citations

Proceedings Article•DOI•

On the problem of optimizing data transfers for complex memory systems

[...]

Kyle A. Gallivan¹, William Jalby², Dennis Gannon³•Institutions (3)

University of Illinois at Urbana–Champaign¹, French Institute for Research in Computer Science and Automation², Indiana University³

01 Jun 1988

TL;DR: An overview of some of the mathematical issues behind several of the problems associated with restructuring software to take advantage of parallel supercomputers architectures with complex memory hierarchies or distributed memory systems is presented.

...read moreread less

Abstract: Parallel supercomputers architectures with complex memory hierarchies or distributed memory systems have become very common. Unfortunately, the problems associated with restructuring software to take advantage of these memory systems are not easily solved. This paper presents an overview of some of the mathematical issues behind several of these problems and attempts to give a brief look at some of the potential solutions.

...read moreread less

81 citations

Patent•

Method and apparatus for addressing a memory by array transformations

[...]

Alan J. Deerfield¹, Sun-Chi Siu¹•Institutions (1)

Raytheon¹

05 Dec 1988

TL;DR: In this paper, a memory having an address generator in an intelligent port which generates address sequences specified by an array transformation operator in a programmable processor is presented. But the address generator is not considered in this paper.

...read moreread less

Abstract: A memory having an address generator in an intelligent port which generates address sequences specified by an array transformation operator in a programmable processor, thereby allowing a controlling processor to proceed immediately to the preparation of the next instruction in parallel with memory execution of a present instruction. The intelligent port of the memory creates complex data structures from input data arrays stored in memory and directs the transformation of the data structures into output data streams. The memory comprises a plurality of read-write memory banks and a bank of read-only memory interconnected through intelligent ports and busses to other units of the processor. An arbitration and switching network assigns memory banks to the intelligent ports.

...read moreread less

73 citations

Patent•

Computer memory system having persistent objects

[...]

Satish M. Thatte¹, Donald W. Oxley¹•Institutions (1)

Texas Instruments¹

19 Feb 1988

TL;DR: In this paper, a uniform memory system for use with symbolic computers has a very large virtual address space, and no separate files, not directly addressable in the address space of the virtual memory, exist.

...read moreread less

Abstract: A uniform memory system for use with symbolic computers has a very large virtual address space. No separate files, not directly addressable in the address space of the virtual memory, exist. A special object, the peristent root, defines memory objects which are relatively permanent, such objects being traceable by pointers from the persistent root. A tombstone mechanism is used to prevent objects from referencing deleted objects.

...read moreread less

72 citations

Patent•

Memory unit backup using checksum

[...]

Yitzhak Dishon¹, Christos John Georgiou¹•Institutions (1)

IBM¹

28 Jun 1988

TL;DR: In this paper, a memory system backup for tightly or loosely coupled multiprocessor systems is proposed, where a plurality of primary memory units (13, 14, 15) having substantially the same configuration are backed up by a single memory unit of similiar configuration.

...read moreread less

Abstract: A memory system backup for use in a tightly or loosely coupled multiprocessor system. A plurality of primary memory units (13, 14, 15) having substantially the same configuration are backed up by a single memory unit (20) of similiar configuration. The backup memory unit holds the checksum of all data held in all primary memory units. In the event of the failure of one of the primary memory units its contents can be recreated based on the data in the remaining non-failed memory unit and the checksum in the backup unit.

...read moreread less

Journal Article•

Data Diffusion Machine - A Scalable Shared Virtual Memory Multiprocessor.

[...]

David H. D. Warren, Seif Haridi

01 Jan 1988-Future Generation Computer Systems

Design, implementation, and performance evaluation of a distributed shared memory server for Mach

[...]

Alessandro Forin

01 Jan 1988

TL;DR: A new distributed algorithm is shown to outperform centralized ones and provide unrestricted sharing of read-write memory between tasks running on either strongly coupled or loosely coupled architectures, and any mixture thereof.

...read moreread less

Abstract: This report describes the design, implementation and performance evaluation of a virtual shared memory server for the Mach operating system. The server provides unrestricted sharing of read-write memory between tasks running on either strongly coupled or loosely coupled architectures, and any mixture thereof. A number of memory coherency algorithms have been implemented and evaluated, including a new distributed algorithm that is shown to outperform centralized ones. Some of the features of the server include support for machines with multiple page sizes, for heterogeneous shared memory, and for fault tolerance. Extensive performance measures of applications are presented, and the intrinsic costs evaluated. Table of

...read moreread less

A Memory Allocation Profiler for C and Lisp Programs

[...]

Benjamin G. Zorn, Paul N. Hilfinger

16 Feb 1988

TL;DR: Inprof as discussed by the authors is a two-phase tool that records the amount of memory each function allocates, breaks down allocation information by type and size, and displays a program's dynamic cal graph so that functions indirectly responsible for memory allocation are easy to identify.

...read moreread less

Abstract: This paper describes inprof, a tool used to study the memory allocation behavior of programs. mprof records the amount of memory each function allocates, breaks down allocation information by type and size, and displays a program''s dynamic cal graph so that functions indirectly responsible for memory allocation are easy to identify. mprof is a two-phase tool. The monitor phase is linked into executing programs and records information each time memory is allocated. The display phase reduces the data generated by the monitor and displays the information to the user in several tables. mprof has been implemented for C and Kyoto Common Lisp. Measurements of these implementations are presented.

...read moreread less

Overview of memory management

[...]

Randall L. Hyde

01 Apr 1988

Journal Article•DOI•

Reducing contention in shared-memory multiprocessors

[...]

Per Stenström¹•Institutions (1)

Lund University¹

01 Nov 1988-IEEE Computer

TL;DR: The techniques that can be used to design a memory system that reduces the impact of contention are examined and the implementations and the design decisions taken in each are reviewed.

...read moreread less

Abstract: The techniques that can be used to design a memory system that reduces the impact of contention are examined. To exemplify the techniques, the implementations and the design decisions taken in each are reviewed. The discussion covers memory organization, interconnection networks, memory allocation, cache memory, and synchronization and contention. The multiprocessor implementations considered are C.mmp, CM*, RP3, Alliant FX, Cedar, Butterfly, SPUR, Dragon, Multimax, and Balance. >

...read moreread less

Journal Article•DOI•

Measuring VAX 8800 performance with a histogram hardware monitor

[...]

D. W. Clark, P. J. Bannon, J. B. Keller

17 May 1988

TL;DR: This paper reports the results of a study of VAX 8800 processor performance using a hardware monitor that collects histograms of the processor's micro-PC and memory bus status, which yields a very detailed picture of the amount of time an average VAX instruction spends in various activities on the 8800.

...read moreread less

Abstract: This paper reports the results of a study of VAX 8800 processor performance using a hardware monitor that collects histograms of the processor's micro-PC and memory bus status. The monitor keeps a count of all machine cycles executed at each micro-PC location, as well as counting all occurrences of each bus transaction. It can measure a running system without interfering with it, and this paper's results are based on measurements of live timesharing. Because the 8800 is a microcoded machine, a great deal of information can be gleaned from these data. The paper reports opcode and operand specifier frequencies, as well as the amount of time spent in instruction execution and various kinds of overhead, such as memory management and cache-wait stalls. The histogram method yields a very detailed picture of the amount of time an average VAX instruction spends in various activities on the 8800.

...read moreread less

Patent•

Multiprocessor memory management system with the flexible features of a tightly-coupled system in a non-shared memory system

[...]

Thomas P. Bishop¹, Mark H. Davis¹, Robert W. Fish¹, James S. Peterson¹, Grover T. Surratt¹ - Show less +1 more•Institutions (1)

AT&T Labs¹

05 Dec 1988

TL;DR: In this article, the process manager assigns processes to processors and satisfies their initial memory requirements through global memory allocations, and deallocates to uncommitted memory both memory that is dynamically requested to be deallocated and memory of terminating processes.

...read moreread less

Abstract: In a multiprocessor system (FIG. 1) wherein each adjunct processor has its own, non-shared, memory (22) the non-shared memory of each adjunct processor (11-12) comprises global memory (42) and local memory (41). All global memory of all adjunct processors is managed by a single process manager (30) of a system-wide host processor (10). Each processor's local memory is managed by its operating system kernel (31). Local memory comprises uncommitted memory (45) not allocated to any process and committed memory (46) allocated to processes. The process manager assigns processes to processors and satisfies their initial memory requirements through global memory allocations. Each kernel satisfies processes' dynamic memory allocation requests from uncommitted memory, and deallocates to uncommitted memory both memory that is dynamically requested to be deallocated and memory of terminating processes. Each processor's kernel and the process manager cooperate to transfer memory between global memory and uncommitted memory to keep the amount of uncommitted memory within a predetermined range.

...read moreread less

Patent•

Photo printer having a host computer assist function and method of controlling the same

[...]

Kawamata Yoshio¹, Ito Shoichi¹, Katsufumi Takagishi¹•Institutions (1)

Hitachi¹

29 Jul 1988

TL;DR: A photo printer system comprises a magnetic storage for storing print data sent from a host computer, a bit map memory for storing a print dot data, and a printer engine for printing the contents of the bitmap memory as discussed by the authors.

...read moreread less

Abstract: A photo printer system comprises a magnetic storage for storing a print data sent from a host computer, a bit map memory for storing a print dot data, and a printer engine for printing the contents of the bit map memory. The system includes a program which operates on the magnetic storage to serve as an external storage for the host computer and on the bit map memory to serve as a cache memory in response to a data read/write command issued by the host computer, and a CPU which controls the execution of the program. At least in a non-print process mode, the system forms a data path so that the host computer can access to the bit map memory and magnetic storage so as to have a bidirectional data read/write operation. In another mode, the system forms a data path so that image data picked up with an image scanner is saved directly in the bit map memory and the image data is sent to the host computer by request.

...read moreread less

Journal Article•DOI•

Intel's 80960: an architecture optimized for embedded control

[...]

D.P. Ryan¹•Institutions (1)

Intel¹

01 May 1988-IEEE Micro

TL;DR: The register model, core instruction set, register operations, memory operations, control operations instruction cache, user-supervisor protection, interrupts, faults, and debug support are presented.

...read moreread less

Abstract: Important features and capabilities of the 80960 are briefly examined, and an overview of its architecture is given. A detached discussion is presented of the register model, core instruction set, register operations, memory operations, control operations instruction cache, user-supervisor protection, interrupts, faults, and debug support. >

...read moreread less

Proceedings Article•DOI•

Virtual memory algorithms

[...]

Alok Aggarwal¹, Ashok K. Chandra¹•Institutions (1)

IBM¹

01 Jan 1988

TL;DR: Computer programs are usually written with the illusion that they will run on something like a random access machine (RAM) with a large memory, all locations of which are equally fast, but in practice this is far from the truth.

...read moreread less

Abstract: Computer programs are usually written with the illusion that they will run on something like a random access machine (RAM) [AHU74], with a large memory, all locations of which are equally fast. In practice, this is far from the truth. In large machines, for example, the range of speeds from the fastest memory (registers at about Ions) to the slowest (disks or mass store at IOms or seconds) can bc a factor of a million or even a billion! Machine dcsigncrs attempt IO smooth out this range. to the ~‘xtcnt lhat is technologically feasible, by providing many levels of memory in hctwccn. ‘l’hcsc memory levels m;ry includes one or Iwo lcvcls of c:rThc. main memory, cxpandcd slow , :IIKI drums. I‘hc pro~r;un~. (II’ course, run in virtual memory. ‘l‘hc h;ndwarc and lhc 0pCrillill~ syslclll al-

...read moreread less

Journal Article•DOI•

Communication performance in multiple-bus systems

[...]

Qing Yang¹, S.G. Zaky¹•Institutions (1)

University of Toronto¹

01 Jul 1988-IEEE Transactions on Computers

TL;DR: An approximate, closed-form solution is given that is simple and easy to use for any number of processors, buses, or memory modules and for arbitrary memory block size.

...read moreread less

Abstract: A simple queueing model is presented for studying the effect of multiple-bus interconnection networks on the performance of asynchronous multiprocessor systems. The proposed model is suitable for systems in which each processor has a local memory and is thus able to continue processing while waiting for a response from the global memory. An approximate, closed-form solution is given that is simple and easy to use for any number of processors, buses, or memory modules and for arbitrary memory block size. The model is used to study the access time of the global memory as a function of the number of buses for different local-memory/global-memory traffic rates. >

...read moreread less

Proceedings Article•

On Array Storage for Conflict-Free Memory Access for Parallel Processors.

[...]

Meera Balakrishnan, Rajiv Jain, Cauligi S. Raghavendra

01 Jan 1988

Proceedings Article•DOI•

A performance study of broadcast information delivery systems

[...]

H. D. Dykeman¹, Johnny W. Wong¹•Institutions (1)

University of Waterloo¹

27 Mar 1988

TL;DR: The authors consider a videotex system architecture where user requests are processed by a service computer, and the requested information pages are broadcast to all users, and features such as scheduling page broadcasts, memory management, and disk scheduling are represented explicitly in the model.

...read moreread less

Abstract: The authors consider a videotex system architecture where user requests are processed by a service computer, and the requested information pages are broadcast to all users. Due to the large volume of information that is typically available, a secondary storage device, such as a disk, is used to hold the database. However, a small fraction of the information pages may be kept in main memory. A detailed simulation model is used to study the performance of this system architecture. Features such as scheduling page broadcasts, memory management, and disk scheduling are represented explicitly in the model. Simulation results are presented to show the response-time performance of various memory-management and disk scheduling strategies. >

...read moreread less

Journal Article•DOI•

Programming in VS Fortran on the IBM 3090 for maximum vector performance

[...]

B. Liu¹, N. Strother¹•Institutions (1)

IBM¹

01 Jun 1988-IEEE Computer

TL;DR: Programming techniques necessary for high performance on the 3090 Vector Facilities are illustrated, showing that VS Fortran programs can achieve near-maximum execution rates.

...read moreread less

Abstract: Programming techniques necessary for high performance on the 3090 Vector Facilities are illustrated, showing that VS Fortran programs can achieve near-maximum execution rates. Relevant features of the 3090 architecture are reviewed, stressing the need to make efficient use of a hierarchical storage system and take advantage of the compound vector instructions. The key programming techniques for managing the storage hierarchy are loop sectioning, loop distribution, and data compaction. Vector register, cache reuse, and virtual memory, storage format, and page reuse are shown to lead to efficient use of the vector registers, the high speed cache, and the virtual memory system, respectively. The multiply-and-add compound instruction is discussed. >

...read moreread less

Journal Article•DOI•

Multiple resources for processing and storage in short-term working memory

[...]

Stuart T. Klapp¹, Allan Netick¹•Institutions (1)

California State University¹

01 Oct 1988-Human Factors

TL;DR: The results of the present experiments support a version of multiple-resource theory applied to working memory in which resource composition depends on internal mediators even when stimulus and response modality are held constant.

...read moreread less

Abstract: A frequent assumption in cognitive psychology is that performance in decision making and planning is severely restricted by the limited capacity of short-term working memory. Many predictions of this theory have not been supported, possibly because working memory may be composed of multiple resources rather than a single resource. The present experiments study two tasks, both involving memory for digits. Although these tasks can employ the same modality for input and for responding, they appear to differ in their demands for working memory resources. Specifically, the tasks appear to differ in resources required for processing at input, and they also differ in resources in the sense of storage capacity. The results support a version of multiple-resource theory applied to working memory in which resource composition depends on internal mediators even when stimulus and response modality are held constant.

...read moreread less

Journal Article•DOI•

The TMS34010: an embedded microprocessor

[...]

K.M. Guttag¹, T.M. Albers¹, M.D. Asal¹, K.G. Rose¹•Institutions (1)

Texas Instruments¹

01 May 1988-IEEE Micro

TL;DR: Although it is aimed at graphics systems, the TMS34010's large address reach, bit-field processing, and DRAM (dynamic random-access memory) interface make it suitable for many other embedded processing applications.

...read moreread less

Abstract: The authors discuss the TMS34010, a high-performance 32-bit microprocessor with special instructions and hardware for handling the bit-field data and address manipulations often associated with computer graphics. They give a history of embedded microprocessors and examine the wide range of processors and applications covered by that term. They provide an overview of the internal architecture of the TMS34010 and discuss the choice of feature set in its design. Although it is aimed at graphics systems, the processor's large address reach, bit-field processing, and DRAM (dynamic random-access memory) interface make it suitable for many other embedded processing applications. >

...read moreread less

Journal Article•DOI•

The control mechanism for the Myrias parallel computer system

[...]

Monica Beltrametti, Kenneth Bobey, John R. Zorbas

01 Sep 1988-ACM Sigarch Computer Architecture News

TL;DR: An overview of the issues involved in the design of the control mechanism used by the Myrias parallel computer system to manage the execution of parallel programs is presented.

...read moreread less

Abstract: This paper presents an overview of the issues involved in the design of the control mechanism used by the Myrias parallel computer system to manage the execution of parallel programs. The following issues are discussed: initial task distribution, dynamic load leveling, hierarchical caching, task synchronization, memory management, and scalability. Some of the more important points are illustrated using performance measurements obtained by running a test program on the Myrias Research Corporation prototype system.

...read moreread less

Journal Article•DOI•

Concurrent algorithms for real-time memory management

[...]

R. Ford¹•Institutions (1)

University of Kansas¹

01 Sep 1988-IEEE Software

TL;DR: The conflict between the performance demands of real-time systems and the shared-resource needs of high-level languages (Ada in particular) is examined and it is shown that one system, an optimized optimistic version, does deliver performance that is acceptable for real- time applications.

...read moreread less

Abstract: The conflict between the performance demands of real-time systems and the shared-resource needs of high-level languages (Ada in particular) is examined. Shared memory requires carefully designed concurrency control, but the traditional approach, which is to embed the entire allocate-release implementation code in critical sections, is unsuitable for real-time applications because it results in excessively high response time. The design and performance of three memory-management systems for real-time applications are evaluated, and it is shown that one system, an optimized optimistic version, does deliver performance that is acceptable for real-time applications. >

...read moreread less

Journal Article•DOI•

Analysis of memory referencing behavior for design of local memories

[...]

G. D. McNiven¹, Edward S. Davidson²•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of Michigan²

17 May 1988

TL;DR: Memory referencing behavior is analyzed via the study of traces for the purpose of developing new local memory structures and management techniques and indicates the use of a program controlled cache to efficiently reduce the traffic from the cache to main memory.

...read moreread less

Abstract: Memory referencing behavior is analyzed via the study of traces for the purpose of developing new local memory structures and management techniques. A novel trace processing technique called flattening reduces the dependence of the results on the underlying compiler and architecture on which the trace was generated, and partitions each memory location into its constituent values. The referencing patterns of each value in the resulting trace is described via statistics such as interreference time, lifetime, etc. The referencing patterns of the entire trace are described via histograms showing the distributions of the statistics for the individual values. The results of this analysis indicate the use of a program controlled cache to efficiently reduce the traffic from the cache to main memory. By using program control, the future knowledge of the compiler can be imparted to the cache, allowing the rejection of dead values and early replacement of values with long interreference times.

...read moreread less