scispace - formally typeset
Search or ask a question
Topic

Memory management

About: Memory management is a research topic. Over the lifetime, 16743 publications have been published within this topic receiving 312028 citations. The topic is also known as: memory allocation.


Papers
More filters
Proceedings ArticleDOI
17 Dec 1998
TL;DR: Modulo Unrolling as discussed by the authors is a code transformation technique for enabling array references to be accessed through the fast static network on a Raw machine, which allows the static communication of a large class of array accesses.
Abstract: We present modulo unrolling, a code transformation technique for enabling array references to be accessed through the fast static network on a Raw machine. A Raw machine comprises of a mesh of simple, replicated tiles connected by an interconnect which supports fast, static near-neighbor communication. Like all other resources, memory is distributed across the tiles. Management of the memory can be performed by well known techniques which generate the requisite communication code on distributed address-space architectures. On the other hand, the fast, static network provides the compiler with a simple interface to optimize such communication. This paper addresses the problem of taking advantage of such static communication for memory accesses. The requirement for static memory communication is the compile-time knowledge of the exact communication required for each memory reference. This knowledge, in turn, can be obtained if a memory reference refers exclusively to memory residing on a single processing tile. We introduce modulo unrolling as a technique which allows the static communication of a large class of array accesses. We show how this technique achieves the goal of static communication by using a relatively small unroll factor. For a set of dense matrix scientific applications, we are able to access all the array references on the static network, enabling scalable speedups on the Raw machine.

68 citations

Journal ArticleDOI
TL;DR: This paper introduces a novel approach to the design of memory systems, which is based on a variety of array grouping techniques and dimensional transformations, and the binding of array groups to memory components with different dimensions, access times, and number of ports.
Abstract: This paper discusses the mapping of arrays in a behavior to memories in an implementation. We introduce a novel approach to the design of memory systems, which is based on a variety of array grouping techniques and dimensional transformations, and the binding of array groups to memory components with different dimensions, access times, and number of ports. The results of design actions are computed in terms of memory cost, the number of wires necessary to connect the memory to the data path, and the limit of performance imposed by the memory design on the implementation. Three different procedures can be used to find a suitable memory design. All three procedures are directed by a weighted and constrained system cost function, which enables the expression of the user's design priorities. Compared to related research efforts, our approach improves performance by as much as 19%, reduces memory cost as 40%, and decreases the number of wires required to connect the memory to the data path by up to 57%.

68 citations

Patent
03 Mar 1982
TL;DR: In a hierarchical memory system, replacement of segments in a cache memory is governed by a least recently used algorithm, while trickling of segments from the cache memory to the bulk memory was governed by the age since first write.
Abstract: In a hierarchical memory system, replacement of segments in a cache memory is governed by a least recently used algorithm, while trickling of segments from the cache memory to the bulk memory is governed by the age since first write The host processor passes an AGEOLD parameter to the memory subsystem and this parameter regulates the trickling of segments Unless the memory system is idle (no I/O activity), no trickling takes place until the age of the oldest written-to segment is at least as great as AGEOLD A command is generated for each segment to be trickled and the priority of execution assigned to such commands is variable and determined by the relationship of AGEOLD to the oldest age since first write of any of the segments If the subsystem receives no command from the host processor for a predetermined interval, AGEOLD is ignored and any written-to segment becomes a candidate for trickling

68 citations

Journal ArticleDOI
TL;DR: In this paper, the authors present a broad range of computational techniques to improve applicability, run time, and memory management of automatic differentiation packages, including operation overloading, region based memory, and expression templates.
Abstract: Derivatives play a critical role in computational statistics, examples being Bayesian inference using Hamiltonian Monte Carlo sampling and the training of neural networks. Automatic differentiation is a powerful tool to automate the calculation of derivatives and is preferable to more traditional methods, especially when differentiating complex algorithms and mathematical functions. The implementation of automatic differentiation however requires some care to insure efficiency. Modern differentiation packages deploy a broad range of computational techniques to improve applicability, run time, and memory management. Among these techniques are operation overloading, region based memory, and expression templates. There also exist several mathematical techniques which can yield high performance gains when applied to complex algorithms. For example, semi-analytical derivatives can reduce by orders of magnitude the runtime required to numerically solve and differentiate an algebraic equation. Open problems include the extension of current packages to provide more specialized routines, and efficient methods to perform higher-order differentiation.

68 citations

Proceedings ArticleDOI
08 Mar 2018
TL;DR: A key consideration for deep neural network (DNN) inference accelerators is the need for large and high-bandwidth external memories, and an architectural concept for stacking a DNN accelerator with DRAMs has been proposed.
Abstract: A key consideration for deep neural network (DNN) inference accelerators is the need for large and high-bandwidth external memories. Although an architectural concept for stacking a DNN accelerator with DRAMs has been proposed previously, long DRAM latency remains problematic and limits the performance [1]. Recent algorithm-level optimizations, such as network pruning and compression, have shown success in reducing the DNN memory size [2]; however, since networks become irregular and sparse, they induce an additional need for agile random accesses to the memory systems.

68 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
94% related
Scalability
50.9K papers, 931.6K citations
92% related
Server
79.5K papers, 1.4M citations
89% related
Virtual machine
43.9K papers, 718.3K citations
87% related
Scheduling (computing)
78.6K papers, 1.3M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202333
202288
2021629
2020467
2019461
2018591