scispace - formally typeset
Search or ask a question
Topic

Memory management

About: Memory management is a research topic. Over the lifetime, 16743 publications have been published within this topic receiving 312028 citations. The topic is also known as: memory allocation.


Papers
More filters
Patent
25 Feb 1999
TL;DR: In this paper, a memory request detector generates memory request indication data, such as data representing whether memory requests have been received within a predetermined time, based on detection of graphics and/or video memory requests during an active mode of the display system operation.
Abstract: An apparatus and method dynamically controls the graphics and/or video memory power dynamically during idle periods of the memory interface during active system modes. In one embodiment, a memory request detector generates memory request indication data, such as data representing whether memory requests have been received within a predetermined time, based on detection of graphics and/or video memory requests during an active mode of the display system operation. A dynamic activity based memory power controller analyzes the memory request indication data and controls the power consumption of the graphics and/or video memory based on whether memory requests are detected.

137 citations

Proceedings ArticleDOI
12 Nov 2011
TL;DR: This paper proposes a simple API, Dymaxion, that allows programmers to optimize memory mappings to improve the efficiency of memory accesses on heterogeneous platforms and achieves 3.3× speedup on GPU kernels and 20% overall performance improvement, including the PCI-E transfer, over the original CUDA implementations on an NVIDIA GTX 480 GPU.
Abstract: Graphics processors (GPUs) have emerged as an important platform for general purpose computing. GPUs offer a large number of parallel cores and have access to high memory bandwidth; however, data structure layouts in GPU memory often lead to sub-optimal performance for programs designed with a CPU memory interface — or no particular memory interface at all! — in mind. This implies that application performance is highly sensitive irregularity in memory access patterns. This issue is all the more important due to the growing disparity between core and DRAM clocks; memory interfaces have increasingly become bottlenecks in computer systems. In this paper, we propose a simple API, Dymaxion1, that allows programmers to optimize memory mappings to improve the efficiency of memory accesses on heterogeneous platforms. Use of Dymaxion requires only minimal modifications to existing CUDA programs. Our current framework extends NVIDIA's CUDA API with the addition of memory layout remapping and index transformation. We consider the overhead of layout remapping and effectively hide it through chunking and overlapping with PCI-E transfer. We present the implementation of Dymaxion and its optimizations and evaluate a variety of important memory access patterns. Using four case studies, we are able to achieve 3.3× speedup on GPU kernels and 20% overall performance improvement, including the PCI-E transfer, over the original CUDA implementations on an NVIDIA GTX 480 GPU. We also explore the importance of maintaining per-device data layouts and cross-device data mappings with a case study of concurrent CPU-GPU execution.

136 citations

Book ChapterDOI
01 Apr 2010
TL;DR: A hybrid architecture for the NAND flash memory storage, of which the log region is implemented using phase change random access memory (PRAM), which has the following advantages: the PRAM log region allows in-place updating so that it significantly improves the usage efficiency of log pages by eliminating out-of-date log records.
Abstract: In recent years, many systems have employed NAND flash memory as storage devices because of its advantages of higher performance (compared to the traditional hard disk drive), high-density, random-access, increasing capacity, and falling cost. On the other hand, the performance of NAND flash memory is limited by its “erase-before-write” requirement. Log-based structures have been used to alleviate this problem by writing updated data to the clean space. Prior log-based methods, however, cannot avoid excessive erase operations when there are frequent updates, which quickly consume free pages, especially when some data are updated repeatedly. In this paper, we propose a hybrid architecture for the NAND flash memory storage, of which the log region is implemented using phase change random access memory (PRAM). Compared to traditional log-based architectures, it has the following advantages: (1) the PRAM log region allows in-place updating so that it significantly improves the usage efficiency of log pages by eliminating out-of-date log records; (2) it greatly reduces the traffic of reading from the NAND flash memory storage since the size of logs loaded for the read operation is decreased; (3) the energy consumption of the storage system is reduced as the overhead of writing and reading log data is decreased with the PRAM log region; (4) the lifetime of NAND flash memory is increased because the number of erase operations are reduced. To facilitate the PRAM log region, we propose several management policies. The simulation results show that our proposed methods can substantially improve the performance, energy consumption, and lifetime of the NAND flash memory storage1.

136 citations

Proceedings ArticleDOI
02 Jun 2018
TL;DR: This paper investigates widely used DNNs and finds that the major contributors to memory footprint are intermediate layer outputs (feature maps), and introduces a framework for DNN-layer-specific optimizations that significantly reduce this source of main memory pressure on GPUs.
Abstract: Modern deep neural networks (DNNs) training typically relies on GPUs to train complex hundred-layer deep networks A significant problem facing both researchers and industry practitioners is that, as the networks get deeper, the available GPU main memory becomes a primary bottleneck, limiting the size of networks it can train In this paper, we investigate widely used DNNs and find that the major contributors to memory footprint are intermediate layer outputs (feature maps) We then introduce a framework for DNN-layer-specific optimizations (eg, convolution, ReLU, pool) that significantly reduce this source of main memory pressure on GPUs We find that a feature map typically has two uses that are spread far apart temporally Our key approach is to store an encoded representation of feature maps for this temporal gap and decode this data for use in the backward pass; the full-fidelity feature maps are used in the forward pass and relinquished immediately Based on this approach, we present Gist, our system that employs two classes of layer-specific encoding schemes -- lossless and lossy -- to exploit existing value redundancy in DNN training to significantly reduce the memory consumption of targeted feature maps For example, one insight is by taking advantage of the computational nature of back propagation from pool to ReLU layer, we can store the intermediate feature map using just 1 bit instead of 32 bits per value We deploy these mechanisms in a state-of-the-art DNN framework (CNTK) and observe that Gist reduces the memory footprint to upto 2X across 5 state-of-the-art image classification DNNs, with an average of 18X with only 4% performance overhead We also show that further software (eg, CuDNN) and hardware (eg, dynamic allocation) optimizations can result in even larger footprint reduction (upto 41X)

136 citations

Patent
20 Sep 1996
TL;DR: In this paper, a secure embedded memory management unit for a microprocessor is used for encrypted instruction and data transfer from an external memory, where all of the processing takes place on buses internal to the chip, detection of clear unencrypted instructions and data is prevented.
Abstract: A secure embedded memory management unit for a microprocessor is used for encrypted instruction and data transfer from an external memory. Physical security is obtained by embedding the direct memory access controller on the same chip with a microprocessor core, an internal memory, and an encryption/decryption logic. Data transfer to and from an external memory takes place between the external memory and the memory controller of the memory management unit. All firmware to and from the external memory is handled on a page-by-page basis. Since all of the processing takes place on buses internal to the chip, detection of clear unencrypted instructions and data is prevented.

136 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
94% related
Scalability
50.9K papers, 931.6K citations
92% related
Server
79.5K papers, 1.4M citations
89% related
Virtual machine
43.9K papers, 718.3K citations
87% related
Scheduling (computing)
78.6K papers, 1.3M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202333
202288
2021629
2020467
2019461
2018591