Topic

Memory management

About: Memory management is a research topic. Over the lifetime, 16743 publications have been published within this topic receiving 312028 citations. The topic is also known as: memory allocation.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Memory coloring: a compiler approach for scratchpad memory management

[...]

Lian Li¹, Lin Gao¹, Jingling Xue¹•Institutions (1)

University of New South Wales¹

17 Sep 2005

TL;DR: A general-purpose compiler approach, called memory coloring, to efficiently allocating the arrays in a program to an SPM, by adapting an existing graph-colouring algorithm for register allocation to assign the array in the program into the register file.

...read moreread less

Abstract: Scratchpad memory (SPM), a fast software-managed on-chip SRAM, is now widely used in modern embedded processors. Compared to hardware-managed cache, it is more efficient in performance, power and area cost, and has the added advantage of better time predictability. This paper introduces a general-purpose compiler approach, called memory coloring, to efficiently allocating the arrays in a program to an SPM. The novelty of our approach lies in partitioning an SPM into a "register file", splitting the live ranges of arrays to create potential data transfer statements between the SPM and off-chip memory, and finally, adapting an existing graph-colouring algorithm for register allocation to assign the arrays in the program into the register file. Our approach is efficient due to the practical efficiency of graph-colouring algorithms. We have implemented this work in SUIF and machSUIF. Preliminary results over benchmarks show that our approach represents a promising solution to automatic SPM management.

...read moreread less

117 citations

Journal Article•DOI•

Experimental comparison of memory management policies for NUMA multiprocessors

[...]

Richard P. LaRowe, Carla Schlatter Ellis¹•Institutions (1)

Duke University¹

01 Nov 1991-ACM Transactions on Computer Systems

TL;DR: The results show that there are memory management policies implemented in the system that can improve the performance of programs written using the simpler uniform memory access (UMA) programming model, and there appears to be no single policy that can be considered the best over a set of test applications.

...read moreread less

Abstract: Non-uniformity of memory access is an almost inevitable feature of memory architecture in shared memory multiprocessor designs that can scale to large numbers of processors. One implication of NUMA architectures is that the placement and movement of code and data become crucial to performance. As memory architectures become more complex and the nonuniformity becomes less well hidden, systems software must assume a larger role in providing memory management support for the programmer. This paper investigates the role of the operating system. We take an experimental approach to evaluating a wide-range of memory management policies. The target NUMA environment is BBN''s GP-1000 multiprocessor. Extensive local modifications have been made to the memory management subsystem of BBN''s nX operating system to support multiple policy implementations. Policy comparisons are based on the measured performance of real parallel applications. Our results show that there are memory management policies implemented in our system that can improve the performance of programs written using the simpler uniform memory access (UMA) programming model. While achieving the level of performance of a highly tuned NUMA program is still a difficult problem, some examples come close. There appears to be no single policy that can be considered the best over our set of test applications. Investigations into the contributions made by individual policy features toward overall behavior of the workload provide some insight into the design of a set of effective policies.

...read moreread less

117 citations

Proceedings Article•DOI•

McRT-Malloc: a scalable transactional memory allocator

[...]

Richard L. Hudson¹, Bratin Saha¹, Ali-Reza Adl-Tabatabai¹, Benjamin C. Hertzberg²•Institutions (2)

Intel¹, Stanford University²

10 Jun 2006

TL;DR: This paper is the first to integrate a software transactional memory system with a malloc/free based memory allocator and presents the first algorithm which ensures that space allocated in an aborted transaction is properly freed and does not lead to a space blowup.

...read moreread less

Abstract: Emerging multi-core processors promise to provide an exponentially increasing number of hardware threads with every generation. Applications will need to be highly concurrent to fullyuse the power of these processors. To enable maximum concurrency, libraries (such as malloc-free packages) would therefore need to use non-blocking algorithms. But lock-free algorithms are notoriously difficult to reason about and inappropriate for average programmers. Transactional memory promises to significantly ease concurrent programming for the average programmer. This paper describes a highly efficient non-blocking malloc/free algorithm that supports memory allocation and deallocation inside transactional code blocks. Thus this paper describes a memory allocator that is suitable for emerging multi-core applications, while supporting modern concurrency constructs.This paper makes several novel contributions. It is the first to integrate a software transactional memory system with a malloc/free based memory allocator. We present the first algorithm which ensures that space allocated in an aborted transaction is properly freed and does not lead to a space blowup. Unlike previous lock-free malloc packages, our algorithm avoids atomic operations on typical code paths, making our algorithm substantially more efficient.

...read moreread less

117 citations

Proceedings Article•DOI•

Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

[...]

Jeffrey A. Stuecheli¹, Dimitris Kaseridis¹, Hillery C. Hunter², Lizy K. John¹•Institutions (2)

University of Texas at Austin¹, IBM²

04 Dec 2010

TL;DR: This work proposes dynamically reconfigurable predictive mechanisms that exploit the full dynamic range allowed in the JEDEC DDRx SDRAM specifications, and refers to the overall scheme as Elastic Refresh, in that the refresh policy is stretched to fit the currently executing workload, such that the maximum benefit of the DRAM flexibility is realized.

...read moreread less

Abstract: High density memory is becoming more important as many execution streams are consolidated onto single chip many-core processors. DRAM is ubiquitous as a main memory technology, but while DRAM’s per-chip density and frequency continue to scale, the time required to refresh its dynamic cells has grown at an alarming rate. This paper shows how currently-employed methods to schedule refresh operations are ineffective in mitigating the significant performance degradation caused by longer refresh times. Current approaches are deficient– they do not effectively exploit the flexibility of DRAMs to postpone refresh operations. This work proposes dynamically reconfigurable predictive mechanisms that exploit the full dynamic range allowed in the JEDEC DDRx SDRAM specifications. The proposed mechanisms are shown to mitigate much of the penalties seen with dense DRAM devices. We refer to the overall scheme as Elastic Refresh, in that the refresh policy is stretched to fit the currently executing workload, such that the maximum benefit of the DRAM flexibility is realized. We extend the GEMS on SIMICS tool-set to include Elastic Refresh. Simulations show the proposed solution provides a 10% average performance improvement over existing techniques across the entire SPEC CPU suite, and up to a 41%improvement for certain workloads.

...read moreread less

116 citations

Journal Article•DOI•

High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic

[...]

Xiaocong Lian¹, Zhenyu Liu¹, Zhourui Song², Jiwu Dai³, Wei Zhou³, Xiangyang Ji¹ - Show less +2 more•Institutions (3)

Tsinghua University¹, Beijing University of Posts and Telecommunications², Northwestern Polytechnical University³

16 May 2019-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: An optimized block-floating-point (BFP) arithmetic is adopted in the accelerator for efficient inference of deep neural networks in this paper, and improves the energy and hardware efficiency by three times.

...read moreread less

Abstract: Convolutional neural networks (CNNs) are widely used and have achieved great success in computer vision and speech processing applications. However, deploying the large-scale CNN model in the embedded system is subject to the constraints of computation and memory. An optimized block-floating-point (BFP) arithmetic is adopted in our accelerator for efficient inference of deep neural networks in this paper. The feature maps and model parameters are represented in 16-bit and 8-bit formats, respectively, in the off-chip memory, which can reduce memory and off-chip bandwidth requirements by 50% and 75% compared to the 32-bit FP counterpart. The proposed 8-bit BFP arithmetic with optimized rounding and shifting-operation-based quantization schemes improves the energy and hardware efficiency by three times. One CNN model can be deployed in our accelerator without retraining at the cost of an accuracy loss of not more than 0.12%. The proposed reconfigurable accelerator with three parallelism dimensions, ping-pong off-chip DDR3 memory access, and an optimized on-chip buffer group is implemented on the Xilinx VC709 evaluation board. Our accelerator achieves a performance of 760.83 GOP/s and 82.88 GOP/s/W under a 200-MHz working frequency, significantly outperforming previous accelerators.

...read moreread less

116 citations

Collapse

Network Information

Performance

Metrics

16,861

Papers

331,311

Citations

No. of papers in the topic in previous years
Year	Papers
2023	33
2022	88
2021	629
2020	467
2019	461
2018	591

Memory management

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics