Topic

Memory management

About: Memory management is a research topic. Over the lifetime, 16743 publications have been published within this topic receiving 312028 citations. The topic is also known as: memory allocation.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Myths and realities: the performance impact of garbage collection

[...]

Stephen M. Blackburn¹, Perry Cheng², Kathryn S. McKinley³•Institutions (3)

Australian National University¹, IBM², University of Texas at Austin³

01 Jun 2004

TL;DR: This paper explores and quantifies garbage collection behavior for three whole heap collectors and generational counterparts: copying semi-space, mark-sweep, and reference counting, the canonical algorithms from which essentially all other collection algorithms are derived.

...read moreread less

Abstract: This paper explores and quantifies garbage collection behavior for three whole heap collectors and generational counterparts: copying semi-space, mark-sweep, and reference counting, the canonical algorithms from which essentially all other collection algorithms are derived. Efficient implementations in MMTk, a Java memory management toolkit, in IBM's Jikes RVM share all common mechanisms to provide a clean experimental platform. Instrumentation separates collector and program behavior, and performance counters measure timing and memory behavior on three architectures.Our experimental design reveals key algorithmic features and how they match program characteristics to explain the direct and indirect costs of garbage collection as a function of heap size on the SPEC JVM benchmarks. For example, we find that the contiguous allocation of copying collectors attains significant locality benefits over free-list allocators. The reduced collection costs of the generational algorithms together with the locality benefit of contiguous allocation motivates a copying nursery for newly allocated objects. These benefits dominate the overheads of generational collectors compared with non-generational and no collection, disputing the myth that "no garbage collection is good garbage collection." Performance is less sensitive to the mature space collection algorithm in our benchmarks. However the locality and pointer mutation characteristics for a given program occasionally prefer copying or mark-sweep. This study is unique in its breadth of garbage collection algorithms and its depth of analysis.

...read moreread less

248 citations

Patent•

Non-volatile memory and method with block management system

[...]

Alan David Bennett, Alan Douglas Bryce, Sergey Anatolievich Gorobets, Alan Welsh Sinclair, Peter John Smith - Show less +1 more

21 Dec 2004

TL;DR: A nonvolatile memory system is organized in physical groups of physical memory locations and each physical group (metablock) is erasable as a unit and can be used to store a logical group of data.

...read moreread less

Abstract: A non-volatile memory system is organized in physical groups of physical memory locations. Each physical group (metablock) is erasable as a unit and can be used to store a logical group of data. A memory management system allows for update of a logical group of data by allocating a metablock dedicated to recording the update data of the logical group. The update metablock records update data in the order received and has no restriction on whether the recording is in the correct logical order as originally stored (sequential) or not (chaotic). Eventually the update metablock is closed to further recording. One of several processes will take place, but will ultimately end up with a fully filled metablock in the correct order which replaces the original metablock. In the chaotic case, directory data is maintained in the non-volatile memory in a manner that is conducive to frequent updates. The system supports multiple logical groups being updated concurrently.

...read moreread less

245 citations

Proceedings Article•DOI•

DNNBuilder: an automated tool for building high-performance DNN hardware accelerators for FPGAs

[...]

Xiaofan Zhang¹, Junsong Wang², Chao Zhu², Yonghua Lin², Jinjun Xiong², Wen-mei W. Hwu¹, Deming Chen¹ - Show less +3 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, IBM²

05 Nov 2018

TL;DR: DNNBuilder, an automatic design space exploration tool to generate optimized parallelism guidelines by considering external memory access bandwidth, data reuse behaviors, FPGA resource availability, and DNN complexity, is designed and demonstrated.

...read moreread less

Abstract: Building a high-performance EPGA accelerator for Deep Neural Networks (DNNs) often requires RTL programming, hardware verification, and precise resource allocation, all of which can be time-consuming and challenging to perform even for seasoned FPGA developers. To bridge the gap between fast DNN construction in software (e.g., Caffe, TensorFlow) and slow hardware implementation, we propose DNNBuilder for building high-performance DNN hardware accelerators on FPGAs automatically. Novel techniques are developed to meet the throughput and latency requirements for both cloud- and edge-devices. A number of novel techniques including high-quality RTL neural network components, a fine-grained layer-based pipeline architecture, and a column-based cache scheme are developed to boost throughput, reduce latency, and save FPGA on-chip memory. To address the limited resource challenge, we design an automatic design space exploration tool to generate optimized parallelism guidelines by considering external memory access bandwidth, data reuse behaviors, FPGA resource availability, and DNN complexity. DNNBuilder is demonstrated on four DNNs (Alexnet, ZF, VGG16, and YOLO) on two FPGAs (XC7Z045 and KU115) corresponding to the edge- and cloud-computing, respectively. The fine-grained layer-based pipeline architecture and the column-based cache scheme contribute to 7.7x and 43x reduction of the latency and BRAM utilization compared to conventional designs. We achieve the best performance (up to 5.15x faster) and efficiency (up to 5.88x more efficient) compared to published FPGA-based classification-oriented DNN accelerators for both edge and cloud computing cases. We reach 4218 GOPS for running object detection DNN which is the highest throughput reported to the best of our knowledge. DNNBuilder can provide millisecond-scale real-time performance for processing HD video input and deliver higher efficiency (up to 4.35x) than the GPU-based solutions.

...read moreread less

244 citations

Journal Article•DOI•

A Survey of Software Techniques for Using Non-Volatile Memories for Storage and Main Memory Systems

[...]

Sparsh Mittal¹, Jeffrey S. Vetter¹•Institutions (1)

Oak Ridge National Laboratory¹

01 May 2016-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A survey of software techniques that have been proposed to exploit the advantages and mitigate the disadvantages of NVMs when used for designing memory systems, and, in particular, secondary storage and main memory.

...read moreread less

Abstract: Non-volatile memory (NVM) devices, such as Flash, phase change RAM, spin transfer torque RAM, and resistive RAM, offer several advantages and challenges when compared to conventional memory technologies, such as DRAM and magnetic hard disk drives (HDDs). In this paper, we present a survey of software techniques that have been proposed to exploit the advantages and mitigate the disadvantages of NVMs when used for designing memory systems, and, in particular, secondary storage (e.g., solid state drive) and main memory. We classify these software techniques along several dimensions to highlight their similarities and differences. Given that NVMs are growing in popularity, we believe that this survey will motivate further research in the field of software technology for NVMs.

...read moreread less

244 citations

Proceedings Article•DOI•

Communication optimization and code generation for distributed memory machines

[...]

Saman Amarasinghe, Monica S. Lam

01 Jun 1993

TL;DR: It is shown that the problems of communication code generation, local memory management, message aggregation and redundant data communication elimination can all be solved by projecting polyhedra represented by sets of inequalities onto lower dimensional spaces.

...read moreread less

Abstract: This paper presents several algorithms to solve code generation and optimization problems specific to machines with distributed address spaces. Given a description of how the computation is to be partitioned across the processors in a machine, our algorithms produce an SPMD (single program multiple data) program to be run on each processor. Our compiler generated the necessary receive and send instructions, optimizes the communication by eliminating redundant communication and aggregating small messages into large messages, allocates space locally on each processor, and translates global data addresses to local addresses.Our techniques are based on an exact data-flow analysis on individual array element accesses. Unlike data dependence analysis, this analysis determines if two dynamic instances refer to the same value, and not just to the same location. Using this information, our compiler can handle more flexible data decompositions and find more opportunities for communication optimization than systems based on data dependence analysis.Our technique is based on a uniform framework, where data decompositions, computation decompositions and the data flow information are all represented as systems of linear inequalities. We show that the problems of communication code generation, local memory management, message aggregation and redundant data communication elimination can all be solved by projecting polyhedra represented by sets of inequalities onto lower dimensional spaces.

...read moreread less

241 citations

Collapse

Network Information

Performance

Metrics

16,861

Papers

331,311

Citations

No. of papers in the topic in previous years
Year	Papers
2023	33
2022	88
2021	629
2020	467
2019	461
2018	591

Memory management

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics