Scalable locality-conscious multithreaded memory allocation
read more
Citations
A lightweight infrastructure for graph analytics
Cache craftiness for fast multicore key-value storage
Corey: an operating system for many cores
Scalable address spaces using RCU balanced trees
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
References
Dynamic Storage Allocation: A Survey and Critical Review
Hoard: a scalable memory allocator for multithreaded applications
Thread Scheduling for Multiprogrammed Multiprocessors
Thread scheduling for multiprogrammed multiprocessors
A fast storage allocator
Related Papers (5)
Frequently Asked Questions (15)
Q2. What future works have the authors mentioned in the paper "Scalable locality-conscious multithreaded memory allocation" ?
Extending Streamflow with mechanisms and policies to detect memory pressure and proactively release memory to prevent thrashing is left as future work.
Q3. Why does contiguous allocation of small objects work well in practice?
contiguous allocation of small objects works well in practice because eliminating object headers helps avoid fragmentation and cache pollution.
Q4. What is the main consideration in the context of thread-safe memory allocators?
Scalability and synchronization overhead reduction has been the central consideration in the context of thread-safe memory allocators [3, 18], while locality has been the focal point of the design of sequential memory allocators for more than a decade [11].
Q5. How many MB of address space is enough for the BIBOP table?
In a 32- bit address space 1 MB is enough for the BIBOP table (768 KB in Linux, since 25% of the virtual address space is reserved for kernel memory).
Q6. How does Streamflow limit the population of the local and global caches?
In order to maintain low virtual memory usage, their implementation constrains the population of the local and global caches to one and zero page blocks respectively.
Q7. How many memory pages are allocated for the thread stack?
Tracing system calls performed by the application revealed that before each thread generation, 513 memory pages (2052 KB) are allocated for the thread’s stack.
Q8. What is the key scalability-limiting factor of Streamflow?
Since Streamflow eliminates (in the common case) or significantly reduces (in the uncommon case) synchronization, the key scalability-limiting factor of multithreaded memory managers, the authors expect it to be scalable and efficient on larger shared-memory multiprocessors as well.
Q9. Why do page blocks have to be of the same size?
Due to the minimum size, maximum size, and power of two size limitations for page blocks, multiple object classes use page blocks of the same size.
Q10. What is the Streamflow policy for removing partially free or locally cached page blocks?
Whenever a thread terminates, Streamflow ensures that the free memory of partially free or locally cached page blocks in its heap will be made available to the other threads.
Q11. What is the performance of Streamflow even with Consume?
It is worth noting that Streamflow performs well even with Consume, which is specifically designed to stress multithreaded allocators that use thread-local heaps.
Q12. What is the maximum page block size?
The minimum page block size (16 KB in Streamflow) allows more than 1024 very small objects to be packed inside a single page block, given that the size of the resulting page blocks is also small and the additional memory consumption is not a concern.
Q13. What are the main problems and trade-offs in a unified design?
Several problems and trade-off’s arise in an at-tempt to integrate scalable concurrent allocation mechanisms with cache- and page-conscious object allocation mechanisms in a unified design.
Q14. What are the main problems of thread-safe memory allocators?
At a lower level, page block allocation and recycling policies in thread-safe allocators are primarily concerned with fragmentation and blowup, without necessarily accounting for locality [3].
Q15. How many bytes is required for a header?
This limits the minimum memory required for headers to 8 bytes and the minimum object granularity supported by the allocator to 16 bytes (including the header).