B-Queue: Efficient and Practical Queuing for Fast Core-to-Core Communication

doi:10.1007/S10766-012-0213-X

Open AccessJournal ArticleDOI

B-Queue: Efficient and Practical Queuing for Fast Core-to-Core Communication

Junchang Wang, +3 more

- 01 Feb 2013 -

International Journal of Parallel Progra...

- Vol. 41, Iss: 1, pp 137-159

TLDR

B-Queue is proposed, an efficient and practical single-producer-single-consumer concurrent lock-free queue that solves the deadlock problem gracefully by introducing a self-adaptive backtracking mechanism and is a good candidate for fast core-to-core communication on multi-core architectures.

Abstract:

Core-to-core communication is critical to the effective use of multi-core processors. A number of software based concurrent lock-free queues have been proposed to address this problem. Existing solutions, however, suffer from performance degradation in real testbeds, or rely on auxiliary hardware or software timers to handle the deadlock problem when batching is used, making those solutions good in theory but difficult to use in practice. This paper describes the pros and cons of existing concurrent lock-free queues in both dummy and real testbeds and proposes B-Queue, an efficient and practical single-producer-single-consumer concurrent lock-free queue that solves the deadlock problem gracefully by introducing a self-adaptive backtracking mechanism. Experiments show that in real massively-parallel applications, B-Queue is faster than FastForward and MCRingBuffer, the two state-of-the-art concurrent lock-free queues, by up to 10x and 5x, respectively. Moreover, B-Queue outperforms FastForward and MCRingBuffer in terms of stability and scalability, making it a good candidate for fast core-to-core communication on multi-core architectures.

B-Queue: Efficient and Practical Queuing for Fast Core-to-Core Communication

Citations

The Art of Multiprocessor Programming

Efficient Engineering and Execution of Pipe-and-Filter Architectures

Parallel and Generic Pipe-and-Filter Architectures with TeeTime

FFQ: A Fast Single-Producer/Multiple-Consumer Concurrent FIFO Queue

Lynx: Using OS and Hardware Support for Fast Fine-Grained Inter-Core Communication

References

Linearizability: a correctness condition for concurrent objects

The Art of Multiprocessor Programming

Simple, fast, and practical non-blocking and blocking concurrent queue algorithms

RouteBricks: exploiting parallelism to scale software routers

Specifying Concurrent Program Modules

Related Papers (5)

FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue

The Broker Queue: A Fast, Linearizable FIFO Queue for Fine-Granular Work Distribution on the GPU

Specifying Concurrent Program Modules

A lock-free, cache-efficient multi-core synchronization mechanism for line-rate network traffic monitoring

Implementation and Analysis of Distributed Relaxed Concurrent Queues in Remote Memory Access Model