B-Queue: Efficient and Practical Queuing for Fast Core-to-Core Communication
TLDR
B-Queue is proposed, an efficient and practical single-producer-single-consumer concurrent lock-free queue that solves the deadlock problem gracefully by introducing a self-adaptive backtracking mechanism and is a good candidate for fast core-to-core communication on multi-core architectures.Abstract:
Core-to-core communication is critical to the effective use of multi-core processors. A number of software based concurrent lock-free queues have been proposed to address this problem. Existing solutions, however, suffer from performance degradation in real testbeds, or rely on auxiliary hardware or software timers to handle the deadlock problem when batching is used, making those solutions good in theory but difficult to use in practice. This paper describes the pros and cons of existing concurrent lock-free queues in both dummy and real testbeds and proposes B-Queue, an efficient and practical single-producer-single-consumer concurrent lock-free queue that solves the deadlock problem gracefully by introducing a self-adaptive backtracking mechanism. Experiments show that in real massively-parallel applications, B-Queue is faster than FastForward and MCRingBuffer, the two state-of-the-art concurrent lock-free queues, by up to 10x and 5x, respectively. Moreover, B-Queue outperforms FastForward and MCRingBuffer in terms of stability and scalability, making it a good candidate for fast core-to-core communication on multi-core architectures.read more
Citations
More filters
Dissertation
Efficient Engineering and Execution of Pipe-and-Filter Architectures
TL;DR: TeeTime is presented, able to model and to execute arbitrary P&F architectures and allows to execute filters in parallel by utilizing the capabilities of contemporary multi-core processor systems.
Proceedings ArticleDOI
Parallel and Generic Pipe-and-Filter Architectures with TeeTime
TL;DR: This paper presents the P&F framework TeeTime, able to model and to execute arbitrary P &F architectures, and allows to execute filters in parallel by utilizing the capabilities of contemporary multi-core processor systems.
Proceedings ArticleDOI
FFQ: A Fast Single-Producer/Multiple-Consumer Concurrent FIFO Queue
TL;DR: A fast FIFO queue, FFQ, that aims at maximizing throughput by specializing the algorithm for single-producer/multiple-consumer settings: each producer has its own queue from which multiple consumers can concurrently dequeue.
Proceedings ArticleDOI
Lynx: Using OS and Hardware Support for Fast Fine-Grained Inter-Core Communication
TL;DR: This paper proposes Lynx, a novel SP/SC queue, specifically tuned for fine-grained communication, built from the ground up, and relies on existing commodity processor hardware and operating system exception handling support to deal with infrequent queue maintenance operations.
References
More filters
Journal ArticleDOI
Linearizability: a correctness condition for concurrent objects
TL;DR: This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.
Book
The Art of Multiprocessor Programming
TL;DR: Transactional memory as discussed by the authors is a computational model in which threads synchronize by optimistic, lock-free transactions, and there is a growing community of researchers working on both software and hardware support for this approach.
Proceedings ArticleDOI
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms
TL;DR: Experiments on a 12-node SGI Challenge multiprocessor indicate that the new non-blocking queue consistently outperforms the best known alternatives; it is the clear algorithm of choice for machines that provide a universal atomic primitive (e.g., compare_and_swap or load_linked/store_conditional).
Proceedings ArticleDOI
RouteBricks: exploiting parallelism to scale software routers
Mihai Dobrescu,Norbert Egi,Katerina Argyraki,Byung-Gon Chun,Kevin Fall,Gianluca Iannaccone,Allan D. Knies,Maziar Manesh,Sylvia Ratnasamy +8 more
TL;DR: This work proposes a software router architecture that parallelizes router functionality both across multiple servers and across multiple cores within a single server, and demonstrates a 35Gbps parallel router prototype.
Journal ArticleDOI
Specifying Concurrent Program Modules
TL;DR: A method for specifying program modules in a concurrent program is described, based upon temporal logic, but uses new kinds of temporal assertions to make the specifications simpler and easier to understand.