scispace - formally typeset
Open AccessJournal ArticleDOI

B-Queue: Efficient and Practical Queuing for Fast Core-to-Core Communication

TLDR
B-Queue is proposed, an efficient and practical single-producer-single-consumer concurrent lock-free queue that solves the deadlock problem gracefully by introducing a self-adaptive backtracking mechanism and is a good candidate for fast core-to-core communication on multi-core architectures.
Abstract
Core-to-core communication is critical to the effective use of multi-core processors. A number of software based concurrent lock-free queues have been proposed to address this problem. Existing solutions, however, suffer from performance degradation in real testbeds, or rely on auxiliary hardware or software timers to handle the deadlock problem when batching is used, making those solutions good in theory but difficult to use in practice. This paper describes the pros and cons of existing concurrent lock-free queues in both dummy and real testbeds and proposes B-Queue, an efficient and practical single-producer-single-consumer concurrent lock-free queue that solves the deadlock problem gracefully by introducing a self-adaptive backtracking mechanism. Experiments show that in real massively-parallel applications, B-Queue is faster than FastForward and MCRingBuffer, the two state-of-the-art concurrent lock-free queues, by up to 10x and 5x, respectively. Moreover, B-Queue outperforms FastForward and MCRingBuffer in terms of stability and scalability, making it a good candidate for fast core-to-core communication on multi-core architectures.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

The Art of Multiprocessor Programming

D.M. Hutton
- 17 Oct 2008 - 
Dissertation

Efficient Engineering and Execution of Pipe-and-Filter Architectures

TL;DR: TeeTime is presented, able to model and to execute arbitrary P&F architectures and allows to execute filters in parallel by utilizing the capabilities of contemporary multi-core processor systems.
Proceedings ArticleDOI

Parallel and Generic Pipe-and-Filter Architectures with TeeTime

TL;DR: This paper presents the P&F framework TeeTime, able to model and to execute arbitrary P &F architectures, and allows to execute filters in parallel by utilizing the capabilities of contemporary multi-core processor systems.
Proceedings ArticleDOI

FFQ: A Fast Single-Producer/Multiple-Consumer Concurrent FIFO Queue

TL;DR: A fast FIFO queue, FFQ, that aims at maximizing throughput by specializing the algorithm for single-producer/multiple-consumer settings: each producer has its own queue from which multiple consumers can concurrently dequeue.
Proceedings ArticleDOI

Lynx: Using OS and Hardware Support for Fast Fine-Grained Inter-Core Communication

TL;DR: This paper proposes Lynx, a novel SP/SC queue, specifically tuned for fine-grained communication, built from the ground up, and relies on existing commodity processor hardware and operating system exception handling support to deal with infrequent queue maintenance operations.
References
More filters
Journal ArticleDOI

Linearizability: a correctness condition for concurrent objects

TL;DR: This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.
Book

The Art of Multiprocessor Programming

TL;DR: Transactional memory as discussed by the authors is a computational model in which threads synchronize by optimistic, lock-free transactions, and there is a growing community of researchers working on both software and hardware support for this approach.
Proceedings ArticleDOI

Simple, fast, and practical non-blocking and blocking concurrent queue algorithms

TL;DR: Experiments on a 12-node SGI Challenge multiprocessor indicate that the new non-blocking queue consistently outperforms the best known alternatives; it is the clear algorithm of choice for machines that provide a universal atomic primitive (e.g., compare_and_swap or load_linked/store_conditional).
Proceedings ArticleDOI

RouteBricks: exploiting parallelism to scale software routers

TL;DR: This work proposes a software router architecture that parallelizes router functionality both across multiple servers and across multiple cores within a single server, and demonstrates a 35Gbps parallel router prototype.
Journal ArticleDOI

Specifying Concurrent Program Modules

TL;DR: A method for specifying program modules in a concurrent program is described, based upon temporal logic, but uses new kinds of temporal assertions to make the specifications simpler and easier to understand.
Related Papers (5)