scispace - formally typeset
Open AccessJournal ArticleDOI

Relativistic red-black trees

Reads0
Chats0
TLDR
Algorithms for concurrently reading and modifying a red‐black tree (RBTree) are presented, which have deterministic response times for a given tree size and uncontended read performance that is at least 60% faster than other known approaches.
Abstract
This paper presents algorithms for concurrently reading and modifying a red-black tree RBTree. The algorithms allow wait-free, linearly scalable lookups in the presence of concurrent inserts and deletes. They have deterministic response times for a given tree size and uncontended read performance that is at least 60% faster than other known approaches. The techniques used to derive these algorithms arise from a concurrent programming methodology called relativistic programming. Relativistic programming introduces write-side delay primitives that allow the writer to pay most of the cost of synchronization between readers and writers. Only minimal synchronization overhead is placed on readers. Relativistic programming avoids unnecessarily strict ordering of read and write operations while still providing the capability to enforce linearizability. This paper shows how relativistic programming can be used to build a concurrent RBTree with synchronization-free readers and both lock-based and transactional memory-based writers. Copyright © 2013 John Wiley & Sons, Ltd.

read more

Content maybe subject to copyright    Report

Portland State University Portland State University
PDXScholar PDXScholar
Computer Science Faculty Publications and
Presentations
Computer Science
1-2011
Relativistic Red-Black Trees Relativistic Red-Black Trees
Philip William Howard
Portland State University
Jonathan Walpole
Portland State University
Follow this and additional works at: https://pdxscholar.library.pdx.edu/compsci_fac
Part of the Computer and Systems Architecture Commons, and the Databases and Information
Systems Commons
Let us know how access to this document bene5ts you.
Citation Details Citation Details
Howard, Philip W., and Jonathan Walpole. Relativistic red-black trees. Technical Report 10-06, Portland
State University, Computer Science Department, 2010.
This Technical Report is brought to you for free and open access. It has been accepted for inclusion in Computer
Science Faculty Publications and Presentations by an authorized administrator of PDXScholar. Please contact us if
we can make this document more accessible: pdxscholar@pdx.edu.

Relativistic Red-Black Trees
Philip W. Howard
Portland State University
pwh@cecs.pdx.edu
Jonathan Walpole
Portland State University
walpole@cs.pdx.edu
Abstract
Operating system performance and scalability on shared-
memory many-core systems depends critically on efficient
access to shared data structures. Scalability has proven dif-
ficult to achieve for many data structures. In this paper we
present a novel and highly scalable concurrent red-black
tree.
Red-black trees are widely used in operating systems,
but typically exhibit poor scalability. Our red-black tree has
linear read scalability, uncontended read performance that
is at least 25% faster than other known approaches, and
deterministic lookup times for a given tree size, making it
suitable for realtime applications.
Keywords synchronization, data structures, scalability, con-
current programming, red-black trees
1. Introduction
The advent of many-core hardware introduces the need
for highly scalable operating system designs. Many-core
hardware poses a special challenge for symmetric shared-
memory operating system architectures because it dramat-
ically increases the degree of concurrency and at the same
time decreases the locality of accesses to kernel data. The
conventional strategy of using mutual exclusion severely
limits scalability by serializing accesses to shared data and
requiring extensive inter-core communication.
Some researchers have chosen to address this chal-
lenge by throwing out symmetric shared-memory multi-
processor operating system architecture, focusing instead
on OS architectures that are non-symmetric in their use of
shared-memory p1855745, or forgo shared-memory alto-
gether [Baumann 2009]. Our research takes a different ap-
proach. We continue to assume a shared-memory operating
system architecture can scale [Boyd-Wickizer 2010] and
[Copyright notice will appear here once ’preprint’ option is removed.]
work toward this by weakening the ordering constraints that
normally govern concurrent accesses to shared data struc-
tures. In this paper we illustrate our approach by considering
a particular data structure, namely a red-black tree.
Red-black trees are used to store sorted hkey,valuei pairs,
and are widely used in operating systems. They are used
in the Linux kernel for I/O schedulers, the process sched-
uler, the ext3 file system, and in many other places [Land-
ley 2007]. Linux kernel primitives that manipulate red-black
trees do not have concurrency control embedded in them. In-
stead, higher level uses of the primitives must manage con-
currency. This is typically done through mutual exclusion
which does not scale [Piggin 2010].
Our approach is similar to RCU-based approaches that
have been applied to simpler Linux data structures such as
lists and hash tables [Triplett 2010]. We refer to these al-
gorithms as “relativistic” because they weaken the ordering
requirements on concurrent reads and updates such that each
reader observes the data structure in its own temporal frame
of reference. Our research is attempting to generalize the
concept of relativistic programming. While this goal has yet
to be reached, the development of a relativistic data struc-
ture as complex as a red-black tree represents a significant
milestone.
Our relativistic red-black tree has the following perfor-
mance and scalability properties:
1. Linear scalability of read accesses even in the presence
of concurrent updates. This property has been tested out
to 64 hardware threads.
2. Updaters can proceed concurrently with any number of
readers, but not other updaters.
3. Safe, fast, wait-free read access in the presence of up-
dates.
By fast we mean performance approaches that of un-
synchronized access
1
. Our read access achieves 93% of the
throughput of unsynchronized read access over a wide range
of tree sizes and thread counts. Our implementation is also
25% faster than the best lock based implentation for an un-
1
Unsynchronized access is safe for single threaded or read-only implemen-
tations, but not for multi-threaded implementations that include updates.
1 2010/12/8

contended read. As contention increases, the advantage of
our implementation grows significantly.
By wait-free we mean that the read path does not use
locks, does not block, and never needs to wait for another
thread (neither a reader nor an updater). Furthermore, the
read path does not require any atomic instructions and on
x86 does not require memory barriers.
The rest of this paper is outlined as follows: Section 2
gives an overview of red-black trees, the operations they sup-
port, and the mechanisms that are used to preserve the bal-
anced nature of red-black trees. This section also discusses
the state of the art for parallelizing red-black trees. Section 3
discusses the ordering constraints that are, and are not, pre-
served by relativistic programming. This section also pro-
vides a justification for why these weakened ordering con-
straints are appropriate for concurrent red-black trees. Sec-
tion 4 presents our implementation for a relativistic red-
black tree. Section 5 shows the performance of our imple-
mentation compared with red-black trees implemented us-
ing other synchronization mechanisms. Section 6 discusses
some of the issues and trade-offs involved in performing
complete tree traversals (as opposed to single look-ups). Fi-
nally, Section 7 presents concluding remarks.
2. Red-Black Trees
Since red-black trees are well known and well documented
[Guibas 1978, Plauger 1999, Schneier 1992], we do not give
a complete explanation of them. Rather, we give a brief
overview to facilitate a discussion of our relativistic imple-
mentation. In particular, we discuss the individual steps that
make up red-black tree algorithms without discussing the
glue that combines these steps. This is because the glue is
not impacted by the relativistic implementation.
Red-black trees are partially balanced, sorted, binary
trees. The trees store hkey,valuei pairs. They support the
following operations:
insert(key, value) inserts a new hkey,valuei pair into the
tree.
lookup(key) returns the value associated with a key.
delete(key) removes a hkey,valuei pair from the tree.
first()/last() returns the first (lowest keyed) / last (highest
keyed) value in the tree.
next()/prev() returns the next/previous value in key-sorted
order from the tree.
Red-black trees are sorted by preserving the following
properties:
1. All nodes on the left branch of a subtree have a key the
key of the root of the subtree.
2. All nodes on the right branch of a subtree have a key >
the key of the root of the subtree.
The tree is balanced by assigning a color to each node
(red or black) and preserving the following properties:
1. Both children of a red node are black.
2. The black depth of every leaf is the same. The black depth
is the number of black nodes encountered on the path
from the root to the leaf.
These invariants are sufficient to guarantee O (log(N ))
lookups because the longest possible path (alternating black
and red nodes) is at most twice the shortest possible path
(all black nodes). The operations required to rebalance a tree
following an insert or delete are limited to the path from the
inserted/deleted node back to the root. The rebalancing is,
worst case, O(log(N )) meaning that inserts and deletes can
also be done in O(log(N )).
We will use the following definitions in the explanation
of the tree operations:
internal node A node with two non-empty children.
leaf A node with at least one empty child.
Observe that if next() is called on any internal node, the
result is always a leaf. This is true because next() is the left-
most node of the right subtree.
The following steps are used to implement red-black
trees:
Insertion New nodes are always inserted at the bottom of
the tree. This is possible because if prev(new-node) is an
internal node, then from the observation above, the new
node must be a leaf. If prev(new-node) is a leaf, the new
node will be a child of that node on an empty branch. The
insert may leave the tree unbalanced. If so, restructures or
recolors (see below) are required to restore the balance
properties of the tree.
Delete Nodes are always deleted from the bottom of the tree
(possibly following a swap—see below). The delete may
leave the tree unbalanced. If so, restructures or recolors
(see below) are required to restore the balance properties
of the tree.
Swap If an interior node needs to be deleted, it is first
swapped with next(deleted-node) prior to removal. This
makes the node to be deleted a leaf.
Restructures Restructure operations, sometimes called a
rotations, are used to rebalance the tree. Restructures
always involve three adjacent nodes: child, parent, and
grandparent. See Figure 1 for an illustration of the two
types of restructure operations.
Recolor Nodes get recolored as part of the rebalancing pro-
cess. Recoloring doesn’t involve changing the structure
of the tree, only the colors applied to particular nodes.
2 2010/12/8

A
1 2
3
B
C
4
A
1 2 3
B
4
C
C
B
A
1
2 3
4
C
B
2 3 4
A
1
DiagRestructure
ZigRestructure
Figure 1. Restructure operations used to rebalance a red-
black tree. There are left and right versions of these, but they
are symmetric so only the left version is shown here.
2.1 Concurrent red-black trees
In thinking about concurrent red-black trees, it is useful to
make a distinction between events and operations. In our
terminology, operations are composed of events. Events are
steps in an operation which have a visible effect. Events can
be thought of as instantaneous. Since operations are typically
composed of multiple events, they have a duration.
We will use the following definitions to describe concur-
rent implementations:
S
n
the Start of operation n.
F
n
the Finish of operation n.
E
n
the Effect of operation n. For example, if operation n is
an insert, the effect of that operation is that the tree has a
new node.
a b defines a happens-before relation such that a happens
before b.
If either of the following two relations holds, operations a
and b are said to be concurrent:
S
a
S
b
F
a
S
b
S
a
F
b
Graphically, this means that the time-lines of the two oper-
ations overlap. There is no implied happens-before relation
between the effects of two concurrent operations. The effects
of the two operations could occur in any order.
Implementations of objects that allow updates to happen
concurrently with reads, require additional properties so that
every intermediate representation of the data structure can be
mapped to a value of the abstract object [Herlihy 1990]. For
a sorted tree, the following properties must be maintained:
1. Lookups will always find a node that exists in the tree.
2. Traversals will always return nodes in the correct order
without skipping any nodes that exist in the tree.
Because reads have a duration, and because updates can
proceed concurrent with reads, it’s possible that the tree will
change during a read. As a result, we need to be specific in
what we mean by “nodes that exist in the tree”. In particular,
it means the following: if operation r is a read looking for
node N (or a complete traversal of the tree), operation i is
the insert of node N , and operation d is the delete of node
N , then
1. If F
i
S
r
and F
r
S
d
then N exists in the tree and
must be observed by r in the correct traversal order.
2. If F
r
B
i
or if F
d
S
r
then N does not exist in the
tree and must not be observed by r.
3. if i is concurrent with r then N may or may not be
observed by r depending on whether the relative view
of r is E
i
E
r
or E
r
E
i
.
4. if d is concurrent with r then N may or may not be
observed by r depending on whether the relative view
of r is E
r
E
d
or E
d
E
r
.
Another way to state these properties is as follows: Prop-
erties one and two state that any update that strictly precedes
a read must be observable by the read, and any update that
strictly follows a read must not be observable by the read.
Properties three and four state that any update that is con-
current with the read may or may not be observable by the
read.
2.2 The State of the Art
The most common way to synchronize access to a red-black
tree is through locking. Unfortunately, this approach doesn’t
scale because accesses are serialized. Since accesses can
be easily divided into reads (lookups) and writes (inserts,
deletes), a reader-writer lock can be used which allows read
parallelism. This approach scales for some number of read
threads, but eventually the contention for the lock dominates
and the approach no longer scales (see the performance data
in Section 5.1 for evidence of this).
Fine grained locking of red-black trees is problematic.
Since updates may affect all the nodes from where the update
occurred back to the root, the simplest approach of acquiring
a write lock on all nodes that might change degrades to
coarse grain locking—all updaters must acquire a write lock
on the root. If one attempts to only acquire write locks on the
nodes that will actually be changed, it is difficult to avoid
deadlock. If the locks are acquired from the bottom up, a
reader progressing down the tree, but above the updater, may
3 2010/12/8

acquire a lock that prevents the write from completing. If
the locks are acquired from the top down, another updater
may change the structure of the tree between the time the
initial change was made (e.g. an insert) and the time when
the necessary locks are acquired to perform a restructure.
Transactional Memory approaches provide a more au-
tomatic approach to disjoint concurrency. However, as the
changes required to rebalance a tree progress up the tree,
more and more concurrent read transactions would get in-
validated. We haven’t done any investigation to determine
what percentage of concurrent transactions might get inval-
idated, so we can not predict the performance impact of the
invalidations.
Bronson [2010] developed a concurrent AVL tree
2
. Their
approach allows readers to proceed without locks, but the
readers have to check each step of the way to see if the
tree has changed or is in the process of changing. If so,
the reader has to wait and retry. Since readers don’t acquire
locks, this simplifies the fine grained locking of the writers.
Their approach is quite complicated and this degrades read
performance as more code must execute at each node of
the tree. Their approach allows concurrent updates and their
performance data show good scalability. We are working to
port their implementation from Java to C to perform a fair
side-by-side comparison, but we have not yet completed this
work. Work done to date indicates that our read approach is
much faster.
A number of researchers have attempted to decouple
rebalancing from insert and delete [Guibas 1978, Hanke
1998]. This allows updates to proceed more quickly because
individual inserts and deletes don’t have to rebalance the
tree. The rebalancing work can potentially be done in paral-
lel and some redundant work can be skipped. None of this
improves read access time, and readers and writers still need
some synchronization between them.
3. Relativistic Programming
The name for relativistic programming is borrowed from
Einstein’s theory of relativity in which each observer is al-
lowed to have their own frame of reference. In relativis-
tic programming, each reader is allowed to have their own
frame of reference with respect to the order of updates.
Relativistic programming is characterized by the follow-
ing two properties:
1. Writes can occur concurrently with reads.
2. Writes are not totally ordered with respect to reads.
Consider the time-line in figure 2. If operations A and B
are writes and operation C is a read, then C can observe the
writes in either order. In particular, since A is concurrent
with C, both E
A
E
C
and E
C
E
A
are equally
valid. The same is true of B and C. Combining all three
2
AVL trees are similar to red-black trees, but they have a different balance
property.
operations, the ordering E
B
E
C
E
A
is valid even
though this violates the happens-before relation between the
non-concurrent A and B.
A B
C
Figure 2. Operation C can see operations A and B in any
order.
It is important to note that the order mentioned above rep-
resents the reference frame of a particular reader. There is
no “global observer” which determines the “correct” order.
Each reader has their own relative view of concurrent oper-
ations which may differ from the view of other concurrent
readers.
3.1 Is this OK?
While it might be disconcerting to have writes appear to
happen in different orders, there are two conditions which, if
met, make this acceptable. The conditions are as follows:
1. The underlying data structure does not have an inherent
time order
2. The updates are independent or commutative
Last In First Out Stacks and First In First Out Queues have
an inherent time order (thus the First and Last in their
names). As a result, these data structures are not good fits
for relativistic implementations
3
. However many other data
structures (lists, dictionaries, trees, etc.) have no such inher-
ent time order and thus allow a relativistic implementation.
To illustrate what is meant by independent or commutable
updates, consider a phone company that uses a tree to main-
tain phone book information. If two customers call the phone
company to change their service, the two calls can be han-
dled in either order. Neither call affects the other so they are
independent. They are also commutable because the results
are the same regardless of the order. Any query that saw nei-
ther, either, or both updates is equally valid. Even printing a
phone book that included neither, either, or both updates is
equally valid. If either of these customers called to complain
about their inclusion or omission from the phone book, the
phone company could legitimately reply that the book was
printed either just before or just after their information was
entered into the system.
3.2 Memory Management
Relativistic programming requires a mechanism to reclaim
memory that has been freed by one thread while still in
use by another. Freed memory comes from two sources:
nodes that are removed from the tree and the “old” copy of
3
Some researchers have proposed weak ordering for LIFO’s and FIFO’s.
This would yield a Later In Earlier Out or Earlier In Earlier Out structure
that would be suitable for relativistic techniques.
4 2010/12/8

Citations
More filters
Proceedings ArticleDOI

Scalable address spaces using RCU balanced trees

TL;DR: A new design for increasing the concurrency of kernel operations on a shared address space is contributed by exploiting read-copy-update (RCU) so that soft page faults can both run in parallel with operations that mutate the same address space and avoid contending with other page faults on shared cache lines.
Proceedings ArticleDOI

RadixVM: scalable address spaces for multithreaded applications

TL;DR: Experiments on an 80 core machine show that RadixVM achieves perfect scalability for non-overlapping regions: if several threads mmap or munmap pages in parallel, they can run completely independently and induce no cache coherence traffic.
Proceedings Article

Resizable, scalable, concurrent hash tables via relativistic programming

TL;DR: Relativistic hash tables demonstrate the promise of a new concurrent programming methodology known as relativistic programming and make novel use of existing RCU synchronization primitives, namely the wait-for-readers operation that waits for unfinished readers to complete.
Proceedings ArticleDOI

Concurrent updates with RCU: search tree as an example

TL;DR: This paper presents citrus, a concurrent binary search tree with a wait-free Contains operation, using RCU synchronization and fine-grained locking for synchronization among updaters, the first RCU-based data structure that allows concurrent updaters.
Proceedings ArticleDOI

ffwd: delegation is (much) faster than you think

TL;DR: The question of delegation vs. synchronized access to shared memory is revisited, and it is shown through analysis and demonstration that delegation can be much faster than locking under a range of common circumstances.
References
More filters
Journal ArticleDOI

Linearizability: a correctness condition for concurrent objects

TL;DR: This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.
Journal ArticleDOI

Skip lists: a probabilistic alternative to balanced trees

TL;DR: Skip lists as mentioned in this paper are data structures that use probabilistic balancing rather than strictly enforced balancing, and the algorithms for insertion and deletion in skip lists are much simpler and significantly faster than equivalent algorithms for balanced trees.
Proceedings ArticleDOI

The multikernel: a new OS architecture for scalable multicore systems

TL;DR: This work investigates a new OS structure, the multikernel, that treats the machine as a network of independent cores, assumes no inter-core sharing at the lowest level, and moves traditional OS functionality to a distributed system of processes that communicate via message-passing.
Book ChapterDOI

Skip Lists: A Probabilistic Alternative to Balanced Trees

TL;DR: This paper describes and analyzes skip lists and presents new techniques for analyzing probabilistic algorithms.
Proceedings ArticleDOI

A dichromatic framework for balanced trees

TL;DR: This paper shows how to imbed in this framework the best known balanced tree techniques and then use the framework to develop new algorithms which perform the update and rebalancing in one pass, on the way down towards a leaf.
Related Papers (5)
Frequently Asked Questions (9)
Q1. What are the contributions in "Relativistic red-black trees" ?

In this paper the authors present a novel and highly scalable concurrent red-black tree. Their red-black tree has linear read scalability, uncontended read performance that is at least 25 % faster than other known approaches, and deterministic lookup times for a given tree size, making it suitable for realtime applications. 

Since accesses can be easily divided into reads (lookups) and writes (inserts, deletes), a reader-writer lock can be used which allows read parallelism. 

Because reads have a duration, and because updates can proceed concurrent with reads, it’s possible that the tree will change during a read. 

Since updates may affect all the nodes from where the update occurred back to the root, the simplest approach of acquiring a write lock on all nodes that might change degrades to coarse grain locking—all updaters must acquire a write lock on the root. 

The authors are also working to solve the multi-update problem so that relativistic implementation can have concurrent updaters as well as concurrent readers with a single updater. 

The authors refer to these algorithms as “relativistic” because they weaken the ordering requirements on concurrent reads and updates such that each reader observes the data structure in its own temporal frame of reference. 

This is because all the synchronization mechanisms except lock allow read concurrency and because the time is dominated by the traversal, not by the synchronization. 

The consequences are that a traversal will take O(N log(N)) time, and the tree that is traversed may not represent any state present in a globally ordered time. 

To allow this type of traversal using the relativistic read and update algorithms described earlier, the mutex used for the write lock is replaced with a reader-writer lock.