scispace - formally typeset
Open AccessProceedings ArticleDOI

Bounding data races in space and time

Reads0
Chats0
TLDR
This work proposes a new semantics for shared-memory parallel programs that gives strong guarantees even in the presence of data races, and provides a straightforward operational semantics and an equivalent axiomatic model for OCaml.
Abstract
We propose a new semantics for shared-memory parallel programs that gives strong guarantees even in the presence of data races. Our local data race freedom property guarantees that all data-race-free portions of programs exhibit sequential semantics. We provide a straightforward operational semantics and an equivalent axiomatic model, and evaluate an implementation for the OCaml programming language. Our evaluation demonstrates that it is possible to balance a comprehensible memory model with a reasonable (no overhead on x86, ~0.6% on ARM) sequential performance trade-off in a mainstream programming language.

read more

Content maybe subject to copyright    Report

Bounding Data Races in Space and Time
(Extended working version, with appendices)
Stephen Dolan
University of Cambridge, UK
KC Sivaramakrishnan
University of Cambridge, UK
Anil Madhavapeddy
University of Cambridge, UK
Abstract
We propose a new semantics for shared-memory parallel
programs that gives strong guarantees even in the presence
of data races. Our local data race freedom property guar-
antees that all data-race-free portions of programs exhibit
sequential semantics. We provide a straightforward oper-
ational semantics and an equivalent axiomatic model, and
evaluate an implementation for the OCaml programming
language. Our evaluation demonstrates that it is possible to
balance a comprehensible memory model with a reasonable
(no overhead on x86, ~0.6% on ARM) sequential performance
trade-o in a mainstream programming language.
CCS Concepts Computing methodologies Shared
memory algorithms
;
Theory of computation Par-
allel computing models;
Keywords weak memory models, operational semantics
ACM Reference Format:
Stephen Dolan, KC Sivaramakrishnan, and Anil Madhavapeddy.
2018. Bounding Data Races in Space and Time: (Extended work-
ing version, with appendices). In Procee dings of working draft (’18).
ACM, New York, NY, USA, 19 pages. hps://doi.org/10.1145/nnnnnnn.
nnnnnnn
1 Introduction
Modern processors and compilers aggressively optimise pro-
grams. These optimisations accelerate without otherwise af-
fecting sequential programs, but cause surprising behaviours
to be visible in parallel programs. To benet from these opti-
misations, mainstream languages such as C++ and Java have
adopted complicated memory models which specify which
of these relaxed behaviours programs may observe. However,
these models are dicult to program against directly.
The primary reasoning tools provided to programmers by
these models are the data-race-freedom (DRF) theorems. Pro-
grammers are required to mark as atomic all variables used
for synchronisation between threads, and to avoid data races,
which are concurrent accesses (except concurrent reads) to
nonatomic variables. In return, the DRF theorems guaran-
tee that no relaxed behaviour will be observed. Concisely,
data-race-free programs have sequential semantics.
When programs are not data-race-free, such models give
few or no guarantees about behaviour. This ts well with
’18, August 2018, Cambridge, UK
2018. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . .$15.00
hps://doi.org/10.1145/nnnnnnn.nnnnnnn
unsafe languages, where misuse of language constructs gen-
erally leads to undened behaviour. Extending this to data
races, another sort of misuse, is quite natural. On the other
hand, safe languages strive to give well-dened semantics
even to buggy programs. These semantics are expected to
be compositional, so that programs can be understood by
understanding their parts, even if some parts contain bugs.
Giving weak semantics to data races threatens this com-
positionality. In a safe language, when
f() + g()
returns
the wrong answer even when
f()
returns the right one, one
can conclude that
g
has a bug. This property is threatened
by weak semantics for data races, when a correct
g
could be
caused to return the wrong answer by a data race in f.
We propose a new semantics for shared-memory parallel
programs, which gives strong guarantees even in the face of
data races. Our contributions are to:
introduce the local DRF property 2), which allows
compositional reasoning about concurrent programs
even in the presence of data races.
propose a memory model with a straightforward small-
step operational semantics 3), prove that it has the
local DRF property 4), and provide an equivalent
axiomatic model 6).
show that our model supports many common compiler
optimisations and provide sound compilation schemes
to both the x86 and ARMv8 architectures 7), and
demonstrate their eciency in practice in the hybrid
functional-imperative language OCaml 8).
2 Reasoning Beyond Data-Race Freedom
We propose moving from the global DRF property:
Data-race-free programs have
sequential semantics
to the stronger local DRF property:
All data-race-free parts of programs have
sequential semantics
To demonstrate the dierence between global and local
DRF, we present several examples of sequential program
fragments, and multithreaded contexts in which a lack of
local DRF causes unexpected results.
2.1 Bounding Data Races in Space
The rst step towards local DRF is bounding data races in
space, ensuring that a data race on one variable does not
aect accesses to a dierent variable.

’18, August 2018, Cambridge, UK Stephen Dolan, KC Sivaramakrishnan, and Anil Madhavapeddy
Since the C++ memory model gives semantics only to
data-race-free programs, in principle it does not have this
property. Still, it is not obvious how this property could fail to
hold in a reasonable implementation, so we give an example.
Example 1. b = a + 10
Assumption
There are no other accesses to
a
or
b
.
Expected result Afterwards, b = a + 10.
Possible result Afterwards, b , a + 10. (C++)
Explanation
Consider the following multithreaded pro-
gram, where c is a nonatomic global variable:
c = a + 10;
... // s ome co mpu tat ion
b = a + 10;
c = 1;
Suppose that the elided computation is pure (writes no
memory and has no side-eects). The compiler might notice
that
a
is not modied between its two reads, and thus oc-
currences of
a + 10
may be combined, optimising the rst
thread to:
t = a + 10;
c = t;
... // s ome co mpu tat ion
b = t;
Register pressure in the elided computation can cause the
temporary
t
to be spilled. Since its value is already stored
in location
c
, a clever register allocator may choose to re-
materialise
t
from
c
instead of allocating a new stack slot
1
,
giving:
t = a + 10;
c = t;
... // s ome co mpu tat ion
b = c;
c = 1;
However, in the transformed program, the data race be-
tween
c = a + 10
and
c = 1
may cause
c
to contain the wrong
value. From the programmer’s point of view, two reads of
location
a
returned two dierent values, even though there
were no concurrent writes to a! Indeed, the only data race
is the two concurrent writes to
c
, a variable which is never
read.
A data race on one variable aecting the results of reading
another is far from the worst eect that compiler optimisa-
tions can bestow on racy C++ programs. Boehm [
9
] gives
several others, but this one suces to show bounding data
races in space is a nontrivial property, and that reasonable
implementations of C++ do not necessarily possess it.
2.2 Bounding Data Races in Time
In contrast to C++, the Java memory model [
16
] limits the
allowable behaviours even in the presence of data races. In
particular, the value that is returned by reading a variable
1
This is an optimisation not generally implemented, because of the eort
involved in preserving information about the contents of memory all the
way to register allocation, but has been proposed for addition to LLVM [
15
].
must be something written to the same variable, so data
races are indeed bounded in space.
However, data races in Java are not bounded in time: a
data race in the past can cause later accesses to have non-
sequential behaviour, as in the following example. Below,
when we say that two accesses happen concurrently, we mean
that neither happens-before the other using the ordering de-
ned by the memory model. Roughly, this means that the
two accesses occur in dierent threads without any synchro-
nisation between the threads.
Example 2. b = a ; c = a;
Assumption
No accesses to
a
,
b
,
c
happen concur-
rently with the above.
Expected result Afterwards, b = c.
Possible result Afterwards, b , c. (C++, Java)
Explanation
Consider the program below, where
flag
is
an atomic (volatile in Java) boolean, initially false.
a = 1;
flag = tr ue ;
a = 2;
f = flag ;
b = a;
c = a;
Suppose that
f
is
true
afterwards. Then the read and
the write of
flag
synchronise with each other (
flag
being
volatile
), so both of the
writes
to
a
happen-before both of
the reads, although the writes race with each other.
In such circumstances, the assumption above does hold:
there are no accesses to
a
,
b
,
c
that happen concurrently with
the reads of
a
. However, Java permits the outcome
b = 1
,
c
= 2
, allowing the two reads to read from the two dierent
writes. Concretely, this can happen if the compiler optimises
b = a
to
b = 2
without making the same change to
c = a
.
This situation can occur due to aliasing, if only the rst read
from
b = a
is statically known to be the same location as
that written by
a = 2
. A concrete example triggering this
behaviour under both Java 8 and 9 appears in appendix D of
the technical report [11].
So, the eect of data races in Java is not bounded in time,
because the memory model permits reads to return inconsis-
tent values because of a data race that happened in the past.
Surprisingly, non-sequential behaviour can also occur be-
cause of data races in the future, as in the following example.
We assume the prior denition of class C {int x;}.
Example 3. C c = new C (); c .x = 42; a = c. x;
Assumption There are no other accesses to a.
Expected result Afterwards, a = 42.
Possible result Afterwards, a , 42. (C++, Java)
Explanation
Here, we know that there cannot be any data
races in the past on the location
c.x
, since
c
is a newly-
allocated object, to which no other thread could yet have
a reference. So, we might imagine that this fragment will
always set
a
to
42
, regardless of what races are present in
the rest of the program.

Bounding Data Races in Space and Time ’18, August 2018, Cambridge, UK
In fact, it is possible for
a
to get a value other than
42
, be-
cause of subsequent data races. Consider this pair of threads:
C c = new C ();
c. x = 42;
a = c. x;
g = c; g. x = 7;
The read of
c.x
and the write of
g
performed by the rst
thread operate on separate locations, so the Java memory
model permits them to be reordered. This can cause the read
of c.x to return 7, as written by the second thread.
So, providing local DRF requires us to prevent loads being
reordered with later stores, which constrains both compiler
optimisations and compilation to weakly-ordered hardware.
We examine the performance cost of these constraints in
detail in §8, and revisit the topic in §9.1.
2.3 Global and Local DRF
We propose a local DRF property which states that data races
are bounded in space and time: accesses to variables are not
aected by data races on other variables, data races in the
past, or data races in the future. In particular, the following
intuitive property holds:
If a location
a
is read twice by the same thread,
and there are no concurrent writes to
a
, then both
reads return the same value.
We formally state the local DRF theorem for our model
in §4, after introducing the operational semantics in §3. De-
tailed proofs appear in the appendix. Thanks to the local
DRF theorem, we can prove that each of the examples above
has the expected behaviour (see §5).
Using the standard global DRF theorems, we are able to
prove that each of the three examples above have the ex-
pected behaviour, but only under the stronger assumption
that there are no data races on any variables at any time dur-
ing the program’s execution. Local DRF allows us to prove
the same results, but under more general assumptions that
are robust to the presence of data races in other parts of the
program.
3 A Simple Operational Model
Here, we introduce the formal memory model for which we
prove local DRF in §4. Our model is an small-step operational
one, where memory consists of locations
L
, divided into
atomic locations
A
,
B
,
. . .
and nonatomic locations
a
,
b
,
. . .
, in
which may be stored values x, y V.
The program interacts with memory by performing actions
ϕ
on locations. There are two types of action:
write x
, which
writes the value
x
to a location, and
read x
, which reads
a location, resulting in the value
x
. We write
:
ϕ
for the
action ϕ applied to the location .
Memory itself is represented by a store
S
. Under a sequen-
tially consistent semantics, the store simply maps locations
to values. Our semantics is not sequentially consistent, and
the form of stores is more complex, since there is not neces-
sarily a single value that a read of a location must return.
Instead, our stores map nonatomic locations
a
to histories
H
, which are nite maps from timestamps
t
to values
x
.
Following Kang et al. [
13
], we take timestamps to be rational
numbers rather than integers: they are totally ordered but
dense, with a timestamp between any two others. Again
following Kang et al., we equip every thread with a frontier
F
, which is a map from nonatomic locations to timestamps.
Intuitively, each thread’s frontier records, for each nonatomic
location, the latest write known to the thread. More recent
writes may have occurred, but are not guaranteed to be
visible.
Atomic locations, on the other hand, are mapped by the
store to a pair
(F, x)
, containing a single value
x
rather than a
history. Additionally, atomic locations carry a frontier, which
is merged with the frontiers of threads that operate on the
location. In this way, nonatomic writes made by one thread
can become known to another by communicating via an
atomic location.
The core of the semantics is the memory operation relation
C; F
:ϕ
C
; F
which species that when a thread with frontier
F
performs
an action
ϕ
on location
containing contents
C
, then the
new contents of the location will be
C
and the thread’s new
frontier will be F
.
There are four cases, for read and write, atomic and non-
atomic actions, shown in g. 1c. When reading a nonatomic
variable, rule Read-NA species that threads may read an
arbitrary element of the history, as long as it is not older
than the timestamp in the thread’s frontier.
Dually, when writing to a nonatomic location, rule Write-
NA species that the timestamp of the new entry in the
location’s history must be later than that in the thread’s
frontier. Note a subtlety here: the timestamp need not be
later than everything else in the history, merely later than
any other write known to the writing thread.
Atomic operations (rules Read-AT and Write-AT) are
standard sequential operations, except that they also involve
updating frontiers. During atomic writes, the frontiers of
the location and the thread are merged, while during atomic
reads the frontier of the location is merged into that of the
thread, but the location is unmodied. The join operation
F
1
F
2
combines two frontiers
F
1
, F
2
by choosing the later
timestamp for each location.
The program itself consists of expressions
e, e
. Our seman-
tics of memory does not specify the exact form of expressions,
but we assume they are equipped with a small-step transition
relation . A step may or may not involve performing an
action, giving two distinct types of transition:
e
ϵ
e
e
:ϕ
e

’18, August 2018, Cambridge, UK Stephen Dolan, KC Sivaramakrishnan, and Anil Madhavapeddy
Location l L
Atomic locations A, B, . . . L
Nonatomic locations a, b, . . . L
Timestamp t Q
Values x, y V
Thread id i
Expression e, e
Machine M B S, P
Store S B a 7→ H A 7→ (F , x)
History H B t 7→ x
Frontier F B a 7→ t
Program P B i 7→ (F , e)
(a) Syntax and congurations
(Silent)
e
ϵ
e
S, P [i 7→ (F , e)]
S, P [i 7→ (F , e
)]
(Memory)
e
:ϕ
e
S(); F
:ϕ
C
; F
S, P [i 7→ (F , e)]
S[ 7→ C
], P[i 7→ (F
, e
)]
(b) Machine steps
(Read-NA) H; F
a: read H (t )
H; F
if F (a) t, t dom(H)
(Write-NA) H ; F
a: write x
H[t 7→ x]; F [a 7→ t]
if F (a) < t, t < dom(H)
(Read-AT) (F
A
, x); F
A: read x
(F
A
, x); F
A
F
(Write-AT) (F
A
, y); F
A: write x
(F
A
F , x); F
A
F
(c) Memory operations
Figure 1. Operational semantics
where
ϵ
represents silent transitions, those that do not access
memory. The only condition that we do assume of these
transitions is that read transitions are not picky about the
value being read, that is:
Proposition 4.
If
e
:read x
e
, then for every
y
,
e
:read y
e
y
for some e
y
.
We don’t require that
e
y
not get stuck later, just that the
read itself can progress.
The operational semantics is a small-step relation on ma-
chine congurations
M =
S, P
, consisting of a store
S
and
a program
P
, which consists of a nite set of threads, repre-
sented as a nite map from thread identier
i
to a pair of a
frontier F and an expression e.
The two types of transitions in this small-step relation
are shown in g. 1b, and correspond to the two types of
transitions for expressions. If a thread
i
with state
(F, e)
can
take a silent step by
e
ϵ
e
, then its new state is
(F, e
)
(rule Silent). Otherwise, if it can step by
e
:ϕ
e
, then
the memory operation relation determines the thread’s new
frontier and the new contents of (rule Memory).
3.1 Initial States
For simplicity, we assume that all locations are initially set to
some arbitrary value
v
0
V
. The initial state of a program
whose threads are the expressions
e
i
(for
i
drawn from some
nite set of thread indices I) is the machine conguration:
M
0
=
(a 7→ (0 7→ v
0
), A 7→ (F
0
,v
0
) for a, A L),
(i 7→ F
0
, e
i
for i I)
The initial frontier
F
0
maps all locations
a
to the timestamp
0. In other words, we assume an initial write of
v
0
to every
location, with timestamp 0 (for nonatomic locations), and
we assume that these initial writes are part of every thread’s
frontier at startup.
3.2 Traces
We write a machine step
T
from a machine state
M
to a
machine state M
as M
T
M
.
Denition 5 (Trace). A trace
Σ = M
0
T
1
M
1
T
2
. . .
T
n
M
n
is a nite sequence of machine transitions starting from the
initial state.
We do not have any requirement that traces lead to nal
states. Every prex of a trace is a trace.
4 Formalising Local DRF
Next, we state and prove the local DRF theorem for the model,
which states that all data-race-free parts of programs have
sequential behaviour. More specically, we show that if there
are no ongoing data races on some set
L
of locations, then
accesses to
L
will have sequential behaviour, at least until a
data race on L occurs.
We need several intermediate denitions before we can
prove local DRF. First, to specify what “sequential behaviour”
means, we introduce weak transitions, and second, to specify
what “no ongoing data races” means we introduce
L
-stability.

Bounding Data Races in Space and Time ’18, August 2018, Cambridge, UK
4.1 Weak Transitions
Our is close to a sequential model of memory, only diering
during transitions Read-NA and Write-NA. We make this
precise by dening weak transitions:
Denition 6
(Weak transition)
.
A weak transition is a ma-
chine step performing a memory operation of one of the fol-
lowing forms:
H
;
F
a:read x
H
;
F
when
H(t) , x
for the largest times-
tamp
t dom(H)
. Informally, this read does not witness
the latest write in that lo cation.
H
;
F
a:write x
H[t 7→ x]
;
F
when
t
is not greater than
the largest timestamp
t
H
. Informally, this write is
not the latest write to that location.
Memory operations which are not weak are either opera-
tions on atomic values, or operations on nonatomic values
which access the element of history with the largest times-
tamp. So, a sequence of machine steps involving no weak
transitions is sequentially consistent: one may ignore all
frontiers and discard all elements of histories but the last,
and recover a simple sequential semantics. We take this as
our denition of sequential consistency:
Denition 7
(Sequentially consistent traces)
.
A trace is se-
quentially consistent if it includes no weak transitions.
4.2 Data Races and Happens-Before
Intuitively, a data race occurs whenever a nonatomic location
is used by multiple threads without proper synchronisation.
To dene what “proper synchronisation” means, we intro-
duce the happens-before relation.
Denition 8 (Happens-before). Given a trace
M
0
T
1
M
1
T
2
. . .
T
n
M
n
the happens-before relation is the smallest transitive relation
which relates T
i
,T
j
, i < j if
T
i
and T
j
occur on the same thread
T
i
is a write and
T
j
is a read or write, to the same atomic
location.
Denition 9
(Conicting transitions)
.
In a given trace, two
transitions
T
i
and
T
j
are conicting if they access the same
nonatomic location and at least one is a write.
Denition 10 (Data race). Given a trace
M
0
T
1
M
1
T
2
. . .
T
n
M
n
we say that it there is a data race between two conicting
transitions
T
i
and
T
j
if
i < j
and
T
i
does not happen-before
T
j
.
4.3 L-stability
Denition 11
(
L
-sequential transitions)
.
Given a set
L
of lo-
cations, a transition is
L
-sequential if it is not a weak transition,
or if it is a weak transition on a location not in L.
If we take
L
to be the set of all nonatomic locations, then
L
-
sequential transitions are exactly the sequentially consistent
transitions.
Denition 12
(
L
-stable)
.
A machine
M
is
L
-stable if, for all
traces that include M:
M
0
T
1
M
1
T
2
. . .
T
n
M
T
1
M
1
T
2
. . .
T
n
M
n
in which the transitions
T
i
are
L
-sequential, then there is no
data race between T
i
and T
j
, for any i, j.
Intuitively,
M
is
L
-stable if there are no data races on
locations in
L
in progress when the program reaches state
M
.
There may be data races before reaching
M
(as in example 2),
there may be data races after reaching
M
(as in example 3),
but there are no data races between one operation before
M
and one operation afterwards.
4.4 The Local DRF Theorem
Theorem 13
(Local DRF)
.
Given an
L
-stable machine state
M
(not necessarily the initial state), and a sequence of L-sequential
machine transitions:
M
T
1
M
1
T
2
. . .
T
n
M
n
then either:
all possible transitions M
n
T
M
are L-sequential, or
there is a non-weak transition
M
n
T
M
, accessing a
location in L, with a data race between some T
i
and T
5 Reasoning with Local DRF
Here, we give several examples of reasoning with the local
DRF theorem. First, we use it to prove the standard global
DRF result, justifying our claim to be more general. Second,
we use it to show that the examples of §2 have the expected
semantics, and do not exhibit the odd behaviours of the C++
and Java models.
The standard global DRF theorem is that data-race-free
programs have sequential semantics, which we formalise as
follows:
Theorem 14
(DRF)
.
Suppose that a given program is data-
race-free. That is, suppose that all sequentially consistent traces
starting from the initial machine state contain no data races.
Then all traces starting from the initial state are sequentially
consistent traces.
Proof.
Suppose we have a non-sequentially-consistent trace,
whose rst weak transition is T
:
M
0
T
1
M
1
T
2
. . .
T
n
M
n
T
M
Applying local DRF with
L
as the set of all locations, either:
T
is L-sequential, contradicting that T
is weak; or
There is a sequential transition racing with some
T
i
,
contradicting that the program is data-race-free.

Citations
More filters
Journal Article

On Thin Air Reads: Towards an Event Structures Model of Relaxed Memory.

TL;DR: This is the first paper to propose a pure event structures model of relaxed memory over an alphabet with a justification relation as a model, based on a game-like model, which strikes a middle ground on well-justification.
Journal ArticleDOI

Bridging the gap between programming languages and hardware weak memory models

TL;DR: In this article, a new intermediate weak memory model, IMM, is developed as a way of modularizing the proofs of correctness of compilation from concurrent programming languages with weak memory consistency semantics to mainstream multi-core architectures such as POWER and ARM.
Journal ArticleDOI

Pomsets with preconditions: a simple model of relaxed memory

TL;DR: The resulting model supports compositional reasoning for temporal safety properties, supports all expected sequential compiler optimizations, satisfies the DRF-SC criterion, and compiles to X64 and ARMv8 microprocessors without requiring extra fences on relaxed accesses.
Proceedings ArticleDOI

Owicki-Gries Reasoning for C11 RAR

TL;DR: A new proof calculus for the C11 RAR memory model (a fragment of C11 with both relaxed and release-acquire accesses) that allows all Owicki-Gries proof rules for compound statements, including non-interference, to remain unchanged is developed.
Proceedings ArticleDOI

SmartTrack: efficient predictive race detection

TL;DR: SmartTrack is presented, an algorithm that optimizes predictive race detection analyses, including two analyses from prior work and a new analysis introduced in this paper.
References
More filters
Proceedings ArticleDOI

The Java memory model

TL;DR: The Java 5.0 memory model as discussed by the authors provides a simple interface for correctly synchronized programs and guarantees sequential consistency to data-race-free programs by requiring that the behavior of incorrectly synchronized programs be bounded by a well defined notion of causality.
Journal ArticleDOI

x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors

TL;DR: A new x86-TSO programmer's model is presented that is mathematically precise but can be presented as an intuitive abstract machine which should be widely accessible to working programmers and put x86 multiprocessor system building on a more solid foundation.
Journal ArticleDOI

Understanding POWER multiprocessors

TL;DR: An abstract-machine semantics that abstracts from most of the implementation detail but explains the behaviour of a range of subtle examples of IBM POWER multiprocessors is given, which should bring new clarity to concurrent systems programming for these architectures.
Proceedings ArticleDOI

Fixing the Java memory model

TL;DR: This paper reviews issues and suggests replacement memory models for Java and concludes that programming idioms used by some programmers and used within Sun’s Java Development Kit is not guaranteed to be valid according to the existing Java memory model.
Book ChapterDOI

Fences in weak memory models

TL;DR: A class of relaxed memory models, defined in Coq, parameterised by the chosen permitted local reorderings of reads and writes, and the visibility of inter- and intra-processor communications through memory is presented.
Frequently Asked Questions (12)
Q1. What are the contributions in "Bounding data races in space and time" ?

The authors propose a new semantics for shared-memory parallel programs that gives strong guarantees even in the presence of data races. The authors provide a straightforward operational semantics and an equivalent axiomatic model, and evaluate an implementation for the OCaml programming language. 

In future work, the authors plan to extend their currently spartan model with other types of atomics. Two routes to this suggest themselves: by extending their operational model with release-acquire primitives in the style of Kang et al. [ 13 ], or by extending the SRA model of Lahav et al. [ 14 ] with load-buffering-free nonatomics. 

The authors enforce SRA by compiling all mutable loads as load acquire ([ldar] on AArch64 and [r <- ld; cmpi r, 0; beq L; L: isync] on PowerPC) and assignments are store release ([stlr] on AArch64 and [lwsync; st] on PowerPC). 

In particular, releaseacquire atomics would be a useful extension: they are strong enough to describe many parallel programming idioms, yet weak enough to be relatively cheaply implementable. 

The only condition that the authors do assume of these transitions is that read transitions are not picky about the value being read, that is:Proposition 4. 

The semantics of Kang et al. [13] accounts for a large fragment of the C++ memory model, including release-acquire, relaxed and nonatomic accesses, while introducing a novel “promise” mechanism to give an operational interpretation to load-buffering behaviours. 

the effect of data races in Java is not bounded in time, because the memory model permits reads to return inconsistent values because of a data race that happened in the past. 

Using the standard global DRF theorems, the authors are able to prove that each of the three examples above have the expected behaviour, but only under the stronger assumption that there are no data races on any variables at any time during the program’s execution. 

In a safe language, when f() + g() returns the wrong answer even when f() returns the right one, one can conclude that g has a bug. 

For two such events E1,E2 ∈ |Σ|, the authors write E1 <Σ E2 if the transition T (E1) occurs before the transition T (E2) in the trace Σ. From any Σ, the authors construct a candidate execution (|Σ|, poΣ, rfΣ, coΣ) as follows:• poΣ is the largest subset of <Σ relating only events on the same thread. 

Two simple ones are to insert a branch after loads, or to insert a dmb ld barrier before stores, shown in tables 2a and 2b respectively. 

New objects in Multicore OCaml are allocated in a threadlocal minor heap with large objects allocated directly in the major heap.