scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Eraser: a dynamic data race detector for multithreaded programs

TL;DR: A new tool, called Eraser, is described, for dynamically detecting data races in lock-based multithreaded programs, which uses binary rewriting techniques to monitor every shared-monory reference and verify that consistent locking behavior is observed.
Abstract: Multithreaded programming is difficult and error prone. It is easy to make a mistake in synchronization that produces a data race, yet it can be extremely hard to locate this mistake during debugging. This article describes a new tool, called Eraser, for dynamically detecting data races in lock-based multithreaded programs. Eraser uses binary rewriting techniques to monitor every shared-monory reference and verify that consistent locking behavior is observed. We present several case studies, including undergraduate coursework and a multithreaded Web search engine, that demonstrate the effectiveness of this approach.

Summary (4 min read)

1 Introduction

  • Multi-threading has become a common programming technique.
  • For this reason, many programmers have resisted using threads.
  • Called Eraser, that dynamically detects data races in multi-threaded programs.the authors.
  • A locking discipline is a programming policy that ensures the absence of data races.
  • Usually a potential data race is a serious error caused by failure to synchronize properly.

2.1 Improving the locking discipline

  • The simple locking discipline the authors have used so far is too strict.
  • There are three very common programming practices that violate the discipline yet are free from any data races: Initialization.
  • Shared variables are frequently initialized without holding a lock.
  • These can be safely accessed without locks.
  • Read-write locks allow multiple readers to access a shared variable, but allow only a single writer to do so.

2.2 Initialization and read-sharing

  • Programmers often take advantage of this observation when initializing newly allocated data.
  • Unfortunately, the authors have no easy way of knowing when initialization is complete.
  • When and if another thread accesses the variable, then the state changes.
  • A write access from a new thread changes the state from Exclusive or Shared to the Shared-Modied state, in which is updated and races are reported, just as described in the original, simple version of the algorithm.
  • The authors support for initialization makes Eraser’s checking more dependent on the scheduler than the authors would like.

2.3 Read-write locks

  • Many programs use single-writer, multiple-reader locks as well as simple locks.
  • The authors continue to use the state transitions of Figure 4, but when the variable enters the Shared-Modied state, the checking is slightly different:.
  • That is, locks held purely in read mode are removed from the candidate set when a write occurs, as such locks held by a writer do not protect against a data race between the writer and some other reader thread.

3 Implementing Eraser

  • Eraser is implemented for the DIGITAL Unix operating system on the Alpha processor, using the ATOM [Srivastava & Eustace 94] binary modification system.
  • To maintain , Eraser instruments each load and store in the program.
  • Eraser does not instrument loads and stores whose address mode is indirect off the stack pointer, since these are assumed to be stack references, and shared variables are assumed to be in global locations or in the heap.
  • The report also includes the thread ID, memory address, type of memory access, and important register values such as the program counter and stack pointer.
  • The authors have found that this information is usually sufficient for locating the source of the race.

3.1 Representing the candidate lock sets

  • A naı̈ve implementation of lock sets would store a list of candidate locks for each memory location, potentially consuming many times the allocated memory of the program.
  • The authors can avoid this expense by exploiting the fortunate fact that the number of distinct sets of locks observed in practice is quite small.
  • The entries in the table are never deallocated or modified, so each lockset index remains valid for the lifetime of the program.
  • Eraser also caches the result of each intersection, so that the fast case for set intersection is simply a table lookup.
  • All the standard memory allocation routines are instrumented to allocate and initialize a shadow word for each word allocated by the program.

3.2 Performance

  • Performance was not a major goal in their implementation of Eraser; consequently it has many opportunities for optimization.
  • The authors estimate that half of the slowdown is due to the overhead incurred by making a procedure call at every load and store instruction; which could be eliminated by using a version of ATOM that can inline monitoring code [Scales et al. 96].
  • Also, there are many opportunities for using static analysis to reduce the overhead of the monitoring code; but the authors have not explored them.
  • In spite of their limited performance tuning, the authors have found that Eraser is fast enough to debug most programs, and therefore meets the most essential performance criteria.

3.3 Program annotations

  • As expected, their experience with Eraser showed that it can produce false alarms.
  • Part of their research was aimed at finding effective annotations to suppress false alarms without accidentally losing useful warnings.
  • Many programs implement free lists or private allocators, and Eraser has no way of knowing that a privately recycled piece of memory is protected by a new set of locks.
  • True data races were found that did not affect the correctness of the program.
  • Some of these were intentional and others were accidental.

3.4 Race detection in an OS kernel

  • The authors have begun to modify Eraser to detect races in the SPIN operating system [Bershad et al. 95].
  • While the authors do not yet have results in terms of data races found, they have acquired some useful experience about implementing such a tool at the kernel level, which is different from the user level in several ways.
  • In most systems, raising the interrupt level to n ensures that only interrupts of priority greater than nwill be serviced until the interrupt level is lowered.
  • When the kernel sets the interrupt level to n, Eraser treats this operation as if the first n interrupt locks had all been acquired.
  • The most common example is the use of semaphores to synchronize execution between a thread and an I/O device driver.

4 Experience

  • The authors calibrated Eraser on a number of simple programs that contained common synchronization errors (e.g. forgot to lock, used the wrong lock, etc.) and versions of those programs with the errors corrected.
  • While programming these tests, the authors accidentally introduced a race, and encouragingly, Eraser detected it.
  • It also produced false alarms, which the authors were able to suppress with annotations.
  • The fact that Eraser worked well on the servers is evidence that experienced programmers tend to obey the simple locking discipline even in an environment that offers many more elaborate synchronization primitives.
  • In the remainder of this section the authors report on the details of their experiences with each program.

4.2 Vesta cache server

  • Vesta [Digital Equipment 96b] is an advanced software configuration management system.
  • Configurations are written in a specialized functional language that describes the dependencies and rules used to derive the current state of the software.
  • This is correct because other threads access the log entries with the log head lock held, and threads do not maintain pointers into the log.
  • The authors eliminated the report of these races by moving the EraserReuse annotations to the three Flush routines.
  • The cache server uses a main server thread to wait for incoming RPC requests.

4.3 Petal

  • Petal is a distributed storage system that presents its clients with a huge virtual disk implemented by a cluster of servers and physical disks [Lee & Thekkath 96].
  • Petal implements a distributed consensus algorithm as well as failure detection and recovery mechanisms.
  • The authors found two races where global variables containing statistics were modified without locking.
  • Finally, the authors found one false alarm that they were unable to annotate away.
  • GmapCh Write2 implements a join-like construct to keep the stack frame active until the threads return.

4.4 Undergraduate coursework

  • As a counterpoint to their experience with mature multithreaded server programs, two of their colleagues used Eraser to examine the kinds of synchronization errors found in the homework assignments produced by their undergraduate operating systems class [Choi & Lewis 97].
  • The authors report their results here to demonstrate how Eraser functions with a less sophisticated code base.
  • These assignments can be roughly categorized as low-level (build locks from testand-set), thread-level (build a small threads package), synchronization-level (build semaphores and mutexes), and application-level (producer/consumer-style problems).
  • Each assignment builds on the implementation of the previous assignment.
  • These were caused by forgetting to take locks, taking locks during writes but not for reads, using different locks to protect the same data structure at different times, and forgetting to re-acquire locks that were released in a loop.

4.5 Effectiveness and Sensitivity

  • Since Eraser uses a testing methodology it cannot prove that a program is free from data races.
  • But the authors believe that Eraser works well compared to manual testing and debugging, and that Eraser’s testing is not very sensitive to the scheduler interleaving.
  • The authors consulted the program history of Ni2 and reintroduced two data races that had existed in previous versions.
  • The first error was an unlocked access to a reference count used to garbage collect file data structures.
  • These races had existed in the Ni2 source code for several months before they were manually found and fixed by the program author.

5 Additional experience

  • Each of which concerns a form of dynamic checking for synchronization errors in multi-threaded programs that the authors experimented with and believe is important and promising, but which they did not implement in Eraser.
  • Using an earlier version of Eraser that detected race conditions in multi-threaded Modula-3 programs, the authors found that the Lockset algorithm reported false alarms for Trestle programs[Manasse & Nelson 91] that protected shared locations with multiple locks, because each of two readers could access the location while holding two different locks.
  • This prevented the false alarms, but it is possible for this modification to cause false negatives.
  • A few seconds into formsedit startup their experimental monitor detected a cycle of locks, showing that no partial order existed.
  • But more work is required to catalog the sound and useful variations on the partial order discipline, and to develop annotations to suppress false alarms.

6 Conclusion

  • Hardware designers have learned to design for testability.
  • Programmers using threads must learn the same.
  • Programmers in the area of operating systems seem to view dynamic race detection tools as esoteric and impractical.
  • As the use of multi-threading expands, so will the unreliability caused by data races, unless better methods are used to eliminate them.
  • The authors believe that the Lockset method implemented in Eraser is promising.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Eraser: A Dynamic Data Race Detector for Multi-Threaded Programs
Stefan Savage
Department of Computer Science and Engineering
University of Washington, Seattle
Michael Burrows Greg Nelson Patrick Sobalvarro
Digital Equipment Corporation
Systems Research Center
Thomas Anderson
Computer Science Division
University of California, Berkeley
Abstract
Multi-threaded programming is difcult and error prone. It
is easy to make a mistake in synchronization that produces a
data race, yet it can be extremely hard to locate this mistake
during debugging. This paper describes a new tool, called
Eraser, for dynamically detecting data races in lock-based
multi-threaded programs. Eraser uses binary rewriting tech-
niques to monitor every shared memory reference and verify
that consistent locking behavior is observed. We present sev-
eral case studies, including undergraduate coursework and a
multi-threaded Web search engine, that demonstrate the ef-
fectiveness of this approach.
1 Introduction
Multi-threading has become a common programming tech-
nique. Most commercial operating systems support threads,
and popular applications like Microsoft Word and Netscape
Navigator are multi-threaded.
Unfortunately, debugging a multi-threaded program can
be difcult. Simple errors in synchronization can produce
timing-dependent data races that can take weeks or months
to track down. For this reason, many programmers have re-
sisted using threads. The difculties with using threads are
well summarized by John Ousterhout in his 1996 USENIX
presentation “Why Threads are a bad idea (for most pur-
poses)”[Ousterhout 96].
savage@cs.washington.edu
In this paper we describe a tool, called Eraser, that dy-
namically detects data races in multi-threaded programs. We
have implemented Eraser for DIGITAL Unix and used it to
detect data races in a number of programs, ranging from the
AltaVista Web search engine to introductory programming
exercises written by undergraduates.
Previous work in dynamic race detection is based on Lam-
port’s happens-before relation[Lamport 78] and checks that
conicting memory accesses from different threads are sepa-
rated by synchronization events. Happens-before algorithms
handle many styles of synchronization, but this generality
comes at a cost. We have aimed Eraser specically at the
lock-based synchronization used in modern multi-threaded
programs. Eraser simply checks that all shared memory ac-
cesses follow a consistent locking discipline. A locking dis-
cipline is a programming policy that ensures the absence of
data races. For example, a simple locking discipline is to re-
quire that every variable shared between threads is protected
by a mutual exclusion lock. We will argue that for many pro-
grams Eraser’s approach of enforcing a locking discipline is
simpler, more efcient, and more thorough at catching races
than the approach based on happens-before. As far as we
know, Eraser is the rst dynamic race detection tool to be
applied to multi-threaded production servers.
The remainder of this paper is organized as follows: After
reviewing what a data race is and describing previous work
in race detection, we present the Lockset algorithm used by
Eraser, rst at a high level and then at a level low enough
to reveal the main performance-critical implementation tech-
niques. Finally, we describe the experience we have had us-
ing Eraser with a number of multi-threaded programs.
Eraser bears no relationship to the tool by the same name
constructed by John Mellor-Crummey for detecting data
races in shared-memory parallel Fortran programs as part of
the ParaScope Programming Environment[Mellor-Crummey
93].

1.1 Denitions
A lock is a simple synchronization object used for mutual
exclusion; it is either available, or owned by a thread. The
operations on a lock mu are lock(mu) and unlock(mu).
Thus it is essentially a binary semaphore used for mutual ex-
clusion, but differs from a semaphore in that only the owner
of a lock is allowed to release it.
A data race occurs when two concurrent threads access a
shared variable, and:
at least one access is a write, and
the threads use no explicit mechanism to prevent the
accesses from being simultaneous.
If a program has a potential data race, then the effect of
the conicting accesses to the shared variable will depend on
the interleaving of the thread executions. Although program-
mers occasionally deliberately allow a data race when the
non-determinism seems harmless, usually a potential data
race is a serious error caused by failure to synchronize prop-
erly.
1.2 Related work
An early attempt to avoid data races was the pioneering con-
cept of a monitor introduced by C.A.R. Hoare [Hoare 74]. A
monitor is a group of shared variables together with the pro-
cedures that are allowed to access them, all bundled together
with a single anonymous lock that is automatically acquired
and released at the entry and exit of the procedures. The
shared variables in the monitor are out of scope (that is, invis-
ible) outside the monitor, consequently they can be accessed
only from within the monitor’s procedures, where the lock is
held. Thus monitors provide a static, compile-time guarantee
that accesses to shared variables are serialized and therefore
free from data races. Monitors are an effective way to avoid
data races if all shared variables are static globals, but they
don’t protect against data races in programs with dynami-
cally allocated shared variables, a limitation that early users
found was signicant[Lampson & Redell 80]. By substitut-
ing dynamic checking for static checking, our work aims to
allow dynamically allocated shared data while retaining as
much of the safety of monitors as possible.
Some attempts have been made to create purely static (that
is, compile-time) race detection systems that work in the
presence of dynamically allocated shared data: for exam-
ple, Sun’s lock
lint [SunSoft 94] and the Extended Static
Checker for Modula-3 [Detlefs et al. 97, Nelson et al. 96].
But these approaches seem problematical since they require
statically reasoning about the program’s semantics.
Most of the previous work in dynamic race detection
has been carried out by the scientic parallel programming
community [Dinning & Schonberg 90, Netzer 91, Mellor-
Crummey 91, Perkovic & Keleher 96] and is based on Lam-
port’s happens-before relation, which we now describe.
Thread 1 Thread 2
lock(mu);
v := v+1;
unlock(mu);
lock(mu);
v := v+1;
unlock(mu);
Figure 1: Lamport’s happens-before orders events in the same
thread in temporal order, and orders events in different threads if
the threads synchronized with one another between the events.
The happens-before order is a partial order on all events
of all threads in a concurrent execution. Within any single
thread, events are ordered in the order in which they oc-
curred. Between threads, events are ordered according to the
properties of the synchronization objects they access. If one
thread accesses a synchronization object and the next access
to the object is by a different thread, then the rst access is
dened to happen before the second if the semantics of the
synchronization object forbid a schedule in which these two
interactions are exchanged in time. For example, Figure 1
shows one possible ordering of two threads executing the
same code segment. The three program statements executed
by Thread 1 are ordered by happens-before because they are
executed sequentially in the same thread. The lock of mu by
Thread 2 is ordered by happens-before with the unlock of
mu by Thread 1 because a lock cannot be acquired before its
previous owner has released it. Finally, the three statements
executed by Thread 2 are ordered by happens-before because
they are executed sequentially within that thread.
If two threads both access a shared variable and the ac-
cesses are not ordered by the happens-before relation, then in
another execution of the program in which the slower thread
ran faster and/or the faster thread ran slower, the two ac-
cesses could have happened simultaneously; that is, a data
race could have occurred, whether or not it actually did oc-
cur. All previous dynamic race detection tools that we know
of are based on this observation. These race detectors mon-

itor every data reference and synchronization operation and
check for conicting accesses to shared variables that are un-
related by the happens-before relation for the particular exe-
cution they are monitoring.
Unfortunately, tools based on happens-before have two
signicant drawbacks. First, they are difcult to implement
efciently because they require per-thread information about
concurrent accesses to each shared memory location. More
importantly, the effectiveness of tools based on happens-
before is highly dependent on the interleaving produced by
the scheduler. Figure 2 shows a simple example where the
happens-before approach can miss a data race. While there is
a potential data race on the unprotected accesses to y, it will
not be detected in the execution shown in the gure, because
Thread 1 holds the lock before Thread 2, and so the accesses
to y are ordered in this interleaving by happens-before. A
tool based on happens-before would detect the error only if
the scheduler produced an interleaving in which the fragment
of code for Thread 2 occurred before the fragment of code
for Thread 1. Thus, to be effective, a race detector based
on happens-before needs a large number of test cases to test
many possible interleavings. In contrast, the programming
error in Figure 2 will be detected by Eraser with any test
case that exercises the two code paths, because the paths vio-
late the locking discipline for y regardless of the interleaving
produced by the scheduler. While Eraser is a testing tool and
therefore cannot guarantee that a program is free from races,
it can detect more races than tools based on happens-before.
The lock covers technique of Dinning and Shonberg is an
improvement to the happens-before approach for programs
that make heavy use of locks[Dinning & Schonberg 91]. In-
deed, one way to describe our approach would be that we
extend Dinning and Shonberg’s improvement and discard the
underlying happens-before apparatus that they were improv-
ing.
2 The Lockset algorithm
In this section we describe how the Lockset algorithm detects
races. The discussion is at a fairly high level; the techniques
used to implement the algorithm efciently will be described
in the following section.
The rst and simplest version of the Lockset algorithm
enforces the simple locking discipline that every shared vari-
able is protected by some lock, in the sense that the lock is
held by any thread whenever it accesses the variable. Eraser
checks whether the program respects this discipline by mon-
itoring all reads and writes as the program executes. Since
Eraser has no way of knowing which locks are intended to
protect which variables, it must infer the protection relation
from the execution history.
For each shared variable
, Eraser maintains the set
of candidate locks for . This set contains those locks that
have protected for the computation so far. That is, a lock
is in if in the computation up to that point, every thread
Thread 1 Thread 2
lock(mu);
v := v+1;
unlock(mu);
lock(mu);
v := v+1;
unlock(mu);
y := y+1;
y := y+1;
Figure 2: The program allows a data race on y, but the error is not
detected by happens-before in this execution interleaving.
that has accessed was holding at the moment of the ac-
cess. When a new variable is initialized, its candidate set
is considered to hold all possible locks. When the vari-
able is accessed, Eraser updates with the intersection
of and the set of locks held by the current thread. This
process, called lockset renement, ensures that any lock that
consistently protects
is contained in . If some lock
consistently protects , it will remain in as is re-
ned. If becomes empty this indicates that there is no
lock that consistently protects .
In summary, here is the rst version of the Lockset algo-
rithm:
Let
be the set of locks held by thread .
For each , initialize to the set of all locks.
On each access to
by thread ,
set := ;
if = , then issue a warning.
Figure 3 illustrates how a potential data race is discovered
through lockset renement. The left column contains pro-
gram statements, executed in order from top to bottom. The
right column reects the set of candidate locks, , after
each statement is executed. This example has two locks, so
starts containing both of them. After v is accessed
while holding mu1, is rened to contain that lock.

lock(mu1);
v := v+1;
unlock(mu1);
lock(mu2);
v := v+1;
unlock(mu2);
{mu1,mu2}
{mu1}
{}
{}
{mu1}
{}
{mu2}
{}
Program locks_held C(v)
Figure 3: If a shared variable is sometimes protected by lock mu1
and sometimes by lock mu2, then no lock protects it for the whole
computation. The gure shows the progressive renement of the
set of candidate locks
for . When becomes empty, the
Lockset algorithm has detected that no lock protects
.
Later, is accessed again, with only mu2 held. The inter-
section of the singleton sets mu1 and mu2 is the empty
set, correctly indicating that no lock protects .
2.1 Improving the locking discipline
The simple locking discipline we have used so far is too
strict. There are three very common programming practices
that violate the discipline yet are free from any data races:
Initialization. Shared variables are frequently initial-
ized without holding a lock.
Read-shared data. Some shared variables are written
during initialization only and are read-only thereafter.
These can be safely accessed without locks.
Read-write locks. Read-write locks allow multiple
readers to access a shared variable, but allow only a sin-
gle writer to do so.
We will extend the Lockset algorithm to accommodate ini-
tialization and read-shared data, and then extend it further to
accommodate read-write locks.
2.2 Initialization and read-sharing
There is no need for a thread to lock out others if no other
thread can possibly hold a reference to the data being ac-
cessed. Programmers often take advantage of this observa-
tion when initializing newly allocated data. To avoid false
alarms caused by these unlocked initialization writes, we de-
lay the renement of a location’s candidate set until after it
has been initialized. Unfortunately, we have no easy way
Virgin
Exclusive
Shared
Shared!
Modified
wr
rd, new
thread
rd/wr, first
thread
rd
wr
wr, new
thread
Figure 4: Eraser keeps track of the state of all locations in mem-
ory. Newly allocated locations begin in the Virgin state. As various
threads read and write a location, its state changes according to the
transitions in the gure. Race conditions are reported only for loca-
tions in the Shared-Modied state.
of knowing when initialization is complete. Eraser therefore
considers a shared variable to be initialized when it is rst
accessed by a second thread. As long as a variable has been
accessed by a single thread only, reads and writes have no
effect on the candidate set.
Since simultaneous reads of a shared variable by multiple
threads are not races, there is also no need to protect a vari-
able if it is read-only. To support unlocked read-sharing for
such data, we report races only after an initialized variable
has become write-shared by more than one thread.
Figure 4 illustrates the state transitions that control when
lockset renement occurs and when races are reported.
When a variable is rst allocated, it is set to the Virgin state,
indicating that the data is new and has not yet been refer-
enced by any thread. Once the data is accessed, it enters
the Exclusive state, signifying that it is has been accessed,
but by one thread only. In this state, subsequent reads and
writes by the same thread do not change the variable’s state
and do not update
. This addresses the initialization is-
sue, since the rst thread can initialize the variable without
causing to be rened. When and if another thread ac-
cesses the variable, then the state changes. A read access
changes the state to Shared. In the Shared state, is up-
dated, but data races are not reported, even if becomes
empty. This takes care of the read-shared data issue, since
multiple threads can read a variable without causing a race
to be reported. A write access from a new thread changes
the state from Exclusive or Shared to the Shared-Modied
state, in which is updated and races are reported, just
as described in the original, simple version of the algorithm.
Our support for initialization makes Eraser’s checking
more dependent on the scheduler than we would like. Sup-
pose that a thread allocates and initializes a shared variable

without a lock, and erroneously makes the variable accessi-
ble to a second thread before it has completed the initializa-
tion. Then Eraser will detect the error if any of the second
thread’s accesses occur before the rst thread’s nal initial-
ization actions, but otherwise Eraser will miss the error. We
don’t think this has been a problem, but we have no way of
knowing for sure.
2.3 Read-write locks
Many programs use single-writer, multiple-reader locks as
well as simple locks. To accommodate this style we intro-
duce our last renement of the locking discipline: we require
that for each variable , some lock protects , meaning
is held in write mode for every write of , and is held in
some mode (read or write) for every read of .
We continue to use the state transitions of Figure 4,
but when the variable enters the Shared-Modied state, the
checking is slightly different:
Let be the set of locks held in any mode by
thread .
Let be the set of locks held in write
mode by thread .
For each , initialize to the set of all locks.
On each read of by thread ,
set := ;
if = , then issue a warning.
On each write of by thread ,
set := ;
if = , then issue a warning.
That is, locks held purely in read mode are removed from
the candidate set when a write occurs, as such locks held by
a writer do not protect against a data race between the writer
and some other reader thread.
3 Implementing Eraser
Eraser is implemented for the DIGITAL Unix operating sys-
tem on the Alpha processor, using the ATOM [Srivastava &
Eustace 94] binary modication system. Eraser takes an un-
modied program binary as input and adds instrumentation
to produce a new binary that is functionally identical, but in-
cludes calls to the Eraser runtime to implement the Lockset
algorithm.
To maintain , Eraser instruments each load and store
in the program. To maintain for each thread ,
Eraser instruments each call to acquire or release a lock, as
well as the stubs that manage thread initialization and nal-
ization. To initialize for dynamically allocated data,
Eraser instruments each call to the storage allocator.
Eraser treats each 32-bit word in the heap or global data
as a possible shared variable, since on our platform a 32-bit
word is the smallest memory-coherent unit. Eraser does not
instrument loads and stores whose address mode is indirect
off the stack pointer, since these are assumed to be stack ref-
erences, and shared variables are assumed to be in global
locations or in the heap. Eraser will maintain candidate sets
for stack locations that are accessed via registers other than
the stack pointer, but this is an artifact of the implementation
rather than a deliberate plan to support programs that share
stack locations between threads.
When a race is reported, Eraser indicates the le and line
number at which it was discovered and a backtrace listing of
all active stack frames. The report also includes the thread
ID, memory address, type of memory access, and important
register values such as the program counter and stack pointer.
We have found that this information is usually sufcient for
locating the source of the race. If the cause of a race is still
unclear, the user can direct Eraser to log all the accesses to
a particular variable that result in a change to its candidate
lock set.
3.1 Representing the candidate lock sets
A na¨ıve implementation of lock sets would store a list of
candidate locks for each memory location, potentially con-
suming many times the allocated memory of the program.
We can avoid this expense by exploiting the fortunate fact
that the number of distinct sets of locks observed in practice
is quite small. In fact, we have never observed more than
10,000 distinct sets of locks occurring in any execution of
the Lockset monitoring algorithm. Consequently, we rep-
resent each set of locks by a small integer, a lockset index
into a table whose entries represent the set of locks as sorted
vectors of lock addresses. Hashing is used to eliminate du-
plicates in the table and to nd a lockset index from a given
set of locks. The entries in the table are never deallocated or
modied, so each lockset index remains valid for the lifetime
of the program. Eraser also caches the result of each inter-
section, so that the fast case for set intersection is simply a
table lookup. Each lock vector in the table is sorted, so that
when the cache fails, the slow case of the intersection oper-
ation can be performed by a simple comparison of the two
sorted vectors.
For every 32-bit word in the data segment and heap, there
is a corresponding shadow word that is used to contain a 30-
bit lockset index and a 2-bit state condition. In the Exclusive
state, the 30 bits are not used to store a lockset index, but
used instead to store the ID of the thread with exclusive ac-
cess.
All the standard memory allocation routines are instru-
mented to allocate and initialize a shadow word for each
word allocated by the program. When a thread accesses a
memory location, Eraser nds the shadow word by adding a
xed displacement to the location’s address.

Citations
More filters
Journal ArticleDOI
09 May 2003
TL;DR: nesc (nesc) as mentioned in this paper is a programming language for networked embedded systems that represents a new design space for application developers and is used to implement TinyOS, a small operating system for sensor networks, as well as several significant sensor applications.
Abstract: We present nesC, a programming language for networked embedded systems that represent a new design space for application developers. An example of a networked embedded system is a sensor network, which consists of (potentially) thousands of tiny, low-power "motes," each of which execute concurrent, reactive programs that must operate with severe memory and power constraints.nesC's contribution is to support the special needs of this domain by exposing a programming model that incorporates event-driven execution, a flexible concurrency model, and component-oriented application design. Restrictions on the programming model allow the nesC compiler to perform whole-program analyses, including data-race detection (which improves reliability) and aggressive function inlining (which reduces resource consumption).nesC has been used to implement TinyOS, a small operating system for sensor networks, as well as several significant sensor applications. nesC and TinyOS have been adopted by a large number of sensor network research groups, and our experience and evaluation of the language shows that it is effective at supporting the complex, concurrent programming style demanded by this new class of deeply networked systems.

1,771 citations


Cites methods from "Eraser: a dynamic data race detecto..."

  • ...Eraser [44] detects unprotected shared variables using a modified binary....

    [...]

Journal ArticleDOI
11 Sep 2000
TL;DR: A verification and testing environment for Java, called Java PathFinder (JPF), which integrates model checking, program analysis and testing, and uses state compression to handle big states and partial order and symmetry reduction, slicing, abstraction, and runtime analysis techniques to reduce the state space.
Abstract: The majority of the work carried out in the formal methods community throughout the last three decades has (for good reasons) been devoted to special languages designed to make it easier to experiment with mechanized formal methods such as theorem provers and model checkers. In this paper, we give arguments for why we believe it is time for the formal methods community to shift some of its attention towards the analysis of programs written in modern programming languages. In keeping with this philosophy, we have developed a verification and testing environment for Java, called Java PathFinder (JPF), which integrates model checking, program analysis and testing. Part of this work has consisted of building a new Java Virtual Machine that interprets Java bytecode. JPF uses state compression to handle large states, and partial order reduction, slicing, abstraction and run-time analysis techniques to reduce the state space. JPF has been applied to a real-time avionics operating system developed at Honeywell, illustrating an intricate error, and to a model of a spacecraft controller, illustrating the combination of abstraction, run-time analysis and slicing with model checking.

1,459 citations


Cites methods from "Eraser: a dynamic data race detecto..."

  • ...The source of the error, a missing critical section, could, however, have been found automatically using the Eraser data detection algorithm....

    [...]

  • ...It immediately identified the race condition using the Eraser algorithm, and then launched the model checker on a thread window consisting of those threads involved in the race condition: the Planner and the Executive, locating the deadlock - all within 25 seconds....

    [...]

  • ...The algorithm described in [38] is relaxed to allow variables to be initialized without locks, and to be read by several threads without locks, if no-one writes....

    [...]

  • ...We have made experiments where the Eraser module in JPF generates a so-calledrace window consisting of the threads involved in a race condition....

    [...]

  • ...An example is the data race detection algorithm Eraser [38] developed at Compaq....

    [...]

Proceedings ArticleDOI
23 Oct 2004
TL;DR: It is found that even well tested code written by experts contains a surprising number of obvious bugs and that simple automatic techniques can be effective at countering the impact of both ordinary mistakes and misunderstood language features.
Abstract: Many techniques have been developed over the years to automatically find bugs in software. Often, these techniques rely on formal methods and sophisticated program analysis. While these techniques are valuable, they can be difficult to apply, and they aren't always effective in finding real bugs.Bug patterns are code idioms that are often errors. We have implemented automatic detectors for a variety of bug patterns found in Java programs. In this extended abstract1, we describe how we have used bug pattern detectors to find serious bugs in several widely used Java applications and libraries. We have found that the effort required to implement a bug pattern detector tends to be low, and that even extremely simple detectors find bugs in real applications.From our experience applying bug pattern detectors to real programs, we have drawn several interesting conclusions. First, we have found that even well tested code written by experts contains a surprising number of obvious bugs. Second, Java (and similar languages) have many language features and APIs which are prone to misuse. Finally, that simple automatic techniques can be effective at countering the impact of both ordinary mistakes and misunderstood language features.

864 citations

Journal ArticleDOI
TL;DR: A comprehensive overview of a broad spectrum of fault localization techniques, each of which aims to streamline the fault localization process and make it more effective by attacking the problem in a unique way is provided.
Abstract: Software fault localization, the act of identifying the locations of faults in a program, is widely recognized to be one of the most tedious, time consuming, and expensive – yet equally critical – activities in program debugging. Due to the increasing scale and complexity of software today, manually locating faults when failures occur is rapidly becoming infeasible, and consequently, there is a strong demand for techniques that can guide software developers to the locations of faults in a program with minimal human intervention. This demand in turn has fueled the proposal and development of a broad spectrum of fault localization techniques, each of which aims to streamline the fault localization process and make it more effective by attacking the problem in a unique way. In this article, we catalog and provide a comprehensive overview of such techniques and discuss key issues and concerns that are pertinent to software fault localization as a whole.

822 citations


Cites background or methods from "Eraser: a dynamic data race detecto..."

  • ...A runtime analysis (such as [144], [321], [392]), on the other hand, is less powerful than a static analysis but also produces fewer false...

    [...]

  • ...Concurrent programs suffer most from three kinds of access anomalies: data race [32], [321], atomicity violation [110],...

    [...]

Proceedings ArticleDOI
01 Mar 2008
TL;DR: This study carefully examined concurrency bug patterns, manifestation, and fix strategies of 105 randomly selected real world concurrency bugs from 4 representative server and client open-source applications and reveals several interesting findings that provide useful guidance for concurrency Bug detection, testing, and concurrent programming language design.
Abstract: The reality of multi-core hardware has made concurrent programs pervasive. Unfortunately, writing correct concurrent programs is difficult. Addressing this challenge requires advances in multiple directions, including concurrency bug detection, concurrent program testing, concurrent programming model design, etc. Designing effective techniques in all these directions will significantly benefit from a deep understanding of real world concurrency bug characteristics.This paper provides the first (to the best of our knowledge) comprehensive real world concurrency bug characteristic study. Specifically, we have carefully examined concurrency bug patterns, manifestation, and fix strategies of 105 randomly selected real world concurrency bugs from 4 representative server and client open-source applications (MySQL, Apache, Mozilla and OpenOffice). Our study reveals several interesting findings and provides useful guidance for concurrency bug detection, testing, and concurrent programming language design.Some of our findings are as follows: (1) Around one third of the examined non-deadlock concurrency bugs are caused by violation to programmers' order intentions, which may not be easily expressed via synchronization primitives like locks and transactional memories; (2) Around 34% of the examined non-deadlock concurrency bugs involve multiple variables, which are not well addressed by existing bug detection tools; (3) About 92% of the examined concurrency bugs canbe reliably triggered by enforcing certain orders among no more than 4 memory accesses. This indicates that testing concurrent programs can target at exploring possible orders among every small groups of memory accesses, instead of among all memory accesses; (4) About 73% of the examinednon-deadlock concurrency bugs were not fixed by simply adding or changing locks, and many of the fixes were not correct at the first try, indicating the difficulty of reasoning concurrent execution by programmers.

800 citations


Cites background from "Eraser: a dynamic data race detecto..."

  • ...For example, data race bug detection [37, 42] checks the synchronization among accesses to one variable; some atomicity violation bug detection tools also focus on atomic regions related to one variable [23, 41]....

    [...]

  • ...(1) Concurrency bug detection Most previous concurrency bug detection research has focused on detecting data race bugs [7, 10, 31,33,37,42] and deadlock bugs [3,10,37]....

    [...]

References
More filters
Journal ArticleDOI
Leslie Lamport1
TL;DR: In this article, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.
Abstract: The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specialized for synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.

6,804 citations

Proceedings ArticleDOI
03 Dec 1995
TL;DR: This paper describes the motivation, architecture and performance of SPIN, an extensible operating system that provides an extension infrastructure together with a core set of extensible services that allow applications to safely change the operating system's interface and implementation.
Abstract: This paper describes the motivation, architecture and performance of SPIN, an extensible operating system. SPIN provides an extension infrastructure, together with a core set of extensible services, that allow applications to safely change the operating system's interface and implementation. Extensions allow an application to specialize the underlying operating system in order to achieve a particular level of performance and functionality. SPIN uses language and link-time mechanisms to inexpensively export fine-grained interfaces to operating system services. Extensions are written in a type safe language, and are dynamically linked into the operating system kernel. This approach offers extensions rapid access to system services, while protecting the operating system code executing within the kernel address space. SPIN and its extensions are written in Modula-3 and run on DEC Alpha workstations.

1,054 citations

Proceedings ArticleDOI
01 Jun 1994
TL;DR: ATOM as mentioned in this paper is a single framework for building a wide range of customized program analysis tools, including block counting, profiling, dynamic memory recording, instruction and data cache simulation, pipeline simulation, evaluating branch prediction, and instruction scheduling.
Abstract: ATOM (Analysis Tools with OM) is a single framework for building a wide range of customized program analysis tools. It provides the common infrastructure present in all code-instrumenting tools; this is the difficult and time-consuming part. The user simply defines the tool-specific details in instrumentation and analysis routines. Building a basic block counting tool like Pixie with ATOM requires only a page of code.ATOM, using OM link-time technology, organizes the final executable such that the application program and user's analysis routines run in the same address space. Information is directly passed from the application program to the analysis routines through simple procedure calls instead of inter-process communication or files on disk. ATOM takes care that analysis routines do not interfere with the program's execution, and precise information about the program is presented to the analysis routines at all times. ATOM uses no simulation or interpretation.ATOM has been implemented on the Alpha AXP under OSF/1. It is efficient and has been used to build a diverse set of tools for basic block counting, profiling, dynamic memory recording, instruction and data cache simulation, pipeline simulation, evaluating branch prediction, and instruction scheduling.

982 citations

Proceedings ArticleDOI
01 Sep 1996
TL;DR: The design, implementation, and performance of Petal is described, a system that attempts to approximate this ideal in practice through a novel combination of features.
Abstract: The ideal storage system is globally accessible, always available, provides unlimited performance and capacity for a large number of clients, and requires no management. This paper describes the design, implementation, and performance of Petal, a system that attempts to approximate this ideal in practice through a novel combination of features. Petal consists of a collection of network-connected servers that cooperatively manage a pool of physical disks. To a Petal client, this collection appears as a highly available block-level storage system that provides large abstract containers called virtual disks. A virtual disk is globally accessible to all Petal clients on the network. A client can create a virtual disk on demand to tap the entire capacity and performance of the underlying physical resources. Furthermore, additional resources, such as servers and disks, can be automatically incorporated into Petal.We have an initial Petal prototype consisting of four 225 MHz DEC 3000/700 workstations running Digital Unix and connected by a 155 Mbit/s ATM network. The prototype provides clients with virtual disks that tolerate and recover from disk, server, and network failures. Latency is comparable to a locally attached disk, and throughput scales with the number of servers. The prototype can achieve I/O rates of up to 3150 requests/sec and bandwidth up to 43.1 Mbytes/sec.

725 citations

Book ChapterDOI
08 Jun 1998
TL;DR: This talk reports on some of the research results of and the current state of the Extended Static Checking project at DEC SRC.
Abstract: Extended static checking (ESC) is a static program analysis technique that attempts to find common programming errors like null-dereferences, array index bounds errors, type cast errors, deadlocks, and race conditions. An ESC tool is powered by program verification technology, yet it feels to the programmer like a type checker because of the limited ambition of finding only certain kinds of errors. This talk reports on some of the research results of and the current state of the Extended Static Checking project at DEC SRC.

481 citations