scispace - formally typeset
Open AccessJournal ArticleDOI

Asynchronous design methodologies: an overview

Scott Hauck
- Vol. 83, Iss: 1, pp 69-93
Reads0
Chats0
TLDR
This work examines the benefits and problems inherent in asynchronous computations, and in some of the more notable design methodologies, which include Huffman asynchronous circuits, burst-mode circuits, micropipelines, template-based and trace theory-based delay-insensitive circuits, signal transition graphs, change diagrams, and complication-based quasi-delay-insensitivity circuits.
Abstract
Asynchronous design has been an active area of research since at least the mid 1950's, but has yet to achieve widespread use. We examine the benefits and problems inherent in asynchronous computations, and in some of the more notable design methodologies. These include Huffman asynchronous circuits, burst-mode circuits, micropipelines, template-based and trace theory-based delay-insensitive circuits, signal transition graphs, change diagrams, and complication-based quasi-delay-insensitive circuits. >

read more

Content maybe subject to copyright    Report

Proceedings of the IEEE, Vol. 83, No. 1, pp. 69-93, January, 1995.
Asynchronous Design Methodologies: An Overview
Scott Hauck
Department of Computer Science and Engineering
University of Washington
Seattle, WA 98195
Abstract
Asynchronous design has been an active area of research since at least the mid 1950's, but has
yet to achieve widespread use. We examine the benefits and problems inherent in asynchronous
computations, and in some of the more notable design methodologies. These include Huffman
asynchronous circuits, burst-mode circuits, micropipelines, template-based and trace theory-based
delay-insensitive circuits, signal transition graphs, change diagrams, and compilation-based quasi-
delay-insensitive circuits.
1. Introduction
Much of today’s logic design is based on two major assumptions: all signals are binary, and time is discrete.
Both of these assumptions are made in order to simplify logic design. By assuming binary values on signals,
simple Boolean logic can be used to describe and manipulate logic constructs. By assuming time is discrete, hazards
and feedback can largely be ignored. However, as with many simplifying assumptions, a system that can operate
without these assumptions has the potential to generate better results.
Asynchronous circuits keep the assumption that signals are binary, but remove the assumption that time is
discrete. This has several possible benefits:
No clock skew - Clock skew is the difference in arrival times of the clock signal at different parts of the circuit.
Since asynchronous circuits by definition have no globally distributed clock, there is no need to worry
about clock skew. In contrast, synchronous systems often slow down their circuits to accommodate the
skew. As feature sizes decrease, clock skew becomes a much greater concern.
Lower power - Standard synchronous circuits have to toggle clock lines, and possibly precharge and discharge
signals, in portions of a circuit unused in the current computation. For example, even though a floating-
point unit on a processor might not be used in a given instruction stream, the unit still must be operated by
the clock. Although asynchronous circuits often require more transitions on the computation path than
synchronous circuits, they generally have transitions only in areas involved in the current computation.
Note that there are techniques being used in synchronous designs to address this issue as well.
Average-case instead of worst-case performance - Synchronous circuits must wait until all possible
computations have completed before latching the results, yielding worst-case performance. Many
asynchronous systems sense when a computation has completed, allowing them to exhibit average-case
performance. For circuits such as ripple-carry adders where the worst-case delay is significantly worse than
the average-case delay, this can result in a substantial savings.
Easing of global timing issues - In systems such as a synchronous microprocessor, the system clock, and thus
system performance, is dictated by the slowest (critical) path. Thus, most portions of a circuit must be
carefully optimized to achieve the highest clock rate, including rarely used portions of the system. Since

2
many asynchronous systems operate at the speed of the circuit path currently in operation, rarely used
portions of the circuit can be left unoptimized without adversely affecting system performance.
Better technology migration potential - Integrated circuits will often be implemented in several different
technologies during their lifetime. Early systems may be implemented with gate arrays, while later
production runs may migrate to semi-custom or custom ICs. Greater performance for synchronous systems
can often only be achieved by migrating all system components to a new technology, since again the
overall system performance is based on the longest path. In many asynchronous systems, migration of
only the more critical system components can improve system performance on average, since performance
is dependent on only the currently active path. Also, since many asynchronous systems sense computation
completion, components with different delays may often be substituted into a system without altering other
elements or structures.
Automatic adaptation to physical properties - The delay through a circuit can change with variations in
fabrication, temperature, and power-supply voltage. Synchronous circuits must assume that the worst
possible combination of factors is present and clock the system accordingly. Many asynchronous circuits
sense computation completion, and will run as quickly as the current physical properties allow.
Robust mutual exclusion and external input handling - Elements that guarantee correct mutual exclusion of
independent signals and synchronization of external signals to a clock are subject to metastability [1]. A
metastable state is an unstable equilibrium state, such as a pair of cross-coupled CMOS inverters at 2.5V,
which a system can remain in for an unbounded amount of time [2]. Synchronous circuits require all
elements to exhibit bounded response time. Thus, there is some chance that mutual exclusion circuits will
fail in a synchronous system. Most asynchronous systems can wait an arbitrarily long time for such an
element to complete, allowing robust mutual exclusion. Also, since there is no clock with which signals
must be synchronized, asynchronous circuits more gracefully accommodate inputs from the outside world,
which are by nature asynchronous.
With all of the potential advantages of asynchronous circuits, one might wonder why synchronous systems
predominate. The reason is that asynchronous circuits have several problems as well. Primarily, asynchronous
circuits are more difficult to design in an ad hoc fashion than synchronous circuits. In a synchronous system, a
designer can simply define the combinational logic necessary to compute the given functions, and surround it with
latches. By setting the clock rate to a long enough period, all worries about hazards (undesired signal transitions) and
the dynamic state of the circuit are removed. In contrast, designers of asynchronous systems must pay a great deal of
attention to the dynamic state of the circuit. Hazards must also be removed from the circuit, or not introduced in the
first place, to avoid incorrect results. The ordering of operations, which was fixed by the placement of latches in a
synchronous system, must be carefully ensured by the asynchronous control logic. For complex systems, these
issues become too difficult to handle by hand. Unfortunately, asynchronous circuits in general cannot leverage off of
existing CAD tools and implementation alternatives for synchronous systems. For example, some asynchronous
methodologies allow only algebraic manipulations (associative, commutative, and DeMorgan's Law) for logic
decomposition, and many do not even allow these. Placement, routing, partitioning, logic synthesis, and most other
CAD tools either need modifications for asynchronous circuits, or are not applicable at all.
Finally, even though most of the advantages of asynchronous circuits are towards higher performance, it isn't
clear that asynchronous circuits are actually any faster in practice. Asynchronous circuits generally require extra time
due to their signaling policies, thus increasing average-case delay. Whether this cost is greater or less than the
benefits listed previously is unclear, and more research in this area is necessary.

3
Even with all of the problems listed above, asynchronous design is an important research area. Regardless of
how successful synchronous systems are, there will always be a need for asynchronous systems. Asynchronous
logic may be used simply for the interfacing of a synchronous system to its environment and other synchronous
systems, or possibly for more complete applications. Also, although ad hoc design of asynchronous systems is
impractical, there are several methodologies and CAD algorithms developed specifically for asynchronous design.
Several of the main approaches are profiled in this paper. Note that we do not catalog all methodologies ever
developed, nor do we explore every subtlety of the methodologies included. Attempting either of these tasks would
fill hundreds of pages, obscuring the significant issues involved. Instead, we discuss the essential aspects of some
of the more well-known asynchronous design systems. This will hopefully provide the reader a solid framework in
which to further pursue the topics of interest. We likewise do not cover many of the related areas, such as
verification and testing, which are very important to asynchronous design, yet too complex to be handled adequately
here. Interested readers are directed elsewhere for details on asynchronous verification [3] and testing [4].
Asynchronous design methodologies can most easily be categorized by the timing models they assume, and this
paper is organized along these lines. Section 2 covers systems using bounded-delay models, including fundamental-
mode Huffman circuits, extensions of these circuits to non-fundamental mode, and burst-mode circuits. Section 3
focuses on micropipelines. Section 4 details delay-insensitive circuits, including template or module based systems,
and trace Theory. Section 5 combines speed-independent and quasi-delay-insensitive circuits, including signal
transition graphs, change diagrams, and communicating processes compilation. Finally, we conclude in Section 6
with a general comparison of the methods discussed.
2. Bounded-Delay Models
The most obvious model to use for asynchronous circuits is the same model used for synchronous circuits.
Specifically, it is assumed that the delay in all circuit elements and wires is known, or at least bounded. Circuits
designed with this model (usually coupled with the fundamental mode assumption discussed below) are generally
referred to as Huffman circuits, after D. A. Huffman, who developed many of the early concepts of these circuits.
2.1 Fundamental Mode Huffman Circuits
In this model, circuits are designed in much the same way as synchronous circuits. The circuit to be synthesized
is usually expressed as a flow-table [5], a form similar to a truth-table. As shown in Figure 1, a flow-table has a
row for each internal state, and a column for each combination of inputs. The entries in each location indicate the
next state entered and outputs generated when the column’s input combination is seen while in the row’s state.
States in circles correspond to stable states, states where the next state is identical to the current state. Normally it
is assumed that each unstable state leads directly to a stable state, with at most one transition occurring on each
output variable. Similar to finite state machine synthesis in synchronous systems, state reduction and state encoding
is performed on the flow-table, and Karnaugh maps generated for each of the resulting signals.
11/0 10/1
00/0
1 32
00 11 01 10
3 , 1
3 , 12 , 0
2 , 0
1 , 0
1 , 0
3
2
1
Figure 1. Example of a Flow-table (left), and the corresponding state machine (right).
There are several special concerns when implementing state machines asynchronously that do not occur in
synchronous systems. First, since there is no clock to synchronize input arrivals, the system must behave properly

4
in any intermediate states caused by multiple input changes. For example in the flow-table of Figure 1, the system
will not move directly from input “00” to “11”, but will briefly pass through “01” or “10”. Thus, for state 1 we
must add entries for both inputs “01” and “10” which keep the machine in state 1.
B
A
C
1101
0
BC
A
00 01 11 10
0 001
1
Figure 2. Karnaugh map (left) and implementation (right) of a circuit with hazards.
We must also deal with hazard removal. Suppose we are trying to implement the Karnaugh map in Figure 2
and use the sum-of-products form shown. Further assume that all gates (including the inverter) have a gate delay of
1 unit, and the current state is (A, B, C) = (1, 1, 1). In this state AB is true, and the output is 1. If we now set B to
0, we will move to state (1, 0, 1), and the output should remain 1. However, because of the delay in the inverter,
the top AND gate will become false before the lower AND becomes true, and a 0 will propagate to the output. This
momentary glitch on the output is known as a static-1 hazard, and must be removed for reliable circuit operation. A
static-0 hazard is similar, with a value meant to remain stable at 0 instead momentarily becoming 1. A dynamic
hazard is the case where a signal that is meant to make a single transition (01 or 10) instead makes three or
more transitions (such as 0101, 1010).
All static and dynamic hazards due to a single input change can be eliminated by adding to a sum-of-products
circuit that has no useless products (i.e. no AND term contains both a variable and its complement) additional cubes
covering all adjacent 1’s in a Karnaugh map ([5] pp. 121-127). In the above example, adding the cube AC would
remove the static-1 hazard demonstrated earlier, since while the circuit transitioned from state (1, 1, 1) to (1, 0, 1)
both A and C remain true, and the AC cube would stay true. Unfortunately, this procedure cannot guarantee correct
operation when multiple inputs are allowed to change simultaneously. Referring to Figure 2, assume that we are in
state (1, 1, 0), and we move to state (1, 0, 1) by changing both B and C. If the delays in the circuit are slightly
greater for input C, the circuit will momentarily be in state (1, 0, 0), and the output will go to 0 (a dynamic hazard).
We could try to alter the circuit delays to make sure that the circuit goes through state (1, 1, 1) instead, but what if
this state had also been set to 0 in the original Karnaugh map? In this case, no intermediate state will maintain the
correct output, and an unavoidable hazard is present. The solution generally adopted is to make a policy decision that
only one input to a circuit is allowed to change at a time.
An important point needs to be made about the sum-of-products form. As the number of inputs increases, the
number of inputs to the AND and OR gates increases. Since most technologies either restrict the number of inputs
to a gate, or penalize large fanin gates by long delays, it is important to have some method for decomposing large
gates. As proven by Unger ([5] pp. 130-134), many applications of algebraic transformations, including the
associative, distributive, and DeMorgan’s laws, do not introduce any new hazards in bounded-delay circuits. Thus, a
sum-of-products form can be factored into smaller gates via these transformations. Note that other transformations,
such as the transformation from F=AB+BC+B’C to F=AB+B’C, can introduce hazards, in this case because it
removes the cube that we added above for hazard-free operation. This ability to use some logic transformations is an
important advantage of this methodology, for many of the other methodologies do not allow these types of
operations.
In order to extend our combinational circuit methodology to sequential circuits, we use a model similar to that
used for synchronous circuits (Figure 3). Since we made the restriction that only one input to the combinational

5
logic can change at a time, this forces several requirements on our sequential circuit. First, we must make sure that
the combinational logic has settled in response to a new input before the present-state entries change. This is done
by placing delay elements on the feedback lines. Also, the same restriction dictates that only one next state bit can
change at a time. Encodings can be made that allow a single transition of state bits for all state transitions, but
require multiple state encodings for each state ([5] pp. 76-79), complicating the combinational logic. One-hot
encodings, encodings where each state q
i
has a single associated state bit y
i
true and all other bits false, require two
transitions, but simplify the associated logic. State transitioning from q
i
to q
j
is accomplished by first setting y
j
,
and then resetting y
i
. The final requirement is that the next external input transition cannot occur until the entire
system settles to a stable state (this final restriction is what characterizes a fundamental-mode circuit). For a one-hot
encoding, this means that a new input must be delayed long enough for three trips through the combinational logic
and two trips through the delay elements.
x
1
x
n
y
1
y
k
z
1
z
m
y
1
'
y
k
'
Inputs
Outputs
Present
State
Next
State
Combinational
logic
Delay Elements
Figure 3. Huffman sequential circuit structure ([6] pg. 157).
2.2 Extending Huffman Circuits to Non-Fundamental Mode
The fundamental-mode assumption, while making logic design easy, greatly increases cycle time. Therefore
there could be considerable gains from removing this restriction. One method is quite simple, and can be seen by
referring back to the original argument for the fundamental mode. The issue was that when multiple inputs change,
and no single cube covers the starting and ending point of a transition, there is the possibility of a hazard. However,
if a single cube covers an entire transition, then there is no need for the fundamental mode restriction, since that cube
will ensure the output stays a 1 at all times. So, for the function A+F(B, C, D), when A is true, inputs B, C, and
D can change at will. However in general input A cannot change in parallel with inputs B, C, and D, because when
A goes from true to false, F(B, C, D) may be going from false to true, potentially causing a hazard. Therefore, this
observation cannot completely eliminate the fundamental mode assumption.
Another method, described by Hollaar [7], uses detailed knowledge of the implementation strategy to allow new
transitions to arrive earlier than the fundamental-mode assumption allows. As shown in Figure 4, Hollaar builds a
one-hot encoded asynchronous state machine with a set-reset flip-flop for each state bit (for example, NAND gates 5
& 6 form a set-reset flip-flop for state K). The set input is driven when the previous state’s bit and the transition
function is true (i.e. for K, when we are in state J, and transition function S is true), and is reset when the following
state is true (hence the connection from gate 9 to gate 6). This basic scheme is expanded beyond simple straight-line
state-machines, and allows parallel execution (i.e. FORK and JOIN) in asynchronous state machines.

Citations
More filters
Journal ArticleDOI

A survey of research and practices of Network-on-chip

TL;DR: The research shows that NoC constitutes a unification of current trends of intrachip communication rather than an explicit new alternative.
Proceedings ArticleDOI

Razor: a low-power pipeline based on circuit-level timing speculation

TL;DR: A solution by which the circuit can be operated even below the ‘critical’ voltage, so that no margins are required and thus more energy can be saved.
Book

Principles of Asynchronous Circuit Design: A Systems Perspective

TL;DR: Industrial designers with a background in conventional (clocked) design to be able to understand asynchronous design sufficiently to assess what it has to offer and whether it might be advantageous in their next design task.
Journal ArticleDOI

Theory of latency-insensitive design

TL;DR: The theory of latency-insensitive design is presented as the foundation of a new correct-by-construction methodology to design complex systems by assembling intellectual property components to design large digital integrated circuits by using deep submicrometer technologies.
References
More filters
Journal ArticleDOI

Petri nets: Properties, analysis and applications

TL;DR: The author proceeds with introductory modeling examples, behavioral and structural properties, three methods of analysis, subclasses of Petri nets and their analysis, and one section is devoted to marked graphs, the concurrent system model most amenable to analysis.
Book

Introduction to VLSI systems

Book

Digital Systems Testing and Testable Design

TL;DR: The new edition of Breuer-Friedman's Diagnosis and Reliable Design ofDigital Systems offers comprehensive and state-ofthe-art treatment of both testing and testable design.
Proceedings Article

Model for Delay Faults Based Upon Paths

TL;DR: A procedure is described which identifies paths which are tested for path faults by a set of patterns, independent of the delays of any individual gate of the network, which is a global delay fault model.
Book

Trace Theory for Automatic Hierarchical Verification of Speed-Independent Circuits

David L. Dill
TL;DR: The problem of receptiveness is proved to be decidable, by reduction to Church's solvability problem, and the resulting verification methodology is naturally hierarchical, because specifications at one level of abstraction can be used as descriptions at higher levels of abstraction.
Frequently Asked Questions (14)
Q1. What have the authors contributed in "Asynchronous design methodologies: an overview" ?

The authors examine the benefits and problems inherent in asynchronous computations, and in some of the more notable design methodologies. 

To preserve the change diagram firing semantics, the I-Net transitions require one of the places to have at least three tokens (hence the three arcs from each place to one of the30transitions), and when fired the transition removes a net of one token from each place. 

While straight-line pipelines are easy to construct, pipelines with feedback or even more general dataflow can require complex control circuits, circuits micropipelines gives little help in synthesizing. 

In order to make delay-insensitive circuit design practical for general computations, the authors must create a set of basic modules that both obey delay-insensitive properties and give enough functionality to implement a wide class of circuits. 

Note that since elements are included which have more than one output (most notably the toggle element), the argument on the limitations of delayinsensitive circuits with only single-output gates does not apply. 

Although the authors have seen that module-based systems can ease manual design, their main power is seen when they are coupled with a high-level language and automatic translation software. 

This is because datapath elements tend to have multiple input signals changing in parallel, and the fundamental-mode assumption would unreasonably restrict the parallelism in datapath elements. 

Since this change diagram is 2-bounded, the authors allow for up to two debts on each arc by marking each I-Net place corresponding to a weak precedence arc with two tokens initially. 

Since this is only a separation of 2 gate delays between inputs (for transitions 4→5), this transition is 3 times faster than the required 6 gate delays. 

These correctness constraints include those previously mentioned, as well as connectedness (all transitions connected by some series of arcs) and switchover29correctness (transitions on a signal must alternate between “+” and “-” in all possible executions). 

While Chu’s algorithm handles STG/ICs and some STG/NCs, its utility for automatic synthesis is limited by the large number of restrictions. 

some approaches, including the communicating processes compilation method described later, restrict isochronic forks to small localized areas, avoiding the chip boundary problem. 

As the above argument against most 2-input gates can easily be generalized to any n-input gate, where n≥2, delay-insensitive circuits with only single output gates can use only C-elements, single input gates (buffers and inverters), and wires. 

This assumption also has the desirable effect of separating circuit correctness13from specific delays, so that delay optimization via transistor sizing and similar improvements can be applied without affecting circuit correctness.