How do the authors preserve the change diagram firing semantics?

To preserve the change diagram firing semantics, the I-Net transitions require one of the places to have at least three tokens (hence the three arcs from each place to one of the30transitions), and when fired the transition removes a net of one token from each place.

What is the main disadvantage of micropipelines?

While straight-line pipelines are easy to construct, pipelines with feedback or even more general dataflow can require complex control circuits, circuits micropipelines gives little help in synthesizing.

What is the main argument against delay-insensitive circuits?

In order to make delay-insensitive circuit design practical for general computations, the authors must create a set of basic modules that both obey delay-insensitive properties and give enough functionality to implement a wide class of circuits.

What is the argument on the limitations of delayinsensitive circuits?

Note that since elements are included which have more than one output (most notably the toggle element), the argument on the limitations of delayinsensitive circuits with only single-output gates does not apply.

What is the main power of module-based systems?

Although the authors have seen that module-based systems can ease manual design, their main power is seen when they are coupled with a high-level language and automatic translation software.

Why is the fundamental-mode assumption unreasonably restrictive?

This is because datapath elements tend to have multiple input signals changing in parallel, and the fundamental-mode assumption would unreasonably restrict the parallelism in datapath elements.

How many tokens can be used to mark a weak arc?

Since this change diagram is 2-bounded, the authors allow for up to two debts on each arc by marking each I-Net place corresponding to a weak precedence arc with two tokens initially.

How many gate delays are required to make a circuit stable?

Since this is only a separation of 2 gate delays between inputs (for transitions 4→5), this transition is 3 times faster than the required 6 gate delays.

What are the correctness constraints for a cyclic CD?

These correctness constraints include those previously mentioned, as well as connectedness (all transitions connected by some series of arcs) and switchover29correctness (transitions on a signal must alternate between “+” and “-” in all possible executions).

What is the limitation of Chu’s algorithm?

While Chu’s algorithm handles STG/ICs and some STG/NCs, its utility for automatic synthesis is limited by the large number of restrictions.

What is the way to avoid the chip boundary problem?

some approaches, including the communicating processes compilation method described later, restrict isochronic forks to small localized areas, avoiding the chip boundary problem.

What is the argument against delay-insensitive circuits?

As the above argument against most 2-input gates can easily be generalized to any n-input gate, where n≥2, delay-insensitive circuits with only single output gates can use only C-elements, single input gates (buffers and inverters), and wires.

What is the effect of separating circuit correctness?

This assumption also has the desirable effect of separating circuit correctness13from specific delays, so that delay optimization via transistor sizing and similar improvements can be applied without affecting circuit correctness.

(Open Access) Asynchronous design methodologies: an overview (1995) | Scott Hauck

Q: What have the authors contributed in "Asynchronous design methodologies: an overview" ?

The authors examine the benefits and problems inherent in asynchronous computations, and in some of the more notable design methodologies.

Proceedings of the IEEE, Vol. 83, No. 1, pp. 69-93, January, 1995.

Asynchronous Design Methodologies: An Overview

Scott Hauck

Department of Computer Science and Engineering

University of Washington

Seattle, WA 98195

Abstract

Asynchronous design has been an active area of research since at least the mid 1950's, but has

yet to achieve widespread use. We examine the benefits and problems inherent in asynchronous

computations, and in some of the more notable design methodologies. These include Huffman

asynchronous circuits, burst-mode circuits, micropipelines, template-based and trace theory-based

delay-insensitive circuits, signal transition graphs, change diagrams, and compilation-based quasi-

delay-insensitive circuits.

1. Introduction

Much of today’s logic design is based on two major assumptions: all signals are binary, and time is discrete.

Both of these assumptions are made in order to simplify logic design. By assuming binary values on signals,

simple Boolean logic can be used to describe and manipulate logic constructs. By assuming time is discrete, hazards

and feedback can largely be ignored. However, as with many simplifying assumptions, a system that can operate

without these assumptions has the potential to generate better results.

Asynchronous circuits keep the assumption that signals are binary, but remove the assumption that time is

discrete. This has several possible benefits:

No clock skew - Clock skew is the difference in arrival times of the clock signal at different parts of the circuit.

Since asynchronous circuits by definition have no globally distributed clock, there is no need to worry

about clock skew. In contrast, synchronous systems often slow down their circuits to accommodate the

skew. As feature sizes decrease, clock skew becomes a much greater concern.

Lower power - Standard synchronous circuits have to toggle clock lines, and possibly precharge and discharge

signals, in portions of a circuit unused in the current computation. For example, even though a floating-

point unit on a processor might not be used in a given instruction stream, the unit still must be operated by

the clock. Although asynchronous circuits often require more transitions on the computation path than

synchronous circuits, they generally have transitions only in areas involved in the current computation.

Note that there are techniques being used in synchronous designs to address this issue as well.

Average-case instead of worst-case performance - Synchronous circuits must wait until all possible

computations have completed before latching the results, yielding worst-case performance. Many

asynchronous systems sense when a computation has completed, allowing them to exhibit average-case

performance. For circuits such as ripple-carry adders where the worst-case delay is significantly worse than

the average-case delay, this can result in a substantial savings.

Easing of global timing issues - In systems such as a synchronous microprocessor, the system clock, and thus

system performance, is dictated by the slowest (critical) path. Thus, most portions of a circuit must be

carefully optimized to achieve the highest clock rate, including rarely used portions of the system. Since

many asynchronous systems operate at the speed of the circuit path currently in operation, rarely used

portions of the circuit can be left unoptimized without adversely affecting system performance.

Better technology migration potential - Integrated circuits will often be implemented in several different

technologies during their lifetime. Early systems may be implemented with gate arrays, while later

production runs may migrate to semi-custom or custom ICs. Greater performance for synchronous systems

can often only be achieved by migrating all system components to a new technology, since again the

overall system performance is based on the longest path. In many asynchronous systems, migration of

only the more critical system components can improve system performance on average, since performance

is dependent on only the currently active path. Also, since many asynchronous systems sense computation

completion, components with different delays may often be substituted into a system without altering other

elements or structures.

Automatic adaptation to physical properties - The delay through a circuit can change with variations in

fabrication, temperature, and power-supply voltage. Synchronous circuits must assume that the worst

possible combination of factors is present and clock the system accordingly. Many asynchronous circuits

sense computation completion, and will run as quickly as the current physical properties allow.

Robust mutual exclusion and external input handling - Elements that guarantee correct mutual exclusion of

independent signals and synchronization of external signals to a clock are subject to metastability [1]. A

metastable state is an unstable equilibrium state, such as a pair of cross-coupled CMOS inverters at 2.5V,

which a system can remain in for an unbounded amount of time [2]. Synchronous circuits require all

elements to exhibit bounded response time. Thus, there is some chance that mutual exclusion circuits will

fail in a synchronous system. Most asynchronous systems can wait an arbitrarily long time for such an

element to complete, allowing robust mutual exclusion. Also, since there is no clock with which signals

must be synchronized, asynchronous circuits more gracefully accommodate inputs from the outside world,

which are by nature asynchronous.

With all of the potential advantages of asynchronous circuits, one might wonder why synchronous systems

predominate. The reason is that asynchronous circuits have several problems as well. Primarily, asynchronous

circuits are more difficult to design in an ad hoc fashion than synchronous circuits. In a synchronous system, a

designer can simply define the combinational logic necessary to compute the given functions, and surround it with

latches. By setting the clock rate to a long enough period, all worries about hazards (undesired signal transitions) and

the dynamic state of the circuit are removed. In contrast, designers of asynchronous systems must pay a great deal of

attention to the dynamic state of the circuit. Hazards must also be removed from the circuit, or not introduced in the

first place, to avoid incorrect results. The ordering of operations, which was fixed by the placement of latches in a

synchronous system, must be carefully ensured by the asynchronous control logic. For complex systems, these

issues become too difficult to handle by hand. Unfortunately, asynchronous circuits in general cannot leverage off of

existing CAD tools and implementation alternatives for synchronous systems. For example, some asynchronous

methodologies allow only algebraic manipulations (associative, commutative, and DeMorgan's Law) for logic

decomposition, and many do not even allow these. Placement, routing, partitioning, logic synthesis, and most other

CAD tools either need modifications for asynchronous circuits, or are not applicable at all.

Finally, even though most of the advantages of asynchronous circuits are towards higher performance, it isn't

clear that asynchronous circuits are actually any faster in practice. Asynchronous circuits generally require extra time

due to their signaling policies, thus increasing average-case delay. Whether this cost is greater or less than the

benefits listed previously is unclear, and more research in this area is necessary.

Even with all of the problems listed above, asynchronous design is an important research area. Regardless of

how successful synchronous systems are, there will always be a need for asynchronous systems. Asynchronous

logic may be used simply for the interfacing of a synchronous system to its environment and other synchronous

systems, or possibly for more complete applications. Also, although ad hoc design of asynchronous systems is

impractical, there are several methodologies and CAD algorithms developed specifically for asynchronous design.

Several of the main approaches are profiled in this paper. Note that we do not catalog all methodologies ever

developed, nor do we explore every subtlety of the methodologies included. Attempting either of these tasks would

fill hundreds of pages, obscuring the significant issues involved. Instead, we discuss the essential aspects of some

of the more well-known asynchronous design systems. This will hopefully provide the reader a solid framework in

which to further pursue the topics of interest. We likewise do not cover many of the related areas, such as

verification and testing, which are very important to asynchronous design, yet too complex to be handled adequately

here. Interested readers are directed elsewhere for details on asynchronous verification [3] and testing [4].

Asynchronous design methodologies can most easily be categorized by the timing models they assume, and this

paper is organized along these lines. Section 2 covers systems using bounded-delay models, including fundamental-

mode Huffman circuits, extensions of these circuits to non-fundamental mode, and burst-mode circuits. Section 3

focuses on micropipelines. Section 4 details delay-insensitive circuits, including template or module based systems,

and trace Theory. Section 5 combines speed-independent and quasi-delay-insensitive circuits, including signal

transition graphs, change diagrams, and communicating processes compilation. Finally, we conclude in Section 6

with a general comparison of the methods discussed.

2. Bounded-Delay Models

The most obvious model to use for asynchronous circuits is the same model used for synchronous circuits.

Specifically, it is assumed that the delay in all circuit elements and wires is known, or at least bounded. Circuits

designed with this model (usually coupled with the fundamental mode assumption discussed below) are generally

referred to as Huffman circuits, after D. A. Huffman, who developed many of the early concepts of these circuits.

2.1 Fundamental Mode Huffman Circuits

In this model, circuits are designed in much the same way as synchronous circuits. The circuit to be synthesized

is usually expressed as a flow-table [5], a form similar to a truth-table. As shown in Figure 1, a flow-table has a

row for each internal state, and a column for each combination of inputs. The entries in each location indicate the

next state entered and outputs generated when the column’s input combination is seen while in the row’s state.

States in circles correspond to stable states, states where the next state is identical to the current state. Normally it

is assumed that each unstable state leads directly to a stable state, with at most one transition occurring on each

output variable. Similar to finite state machine synthesis in synchronous systems, state reduction and state encoding

is performed on the flow-table, and Karnaugh maps generated for each of the resulting signals.

11/0 10/1

00/0

1 32

00 11 01 10

3 , 1

3 , 12 , 0

2 , 0

1 , 0

Figure 1. Example of a Flow-table (left), and the corresponding state machine (right).

There are several special concerns when implementing state machines asynchronously that do not occur in

synchronous systems. First, since there is no clock to synchronize input arrivals, the system must behave properly

in any intermediate states caused by multiple input changes. For example in the flow-table of Figure 1, the system

will not move directly from input “00” to “11”, but will briefly pass through “01” or “10”. Thus, for state 1 we

must add entries for both inputs “01” and “10” which keep the machine in state 1.

1101

00 01 11 10

0 001

Figure 2. Karnaugh map (left) and implementation (right) of a circuit with hazards.

We must also deal with hazard removal. Suppose we are trying to implement the Karnaugh map in Figure 2

and use the sum-of-products form shown. Further assume that all gates (including the inverter) have a gate delay of

1 unit, and the current state is (A, B, C) = (1, 1, 1). In this state AB is true, and the output is 1. If we now set B to

0, we will move to state (1, 0, 1), and the output should remain 1. However, because of the delay in the inverter,

the top AND gate will become false before the lower AND becomes true, and a 0 will propagate to the output. This

momentary glitch on the output is known as a static-1 hazard, and must be removed for reliable circuit operation. A

static-0 hazard is similar, with a value meant to remain stable at 0 instead momentarily becoming 1. A dynamic

hazard is the case where a signal that is meant to make a single transition (0→1 or 1→0) instead makes three or

more transitions (such as 0→1→0→1, 1→0→1→0).

All static and dynamic hazards due to a single input change can be eliminated by adding to a sum-of-products

circuit that has no useless products (i.e. no AND term contains both a variable and its complement) additional cubes

covering all adjacent 1’s in a Karnaugh map ([5] pp. 121-127). In the above example, adding the cube AC would

remove the static-1 hazard demonstrated earlier, since while the circuit transitioned from state (1, 1, 1) to (1, 0, 1)

both A and C remain true, and the AC cube would stay true. Unfortunately, this procedure cannot guarantee correct

operation when multiple inputs are allowed to change simultaneously. Referring to Figure 2, assume that we are in

state (1, 1, 0), and we move to state (1, 0, 1) by changing both B and C. If the delays in the circuit are slightly

greater for input C, the circuit will momentarily be in state (1, 0, 0), and the output will go to 0 (a dynamic hazard).

We could try to alter the circuit delays to make sure that the circuit goes through state (1, 1, 1) instead, but what if

this state had also been set to 0 in the original Karnaugh map? In this case, no intermediate state will maintain the

correct output, and an unavoidable hazard is present. The solution generally adopted is to make a policy decision that

only one input to a circuit is allowed to change at a time.

An important point needs to be made about the sum-of-products form. As the number of inputs increases, the

number of inputs to the AND and OR gates increases. Since most technologies either restrict the number of inputs

to a gate, or penalize large fanin gates by long delays, it is important to have some method for decomposing large

gates. As proven by Unger ([5] pp. 130-134), many applications of algebraic transformations, including the

associative, distributive, and DeMorgan’s laws, do not introduce any new hazards in bounded-delay circuits. Thus, a

sum-of-products form can be factored into smaller gates via these transformations. Note that other transformations,

such as the transformation from F=AB+BC+B’C to F=AB+B’C, can introduce hazards, in this case because it

removes the cube that we added above for hazard-free operation. This ability to use some logic transformations is an

important advantage of this methodology, for many of the other methodologies do not allow these types of

operations.

In order to extend our combinational circuit methodology to sequential circuits, we use a model similar to that

used for synchronous circuits (Figure 3). Since we made the restriction that only one input to the combinational

logic can change at a time, this forces several requirements on our sequential circuit. First, we must make sure that

the combinational logic has settled in response to a new input before the present-state entries change. This is done

by placing delay elements on the feedback lines. Also, the same restriction dictates that only one next state bit can

change at a time. Encodings can be made that allow a single transition of state bits for all state transitions, but

require multiple state encodings for each state ([5] pp. 76-79), complicating the combinational logic. One-hot

encodings, encodings where each state q

has a single associated state bit y

true and all other bits false, require two

transitions, but simplify the associated logic. State transitioning from q

to q

is accomplished by first setting y

and then resetting y

. The final requirement is that the next external input transition cannot occur until the entire

system settles to a stable state (this final restriction is what characterizes a fundamental-mode circuit). For a one-hot

encoding, this means that a new input must be delayed long enough for three trips through the combinational logic

and two trips through the delay elements.

Inputs

Outputs

Present

State

Combinational

logic

∆

Delay Elements

Figure 3. Huffman sequential circuit structure ([6] pg. 157).

2.2 Extending Huffman Circuits to Non-Fundamental Mode

The fundamental-mode assumption, while making logic design easy, greatly increases cycle time. Therefore

there could be considerable gains from removing this restriction. One method is quite simple, and can be seen by

referring back to the original argument for the fundamental mode. The issue was that when multiple inputs change,

and no single cube covers the starting and ending point of a transition, there is the possibility of a hazard. However,

if a single cube covers an entire transition, then there is no need for the fundamental mode restriction, since that cube

will ensure the output stays a 1 at all times. So, for the function A+F(B, C, D), when A is true, inputs B, C, and

D can change at will. However in general input A cannot change in parallel with inputs B, C, and D, because when

A goes from true to false, F(B, C, D) may be going from false to true, potentially causing a hazard. Therefore, this

observation cannot completely eliminate the fundamental mode assumption.

Another method, described by Hollaar [7], uses detailed knowledge of the implementation strategy to allow new

transitions to arrive earlier than the fundamental-mode assumption allows. As shown in Figure 4, Hollaar builds a

one-hot encoded asynchronous state machine with a set-reset flip-flop for each state bit (for example, NAND gates 5

& 6 form a set-reset flip-flop for state K). The set input is driven when the previous state’s bit and the transition

function is true (i.e. for K, when we are in state J, and transition function S is true), and is reset when the following

state is true (hence the connection from gate 9 to gate 6). This basic scheme is expanded beyond simple straight-line

state-machines, and allows parallel execution (i.e. FORK and JOIN) in asynchronous state machines.

Asynchronous design methodologies: an overview

Figures

Citations

A survey of research and practices of Network-on-chip

Razor: a low-power pipeline based on circuit-level timing speculation

Principles of Asynchronous Circuit Design: A Systems Perspective

Principles of Asynchronous Circuit Design

Theory of latency-insensitive design

References

Petri nets: Properties, analysis and applications

Introduction to VLSI systems

Digital Systems Testing and Testable Design

Model for Delay Faults Based Upon Paths

Trace Theory for Automatic Hierarchical Verification of Speed-Independent Circuits

Related Papers (5)

Principles of Asynchronous Circuit Design: A Systems Perspective

Synthesis of self-timed vlsi circuits from graph-theoretic specifications

Asynchronous sequential switching circuits

The limitations to delay-insensitivity in asynchronous circuits

Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronous controllers

Frequently Asked Questions (14)

Q1. What have the authors contributed in "Asynchronous design methodologies: an overview" ?

Q2. How do the authors preserve the change diagram firing semantics?

Q3. What is the main disadvantage of micropipelines?

Q4. What is the main argument against delay-insensitive circuits?

Q5. What is the argument on the limitations of delayinsensitive circuits?

Q6. What is the main power of module-based systems?

Q7. Why is the fundamental-mode assumption unreasonably restrictive?

Q8. How many tokens can be used to mark a weak arc?

Q9. How many gate delays are required to make a circuit stable?

Q10. What are the correctness constraints for a cyclic CD?

Q11. What is the limitation of Chu’s algorithm?

Q12. What is the way to avoid the chip boundary problem?

Q13. What is the argument against delay-insensitive circuits?

Q14. What is the effect of separating circuit correctness?