scispace - formally typeset
Open AccessJournal ArticleDOI

A Defect-Tolerant Computer Architecture: Opportunities for Nanotechnology

Reads0
Chats0
TLDR
The defect-tolerant architecture of Teramac, which incorporates a high communication bandwith that enables it to easily route around defects, has significant implications for any future nanometer-scale computational paradigm.
Abstract
Teramac is a massively parallel experimental computer built at Hewlett-Packard Laboratories to investigate a wide range of different computational architectures. This machine contains about 220,000 hardware defects, any one of which could prove fatal to a conventional computer, and yet it operated 100 times faster than a high-end single-processor workstation for some of its configurations. The defect-tolerant architecture of Teramac, which incorporates a high communication bandwith that enables it to easily route around defects, has significant implications for any future nanometer-scale computational paradigm. It may be feasible to chemically synthesize individual electronic components with less than a 100 percent yield, assemble them into systems with appreciable uncertainty in their connectivity, and still create a powerful and reliable data communications network. Future nanoscale computers may consist of extremely large-configuration memories that are programmed for specific tasks by a tutor that locates and tags the defects in the system.

read more

Content maybe subject to copyright    Report

A Defect-Tolerant Computer Architecture:
Opportunities for Nanotechnology
James R. Heath, Philip J. Kuekes, Gregory S. Snider, R. Stanley Williams
Teramac is a massively parallel experimental computer built at Hewlett-Packard Lab-
oratories to investigate a wide range of different computational architectures. This
machine contains about 220,000 hardware defects, any one of which could prove fatal
to a conventional computer, and yet it operated 100 times faster than a high-end
single-processor workstation for some of its configurations. The defect-tolerant archi-
tecture of Teramac, which incorporates a high communication bandwith that enables it
to easily route around defects, has significant implications for any future nanometer-
scale computational paradigm. It may be feasible to chemically synthesize individual
electronic components with less than a 100 percent yield, assemble them into systems
with appreciable uncertainty in their connectivity, and still create a powerful and reliable
data communications network. Future nanoscale computers may consist of extremely
large-configuration memories that are programmed for specific tasks by a tutor that
locates and tags the defects in the system.
The last 25 years have witnessed astonish-
ing advances in the fields of microelectron-
ics and computation. The first integrated
circuit microprocessor, the Intel 4004, was
able to perform roughly 5000 binary-coded
decimal additions per second with a total
power consumption of about 10 W (;500
additions per Joule) in 1971, whereas mod-
ern microprocessors can perform ;3 3 10
6
additions per Joule. The 1997 National
Technology Roadmap for Semiconductors
(1) calls for an additional factor of 10
3
increase in the computational efficiency by
the year 2012. If this goal is attained, then
performance of the silicon-based integrated
circuit will have improved by nearly seven
orders of magnitude in 40 years, using en-
ergy consumed per operation as a metric,
with a single manufacturing paradigm. Al-
though complementary metal oxide semi-
conductor (CMOS) technology is predicted
by many researchers to run into significant
physical limitations shortly after 2010 (2),
the energy cost of an addition operation
will still be nowhere near any fundamental
physical limit. A crude estimate of the en-
ergy required to add two 10-digit decimal
numbers, based on a thermodynamic anal-
ysis of nonreversible Boolean logic steps (3,
4)is;100zkzTzln(2), which implies that
3 3 10
18
additions per Joule can be per-
formed at room temperature without any
reversible steps. Thus, there are potentially
eight orders of magnitude in computational
energy efficiency in a nonreversible ma-
chine available beyond the limits of CMOS
technology. To achieve these further ad-
vances will require a totally different type of
computational machinery, but knowing
that such a system is in principle possible
provides a strong incentive to hunt for it.
The requirement for inventing a new tech-
nology paradigm has created exciting re-
search opportunities for physical and bio-
logical scientists as well as for electrical
engineers. Indeed, much of the current in-
terest in interdisciplinary research in areas
such as nanofabrication, self-assembly, and
molecular electronics is being driven by this
search for a new archetype computer.
A number of alternatives to standard
Si-based CMOS devices have been pro-
posed, including single-electron transistors
(5), quantum cellular automata (6, 7), neu-
ral networks (8, 9), and molecular logic
devices (10, 11). A common theme that
underlies many of these schemes is the push
to fabricate logic devices on the nanometer-
length scale. Such dimensions are more
commonly associated with molecules than
integrated circuits, and it is not surprising
that chemically assembled (or bottom-up)
configurations, rather than artificially
drawn (or top-down) structures created
with lithography, are expected to play an
increasingly important role in the fabrica-
tion of electronic devices and circuits. We
define chemical assembly as any manufac-
turing process whereby various electronic
components, such as wires, switches, and
memory elements, are chemically synthe-
sized (a process often called “self-assembly”)
and then chemically connected together
(by a process of “self-ordering”) to form a
working computer or other electronic cir-
cuit (12).
Several problems will arise when such an
assembly is used for some computational
task. Some fraction of the discrete devices
will not be operational because of the sta-
tistical yields of the chemical syntheses used
to make them, but it will not be feasible to
test them all to select out the bad ones. In
addition, the system will suffer an inevitable
and possibly large amount of uncertainty in
the connectivity of the devices. Given
these problems, how does one communicate
with the system from the outside world in a
reliable and predictable way and be assured
that it is performing error-free computa-
tions? Furthermore, because one goal of
nanoscale technology is to provide a huge
number (for example, a mole) of devices for
a system, how does one impose an organi-
zation that allows the entire ensemble to
operate efficiently? A self-ordering process
is only likely to produce fairly regular struc-
tures with low information content, but real
computers built today have great com-
plexity imposed by human designers. A
chemically assembled machine must be able
to reproduce the arbitrary complexity de-
manded for general-purpose computation.
In engineering, the answer to low but
nonzero failure rates is to design redundan-
cy into the system. The history of integrat-
ed-circuit technology has been that wiring
and interconnects have become increasing-
ly more expensive with respect to active
devices. Should nanotechnology give us ex-
traordinarily “cheap” but occasionally de-
fective devices, then nearly all expense will
be shifted to the wires and connections.
Recent research at Hewlett-Packard (HP)
Laboratories with an experimental comput-
er, code-named “Teramac”, has illuminated
several of these issues. Although Teramac
was constructed with conventional silicon
integrated-circuit technology, many of the
problems associated with this machine are
similar to the challenges that are faced
by scientists exploring nanoscale paradigms
for electronic computation. In order to keep
the construction costs as low as possible,
the builders of Teramac intentionally used
components that were cheap but defective,
and inexpensive but error-prone technolo-
gies were used to connect all the compo-
nents. Because of the physical architecture
chosen to implement powerful software al-
gorithms, (13) Teramac could be config-
ured into a variety of extremely capable
parallel computers, even in the presence of
all the defects. Thus, we define defect tol-
J. R. Heath is in the Department of Chemistry and Bio-
chemistry, University of California at Los Angeles, Los
Angeles, CA 90095–1569, USA. P. J. Kuekes, G. S.
Snider, and R. S. Williams are at Hewlett-Packard Labo-
ratories, Palo Alto, CA 94304–1392, USA.
SCIENCE
z
VOL. 280
z
12 JUNE 1998
z
www.sciencemag.org1716

erance as the capability of a circuit to op-
erate as desired without physically repairing
or removing random mistakes incorporated
into the system during the manufacturing
process (14). The major surprises of the
Teramac project were that the compiling
time for new logical configurations was lin-
ear with respect to the number of resources
used and the execution time for many algo-
rithms was surprisingly fast, given the large
number of defects in the machine. The
architecture of Teramac and its implemen-
tation of defect tolerance are relevant to
possible future chemically assembled cir-
cuits (15).
Custom Configurable
Architecture
The name “Teramac” takes “Tera” from
10
12
operations per second, which is
achieved by 10
6
logic elements [or gates (4)]
operating at 10
6
Hz, and “mac” from “mul-
tiple architecture computer.” It is a large
Custom Configurable Computer (CCC)
(16) that was designed for architectural ex-
ploration. The key property of reconfigu-
rable architectures, as opposed to conven-
tional processors, is that they can be con-
figured by means of a software instruction
set into a large variety of very different
digital systems. Teramac contains 864 iden-
tical chips designed at HP labs and built
specifically for Teramac. These chips, called
field programmable gate arrays (FPGAs),
contain a large number of very simple com-
puting elements and a flexible communica-
tions network for routing the signals among
the computing elements. Each computing
element performs a six-input, one-output
combinatorial logic function. The logic
function is not performed with active logic
elements, but rather with memory—that is,
the “answers” to the logical functions (the
truth tables) are stored in 64-bit Look-Up
Tables (LUTs). Each LUT holds the equiv-
alent of 10 logic gates, and there are a total
of 65,536 LUTs in the machine. Thus, a
total of 4 megabits (65,536 3 64 bits) of
configuration memory is used to define the
logic functions of all the computing ele-
ments. The system operates at a clock rate
of 1 MHz, so that each computing element
computes a new 1-bit output datum each
microsecond, with all the 65,536 LUTs op-
erating in parallel. The 4 megabits of con-
figuration memory are drawn from about
30% (256) of the FPGAs. The bulk of the
FPGAs are used only for communication
and signal routing. It was significantly less
expensive to design and manufacture a sin-
gle FPGA, and then ignore the LUTs
present on the chips that were used only for
communication, than it would have been to
produce two special-purpose chips.
In a typical microprocessor, a description
of what the chip should do is first devel-
oped, and then the hardware is constructed
on the basis of that logic. The general idea
behind a CCC is conceptually the opposite.
A generic set of wires, switches, and gates
are fabricated in the factory, and then the
resources are configured in the field by set-
ting switches linking them together to ob-
tain the desired functionality. For Teramac,
these components are in the FPGAs, and
they are the building blocks from which
almost any digital machine can be con-
structed. The architecture of a computer is
determined by the logical graph of wires
connecting the gates. In FPGAs, software
(field)–addressable switches determine the
wiring relationships among the compo-
nents. An FPGA can be logically thought
of as consisting of two planes. In one plane
are address lines that control a large con-
figuration memory determining what
switches in a crossbar are open and closed,
and thus what functions are carried out by
the LUTs. The other plane contains a sep-
arate set of application data lines and the
actual LUTs that are connected into the
desired configuration by the switches. A
drawing of a crossbar is shown in Fig. 1A.
The use of FPGAs allows one to load a
desired custom architecture onto Teramac
through an automated software routine. To
do this, Teramac uses an ex-
treme version of what is known as
the very long instruction word (VLIW) ar-
chitecture. This indefinitely (or insanely)
long instruction word (ILIW) is essentially
the translated logical description of the cus-
tom computer that is desired. In a typical
nonparallel machine, 32-bit instructions are
issued sequentially. Machines that use a
VLIW achieve instruction level parallelism
in a processor by having the compiler issue
single instructions with perhaps several
hundred bits. Teramac uses a 300-megabit
word, essentially as a single instruction that
sets every configuration bit in every FPGA
(most of which are in the crossbars to be
discussed below). Such an instruction is
only rarely downloaded, and the process is
relatively expensive. However, this single
instruction is powerful enough to reconfig-
ure the entire machine into the desired
custom computer.
Mapping a particular logical machine
onto the physical resources of a CCC could
easily be intractable, especially if there are a
very large number of switches and LUTs but
only a few viable configurations. This prob-
lem is similar to that of the traveling sales-
man who is forced to pick the shortest
possible route among a large number of
cities. The physical architecture of Teramac
was designed to ensure that a very large
Fig. 1. Graphical presentation of concepts related
to the logical architecture of Teramac. (A) The
crossbar represents the heart of the configurable
wiring network that makes up Teramac. (Inset)A
configurable bit (a memory element) that controls
a switch, which required six transistors to physi-
cally implement. The bit is located and configured
by applying a voltage across the address lines,
and its status is read by means of the data lines
(they are either shorted or open). The crossbar
provides not only a means of mapping many con-
figuration bits together into some desired se-
quence, but it also represents a highly redundant
wiring network. Between any two configuration
bits, there are a large number of pathways, which
implies a high communication bandwidth within a
given crossbar. Logically, this may be represented
as a “fat tree.” Such a “fat tree” is shown in (B),
where it is contrasted with a standard tree archi-
tecture. Both trees appear the same from the front
view, but from an oblique view, the fat tree has a
bandwidth that the standard tree does not. Color-
coded dots and a dashed box are included to
show the correspondence between a given level
of the fat tree and the crossbar in (A). Several
important issues are highlighted in this represen-
tation of the crossbar architecture. At every junc-
tion of the crossbar is a switch. Start at any point
in the crossbar, and it is apparent that, by setting
the appropriate switches, there are many possible
pathways to any other junction. This degeneracy
of pathways lends the crossbar architecture a high threshold for defect tolerance. It is also apparent from
the drawing that 2n
1/2
address lines are needed to address n switches, and that wires dominate all the
drawings.
www.sciencemag.org
z
SCIENCE
z
VOL. 280
z
12 JUNE 1998 1717

number of instruction words (switch set-
tings) provide satisfactory configurations for
any desired computer design. It may still be
essentially impossible to find the optimum
mapping, but as long as there are many
possible solutions, it should be relatively
easy to find a reasonable one (just as trav-
eling salesmen do when planning their
trips). Two concepts are important for un-
derstanding how Teramac was designed to
provide a large number of satisfactory phys-
ical realizations for any logical configura-
tion: the “fat tree” (17), which is a special
case of a Banyan network (18), and “Rent’s
rule” (19).
The power of the fat-tree architecture
can be appreciated by first considering the
regular family tree (Fig. 1B). A parent pro-
duces two children, each of which in turn
produces four more children, so that the
width of the tree expands in the direction of
younger generations. Each child is connect-
ed to one parent, so that the communica-
tion bandwidth of the tree remains con-
stant, irrespective of generation. All the
children of any parent are equally easy to
communicate with. This is the advantage of
a treelike architecture—at a given level,
devices may be arranged in any arbitrary
order because all arrangements are equally
efficient. However, same-branch children
cannot communicate directly with each
other but must pass messages through their
parent. Furthermore, if two same-branch
children want to speak to a grandparent,
then communication must flow through a
single node (the parent), and so the chil-
dren must communicate in series, rather
than in parallel. Even worse, if the line of
communication between a parent and
grandparent is broken, then communica-
tion to a whole branch of the family tree is
cut off. In a fat tree all of these problems are
avoided. Each single-parent node is re-
placed by several nodes, and communica-
tions between levels of the tree occur
through crossbars that connect multiple
nodes at each level (and can communicate
with other levels as well). Connectivity
between the various levels is determined by
the amount of bandwidth necessary for lo-
cal (same level) or long distance (level-to-
level) communication. The fat tree shown
in Fig. 1B has been constructed with a
higher bandwidth at the lowest level, and
less bandwidth at the next level up. Large
communication bandwidth is critical for
both parallel computation and for defect
tolerance. If one of the wires or nodes in the
fat tree were blocked or damaged, commu-
nication among the remaining elements
would only be slightly affected.
Rent’s rule is an empirically derived
guideline that may be used to determine the
minimum communication bandwidth that
should be included in a fat-tree architecture
(20). Rent’s rule states that for the realistic
circuits people actually build (not random
graphs), the number of wires coming out of
a particular region of the circuit should
scale as a power of the number of devices
(n) in that region, ranging from n
1/2
to n
2/3
.
These are the exponents that one would
intuitively expect if designers were con-
strained to build in two-dimensional (n
1/2
)
or three-dimensional (n
2/3
) space and be as
efficient as possible. For the crossbars of
Teramac, exponents ranging between 2/3
and 1 were used, and thus significantly
more bandwidth than required by Rent’s
rules was incorporated into the fat tree.
This bandwidth is much higher than is
normally used for a standard architecture
computer or even a CCC, but it provides a
great deal of extra communication capacity
and redundancy in the system. This extra
capacity is critical for the operation of a
defect-tolerant machine, and will be revis-
ited below. Given the framework of the fat-
tree architecture and the communication
guidelines suggested by Rent’s rule, how are
the available resources assembled to create
a computer? At the top of the Teramac fat
tree are the FPGAs that communicate glo-
bally, and at the bottom are the FPGAs that
are used as logic elements. Everything in-
between is determined by these two ex-
trema. To see this, it is instructive to work
up through the fat tree of Teramac, begin-
ning at the LUT level (Fig. 2).
As mentioned above, logic operations
are performed with the LUTs, which are
essentially memories with six-bit addresses.
Any Boolean function with six input vari-
ables can be stored in such a truth table.
The function is evaluated simply by looking
up the answer. In principle, any of the
LUTs may be wired to any other LUT to
execute arbitrarily complex instructions.
For a given configuration of Teramac,
LUTs may comprise less than 10% of the
total silicon area used (21). The rest of
Teramac’s active resources are devoted to
communication. The detailed map of Tera-
mac’s fat-tree hierarchy in Fig. 2 shows that
most of the resources are devoted to com-
munication among the various LUTs, be-
tween adjacent levels of the computational
hierarchy, and between the computer and
Fig. 2. The logical map of Teramac.
An example of a six-input logic ele-
ment is shown (bottom). In Tera-
mac, logic is performed with mem-
ory, rather than with a central pro-
cessing unit (CPU), and the results
of various logic operations are
stored in a look-up table (LUT ). Six-
teen of these LUTs are connected
to each other through a crossbar
( X-bar) to make up a hextant. The
number of wires leaving the LUTs is
equal to the number leaving the
hextant, and this represents a
Rent’s rule exponent of 1. Sixteen
hextants communicate through
four crossbars to make a logic chip
(LC), with a Rent’s rule exponent of
;2/3. This exponent can be calcu-
lated as follows: Each LC contains
256 (516 3 16) LUTs with 7 wires
each, for a total of 7 3 256 5 1792
effective devices. The number of
wires leaving the LC is 336, which is
larger than 1792
2
y
3
. Each multichip
module (MCM) contains eight LCs
that communicate with each other
and with other MCMs through 12
FPGAs used as routing chips (RCs).
Because an MCM contains eight
LCs worth of wires, and four RCs
worth of wires leave each MCM, this also represents a Rent’s rule exponent of
2
3
(4 5 8
2
y
3
). The next level
is the printed circuit board (PCB), which consists of 4 MCMs communicating through 28 crossbars
(which are physically contained in the 7 FPGAs per MCM that have not yet been used). The PCB level
is also characterized by a Rent’s rule exponent of 2/3. Finally, all eight PCBs are wired together with
ribbon cable connections to make Teramac. A large number of remaining crossbars on the MCMs are
used for communication (I/O) between Teramac and the outside world. To configure Teramac, a very
large (300 megabit) instruction word is downloaded through the I/O connections onto this logical graph.
The instruction word sets the various configuration bits and LUTs so that Teramac becomes a custom
computer.
SCIENCE
z
VOL. 280
z
12 JUNE 1998
z
www.sciencemag.org1718

the outside world. As an illustration of the
concept, Fig. 3 shows how a simple calcu-
lation can be performed with registers and
LUTs connected by a fat-tree network.
Teramac has been successfully config-
ured into a number of parallel architectures
and used for extremely demanding compu-
tations. In one configuration, Teramac was
specifically designed to translate magnetic
resonance imaging data into a three-dimen-
sional map of arteries in the human brain.
In another use, it was configured as a vol-
ume visualization engine referred to as the
Cube 4 architecture (22). In one particular-
ly efficient version of Teramac (23), it was
actually operating at 10
12
gate operations
per second. The first configuration that was
loaded onto Teramac was that of a machine
that could test itself. It is this configuration
that located and cataloged the defective
hardware that was part of Teramac.
Defect Tolerance
Teramac was so complex and difficult to
build that it was not economically feasible
to construct it perfectly. A conscious deci-
sion was made to build the computer cheap-
ly by using defective components and as-
sembly techniques and then to compensate
afterward (by programming in the field) for
the mistakes. Most previous defect-toler-
ance work in theory (24) and practice (25)
has been concerned with chip or wafer scale
integration, but for Teramac, the entire ma-
chine was designed to be defect tolerant. It
is thus the largest defect-tolerant computer
ever built, and it strained the capabilities of
available commercial technology. Each
multichip module (MCM) had 33 layers of
wiring to interconnect a total of 27 chips, 8
used for their LUTs and 19 for only their
crossbars. Each printed circuit board (PCB)
had 12 layers of interconnects for four
MCMs. The interconnects for the eight
PCBs that comprised Teramac were inex-
pensive and defect-prone ribbon cables
with insulation-displacement connectors.
The huge communication bandwidth incor-
porated to make the compiler work effi-
ciently also forced the limits of the chip,
MCM, PCB, and cable levels of intercon-
nect, each of which contained about 10 km
of wire. However, Teramac was built rela-
tively cheaply because the fat-tree architec-
ture is also intrinsically tolerant of manu-
facturing and assembly defects. There are
very many reasonable ways to configure
Teramac because of the multiple equally
good choices for the compiler to route be-
tween any two LUTs. Adding defect toler-
ance to the system essentially involved
avoiding those configurations that con-
tained unreliable resources.
The use of defect tolerance in Teramac
saved substantial cost for the system. Only
217 of the FPGAs used in Teramac were
free of defects; the rest (75% of the total
used) were free of charge, because the com-
mercial foundry that made them would nor-
mally have discarded them. Half of the
MCMs failed the manufacturer’s tests, so
they were also free. This represents a sub-
stantial cost saving compared to building
Teramac from perfect components. The ini-
tial high cost in the redundant wiring used
in the FPGAs was more than recovered by
the fact that most of the FPGAs were free
but still usable because of the high level of
defect tolerance designed into the total sys-
tem. The tests determined that 10% of the
logic cells in the FPGAs used as processors
were defective, and that 10% of the inter-
chip signals were unreliable. Out of a total
of 7,670,000 resources in Teramac, 3% were
defective (26). The increase in functional-
ity that was realized with inexpensive (or
free) components was significantly greater
than the cost of designing and building the
defect avoidance capability. Furthermore, if
Teramac is physically damaged (a chip is
removed, or a set of wires cut, for example),
it can be reconfigured and resume operation
with only a minor loss in computational
capacity (roughly proportional to the frac-
tion of damaged parts).
For most computers, a defective chip or
connection must be physically repaired or
replaced for the system to be operational.
For Teramac, all “repair” work was done
with software. A program was written to
locate the mistakes and create a defect da-
tabase for the compiler. Teramac was con-
nected to an independent workstation that
performed the initial testing, but in princi-
ple a CCC can be configured
into a machine that tests itself.
The testing process can be separated into
running configurations that measure the
state of the CCC, and a set of algorithms
that are run on these measurements to de-
termine the defect. LUTs were connected
in a wide variety of configurations to deter-
mine if a resource (switch, wire, or LUT)
was reliable or not. If any group failed, then
other configurations that used the resources
in question in combination with other de-
vices were checked. Those resources found
in the intersection of the unreliable config-
urations were declared bad and logged in a
defect database. The actual testing was per-
formed by downloading designs onto Tera-
mac called “signature generators.” These are
sets of LUTs that generate long pseudo-
random number strings that are sent around
Teramac by a large number of different
physical paths. If the bit stream was both
correctly generated and transmitted by the
network, all the resources used are probably
(but not always) good. The bit sequences
were designed to diverge exponentially in
time after an error in computation, and so
this is an especially sensitive detector for
bad resources. This procedure is designed to
find the physical defects, such as opens,
shorts, and stuck at 1 or 0, which is much
easier than finding a logic design error.
There is an obvious problem in having a
device test itself when one does not know
whether anything is working. How do you
trust the testers? In practice, only a small
subset of resources have to be perfect. For
the FPGAs about 7% of the chip area, for
the MCMs about 4% of the wires, and for
the PCBs about 7% of the wires could not
Fig. 3. This figure demonstrates how a particular
implementation of a custom configurable comput-
er is downloaded onto a given set of resources,
and how the crossbar architecture, with sufficient
bandwidth, allows for defect-tolerant computa-
tion. The blue boxes at the bottom are logic ele-
ments or memory (or both). The role of this system
is to add two bits, P and Q, together to produce a
Sum (S) and a Carry (C). When P and Q are the
inputs to an And gate (A), then the output is the
Carry. When they are inputs into an Xor gate, then
the output is the Sum. Thus, both P and Q must
be connected to both A and X, and the ouput of A
and X must be connected to the memory loca-
tions for S and C, respectively. The red circles are
crossbars, and there are two levels to this fat tree.
This particular logical implementation illustrates
how various components with widely varying
numbers of defect can still be used to construct a
working system. From the bottom left crossbar,
and proceeding clockwise, we use 20, 70, 0, and
80% of the available resources. Similar arguments can be made for the other components. To under-
stand this system more completely, it is advisable to reassign the look-up tables differently, define some
of the crossbar switches to be defective and thus removed from the available resources, and then
reconnect the system to enable the adder. Such an exercise is very similar to what the compiler does
when it downloads the logical architecture onto the available resources.
www.sciencemag.org
z
SCIENCE
z
VOL. 280
z
12 JUNE 1998 1719

have defects in Teramac. These are the
wires that are used for clocks and to get data
out of the system for observability. Further-
more, some small percentage of resources
must be working to guarantee that the de-
fect-finding algorithms would work if the
system was to test itself. Those resources
that were part of this privileged set were
deliberately designed with explicit addi-
tional redundancy to ensure that they had a
high probability of survival.
Once the defect data base had been es-
tablished, computer architectures could be
loaded onto Teramac. The presence of de-
fects makes this task more difficult than for
a perfect system, but because of all the extra
bandwidth that resulted from using inter-
connects that exceed Rent’s rule exponents
in the fat tree, it turned out to be surpris-
ingly easy to do. In any given configuration
of Teramac, only 70 to 90% of the healthy
resources are actually used. However, such
inefficiency is a relatively inexpensive cost
associated with the defect tolerance. Scal-
ing properties are very important for any
architecture that aspires to eventually have
moles of components. The compiler algo-
rithms are dominated by the partitioning
time, which scales linearly with the number
of gates in the design (27). Experiments
with various-sized partitions of Teramac
showed that the time required to find the
defects also scaled linearly with the total
number of wires and switches in the fat tree.
This empirical result is extremely impor-
tant, for if the scaling had been superlinear,
the extension of this architecture to ex-
traordinary numbers of components would
not be so promising. The explicit effect of
defects on the scaling properties are still
issues of active research, but there does not
appear to be any significant scaling penalty.
Lessons for Nanotechnology
The ability of Teramac to operate reliably
in the presence of large numbers of defects
shows that a CCC architecture is applicable
to, and may be essential for, computational
nanotechnology. As perfect devices become
more expensive to fabricate, defect toler-
ance becomes a more valuable method to
deal with the imperfections. Any computer
with nanoscale components will contain a
significant number of defects, as well as
massive numbers of wires and switches for
communication purposes. It therefore makes
sense to consider architectural issues and
defect tolerance early in the development
of a new paradigm. The Teramac design
and assembly philosophy differs signifi-
cantly from the usual ideas of build-
ing complex computer systems, and thus
there are several important lessons for
nanotechnology.
The first lesson is that it is possible to
build a very powerful computer that con-
tains defective components and wiring, as
long as there is sufficient communication
bandwidth in the system to find and use the
healthy resources. The machine is built
cheaply but imperfectly, a map of the de-
fective resources is prepared, and then the
computer is configured with only the
healthy resources. At present, such an ap-
proach is not economically competitive
with CMOS technology, which requires
perfection in all the components of a com-
puter, because so many of the resources in a
CCC are not used (for example, most of the
LUTs in Teramac). However, the cost of
the fabrication plants for integrated circuits
(Fabs) is escalating exponentially with time
as chips continue to shrink in size, an ob-
servation that is sometimes called Moore’s
second law (2). By the year 2012, a single
Fab could cost $30 billion (1) or more,
which may simply be too expensive and
risky to build. At the same time, the sophis-
tication of inexpensive chemically synthe-
sized components is increasing dramatically.
There may eventually be a crossover from
one manufacturing paradigm to another,
and the defect tolerance possibilities raised
by Teramac could be the key enabling eco-
nomic issue that ushers in the era of chem-
ically assembled electronic computers.
A second and related lesson from Tera-
mac is that the resources in a computer do
not have to be regular, but rather they must
have a sufficiently high degree of connec-
tivity. The wiring mistakes in the MCMs
introduced a significant element of random-
ness to the connectivity of the system,
such that it was not possible to know what
resources were connected together with-
out performing a test. Thus, it is not es-
sential to place a component at a specific
position as long as the components can be
located logically. A crude analogy here is
the comparison between the American
and the Japanese post offices. If residences
are laid out in a Cartesian coordinate sys-
tem, then it does not take much complex-
ity in the mail-delivery system to find an
address. In Japan, however, there are no
regular street addresses. Nevertheless, the
knowledge of many local postmen is suffi-
cient to deliver a letter. A system at the
nanoscale that has some random character
can still be functional if there is enough
local intelligence to locate resources, ei-
ther through the laws of physics or
through the ability to reach down through
random but fixed local connections.
The third lesson addresses the issue of
what are the most essential components for
an electronic nanotechnology. In Teramac,
wires are by far the most plentiful resource,
and the most important are the address
lines that control the settings of the con-
figuration switches and the data lines that
link the LUTs to perform the calculations.
In a nanotechnology paradigm, these wires
may be physical or logical, but they will be
essential for the enormous amount of com-
munication bandwidth that will be re-
quired. Next, in terms of the number of
elements, are the crossbar switches and the
configuration bits that control them. This
may well be the most important active de-
vice that will be needed for computational
nanotechnology. One possible physical im-
plementation of a crossbar switch is illus-
trated in Fig. 4, although this example
should not be viewed as restrictive. The
replacement of the six transistors required
by an FPGA for a single configurable bit by
one quantum dot that may require only a
single electron to change its state would
represent an enormous energy saving for a
bit operation. This would represent a tre-
mendous advance toward the thermody-
namic limit for a nonreversible machine.
The LUTs make up less than 3% of the
fat-free utilizable resources of Teramac. As
such an architecture is scaled to significant-
ly larger sizes, that percentage will decrease
because there will be more levels added to
Fig. 4. An idealized version of the
chemically fabricated configurable
bit (right), compared with the logi-
cal description of a configurable
bit redrawn from Fig. 1. The com-
ponents labeled a and d are the
address lines and data lines, re-
spectively. The orange component
(b) is the switch. The address lines
are used to locate and “set” the
bit. Once the bit is set, the connec-
tion between the two data lines is shorted, and thus the status of the bit may be read. A chemically
fabricated switch could consist of a single semiconductor quantum dot in capacitive contact with the
two address wires. The dot is also in tunnelling or ohmic contact with two data wires. Ligands that
connect the dot to the four wires are varied to control the nature of the contact. Operationally, this
switch is a dual-gated single-electron transistor (5). When the two address lines are biased “on,” the
quantum dot is shifted out of the Coulomb blockade voltage region, and the data lines are effectively
shorted.
SCIENCE
z
VOL. 280
z
12 JUNE 1998
z
www.sciencemag.org1720

Citations
More filters
Journal ArticleDOI

Nanoionics-based resistive switching memories

TL;DR: A coarse-grained classification into primarily thermal, electrical or ion-migration-induced switching mechanisms into metal-insulator-metal systems, and a brief look into molecular switching systems is taken.
Journal ArticleDOI

Synthesis and Characterization of Monodisperse Nanocrystals and Close-Packed Nanocrystal Assemblies

TL;DR: In this article, solution phase syntheses and size-selective separation methods to prepare semiconductor and metal nanocrystals, tunable in size from ∼1 to 20 nm and monodisperse to ≤ 5%, are presented.
Journal ArticleDOI

Functional nanoscale electronic devices assembled using silicon nanowire building blocks.

TL;DR: The facile assembly of key electronic device elements from well-defined nanoscale building blocks may represent a step toward a "bottom-up" paradigm for electronics manufacturing.
Journal ArticleDOI

Directed Assembly of One-Dimensional Nanostructures into Functional Networks

TL;DR: It is shown that nanowires can be assembled into parallel arrays with control of the average separation and, by combining fluidic alignment with surface-patterning techniques, that it is also possible to control periodicity.
References
More filters
Journal ArticleDOI

Computing with neural circuits: a model

TL;DR: A new conceptual framework and a minimization principle together provide an understanding of computation in model neural circuits that represent an approximation to biological neurons in which a simplified set of important computational properties is retained.
Journal ArticleDOI

Neuromorphic electronic systems

TL;DR: It is shown that for many problems, particularly those in which the input data are ill-conditioned and the computation can be specified in a relative manner, biological solutions are many orders of magnitude more effective than those using digital methods.
Journal ArticleDOI

Logical devices implemented using quantum cellular automata

TL;DR: This work examines the possible implementation of logic devices using coupled quantum dot cells, which use these cells to design inverters, programmable logic gates, dedicated AND and OR gates, and non‐interfering wire crossings.
Journal ArticleDOI

Fat-trees: Universal networks for hardware-efficient supercomputing

TL;DR: In this article, the authors presented a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer, and proved that a fat-tree of a given size is nearly the best routing network of that size.
Journal ArticleDOI

An Improved Min-Cut Algonthm for Partitioning VLSI Networks

TL;DR: This-paper generalizes the ideas of Fiduccia and Mattheyses and suggests a class of increasingly sophisticated heuristics, and shows that the computational complexity of any specific heuristic in the suggested class remains linear in the size of the network.
Related Papers (5)
Frequently Asked Questions (15)
Q1. How many LUTs are used to compute the logic functions of the computing elements?

The system operates at a clock rate of 1 MHz, so that each computing element computes a new 1-bit output datum each microsecond, with all the 65,536 LUTs operating in parallel. 

The major surprises of the Teramac project were that the compiling time for new logical configurations was linear with respect to the number of resources used and the execution time for many algorithms was surprisingly fast, given the large number of defects in the machine. 

The key property of reconfigurable architectures, as opposed to conventional processors, is that they can be configured by means of a software instruction set into a large variety of very different digital systems. 

Because of the physical architecture chosen to implement powerful software algorithms, (13) Teramac could be configured into a variety of extremely capable parallel computers, even in the presence of all the defects. 

The ability of Teramac to operate reliably in the presence of large numbers of defectsshows that a CCC architecture is applicable to, and may be essential for, computational nanotechnology. 

Each multichip module (MCM) had 33 layers of wiring to interconnect a total of 27 chips, 8 used for their LUTs and 19 for only their crossbars. 

The authors define chemical assembly as any manufacturing process whereby various electronic components, such as wires, switches, and memory elements, are chemically synthesized (a process often called “self-assembly”) and then chemically connected together (by a process of “self-ordering”) to form a working computer or other electronic circuit (12). 

Only 217 of the FPGAs used in Teramac were free of defects; the rest (75% of the total used) were free of charge, because the commercial foundry that made them would normally have discarded them. 

A self-ordering process is only likely to produce fairly regular structures with low information content, but real computers built today have great complexity imposed by human designers. 

A number of alternatives to standard Si-based CMOS devices have been proposed, including single-electron transistors (5), quantum cellular automata (6, 7), neural networks (8, 9), and molecular logic devices (10, 11). 

There may eventually be a crossover from one manufacturing paradigm to another, and the defect tolerance possibilities raisedby Teramac could be the key enabling economic issue that ushers in the era of chemically assembled electronic computers. 

If this goal is attained, then performance of the silicon-based integrated circuit will have improved by nearly seven orders of magnitude in 40 years, using energy consumed per operation as a metric, with a single manufacturing paradigm. 

a total of 4 megabits (65,536 3 64 bits) of configuration memory is used to define the logic functions of all the computing elements. 

If this argument is valid, then wires, switches, and memory are the key enabling ingredients for computational nanotechnology, and a configuration bit appears to be the most enticing initial target for research in active devices (28).REFERENCES AND NOTES ___________________________1. 

Teramac was connected to an independent workstation that performed the initial testing, but in princi-ple a CCC can be configured into a machine that tests itself.