## **Superconducting Digital Electronics**

## HISAO HAYAKAWA, NOBUYUKI YOSHIKAWA, SHINICHI YOROZU, AND AKIRA FUJIMAKI

## Invited Paper

Single-flux quantum logic (SFQ) circuits, in which a flux quantum is used as an information carrier, have the possibility for opening the door to a new digital system operated at over 100-GHz clock frequency at extremely low power dissipation. The SFQ logic system is a so-called pulse logic, which is completely different from the level logic for semiconductors like CMOS, so circuit design technologies for SFQ logic circuits have to be newly developed.

Recently, much progress in basic technologies for designing SFQ circuits and operating circuits at high speeds has been made. With advances in these design tools, large-scale circuits including more than several thousand junctions can be easily operated with the clock frequency of more than several tens of gigahertz. High-end routers and high-end computers are possible applications of SFQ logic circuits because of their high throughput nature and the low power dissipation of SFQ logic.

In this paper, recent advances of SFQ circuit design technologies and recent developments of switches for high-end routers and microprocessors for high-end computers that are considered possible applications for SFQ logic will be described.

**Keywords**—Digital applications, high-end computer, high-end router, single-flux quantum (SFQ) logics, superconductor.

### I. INTRODUCTION

Semiconductor devices such as CMOS devices have been advanced by shrinking their feature sizes. The most important problem in such miniaturized devices is heating. Moreover, miniaturization causes increase of wiring resistances, resulting in increasing delay times.

Superconducting digital devices that can be operated with high speed and low power consumption are expected to realize systems with extremely high performance by high

Manuscript received October 24, 2003; revised May 6, 2004.

- H. Hayakawa is with the Department of Quantum Engineering, Nagoya University, Nagoya 464-8603, Japan (e-mail: hhayakaw@nuee.nagoya-u. ac.jp).
- N. Yoshikawa is with the Yokohama National University, Yokohama 240-0023, Japan (e-mail: yoshi@yoshilab.dnj.ynu.ac.jp).
- S. Yorozu is with the Superconductivity Research Laboratory, International Superconductivity Technology Center (ISTEC), Tsukuba 305-8501, Japan (e-mail: yorozu@istec.or.jp).
- A. Fujimaki is with the Department of Quantum Engineering, Nagoya University, Nagoya 464-8603, Japan (e-mail: fujimaki@nuee.nagoya-u. ac.ip).

Digital Object Identifier 10.1109/JPROC.2004.833658

density packaging and superconducting interconnects or wirings.

Recently, the development of superconducting digital devices has been accelerated by introducing a new logic concept: that is, the logic system making use of the single-flux quantum (SFQ) as an information carrier. SFQ logic is essentially a pulse logic which differs from CMOS logic based on voltage levels. The time width of an SFQ pulse in circuits can be on the order of 1 ps, which means that the operating frequency, in principle, can be up to subterahertz, resulting in very high throughput capability of the circuits. The power consumption is extremely low in such high-frequency operation, three orders of magnitude less than that of CMOS logic.

Recently, high-end computers, high-end routers, and high-end servers have been developed mainly in the United States and Japan as applications of SFQ logic. These applications are making use of advantages of SFQ circuits such as the low power consumption and the high throughput operation.

## II. HISTORICAL BACKGROUND OF SUPERCONDUCTING DIGITAL ELECTRONICS

Superconducting digital electronics started from the proposal of a switching device called the cryotron by Buck in 1956 [1]. The cryotron utilized the transition from the superconducting state to the normal state in a superconducting thin film caused by the magnetic field due to the current in a control line placed nearby the thin film. The phenomenon is quite slow because the transition is caused by a thermal effect, hence, the low switching speed in circuit operation.

In 1962, Josephson [2] discovered the superconducting tunneling effect and the Josephson effect was soon experimentally demonstrated [3]. Matisoo proposed switching devices utilizing Josephson tunnel junctions which exhibit hysteresis with the superconducting state and the highly resistive state in the subgap region [4]. This logic is called latching logic, where bias currents should be reduced to zero in order to reset to the superconducting state logic gates that have switched to the resistive or voltage state. Latching gates have to be ac powered to be reset.

The switching delays were several 10 ps, which were very attractive compared with Si devices whose switching times were hard to be less than 1 ns at that time.

Taking advantage of Josephson junctions for logic devices, IBM started a Josephson computer project, in which Josephson digital technologies were much advanced by developing circuit design, memories and packaging technologies [5]. However, in 1983, IBM announced an end to the project, saying the superiority of Josephson technology to semiconductor technology would be lost by the rapid progress of Si devices [6]. They also pointed out the technological difficulty of operating large-scale integration (LSI) circuits made of Pb alloy, which they had developed.

In spite of IBM's announcement, Japan continued the development of Josephson digital devices by choosing newly developed Nb junctions instead of the Pb alloy junctions that IBM had used. Josephson integrated circuits (ICs) made with Nb junctions are much more reliable and uniform than those made with Pb alloy junctions, allowing us to operate logic and memory circuits with an LSI level.

In fact, microprocessors and memories with LSI levels have been successfully demonstrated using the Nb LSI technology [7]–[10]. Fig. 1 shows an 8-b digital signal processor [Fig. 1(a)] and a 4-kb memory [Fig. 1(b)] made by Fujitsu and NEC, respectively.

One disadvantage of latching logic is an increase in failures in logic operations due to punch-through when the frequency of the ac power supply is increased. Punch-through is the failure to reset the resistive state to the superconductive state. In latching logic circuits, it is said that the frequency of the ac power, hence, the frequency of the clock, cannot exceed several gigahertz, due to the increase of the punch-through probability [11].

To overcome the limitation on latching logic, a new concept of logic operations, in which an SFQ is used as an information carrier, has been proposed [12]. A logic system utilizing SFQ pulses propagating in circuits consisting of overdamped Josephson junctions was made by researchers of Moscow Sate University and Institute of Radioengineering and Electronics [13], [14]. They first called this logic system resistive SFQ logic, because they made use of small resistors for connecting gate to gate. This convention followed that of the SFQ binary ripple counters invented at Ford Research [15]. However, finally they replaced the resistors by superconducting wires and Josephson junctions, resulting in higher operating frequencies and wider operating margins; hence, they named the logic system rapid SFQ (RSFQ) [16]. The RSFQ logic circuits using moderate integration technologies with minimum line widths of 1  $\mu$ m can be operated at higher frequencies, up to 100 GHz. Digital divide-by two flip-flops have been demonstrated up to 750 GHz using submicrometer features [17]. Despite the low power and high speed, low bit error operation has been demonstrated [18]. In superconducting digital electronics, systems based on SFQ logic, the same concept as RSFQ, have now seen development on a worldwide scale.

### III. PRINCIPLES OF SFQ (RSFQ) LOGIC AND CIRCUITS

One information bit of an SFQ logic circuit is stored in a superconducting loop as a flux quantum. The loop includes more than one overdamped Josephson junction and forms a quantum interferometer. The number of stable quantum states of the interferometer is restricted to two by adjusting circuit parameters such as inductances of the loop and critical currents of the Josephson junctions. The two states differ by an SFQ  $\Phi_0=2.07\times 10^{-15}$  Wb, and the transition between the two is caused by entry and exit of an SFQ through the Josephson junctions. The use of overdamped junctions, which have external shunt resistors, guarantees stable operation of the SFQ circuits.

Josephson junctions usually keep the superconducting (zero-voltage) state in SFQ circuits. A voltage pulse is generated with return-to-zero (RTZ) shape only when an SFQ crosses a junction. Typical height and half width of the pulse are 0.5 mV and 2.5 ps for the critical current density  $J_c$  of 2.5 kA/cm². The RTZ nature leads to special features of the SFQ circuits. One is extremely low power consumption. If an SFQ IC made up of 1 million Josephson junctions is operated at 100 GHz, the circuit consumes only 4 mW. This low power feature enables dense packaging of SFQ ICs, resulting in low latency of the system.

Lack of an *RC* recharge process in interconnections is another advantage of the SFQ circuits. The recharge process causes interconnection delay in semiconductor circuits. Recently, the delay of long interconnections has become larger than gate delay, and puts restrictions on reduction of the clock period in semiconductor ICs. On the contrary, the clock period can be reduced to the time determined by the gate delay in the SFQ circuits, though it depends on the clock distribution technique. In fact, high throughput exceeding 40 GHz has been demonstrated based on the concurrent-flow clocking, as described later.

Fig. 2 shows the equivalent circuit of the most basic gate of an SFQ circuit, reset–set flip-flop (RS–FF). The RS–FF is similar to the two-junction interferometer provided with bias currents. The difference is an additional input port for the reset SFQ signal RESET\_IN. The operation with destructive readout is as follows: A set SFQ signal comes in at the port SET\_IN. At that time, the sum of the bias current and the signal current exceeds the critical current of the junction  $J_1$ . The SFQ enters the loop  $J_1$ -L- $J_2$  and is stored because the total current, including a circulating current induced by the stored SFQ, does not reach the critical current of  $J_2$ . If the reset signal comes in, the total current can exceed the critical current  $I_c$  and the output SFQ signal is sent to the next gate.

General logic gates such as AND and XOR have two inputs. In most SFQ logic gates, logic operation is executed first, and then the result is stored in the inherent superconducting loop. This stored result is read out and sent to the next gate through Josephson transmission lines by supplying a clock signal.

Passive transmission lines (PTLs) with microstrip line structures are used for long interconnections, where a voltage pulse can travel at the speed of light.





**Fig. 1.** Examples of circuits developed in latching logic regime. (a) 8-b digital signal processor fabricated by Fujitsu. Operated at the clock frequency of 1.6 GHz. (b) 4-kb RAM fabricated by NEC. The chip contains 21 000 junctions and the access time is 380 ps. Those LSI chips are fabricated with the Nb integration process.

The binary code of the SFQ circuits is defined by the relationship between data and clock signal [16]. If an SFQ is stored or a voltage pulse is detected between adjacent two clock pulses, the code is determined as "1." If not, the code is "0." The minimum clock period is three to five times as large as the pulsewidth because of this coding.



Fig. 2. Equivalent circuit of RS–FF. An SFQ pulse with the RTZ nature is generated when the total current flowing through  $J_2$  exceeds the critical current  $I_c$ . The pulse height is typically 0.5 mV, and the half width is 2.5 ps at the critical Josephson current density of 2.5 kA/cm<sup>2</sup>.

#### IV. CIRCUIT DESIGN TECHNOLOGY

### A. Optimization of Circuit Parameters

There are local (on-chip) spreads and global (chip-to-chip) spreads in circuit parameters in the actually fabricated circuits. These spreads reduce the operating margins and make the operation unstable. In order to maintain the number of the permitted quantum states and widen the operating margins, the circuit parameters should be optimized. The optimization tools have been developed so far [19]–[25] based on analyzing the dynamics of the circuit.

The first step of the optimization is to find the operating region in a multidimensional space where the circuit parameters are taken as the variables. Then we search a set of the parameters giving the point that is farthest from the boundary of the operating region. Several optimization tools have been proposed so far to find such a set and can be classified into three types: the critical margin method, the Monte Carlo method, and the method of inscribed hyperspheres.

Yoshikawa *et al.* compared the effectiveness of several optimizers for the circuits of an RS–FF, a toggled-FF (T-FF), and a demultiplexer [26]. Any optimizer gives the global dc bias margins and yields enough for such small circuits to operate. For example, the global dc bias margin is enhanced to about  $\pm 35\%$  from the initial value of  $\pm 28\%$  for the T-FF. The yield is also enhanced to 80%–95% from 50% on the following assumptions. There are 50% local spreads and 50% global spreads in  $I_c$ , and 20% in inductance, and 20% in resistance, where x% spreads mean uniform probability from (100-x)% to (100+x)% of the designed value in the histogram.

In general, the obtained dc bias margins and yields depend on the initial values in the critical margin method, while the required calculation time is short. On the other hand, it takes longer time in the Monte Carlo method and in the method of inscribed hyperspheres as the number of circuit elements increases.

Several requirements are added when optimizing the circuits based on the cell-based design described later. The most difficult requirement is adjustment of the timing parameters such as delay, hold time, and setup time. Arrival times of the pulses need to be adjusted on the order of a few picoseconds because of the coding of the SFQ circuit; thus, the operating

margins are very sensitive to the spread in the timing parameters. In particular, the global spread leads to large reduction of the operating margin. One way to avoid the reduction is that the sets of circuit parameters are optimized so as to give the similar bias current dependences of the timing parameters for all cells. The effectiveness of this optimization has been confirmed by the demonstration of a relatively large-scale SFQ circuit, as described later.

#### B. Cell-Based Design

Digital circuit performance, especially in random-logic circuits, strongly depends on the design tool environment. Most work on design tools for SFQ circuits has focused on small-scale design technology, e.g., a time-domain analog simulator with operating margin optimization, and inductance extraction from physical layout [27]. However, such design methodology cannot be adopted for the design of large-scale SFQ circuits which are composed of over thousands Josephson junctions. For large-scale circuit design, a very popular semiconductor design methodology is a standard cell based approach, in which the circuit is built from a standard cell library. The Rochester University group first studied a top-down circuit design methodology based on the commercially available Cadence DFII environment [28]. A Japanese team (called CONNECT) also established such a top-down design environment, also based on Cadence DFII [29]. Both methodologies are fundamentally the same, but the later one has more sophisticated design automation tools. For cell-based design, a standard cell library is a very important component. Various libraries have been developed [30], [31]. Unfortunately, the cell library strongly depends on design tool environment and fabrication technology. Therefore, each cell library has a different structure and circuit parameter optimization concept. In this paper, we review the CONNECT team cell-based design methodology using CONNECT cell library, because this approach is standard.

All CONNECT cells are designed and optimized to have as little static interaction between each other in connection as possible. All circuit parameter values were optimized to provide the widest margins, even if all the critical currents are changed by  $\pm 10\%$ . Currently there are about 230 cells in the library. Most cells are wiring Josephson transmission lines (JTLs) cells, which can make circuit design very flexible. Fig. 3 shows a design flow. The design flow is roughly separated into three phases; design entry phase, cell placement and routing (P&R) phase, and physical layout phase. For the design entry phase, each CONNECT cell is described its behavior by using Verilog-HDL. A designed circuit schematic is verified its logical functions by using the Verilog-XL simulator. In this phase, all cells have an ideal zero delay. After completing logic function design, a designer proceeds to the cell P&R phase. In SFQ random logic circuit design, the signal timing adjustment between clock and data is a very important task because the logic cell delay is either similar to or shorter than the JTL interconnection delay. Therefore, the CONNECT library has many kinds of JTL cells that have different sizes and delay times. In the P&R phase, the circuit



Fig. 3. Design flow.



Fig. 4. Example of layout symbol view.

schematic is drawn by using "layout symbol view." Fig. 4 shows an example of the layout symbol view. The layout symbol view reflects the shape and pin locations of the corresponding physical layout and timing information (setup time, hold time, and delay time, with their bias current dependence). Therefore, a designer can design a circuit considering the final physical layout. The CONNECT environment is available to verify circuits not only dynamically by Verilog-XL, but also the static way. The static timing analysis is done by the Cadence commercial tool BuildGates. The static timing verification does not need a test vector as is needed in the dynamic simulation. In the statistical timing analysis, path-delay time can be calculated, so the timing sequence is easily checked at each cell. If the timing sequence is wrong, interconnection delay time is adjusted by replacing a JTL cell.

In the physical layout phase, the whole layout is done by simply replacing each layout symbol view with the corresponding physical layout data. The physical layout data, which is called layout view, is drawn based on the NEC standard fabrication technology [10]. Fig. 5 shows an example of the layout view. The width and height of the cell layout are multiples of the standard unit length, 40  $\mu$ m. Inductance extraction from the layout symbol view is done by using an inductance calculation tool "L-meter" [32]. The extracted inductance value is back-annotated to the cell analog circuit schematic and the cell analog circuit is reoptimized. After such iteration is completed, all timing parameters are calculated and back-annotated to the digital behavior files.



Fig. 5. Example of layout view.

# C. Further Expansion of Design Technology, Automation Tool, and Passive Interconnection

As the circuit scale increases, manual circuit design becomes more difficult. Thus, design automation tools have been developed. At first, the design data conversion process in the physical layout phase was automated by implementing original software [29]. Next, an automatic cell P&R tool in the cell P&R phase was implemented with Cadence Japan Inc.; this is the world's first automatic P&R tool for SFQ LSI. Based on concurrent-flow clock distribution, it makes the lengths of the data delay path and clock delay path the same for each cell. And then the tool adjusts all the timing constraints, while considering the setup time and hold time of the logic cells. Fig. 6 shows an example of the result. By using this tool, we can expand the designable circuit scale more than ever before. Moreover, the design time is shortened by several orders compared with full manual design. On the other hand, logic synthesis automation in the design entry phase is not improved yet, although it is one of key technologies toward making real LSI circuits. Because an SFQ circuit has a different configuration from a semiconductor circuit, we cannot obtain good synthesis results by using only commercial tools. Methodical study on the logic synthesis suitable for SFQ circuit is required.

Interconnection is also important in order to increase circuit performance further. PTL interconnection technology can be used. PTL is a line such as a microstrip or strip line, so it does not use Josephson junctions except for a SFQ pulse driver and receiver. Therefore, the power consumption and the signal delay time of PTLs are smaller than those for JTLs. Furthermore, the spread of the signal delay in JTLs is strongly related to the process parameter spreads. This spread is the cause of the timing jitter, which grows larger as the JTL gets longer. In contrast, the PTL does not have such problems. The PTL, thus, has an advantage in interconnections. When PTLs are used for interconnection between the cells,









Fig. 6. Example of automatic P&R result.

design strategies of clock distribution, cell placement, and delay adjustment need to be reconsidered. While there are only a few demonstrations for PTL interconnected circuit, they will be dominant in the future [33]–[35].

### V. APPLICATIONS

A number of applications of superconducting digital technology based on SFQ logic have been proposed so far by taking advantages of extremely high throughput and low power consumption. Digital signal processing for digital filtering and multiuser detectors [36], digital sensor readouts, switches for routers, and high-end computers are considered as possible applications. In this review, switches and high-end computers will be described in detail since both applications are now being intensively developed in Japan and the United States.

### A. Superconducting Digital System

Although SFQ circuits are capable of internal operation at tens of gigahertz, packaging technology limits their performance as a subsystem.



Fig. 7. Packaging concept of superconductinig subsystem.

Interconnection technology choice between room temperature and the low temperature stage depends on application requirements. A reasonable solution is a 40-Gb/s optical-in and 10-Gb/s electrical-out scheme. Fig. 7 shows an example schematic of the superconducting subsystems. The package has two separated temperature stages at 5 and 50 K. At the 5 K stage, an SFQ chip or multichip module (MCM) is mounted, and at 50 K semiconductor amplifiers are mounted. Both stages are available to mount other cryodevices such as cryo-CMOS if needed. The optical input signal by way of optical fiber drives a photodiode which is a kind of metal-semiconductor-metal device fabricated on the SFQ chip. After processing of the SFQ circuit, the output signal is first amplified to several milivolts using a superconducting digital voltage driver, and then again amplified to several hundreds millivolts using a semiconductor amplifier at the 50 K stage. For reliable operation, RSFQ circuits require a magnetic shield keeping the magnetic field less than  $\sim 1$  mG. The cryocooler can be mounted in a standard system rack. Therefore, SFQ subsystems are almost seamless for system integration, even though they are operated at different temperature.

### B. Switch for High-End Router

1) Packet Switch: Internet traffic loads are increasing at a rate faster than Moore's Law. Next-generation communication nodes will have link speeds of 40–160 Gb/s/port and total throughputs of over 10 Tb/s. While link capacities can easily be increased by bundling optical fibers, it is difficult to sustain the packet-switching throughput of a node. A typical high-end router consists of data input/output line cards and a switch card connected to all of them. Each line card has a network processor to analyze packet headers for forwarding. The switch card forward packets from an incoming port to a destination port. In evaluating a router's performance, an important criterion is the packet-processing



Fig. 8. Comparison of SFQ switch with semiconductor switch.

time, which is the bit length of a packet divided by the port speed. It is typically several tens of nanoseconds at 10 Gb/s. A router must process packets within the packet time length to achieve wire-speed operation. As the data rate increases, the packet time length decreases, so the processing speed must be increased commensurately. Three bottlenecks arise from this decrease in processing time. The first is header address processing for incoming packets. The second is packet buffering. Both processes are performed by a line card; a data-parallelism technique employing CMOS high-speed network processors and high-speed semiconductor memories solves these problems. The third problem is packet forwarding (switching) in the switch card. The switch card gathers all packets from all line cards. Therefore, as the throughput increases, the physical packaging density and power consumption become large. As a result, the conventional data-parallelized semiconductor switch throughput will be limited at around several terabits per second. An alternative technology for packet switching is, thus, strongly required. Superconductor technology is one of the best candidates. The large-scale circuit integration availability of superconducting technology in this frequency domain is better than for the semiconductor technologies because of the low-power consumption characteristics. Fig. 8 shows a conceptual comparison of an SFQ switch with a semiconductor switch. Historically, from the same point of view, superconductor switch circuits using a voltage-level logic family were proposed [37], [38]. However, voltage-level logic is slower than SFQ logic, so SFQ circuit implementation is currently dominant.

The packet switch architecture at which we are aiming should have two key characteristics: 1) it should be suitable for use in high-end routers which process high-speed packets and require large throughputs and 2) it should be suitable for SFQ circuit implementation. Two types of packet switch architectures meeting the above requirements have been proposed: the Batcher–Banyan type and the crossbar type.

The Banyan network is constructed from  $2 \times 2$  self-routing switch elements with a single path between any input and output pair. The complexity of the paths and switching elements is on the order of NlogN, so this approach requires less hardware than the other switch topology. Furthermore, there are no global control circuits in a Banyan network, so the switch circuits can be expanded easily. The



Fig. 9. Internal speedup Batcher-Banyan switch architecture.



**Fig. 10.** Fabricated  $2 \times 2$  self-routing switch.

disadvantage of this approach is internal and external packet blocking, which typically causes a 50% packet-loss rate. To improve these throughput characteristics, there are some propositions [39], [40]. One of them is illustrated in Fig. 9 [41]. It improves the throughput by: 1) adding a Batcher sorter; (2) compressing packets (in this case, a factor of four); 3) shifting the packet input timing slot to a Banyan switch; and 4) doing grouping and shuffling exchange wiring between the Batcher sorter and the Banyan. With this architecture, the network load is effectively compressed, thus dramatically reducing the packet contention rate. The linearity of the port numbers is also good. Fig. 10 shows an example of a fabricated  $2 \times 2$  self-routing-type packet switch [42].

The crossbar switch network has simple topology with nonblocking features. Therefore, it is the most popular approach in commercial high-end systems. In this architecture, a switch scheduler controls the crossbar switch in order to avoid packet collisions. Therefore, while the  $N \times N$  switch elements grow in complexity as a function of  $N^2$ , which is larger than in the case of Banyan switches, the crossbar switch element itself may be simpler than the self-routing switch circuit. There are already some implementations of crossbar switch circuits using SFO circuit technology [43]–[46]. Fig. 11 shows an example of a fabricated circuit, which was designed using the standard cell library CONNECT [46]. The  $2 \times 2$  switch circuit operation was demonstrated experimentally up to 35 GHz. The on-chip test technique was used for such a high-speed test. The shift register (Input SR) serving as the rate-transfer circuit from low speed to high speed was placed before the SFQ circuit under test, and another shift register used for slowdown (Output



**Fig. 11.** Fabricated  $2 \times 2$  crossbar switch.



Fig. 12. A possible switch module.

SR) was placed after the circuit. Input data were sent to the Input SR at low frequencies, and then processed using high-speed clocks provided by the built-in clock generator. The result was stored in the Output SR, and read out at low frequencies. The switch scheduler is the key component for increasing the crossbar switch network performance. Thus, it is also important to implement the switch scheduler circuit using SFQ circuit technology. If this is done, a large reduction in the packet processing time and an expansion of the data throughput can be expected.

2) Switch Module: Fig. 12 shows a diagram of a possible example of a  $32 \times 32$  switch module. It consists of four  $16 \times 16$  switches and scheduler chips, which can handle a packet speed of up to 40 Gb/s. The maximum throughput exceeds 1 Tb/s. For chip-to-chip communications, a superconducting interconnected multichip-module is used. A 10-Gb/s switch module was already demonstrated [47].

As for the transfer rate between the room-temperature and low-temperature subsystems, a 10 Gb/s/line is both currently possible and appropriate, while the SFQ circuit operates a 40-GHz clock on the chip. Therefore, some kind of speed conversion circuit is very important. A special speed conversion circuit for serial block data such as packet data has been proposed [48]. These are used to convert between 10-Gb/s serial data and 40-Gb/s serial data. Using optical I/O drastically

changes the hardware. However, the nature of the optical output interface technology is still unclear. Further research is, thus, required.

### C. High-End Computers

One of the most attractive and ambitious applications for superconducting digital electronics is a superconducting processor for high-end computing. In such a large-scale system, extremely low power consumption and high-speed operation become the major advantage. High packing density resulting from the low power brings about a decrease of the propagation delay in the system. The cooling cost is also relatively decreased in a large-scale application.

Historically, the early development of the superconducting processor had been performed based mostly on voltage-state circuits in many research groups [49], [50]. Some projects were rewarded with sufficient results, demonstrating successful implementation and operation of the superconducting processors. Despite these results, all projects were ended. The main reason stems from the fact that the voltage-state circuits need ac power to operate, which limits the clock frequency to several gigahertz. Lack of high-speed and high-density memories is another impediment. The situation, however, has been dramatically changed by the invention of SFQ logic circuits [12], [13], [16]. The SFQ circuits allow more than tenfold speedup in the clock frequency and much less power than the voltage-state circuits. They have offered the possibility of an ultrahigh-speed and low-power superconducting processor.

Recent rapid progress in digital electronics has been brought about by the exponential speedup and scaling up of the semiconductor LSIs due to the decrease of their minimum feature sizes. However, it is also well recognized that semiconductor technology will face several problems within the decade [51]. They are physical limits in the integration level, increasing power consumption, increasing design complexity, large interconnection delay, and increasing fabrication costs. According to the recent edition of the International Technology Roadmap for Semiconductors (ITRS) [52], the power consumption by a single chip of over 100 W is one of major problems in high-performance LSI systems. The interconnection delay will occupy a dominant portion of the total system delay in the new semiconductor technology generations even if the intrinsic device delay is decreased to subpicoseconds. SFQ circuit technology overcomes these problems by decreasing the power consumption by three orders and by increasing the clock frequency beyond 100 GHz using the ballistic propagation of the SFQ pulses in superconducting PTLs. This is a major motivation to develop a high-end computing system using SFQ LSI technology.

In spite of the potentially high performance of SFQ circuits, lack of a good solution to build large memories (RAM) is still a difficult problem in SFQ digital technology. Architecture for the SFQ digital system should take this into account. The most successful Josephson memory up to the present is the 4-kb RAM developed by NEC [10], which uses voltage-state circuits. Because of low driving ability

of SFQ gates, the typical memory organization based on a two-dimensional memory cell array is not effective in the SFQ memory. One approach employs an array of SFQ serial registers [53]. Because of simple structure and high operating frequency of the serial register, a relatively high-density memory is possible, though random accessibility is limited to some extent. Another approach pursues hyridization of Josephson and CMOS technologies [54]. The main idea of this approach is to take advantage of each technology: high density in CMOS technology, and high speed and low power in Josephson technology.

There are also several issues to be solved from the designer's point of view when making a large SFQ computer system. One is the difficulty in synchronizing logic circuits on a chip at speeds over several tens of gigahertz. Because the time-of-flight of the signal over 10 mm distance is about 70 ps on a chip, global synchronization of the logic circuits on a chip is impractical. Another is the intrinsically deep pipelining feature of SFQ logic gates. Typical SFQ logic gates are latched gates clocked by the SFQ pulses, which act as flip-flops with simple combinational logic functions like AND, NOT, etc., using the terminology of semiconductor logic circuits. This means that only one combinational logic gate can be included in each pipeline stage, while several combinational logic gates exist in the typical semiconductor pipeline stage. Though this gate-level deep pipelining helps to increase the clock frequency, it brings about a large number of data/control hazards in the pipeline stages, resulting in the deterioration of system performance. These problems, the difficulty in synchronization and deep pipelining, are the main challenges to be solved from the viewpoint of circuit architecture designers.

At present there are two large projects in the world aiming at high-end computer systems using SFQ digital technology. One is the FLUX processor project in the United States [55]–[58] that started as a spin-off project of the preceding Hybrid Technology Multi-Threaded architecture (HTMT) project [59], [60]. The other one is the CORE processor project within the framework of the Superconductors Network Device Project in Japan [61], [62].

The objective of the HTMT project, which was started in 1997, was to carry out preliminary studies of computer architecture and system organization to realize a petaflops-scale computer system by combining several novel hardware technologies. In the HTMT project, SFQ LSIs are placed at the center of its hierarchical system organization, providing high-speed processors and data switches. Fig. 13 shows the HTMT computer concept. A preliminary design of SFQ processors and network switches was made by SUNY's group assuming a 0.8- $\mu$ m Nb LSI technology with the junction critical current of 20 kA/cm<sup>2</sup> and with eight interconnect layers. They estimated that 4-k superconductor processors with a 4-GB superconductor memory operating at 100 GHz can achieve petaflops level of performance, and occupies only a physical space of about 0.5 m<sup>3</sup>. The power dissipation in the helium part and in the total system including a refrigerator is estimated to be 250 W and 0.3 MW, respectively.



Fig. 13. HTMT computer concept (adopted from [59]). It is composed of multiple levels of memory systems: optical memory, semiconductor DRAM and SRAM, and superconductor memory (CRAM), as well as different types of processors: SRAM- and DRAM-based processor-in-memory (PIMs) and superconductor processors (SPELLs) that are connected by superconductor network switches (CNET).



**Fig. 14.** Block diagram of the multiprocessor system based on the CORE architecture (after [61]).

The total power consumption is found to be 3% that of a semiconductor system with the same performance.

Another approach aims at the realization of a low power high-end server for the network-based applications rather than high-speed computing [61], [62]. The project is performed by three universities in collaboration. Their concept in computer architecture design is based on the complexity reduced architecture (CORE), in which the complexity of the system is reduced in exchange for using the high clock rate of the SFQ circuits [61]. This idea arises from the observation that the total performance of the computer system is limited by processor-memory bandwidth and latency rather than processor performance itself. In the CORE processor, a bit-slice data structure, e.g., a 32-b word of 8-b width and 4-b length, is used. Though such a narrow-width data bus causes an increase in clock cycles per instruction, it matches the processor performance with the processor-memory bandwidth, decreases the difficulty in timing design and reduces the complexity of the circuits.



**Fig. 15.** Block diagram of the FLUX processor. Its main components are a pipelined instruction memory, two instruction registers with decode/issue units, eight integer ALUs interleaved with eight registers, and two I/O ports.

A preliminary design of the multiprocessor system (Fig. 14) assumes a basic CORE multiprocessor module consisting of 16 CORE processors with 12.5 GFLOPS performance each and SFQ or JJ-CMOS hybrid memories [54] of 4 GB, which are connected by a high-bandwidth SFQ interconnection network with 12.8-Tb/s bandwidth. The power consumption of the CORE multiprocessor module, including a refrigerator, is estimated assuming a future SFQ LSI technology with critical current density of 64 kA/cm<sup>2</sup>. It was shown that more than a 20-fold reduction of power is available in the SFQ memory-based CORE multiprocessor module in comparison with a CMOS processor module of the same scale.

After an architecture study of the SFQ processor in the HTMT project, a new project was started in 2000 as a collaboration between the State University of New York (SUNY) and TRW in order to validate the feasibility of the SFQ processor using currently available SFQ LSI technology through a demonstration of a reasonably complex SFQ processor prototype [55]–[58]. They also intended to demonstrate chip-to-chip communication on an MCM at high clock rate. Their processor prototype, named the FLUX chip, was a simple 16-b two-way long-instruction words (LIW) microprocessor, which was reduced to an 8-b version in their later design. Fig. 15 shows a block diagram of the FLUX processor. In the processor microarchitecture, the FLUX chip employs several remarkable features different from the conventional semiconductor processor architecture to achieve high performance. Totally distributed register files, where each arithmetic logic unit (ALU) is put between two adjacent registers, decrease the register access time. The 8-b ALUs and registers are built with eight single-bit units



Fig. 16. Micrograph of the FLUX processor.

to allow a bit-level operation chaining in the ALU operation. In the circuit level, the FLUX chip aggressively uses PTLs in gate-to-gate interconnections to decrease the latency and the power consumption. The FLUX chip is fabricated in TRWs 1.75- $\mu$ m Nb-trilayer process with critical density of 4 kA/cm² [63]. Fig. 16 shows a photograph of the FLUX processor. Table 1 summarizes physical characteristics of the latest version of the 8-b FLUX processor [58]. Its peak performance is estimated to be 40 billion 8-b operations/s at

**Table 1**Physical Characteristics of the Latest 8-b FLUX Processor

| Feature                  | Parameter                                 |  |
|--------------------------|-------------------------------------------|--|
| Fabrication Technology   | 1.75 µm Nb-trilayer process with junction |  |
|                          | critical density 4 kA/cm <sup>2</sup>     |  |
| Target clock frequency   | 20 GHz                                    |  |
| Josephson junction count | 63,107                                    |  |
| Die size                 | 10.3 mm x 10.6 mm                         |  |
| Total bias current       | 4.6 A                                     |  |
| Power dissipation        | 9.2 mW                                    |  |



**Fig. 17.** Block diagram of the CORE1 processor (adopted from [67]). It is composed of a bit-serial ALU with two adjacent registers (REG0 and REG1), an instruction register (IR), a 5-b program counter (PC), and a controller.

a 20-GHz clock rate. At present, successful operation of a 1-b ALU is reported [64]. The chip-to-chip communication up to 60 GHz was demonstrated between superconducting chips mounted on an MCM using double-flux-quantum pulses [65].

Based on the CORE concept, a small SFQ processor, named CORE1 has been developed as a collaborative project between Nagoya University and Yokohama National University. The objective of the CORE1 project is to build up fundamental technologies for designing a large-scale SFQ digital system by making the smallest but complete SFQ processor. Fig. 17 shows a block diagram of the CORE1 microprocessor. The basic microarchitecture is similar to that of the TYPPY processor in the preceding SFQ processor project [66]. The CORE1 processor is a bit-serial 8-b

processor with eight instructions and a 32-B memory space. The two registers (REG0 and REG1) are placed close to the bit-serial ALU to achieve a high clock rate. The CORE1 processor is designed by using the CONNECT cell-based design technique described in the previous section [31]. In Table 2, the physical characteristics are summarized. In contrast to the circuit design in the FLUX processor, JTLs are used in all gate-to-gate interconnections. For this reason, more than 100 unique JTL cells were prepared. PTLs are also used between circuit components to decrease the interconnection delay. All circuit components of the CORE processor have been implemented by the NEC 2.5 kA/cm<sup>2</sup> Nb-doublelayer process [10] and their correct operations were confirmed. Tested dc bias margins of the major circuit components are listed in Table 3. It should be noted that

**Table 2** Physical Characteristics of the CORE 1 Processor

| Feature                  | Parameter                                   |  |
|--------------------------|---------------------------------------------|--|
| Fabrication Technology   | 2.0 µm Nb-doublelayer process with junction |  |
|                          | critical density 2.5 kA/cm <sup>2</sup>     |  |
| Target clock frequency   | 16 GHz                                      |  |
| Josephson junction count | 4,999                                       |  |
| Die size                 | 1.8 mm x 2.8 mm                             |  |
| Total bias current       | 0.64 A                                      |  |
| Power dissipation        | 1.6 mW                                      |  |

**Table 3**Tested DC Bias Margin of the CORE 1 Processor and Its Components

| Circuit component               | Josephson junction count   | Tested DC bias margin |
|---------------------------------|----------------------------|-----------------------|
| Circuit component               | Josephson Junction count   | rested DC bias margin |
| ALU with REG0 and REG0          | 1600                       | -5% ~ +19% @15GHz     |
| [67]                            |                            |                       |
| 8-bit instruction register [68] | 448                        | -27% ~ 19% @16GHz     |
| 5-bit program counter [68]      | 548                        | -12% ~ 31% @1kHz      |
| Controller [68]                 | 2399                       | -19% ~ 21% @1kHz      |
| 32-bit SFQ RAM [69]             | 2300 (including high speed | +/- 5% @20GHz         |
|                                 | test system)               |                       |
| CORE1 processor [72]            | 4999 (RAM is not included) | -5% ~ +19% @12GHz     |

their high-speed operation was confirmed by an on-chip high-speed test technique [70], [71]. Fig. 18 shows a photograph of the CORE1 processor. The processor is made up of 4999 Josephson junctions on 1.8 mm  $\times$  2.8 mm area. Complete operation for all instructions has been confirmed at clock rate up to 15.2 GHz just recently [72]. The power dissipation was estimated to be 1.6 mW. Though its performance is trivial, 200 million 8-b operation/s, it is the first successful demonstration of an SFQ processor.

# VI. PRESENT STATUS OF HTS TECHNOLOGY FOR DIGITAL APPLICATION

In 1986, high-temperature superconductors (HTSs) were discovered [73], and then many kind of HTSs whose critical temperatures were higher than the boiling point of liquid nitrogen (77 K) were developed. It was expected that systems made of HTSs could be operated at higher temperatures rather than the liquid helium temperature (4.2 K), resulting in smaller system size and lower cost of refrigeration, hence, lower cost of systems. Much effort has been made to develop thin-film, junction, and junction integration technologies so far. It has taken a long time to develop these technologies because HTS materials are so difficult.

However, the technologies have gradually progressed. Recently, a ring oscillator circuit consisting of 21 junctions made of  $YBa_2Cu_3O_y$  was successfully operated with a SFQ circulating frequency of 29 GHz at 30 K [74]. The present status of HTS junction integration technology is at a level to fabricate circuits including less than 100 junctions [75].

Taking the lower integration level, but higher  $I_cR_n$  products, hence, higher intrinsic speed of HTS junctions into account, it is considered that front-ends of A/D converters [76]–[78] and samplers with wide bandwidths [79] are realistic applications for HTS junctions.

#### VII. SUMMARY

Superconducting digital electronics is being advanced by changing the logic system from latching to SFQ logic. Recent developments in this field are mainly concerned with circuits based on SFQ logic. The scale of SFQ circuits has been rapidly increased by the progress of the circuit designing technologies. Top-down design technology is inevitably needed to produce large scale circuits with enough operating margins. The development of automated design tools enable us to produce circuits of nearly 10<sup>4</sup> junctions.

Moreover, superconducting technology has large headroom compared with semiconductor one. Nb integration technology with nearly 1- $\mu$ m features, which is now used, makes circuits that can operate at clock frequencies of about 80 GHz, while CMOS technology is using a 0.09- $\mu$ m feature. This means that with superconducting technology it is possible to increase circuit performance by scaling.

As possible applications of SFQ circuits, switches for high-end routers and high-end computers were described. However, many technologies such as packaging, interfaces between low temperature and room temperature, and refrigerators need further development to build systems. More effort is needed to put products into use in the market place.





**Fig. 18.** A CORE1, prototype SFQ processor. (a) A microphotograph of a fabricated chip. (b) Layout of the components.

#### REFERENCES

- [1] D. A. Buck, "The cryotron—A superconductive computer component," *Proc. IRE*, vol. 44, p. 482, 1956.
- [2] B. D. Josephson, "Possible new effect in superconductive tunneling," *Phys. Lett.*, vol. 1, pp. 251–253, 1962.
- [3] P. W. Anderson and J. M. Rowell, "Probable observation of the Josephson supeconducting tunnel effect," *Phys. Rev. Lett.*, vol. 10, p. 230, 1963.
- [4] J. Matisoo, "The tunneling cryotron—A superconductive logic element based on electron tunneling," *Proc. IEEE*, vol. 55, p. 172, 1967.
- [5] IBM J. Res. Dev. (Special Issue on Josephson Computer Technology), vol. 24, no. 2, Mar. 1980.
- [6] A. L. Robinson, "IBM drops superconducting computer project," Science, vol. 222, pp. 492–494, 1983.
- [7] S. Kotani, A. Inoue, T. Imamura, and S. Hasuo, "A 1-GOPS 8-bit Josephson digital signal processor," in *IEEE Int. Solid-State Circuits Conf.*, Tech. Dig., 1990, pp. 148–149.
- [8] Y. Hatano, S. Yano, H. Mori, H. Yamada, M. Hirano, and U. Kawabe, "A. 4-bit Josephson date processor with dc output buffer," in Extended Abstracts Int. Superconductive Electronics Conf. (ISEC'89), pp. 375–380.
- [9] H. Nakagawa, I. Kurosawa, M. Aoyagi, S. Kosaka, Y. Hamazaki, Y. Okada, and S. Takada, "A 4-bit Josephson computer ETL-JC1," *IEEE Trans. Appl. Superconduct.*, vol. 1, pp. 37–47, Mar. 1991.
- [10] S. Nagasawa, Y. Hashimoto, H. Numata, and S. Tahara, "A 380 ps, 9.5 mW Josephson 4 Kbit RAM operated at high bit yield," *IEEE Trans. Appl. Superconduct.*, vol. 5, pp. 2447–2452, June 1995.
- [11] R. E. Jewwtt and T. Van Duzer, "Low-probability punchthrough in Josephson junctions," *IEEE Trans. Magn.*, vol. MAG-17, pp. 599–602, Jan. 1981.

- [12] K. Nakajima, G. Oya, and Y. Sawada, "Fluxoid motion in phase mode Josephson switching system," *IEEE Trans. Magn.*, vol. MAG-19, pp. 1201–1204, Mar. 1983.
- [13] K. K. Likharev, O. A. Mukhanov, and V. K. Semenov, "Resistive single flux quantum logic for Josephson junction technology," in *Proc. 3rd Int. Conf. Superconducting Quantum Devices*, W. de Gruyter, Ed., 1985, pp. 1103–1108.
- [14] V. P. Koshelets, K. K. Likharev, V. V. Migulin, O. A. Mukhanov, G. A. Ovsyanikov, V. K. Semenov, I. L. Serpuchenko, and A. N. Vystavkin, "Experimental realization of a resistive single flux quantum logic circuit," *IEEE Trans. Magn.*, vol. 23, pp. 755–758, Mar. 1987.
- [15] J. P. Hurrell, D. C. Pridmore-Brown, and A. H. Silver, "Analog-to-digital conversion with unlatched SQUIDs," *IEEE Trans. Electron Devices*, vol. ED-27, pp. 1857–1869, Oct. 1980.
- [16] O. Mukhanov, V. Semenov, and K. Likharev, "Ultimate performance of the RSFQ logic circuits," *IEEE Trans. Magn.*, vol. MAG-23, pp. 759–762, Mar. 1987.
- [17] W. Chen, A. V. Rylyakov, V. Patel, J. E. Lukens, and K. K. Likharev, "Superconductor digital frequency divider operating up to 750 GHz," Appl. Phys. Lett., vol. 73, pp. 2817–2819, 1998.
- GHz," *Appl. Phys. Lett.*, vol. 73, pp. 2817–2819, 1998.
  [18] Q. P. Herr and M. J. Feldman, "Error rate of a superconducting circuit," *Appl. Phys. Lett.*, vol. 69, pp. 694–695, 1996.
- [19] C. A. Hamilton and K. C. Gilbert, "Margins and yields in single flux quantum logic," *IEEE Trans. Appl. Superconduct.*, vol. 1, pp. 157–163, Dec. 1991.
- [20] S. Polonsky, P. Shevchenko, A. Kirichenko, D. Zinoviev, and A. Rylyakov, "PSCAN'96: New software for simulation and optimization of complex RSFQ circuits," *IEEE Trans. Appl. Superconduct.*, vol. 7, pp. 2685–2689, June 1997.
- [21] WinS [Online]. Available: http://home.pacbell.net/kapl/wins/wins.
- [22] Q. P. Herr and M. J. Feldman, "Multiparameter optimization of RSFQ circuits using the method of inscribed hyperspheres," *IEEE Trans. Appl. Superconduct.*, vol. 5, pp. 3337–3340, June 1995.
- [23] T. Harnisch, J. Kunert, H. Toepfer, and H. F. Uhlmann, "Design centering methods for yield optimization of cryoelectronic circuits," *IEEE Trans. Appl. Superconduct.*, vol. 7, pp. 3434–3437, June 1997.
- [24] N. Yoshikawa and K. Yoneyama, "Parameter optimization of single flux quantum digital circuits based on Monte Carlo yield analysis," *IEICE Trans. Electron.*, vol. E83-C, pp. 75–80, 2000.
  [25] N. Mori, A. Akahori, T. Sato, N. Takeuchi, A. Fujimaki, and H.
- [25] N. Mori, A. Akahori, T. Sato, N. Takeuchi, A. Fujimaki, and H. Hayakawa, "A new optimization procedure for single flux quantum circuits," *Physica C*, vol. 357–360, pp. 1557–1560, 2001.
- [26] N. Yoshikawa, K. Fujiwara, and H. Hoshina, "A survey of optimization methods and tools for superconducting circuits—A bench mark test of circuit optimization tools—," in *Papers Tech. Meeting Metal and Ceramics*, vol. MC-00-11, 2000, pp. 53–58. In Japanese.
- [27] K. Gaj, Q. Herr, V. Alder, A. Krasniewski, E. Friedman, and M. Feldman, "Tools for the computer-aided design of multigigahertz superconducting digital circuits," *IEEE Trans. Appl. Superconduct.*, vol. 9, pp. 18–38, Mar. 1999.
- [28] K. Gaj, Q. Herr, V. Alder, D. Brock, E. Friedman, and M. Feldman, "Toward a systematic design methodology for large multigigahertz rapid single flux quantum circuits," *IEEE Trans. Appl. Superconduct.*, vol. 9, pp. 4591–4606, Sept. 1999.
- [29] Y. Kameda and S. Yorozu, "Automatic Josephson-transmission-line routing for single-flux-quantum cell-based logic circuits," *IEEE Trans. Appl. Superconduct.*, vol. 9, pp. 519–522, June 2003.
- [30] SUNY RSFQ Cell Library [Online]. Available: http://pavel. physics.sunysb.edu/RSFQ/Lib/
- [31] S. Yorozu, Y. Kameda, H. Terai, A. Fujimaki, T. Yamada, and S. Tahara, "A single flux quantum standard logic cell library," *Physica C*, vol. 378–381, pp. 1471–1474, 2002.
- [32] P. Bunyk and S. Rylov, "Automated calculation of mutual inductance matirices of multilayer superconductor integrated circuits," in *Extended Abstracts Int. Superconductive Electronics Conf. (ISEC'93)*, 1993, p. 62.
- [33] V. Semenov, A. Ryzhikh, and Y. Polyakov, "Decimation filters based on RSFQ logic/memory cells," in *Extended Abstracts Int. Superconductive Electronics Conf. (ISEC'97)*, vol. 2, 1997, pp. 344–346.
- [34] Y. Hashimoto, S. Yorozu, Y. Kameda, and V. Semenov, "A desgin approach to passive interconnects for single flux quantum logic circuits," *IEEE Trans. Appl. Superconduct.*, vol. 13, pp. 535–538, June 2003.
- [35] Q. P. Herr, M. S. Wire, and A. B. Smith, "Ballistic SFQ signal propagation on-chip and chip-to-chip," *IEEE Trans. Appl. Superconduct.*, vol. 13, pp. 463–466, June 2003.
- [36] A. Yu. Kidiyarova-Shevchenko, "RSFQ asynchronous serial multiplier and spreading codes generator for multiuser detector," *IEEE Trans. Appl. Superconduct.*, vol. 13, pp. 429–432, June 2003.

- [37] S. Tahara, S. Yorozu, and H. Matsuoka, "A superconductive ringpipelined network system," *IEEE Trans. Appl. Superconduct.*, vol. 5, pp. 3164–3167, June 1995.
- [38] M. Hosoya and S. Kominami, "A 4 × 4 superconducting Batcher–Banyan switch," *IEEE Trans. Appl. Superconduct.*, vol. 7, pp. 3817–3823, Dec. 1997.
- [39] D. Zinoviev and K. Likharev, "Feasibility study of RSFQ-based self-routing nonblocking digital switches," *IEEE Trans. Appl. Superconduct.*, vol. 7, pp. 3155–3163, June 1997.
- [40] S. Yorozu, Y. Kameda, and S. Tahara, "A hybrid switch system architecture for large-scale digital communication network using SFQ technology," *IEICE Trans. Electron.*, vol. E84-C, no. 1, pp. 15–19, 2001.
- [41] H. Terai, Y. Kameda, S. Yorozu, A. Kawakami, N. Yoshikawa, and Z. Wang, "High-speed testing of Tandem-Banyan network switch component," *Physica C*, vol. 392–396, pp. 1485–1489, 2003.
- [42] Y. Kameda, S. Yorozu, M. Hidaka, and S. Tahara, "Successful operation of single flux quantum 2 × 2 unit switch," *Physica C*, vol. 378–381, pp. 1446–1470, 2002.
- [43] N. Dubash, P. Yuh, and V. Borzenets, "SFQ data communication switch," *IEEE Trans. Appl. Superconduct.*, vol. 7, pp. 2681–2684, June 1997.
- [44] Q. Ke, B. Dalrymple, D. Durand, and J. Spargo, "Single flux quantum crossbar switch," *IEEE Trans. Appl. Superconduct.*, vol. 7, pp. 2968–2971, June 1997.
- [45] A. Worsham, A. Miklich, D. Miller, J. Kang, and J. Przybysz, "Single flux quantum circuits for 2.5 Gbps data switching," *IEEE Trans. Appl. Superconduct.*, vol. 7, pp. 2476–2479, June 1997.
- [46] S. Yorozu, Y. Kameda, Y. Hashimoto, H. Terai, A. Fujimaki, and N. Yoshikawa, "Single flux quantum circuit technology innovation for backbone router applications," *Physica C*, vol. 392–396, pp. 1478–1484, 2003.
- [47] N. Dubash, V. Borzenets, Y. Zhang, V. Kaplunenko, J. Spargo, A. Smith, and T. Van Duzer, "System demonstration of a multigigabit network switch," *IEEE Trans. Microwave Theory Tech.*, vol. 48, pp. 1209–1215, July 2000.
- [48] S. Yorozu, Y. Kameda, Y. Arakawa, H. Hayakawa, and S. Tahara, "A novel single flux quantum speed conversion buffer for the internal speedup architecture," *Physica C*, vol. 372–376, pp. 127–130, 2002.
- [49] S. Hasuo and T. Imamura, "Digital logic circuits," *Proc. IEEE*, vol. 77, pp. 1177–1193, Aug. 1989.
- [50] T. Van Duzer, "Superconductor electronics," *IEEE Trans. Appl. Su-perconduct.*, vol. 7, pp. 98–111, June 1997.
- [51] T. Sakurai, "VLSI's in the year 2010 and beyond," *JSAP Int.*, no. 3, pp. 15–21, 2001.
- [52] International Technology Roadmap for Semiconductors (2003). [Online]. Available: http://public.itrs.net/
- [53] P. Yuh and O. A. Mukhanov, "Design and testing of rapid single flux quantum shift registers with magnetically coupled readout gates," *IEEE Trans. Appl. Superconduct.*, vol. 2, pp. 214–221, Dec. 1002
- [54] U. Ghoshal, D. Hebert, and T. Van Duzer, "Josephson-CMOS memories," in 1993 ISSCC Dig. Tech. Papers, vol. 33, pp. 44–54.
- [55] M. Dorojevets, P. Bunyk, and D. Zinoviev, "FLUX chip design of 20-GHz 16-bit ultrapipelined RSFQ processor prototype based on 1.75-\(\mu\)m LTS technology," *IEEE Trans. Appl. Superconduct.*, vol. 11, pp. 326–332, Mar. 2001.
- [56] M. Dorojevets, "An 8-bit FLUX-1 RSFQ microprocessor built in 1.75 μm technology," *Physica C*, vol. 378–381, pp. 1446–1453, 2002.
- [57] M. Dorojevets and P. Bunyk, "Architectural and implementation challenges in designing high-performance RSFQ processors: A FLUX-1 microprocessor and beyond," *IEEE Trans. Appl. Super*conduct., vol. 13, pp. 446–449, June 2003.
- [58] P. Bunyk, M. Leung, J. Spargo, and M. Dorojevets, "FLUX-1 RSFQ microprocessor: Physical design and test results, A FLUX-1 microprocessor and beyond," *IEEE Trans. Appl. Superconduct.*, vol. 13, pp. 433–436, June 2003.
- [59] T. Sterling. A hybrid technology multithreaded architecture for petaflops computing. presented at CACR. [Online]. Available: http://htmt.cacr.caltech.edu/Overview.html
- [60] M. Dorojevets, P. Bunyk, D. Zinoviev, and K. Likharev, "COOL-0: Design of an RSFQ subsystem for petaflops computing," *IEEE Trans. Appl. Superconduct.*, vol. 9, pp. 3606–3614, June 1999.
- [61] A. Fujimaki, Y. Takai, and N. Yoshikawa, "High-end server based on complexity-reduced architecture for superconducting technology," *IEICE Trans. Electron.*, vol. E85-C, pp. 612–616, Mar. 2002.
- [62] A. Fujimaki, H. Terai, S. Yorozu, and N. Yoshikawa, "Recent SFQ research in Japan," presented at the Superconductivity Electronics Conf., Sydney, NSW, Australia, 2003.

- [63] G. Kerber, L. Abelson, M. Leung, Q. Herr, and M. Johnson, "A high density 4 kA/cm<sup>2</sup> Nb integrated circuit process," *IEEE Trans. Appl. Superconduct.*, vol. 11, pp. 1061–1065, Mar. 2001.
- Superconduct., vol. 11, pp. 1061–1065, Mar. 2001.
  [64] P. Bunyk and Q. P. Herr, "FLUX-1 RSFQ microprocessor: Current status and test results," presented at the 9th Int. Superconductive Electronics Conf., Sydney, NSW, Australia, 2003.
- [65] Q. P. Herr, M. S. Wire, and A. D. Smith, "High speed data link between digital superconductor chips," *Appl. Phys. Lett.*, vol. 80, pp. 3210–3212, 2002.
- [66] N. Yoshikawa, F. Matsuzaki, N. Nakajima, K. Fujiwara, K. Yoda, and K. Kawasaki, "Design and component test of a tiny processor based on the SFQ technology," *IEEE Trans. Appl. Superconduct.*, vol. 13, pp. 441–445, June 2003.
- [67] M. Tanaka, T. Kondo, A. Sekiya, A. Fujimaki, H. Hayakawa, F. Matsuzaki, N. Yoshikawa, H. Terai, and S. Yorozu, "Component test toward single-flux-quantum processors," *Physica C*, vol. 392–396, pp. 1490–1494, 2003.
- [68] N. Nakajima, F. Matsuzaki, Y. Yamanashi, N. Yoshikawa, M. Tanaka, T. Kondo, A. Fujimaki, H. Terai, and S. Yorozu, "Design and implementation of circuit components of the SFQ microprocessor, CORE 1," presented at the 9th Int. Superconductivity Conf., Sydney, NSW, Australia, 2003.
- [69] K. Fujiwara, Y. Yamashiro, N. Yoshikawa, A. Fujimaki, H. Terai, and S. Yorozu, "Design and high-speed test of 4 × 8-bit SFQ shift register files," *Supercond. Sci. Technol.*, to be published.
- [70] Z. J. Deng, N. Yoshikawa, S. R. Whiteley, and T. Van Duzer, "Data-driven self-timed RSFQ high speed test system," *IEEE Trans. Appl. Superconduct.*, vol. 7, pp. 3830–3833, Dec. 1997.
- [71] T. Yamada, A. Sekiya, A. Akahori, H. Akaike, A. Fujimaki, H. Hayakawa, Y. Kameda, S. Yorozu, and H. Terai, "On-chip test of the shift-register for high-end network switch based on cell-based design," *Supercond. Sci. Technol.*, vol. 14, pp. 1071–1074, Dec. 2001
- [72] M. Tanaka, F. Matsuzaki, T. Kondo, N. Nakajima, Y. Yamanashi, A. Fujimaki, H. Hayakawa, N. Yoshikawa, H. Terai, and S. Yorozu, "Demonstration of a prototype of the microprocessor based on the single-flux-quantum logic," presented at the 2004 Int. Solid-State Circuit Conf. (ISSCC 2004), San Francisco, CA.
- [73] J. G. Bednortz and K. A. Muller, "Possible high-T<sub>c</sub> superconductivity in Ba–La–Cu–O system," Z. Phys., vol. B64, pp. 189–193, 1986.
- [74] H. Katsuno, T. Nagano, K. Nakayama, and J. Yoshida, "HTS-SFQ ring oscillator circuit fabricated with a nobel multilayer structure," *Physica C*, vol. 392–396, pp. 1433–1440, 2003.
- [75] H. Wakana, S. Adachi, A. Kamitani, H. Sugiyama, T. Sugano, M. Horibe, Y. Ishimaru, Y. Tarutani, and K. Tanabe, "Fabrication of interface-modified ramp-edge junction on YBCO ground plane with multilayer structure," *Physica C*, vol. 392–396, pp. 1322–1327, 2003
- [76] J. D. McCambridge, M. G. Forrester, D. L. Miller, B. D. Hunt, J. X. Pryzbysz, J. Talvacchio, and R. M. Young, "Multilayer HTS SFQ analog-to-dugital converters," *IEEE Trans. Appl. Superconduct.*, vol. 7, pp. 3622–3625, June 1997.
- [77] A. H. Sonnenberg, I. Oomen, H. Hilgenkamp, G. L. Gerritsma, and H. Rogalla, "Sigma-delta A/D converter in HTS ramp edge technology," *IEEE Trans. Appl. Superconduct.*, vol. 11, pp. 200–204, Mar. 2001
- [78] K. Saitoh, F. Furuta, Y. Soutome, T. Fukazawa, and K. Takagi, "Investigation of basic properties of an HTS sigma-delta modulator," *Physica C*, vol. 378–381, pp. 1429–1434, 2002.
- [79] M. Maruyama, M. Hidaka, and T. Satoh, "Improved high-Tc superconductot sampler circuits using Josephson transmission line buffers," *IEEE Trans. Appl. Superconduct.*, vol. 13, pp. 401–404, June 2003.



**Hisao Hayakawa** received the B.S., M.S., and Ph.D. degrees from Nagoya University, Nagoya, Japan, in 1963, 1965, and 1973, respectively. His Ph.D. research was on acoustoelectronics.

He was with the Electrotechnical Laboratory (ETL) of the Ministry of International Trade and Industries. He started research on superconducting digital electronics in 1976 as the Head of the cryoelectronics section of ETL. In 1986, he moved to Nagoya University to be a Professor in the Department of Electronics. He is now

engaging in the NEDO project called "Superconducting Network Devices" as the Project Leader.



Nobuyuki Yoshikawa received the B.E., M.E., and Ph.D. degrees in electrical and computer engineering from Yokohama National University, Yokohama, Japan, in 1984, 1986, and 1989, respectively.

Since 1989, he has been working in the Department of Electrical and Computer Engineering, Yokohama National University, where he is currently a Professor. His research interests include superconductive devices and their application in digital and analog circuits. He is also interested



Dr.Eng. degrees from Tohoku University, Sendai, Japan, in 1982, 1984, and 1987, respectively. He was a Visiting Assistant Research Engineer at the University of California, Berkeley, in 1987. Since 1988, he has been working on superconductor devices and circuits at the School of Engineering, Nagoya University, Nagoya, Japan.

His current research interests include single-flux

quantum circuits and their applications such as

communication system, multiprocessor system.

Akira Fujimaki received the B.E., M.E., and

in single-electron-tunneling devices and quantum computing devices.



Shinichi Yorozu received the M.S. and Ph.D. degrees in applied physics from the University of Tokyo, Tokyo, Japan in 1990 and 1993, respectively.

In 1993, he joined NEC Corporation, where he was engaged in the research of superconducting electronics and systems. From 1997 to 1998, he was a Research Scientist at the State University of New York, Stony Brook. Since 2002, he has been assigned as a Member of the Superconducting Research Laboratory, International

Superconductivity Technology Center (ISTEC-SRL), Tsukuba, Japan. Dr. Yorozu is a Member of the Physical Society of Japan and the Japan Society of Applied Physics.