# A 12-b 5-MSample/s Two-Step CMOS A/D Converter

Behzad Razavi, Member, IEEE, and Bruce A. Wooley, Fellow, IEEE

Abstract-Two-step flash architectures are an effective means of realizing high-speed, high-resolution analog-to-digital converters (ADC's) because they can be implemented without the need for operational amplifiers having either high gain or a large output swing. Moreover, with conversion rates approaching half those of fully parallel designs, such half-flash architectures provide both a relatively small input capacitance and low power dissipation. This paper describes the design of a 12-b, 5-Msample/s A/D converter that is based on a two-step flash topology and has been integrated in a 1-µm CMOS technology. Configured as a fully differential circuit, the converter performs a 7-b coarse flash conversion followed by a 6-b fine flash conversion. Both analog and digital error correction are used to achieve a resolution of 12 b. The converter dissipates only 200 mW from a single 5-V supply and occupies an area of  $2.5 \text{ mm} \times 3.7 \text{ mm}$ .

# I. Introduction

THE potential of two-step flash architectures for realizing fast, high-resolution analog-to-digital converters (ADC's) has been demonstrated in a number of designs [1], [2]. With conversion rates approaching half those of fully parallel (flash) ADC's, these architectures provide a relatively small input capacitance together with low power dissipation and can be used to achieve resolutions in the range of 10 to 14 b, which is well above that obtainable in single-stage flash designs.

This paper introduces a 12-b, 5-Msample/s two-step ADC that has been integrated in a  $1-\mu m$  CMOS technology. Configured as a fully differential architecture, the circuit employs both analog and digital correction and dissipates only 200 mW of power. The design avoids the use of operational amplifiers with high gain or output swings greater than a few hundred millivolts and is thus well-suited to a single-supply implementation using small-geometry devices. The circuit uses a segmented-capacitor interstage DAC to achieve high speed and rail-to-rail swings when converting the digital output of the first stage to an analog signal.

The converter architecture and timing are described in

Manuscript received April 20, 1992; revised July 10, 1992. This work was supported by the Army Research Office under Contract DAALO3-91-0-0088

- B. Razavi was with the Center for Integrated Systems, Stanford University, Stanford, CA 94305. He is now with AT&T Bell Laboratories, Holmdel, NJ.
- B. A. Wooley is with the Center for Integrated Systems, Stanford University, Stanford, CA 94305.

IEEE Log Number 9203618.

Section II. Details of the circuits in the first stage, the interstage processing, and the design of the circuits in the second stage are then presented in Sections III through V, respectively. The overlap between the conversion ranges of the two stages and the corresponding digital correction algorithm are described in Section VI. Layout considerations for a prototype implementation are discussed in Section VII, and experimental results are presented in Section VIII.

While the actual circuits are fully differential, most of the building blocks are illustrated herein in single-ended form in order to simplify the description of their operation.

#### II. ADC ARCHITECTURE

The ADC architecture and timing diagram are shown in Fig. 1. The converter consists of a 7-b coarse flash stage, a 7-b DAC, a subtractor, and a 6-b fine flash stage. One-of-N decoders and ROM's are used to convert the thermometer code outputs of the two flash stages to binary data, which is then corrected digitally to produce the final output. One bit of redundancy, or overlap, is used in this architecture to enable the second stage to correct for outof-range errors in the first stage, thereby relaxing the precision required of the first-stage comparators. Furthermore, the fully differential architecture increases the input dynamic range, eliminates even-order harmonic distortion, and suppresses common-mode noise due to supply transients and substrate coupling. With the present design, an external sample-and-hold circuit must be used in order to digitize high-frequency analog inputs.

Controlled by two clocks,  $\Phi_1$  and  $\Phi_2$ , the system of Fig. 1 operates as follows. In the sampling/offset-cancellation mode,  $\Phi_1$  and  $\Phi_2$  are high, an external sample-and-hold circuit samples the analog input, the comparators in the two flash stages are in the offset-storage mode, and the DAC and the subtractor are reset. When  $\Phi_1$  goes low, the first-stage comparators are strobed to produce a digital estimate of the analog input, and the second-stage comparators begin to track the subtractor output. After the first stage, the DAC, and the subtractor have responded,  $\Phi_2$  goes low to strobe the second-stage comparators and perform the fine conversion. This timing arrangement cancels offsets in every cycle and thus also suppresses the effects of flicker noise at the input of the comparators.

A simplified single-ended functional diagram of the



Fig. 1. A/D converter architecture and timing.



Fig. 2. Simplified single-ended functional diagram of ADC

ADC is shown in Fig. 2. The first stage comprises 128 comparators and is followed by a segmented-capacitor DAC. The output of the DAC is directly coupled to the subtractor, which is implemented as a switched-capacitor circuit. The subtractor drives the 64 comparators in the fine flash stage.

While it is possible to use the first-stage resistor ladder as the interstage DAC [8], this design employs an independent DAC circuit so as to avoid corrupting its sensitive output with kickback noise generated by the first-stage comparators.

At a 5-MHz clock rate, the converter allows approximately equal time intervals for input sampling and conversion, i.e., 100 ns for each. This timing constrains the total delay of the flash stages, DAC, and subtractor, and thus requires a careful allocation of speed and precision among these elements. In order to ease the accuracy required of the subtractor, which is potentially the slowest circuit in the signal path, the converter design relies extensively upon precision voltage comparison and D/A conversion. In particular, the comparators in the fine flash stage are designed to resolve inputs as small as 1 LSB ( $\approx 2.4$  mV) so that the subtractor need have a closed-loop gain of only one, thereby maximizing its speed.

# III. COARSE FLASH STAGE

The coarse flash stage estimates the seven most significant bits of the sampled signal, producing both a thermometer-code output to drive the interstage DAC and a binary output that is subsequently corrected digitally. One bit of overlap between this stage and the fine flash stage provides for the correction of offsets as large as 30 mV in the first-stage comparators, thereby simplifying their design.

# A. First-Stage Comparator

The design of the first-stage comparators (FSC's) plays a crucial role in the overall system performance. While their input offset voltage, delay, and input voltage range directly influence the resolution and speed of the coarse stage, the input capacitance, power dissipation, and complexity of these 128 comparators can become large enough to degrade other system parameters as well.

The FSC input offset must be consistent with the error range covered by redundancy and digital correction. As discussed in Section VI, with one bit of redundancy this range is approximately 40 mV for a differential input range of 10 V. However, in order to allow for integral nonlinearity and gain error in the first stage and the DAC, the FSC input offset has been maintained below 20 mV. Because typical mismatches in the width, length, and threshold voltage of small-geometry transistors can result in offsets as high as 50 mV, the comparator employs a simple offset cancellation.

Fig. 3 is a schematic of the fully differential FSC circuit. It consists of an input sampling network S1–S4, C1, and C2, a regenerative amplifier M1–M4, reset switches S5–S7, and inverters N1–N2. Inverter N1 simply provides symmetric loading of the circuit to avoid systematic offsets. The circuit functions as follows. In the offset-cancellation mode,  $\Phi_1$  is high, S3–S7 are on, nodes P and Q charge to  $V_{\rm REF}^+$  and  $V_{\rm REF}^-$ , respectively, and the amplifier's offset is stored on C1 and C2. In the transition from this mode to the comparison mode,  $\Phi_1$  goes low, turning off S3–S7 and turning on S1 and S2. The circuit then begins to regenerate, amplifying the difference between  $V_{\rm IN}^+$  –  $V_{\rm IN}^-$  and  $V_{\rm REF}^+$  –  $V_{\rm REF}^-$ . The waveforms at nodes X and Y are shown in Fig. 4, wherein the amplitude of  $\Phi_1$  (= 5 V) is reduced for clarity.

Since the FSC begins regeneration with M1-M4 already on, its offset voltage is relatively independent of the clock fall time. Nonetheless, if S5-S7 turn off simultaneously, any mismatch in the charge injected onto the gates and drains of M1 and M2 can cause a false regeneration around M3 and M4, resulting in a large input-referred offset. In order to avoid this problem, the circuit actually uses delayed versions of  $\Phi_1$  to realize the following sequence: 1) turn S3-S6 off to end the reset mode, 2) turn S1 and S2 on to begin the tracking mode, and 3) turn S7 off to allow regeneration around M3 and M4. Switching S5 and S6 off before turning S1 and S2 on significantly lowers the capacitance seen at the inputs when S1



Fig. 3. First-stage comparator.



Fig. 4. Output waveforms of first-stage comparator ( $\Phi_1$  not to scale).

and S2 turn on because C1 and C2 then appear in series with the gates of transistors M1 and M2.

The residual offset of the first-stage comparator can be expressed as

$$V_{\rm OS} = \frac{V_{\rm OS1}}{A_d} + \Delta V \tag{1}$$

where  $V_{\rm OS1}$  is the input-referred offset without offset cancellation,  $A_d$  is the differential voltage gain from the gates of M1 and M2 to their drains, and  $\Delta V$  is the offset due to charge injection mismatch between S5 and S6. It can be shown that

$$A_d = \frac{g_{mN}R_7}{\frac{1}{7}g_{mP}R_7 - 1} \tag{2}$$

where  $g_{mN}$  and  $g_{mP}$  are the NMOS and PMOS transconductance values, respectively, and  $R_7$  is the small-signal on-resistance of S7. In this expression, the numerator represents the linear gain provided by M1 and M2, while the denominator reflects the regenerative amplification provided by M3 and M4. To avoid undue variations in  $A_d$ , M3, M4, and S7 are sized so that  $g_{mP}R_7/2$  is confined between 0.5 and 0.7 for all process corners, yielding  $2g_{mN}$   $R_7 (\approx 5) < A_d < 3.3 g_{mN}R_7 (\approx 7.3)$ . Also, M1-M4 are



Fig. 5. First-stage layout.

sized so as to ensure that, during reset, the voltages at nodes X and Y are sufficiently close to the supply rails to provide well-defined logic levels at the inputs of inverters N1 and N2. Since NMOS transistors M1 and M2 have wide channels for high gain, while PMOS devices M3 and M4 have long channels for better matching, the reset voltages at nodes X and Y are closer to ground than to  $V_{DD}$  and should be interpreted as logic zeros by the inverters under worst-case conditions.

The first-stage comparator dissipates 0.55 mW at 5 MHz, and its measured offset is less than 5 mV.

# B. Reference Division and Output Decoding

The floor plan of the first stage layout is illustrated in Fig. 5. As indicated in this figure, the layout is folded at its midpoint so as to reduce the effect of process gradients. In order to conform to this layout, two resistor ladders are used to establish the differential reference voltages for the comparators in the first stage.

The resistor ladders subdivide the range between reference voltages  $V_{\rm REF}^+$  and  $V_{\rm REF}^-$ , generating 128 differential voltages that vary, in steps of 32 LSB, from  $V_{\rm REF}^-$  –  $V_{\rm REF}^+$  to zero on the left side of Fig. 5 and from +32 LSB to  $V_{\rm REF}^+$  –  $V_{\rm REF}^-$  on the right. These voltages are stored in their corresponding comparators during offset cancellation. Each ladder is itself folded at its midpoint and consists of 128 polysilicon resistors, each with a width of 10  $\mu$ m, a length of 15  $\mu$ m, and a sheet resistivity of 20  $\Omega/\Box$ .

An important aspect of the first stage design is the dynamic loading of the FSC's on the reference ladders [6]. With the values chosen for the FSC input capacitance ( $\approx 100$  fF) and the unit resistor of these ladders ( $\approx 30~\Omega$ ), simulations show that by the end of the offset-cancellation mode the ladder tap voltages settle to within a few LSB of their final value. The residual error is well within the range that is corrected digitally following the second-stage conversion.

The two halves of the interstage DAC are driven by the FSC's on the two sides of Fig. 5 and share their output nodes, which connect to the subtractor inputs.

The decoding logic in the first stage consists of one level

of two-input NAND gates followed by a ROM. As shown in Fig. 5, the ROM consists of two 6-by-64 sections. These sections use separate sense amplifiers and subsequent logical combinations to estimate the 7 MSB's.

# IV. INTERSTAGE PROCESSING

The interstage processing entails converting the digital estimate generated by the first stage to an analog signal that is subtracted from the main input signal. These operations are performed using switched-capacitor circuit techniques.

# A. D/A Converter

Charge redistribution DAC's have been extensively used in high-resolution CMOS converters [3]-[5], primarily because their output voltage range extends to both rails and their settling time is only weakly related to resolution and linearity. These DAC's are often designed for a binary input and hence employ binary-weighted capacitor arrays. However, in two-step flash ADC's the coarse conversion typically provides a thermometer code output, which suggests the use of a "linear," or segmented, capacitor array for the DAC that can be driven directly by the outputs of the first-stage comparators.

Fig. 6 illustrates a single-ended realization of a linear-array capacitor DAC of the type implemented in a fully differential form in the ADC. This circuit consists of an array of equal capacitors that share the same top plate and have their bottom plates driven through CMOS switches. Switch  $S_P$  discharges the top plate to ground during reset, while at the same time the thermometer code is set to zero. Following reset, a thermometer code of height n (i.e., n ONES) at the input results in an output voltage of

$$V_0 = \frac{n}{N} V_{\text{REF}} \tag{3}$$

where N is the total number of capacitors.

The DAC of Fig. 6 has several advantages over binary-weighted DAC's. First, it avoids the need for thermometer-binary code conversion in the critical path of the ADC, thereby improving the converter's speed. Second, the transfer characteristic of the DAC is guaranteed to be monotonic, and third, the DAC lends itself to simple layout and routing because each capacitor can be included with its corresponding comparator.

In order to achieve a fast settling time in the DAC, relatively wide MOS switches  $[(W/L)_{\rm NMOS} = 20/1, (W/L)_{\rm PMOS} = 40/1]$  are used to drive the bottom plates of the capacitors, and the sizes of the transistors in the first-stage comparator are tapered to drive these switches with minimum delay. Simulations indicate that the DAC settles to an accuracy of 12 b in 8 ns when converting the digital output of the first stage to an analog estimate that is processed by the second stage.

Since errors in the first stage are corrected digitally, the ADC's linearity is determined primarily by linearity of the DAC. Monte Carlo simulations indicate that the DAC



Fig. 6. Linear-array capacitor DAC.

exhibits 12-b linearity if capacitor mismatch remains below 0.1%.

# B. Subtractor

Fig. 7 shows a single-ended version of the subtractor along with its interface to the interstage DAC. In the reset mode, the DAC output node D0 is discharged to ground and switches S1 and S3 are on. In the conversion mode, while the DAC produces a 7-b estimate of the analog input at D0, switches S1 and S3 turn off and S2 turns on so that the voltage at node X switches from the analog input to zero. Thus, if capacitors C1, C2, and C3 in Fig. 7 are equal, the subtractor generates an output equal to the difference between the analog input and the DAC output.

Due to the finite conversion times of the first stage and the DAC, the analog voltage at D0 does not settle until approximately 15 ns after  $\Phi_1$  has gone low. During this time, if the subtractor senses the analog input signal it can experience a large input difference, which in turn drives its output toward one of the rails. This results in an additional delay after the DAC output is established because the subtractor output must recover from a near-rail voltage to a small value. To avoid this problem, the subtractor input, node X in Fig. 7, must switch to ground at approximately the same time as D0 begins to change. This is accomplished by driving S1-S3 with a delayed version of  $\Phi_1$ . The delayed version of  $\Phi_1$  is generated by means of a replica circuit that tracks the delays in the first stage converter and the DAC.

The transient response of the subtractor is governed by the performance of the op amp used in its implementation. Since the subtractor output is no larger than 64 LSB ( $\approx 150$  mV), the op-amp linearity, gain, slew rate, and output swing requirements are quite relaxed, allowing its optimization for small-signal bandwidth.

Fig. 8 illustrates the fully differential op amp used in the subtractor. It consists of a folded-cascode amplifier, M1-M10, combined with a common-mode feedback network, M11-M13. The input differential pair employs NMOS transistors to achieve a large transconductance,



Fig. 7. DAC-subtractor interface.



Fig. 8. Folded-cascode operational amplifier used in the subtractor.

and hence high speed in driving the input capacitance ( $\approx 3$  pF) of the second (fine quantization) stage. Since this load capacitance tends to overcompensate the op amp, the non-dominant poles at the sources of M3 and M4 have little effect on the phase margin. Also, because the output swings are small, M11 and M12 maintain a constant common-mode voltage at the output without introducing significant nonlinearity.

The subtractor offset voltage, which also appears as an offset in overall transfer characteristic of the ADC, does not affect the linearity of the converter so long as it does not drive the input to the second stage out of the range covered by the digital correction. Because the subtractor is actually implemented as a fully differential circuit, its offset is primarily the result of charge injection *mismatch* between the switches and the offset of the op amp and is typically less than 5 mV.

Another important aspect of the subtractor design is the matching required of its input capacitors. In a stand-alone subtractor, the mismatch between these capacitors can introduce a substantial error at the output if the inputs must

sense large swings. However, for the subtractor used in this A/D converter, the two inputs are strongly correlated—one is a 7-b estimate of the other. Therefore, as the analysis in the Appendix indicates, most of the error caused by capacitor mismatch appears in the overall system gain, with only a small contribution to differential nonlinearity.

In technologies with poly-diffusion capacitors, the bottom plate of capacitor C1 in Fig. 7 is heavily doped, and suffers from substantial nonlinearity in its parasitic bottom-plate capacitance to the substrate. To suppress the effect of this nonlinearity, the bottom plate of C1 is connected to the virtual ground at the op amp input rather than to the node D0.

# V. FINE FLASH STAGE

The fine flash stage digitizes the differential output of the subtractor to encode the six least significant bits of the ADC. This stage consists of 64 comparators, two reference resistor ladders with loading correction, decoding logic, and a ROM. The stage utilizes two clock phases:  $\Phi_1$  to control the offset cancellation and input sensing and  $\Phi_2$  to control the comparison.

The resolution, speed, and input capacitance required in the second stage are strongly correlated with the gain, speed, and output drive capability of the subtractor. While a closed-loop gain of 2 or 4 in the subtractor proportionally eases the resolution required of the second-stage comparators, the gain-bandwidth and gain-linearity trade-offs in the subtractor result in a corresponding degradation in its sampling rate and linearity. For this reason, the subtractor has been configured for a closed-loop gain of unity to achieve high speed and linearity, while the second-stage comparators are designed to resolve inputs as small as 1 LSB of the 12-b ADC.

#### A. Second-Stage Comparator

The performance of the second stage is determined primarily by performance of the second-stage comparators (SSC's). To achieve a high comparison rate with a small input offset, the design shown in Fig. 9 is used. This circuit is similar to that described in [7], but with a preamplifier A1 added. The comparator core, consisting of transconductance amplifiers  $G_{m1}$  and  $G_{m2}$  and their associated load and loop components, constitutes an amplifier followed by a latch, the offsets of both of which are canceled. A CMOS implementation of the comparator core is shown in Fig. 10 [7].

The second-stage comparators must compare the differential output of the subtractor with differential reference voltages that range from 1 to 64 LSB. These reference voltages (i.e.,  $V_R^+ - V_R^-$  in Fig. 9) are stored in the comparators during offset cancellation to establish the comparison levels. For example, a voltage of 64 LSB ( $\approx$  150 mV) is stored in the 64th comparator so that it can com-



Fig. 9. Second-stage comparator.



Fig. 10. Core circuit of the second-stage comparators.

pare the subtractor output with a reference corresponding to 64 LSB.

The comparator of Fig. 9 must detect a 1-LSB difference between the differential input  $V_{\rm in}$  and  $V_R^+ - V_R^-$ , where  $V_R^+ - V_R^-$  ranges from 1 to 64 LSB. The argument presented in the following paragraph suggests that, due to dc coupling at its input, the comparator core of Fig. 10 is not adequate for comparing large differential voltages. The addition of capacitors C3 and C4 and the preamplifier A1 in Fig. 9 mitigates the problem.

To establish the differential comparison level in the comparator of Fig. 10, during offset cancellation a differential reference voltage  $V_{R1} - V_{R2}$  is sensed at the input, amplified, and stored on C1 and C2. Unfortunately, for relatively large reference inputs the amplification may lead to an imbalance at nodes A and B that results in an inputreferred offset. For example, with a nominal gain of 5 in the input stage, a differential reference voltage of 150 mV results in approximately a 750-mV difference between the voltages at nodes A and B, causing a finite difference between the small-signal impedances of M5 and M6. This asymmetry, in turn, results in different gains from E and F to A and B, making the circuit sensitive to commonmode effects at E and F. For example, because of the poor common-mode rejection of the pair M7 and M8, any change in the common-mode voltage at nodes E and F due to charge injection from S5-S10 gives rise to a differential error at A and B that contributes to the input-referred offset.

The potential asymmetry in the behavior of the circuit of Fig. 10 can be avoided if the differential reference voltage is instead stored directly on capacitors in series with the circuit's inputs. This can be accomplished without overly large capacitors if a preamplifier is used. In the circuit of Fig. 9, the preamplifier preceding the series storage capacitors, C3 and C4, reduces the comparator input capacitance and, because the effect of charge injection mismatch from S11 and S12 is now divided by the preamplifier gain, allows smaller values for C3 and C4. The preamplifier is implemented as a simple differential pair with diode-connected loads and has a gain of approximately three.

The second-stage comparator dissipates 1.7 mW and exhibits an offset of less than 300  $\mu$ V at a comparison rate of 5 MHz. This offset has been measured for dynamic inputs [7] and thus indicates that the SSC's fully recover from overdrive.

# B. Reference Division and Output Decoding

The differential reference voltages in the second stage are generated using two resistor ladders configured as shown in Fig. 11. The first ladder L1 divides the main reference voltage into 64 equal segments, each 64 LSB wide. The second ladder L2 divides one of these segments by 64, providing differential voltages from 1 to 64 LSB. The segment of L1 that is divided is near that ladder's midpoint so that the common-mode voltage of the differential reference voltages is in the vicinity of half of the supply voltage (if  $V_{\rm REF}^+$  and  $V_{\rm REF}^-$  are equally separated from the supply rails). This common-mode level is consistent with that required by the input stages of the second-stage comparators. Formed from 20- $\Omega$ / $\square$  polysilicon, each unit resistor in L1 and L2 has a width of  $20~\mu m$  and a length of  $40~\mu m$ .

The loading of L2 on L1 introduces a systematic error of approximately 1 LSB. This error can be suppressed by making the total resistance of L2 sufficiently greater than that of one segment of L1 or by canceling the current drawn from L1 by L2. The first method gives rise to excessively long time constants owing to the L2 resistance and the input capacitances of the second stage comparators. The second approach, adopted here, avoids this problem. As shown in Fig. 11, the current cancellation approach corrects the error by injecting currents I1 and I2 at the junction nodes of the two ladders. The magnitudes of these currents are equal to that of the current drawn by L2, i.e.,  $(64 \text{ LSB})/(64R_{u2})$ , where  $R_{u2}$  represents the unit resistance of L2.

Fig. 12 illustrates how currents I1 and I2 are generated. A voltage-to-current converter, which consists of the operational amplifier A1, resistor  $R_s$  (=  $8R_{u2}$ ), and transistor M1, produces a current of (64 LSB)/ $8R_{u2}$ . Transistors M2-M6 divide this current by factors of 8 and -8 to generate I1 and I2. In this circuit, I1 and I2 track  $R_{u2}$ , thus maintaining the loading correction over variations in temperature and resistor values.



Fig. 11. Reference ladder configuration in the second stage.



Fig. 12. Ladder loading correction circuit.

The accuracy of the correction performed in Fig. 12 is not critical because even if I1 and I2 deviate from their ideal values by 10%, the error is still reduced by a factor of 10, i.e., to 0.1 LSB.

The decoding logic in the second stage is essentially the same as that in the first stage, except that the first level is implemented with three-input NAND gates. By sensing three adjacent levels in the thermometer code, these gates can partially correct out-of-order ones and zeros caused by large offsets in the SSC's. With two-input gates, an offset larger than 1 LSB would simultaneously activate two rows in the ROM, producing errors as large as 32 LSB at the output.

# VI. REDUNDANCY AND DIGITAL CORRECTION

Redundancy and digital correction are often employed in A/D converters to ease the precision required of their comparators [9], [10]. In the converter described herein, the first-stage comparators are designed for high speed and only moderate resolution, while potentially large errors

are accommodated by using one bit of overlap between the two stages. This overlap ensures that the residue of the first stage conversion (the difference between the analog input and the analog reconstruction of the bits encoded by the first stage) falls within the input range of the second stage. In other words, the quantization range of the second stage is expanded to correct for errors in the first stage as large as one-half the least significant bit of that stage.

Fig. 13 shows a typical section of a plot of the residue of the first stage conversion as a function of the input voltage, wherein LSB refers to the least significant bit of the 12-b A/D converter. For a 7-b coarse flash stage with ideal comparators and reference voltages, the output code transitions occur for input voltages that are integer multiples of 32 LSB. The residue would then vary from 0 to a maximum of 32 LSB, and the fine flash stage would need a quantization range of only 32 LSB, i.e., a resolution of 5 b. However, if the *i*th comparator in the coarse stage has an offset of  $\Delta V_i$ , the code transition threshold corresponding to a  $V_{\rm in}$  of (32 LSB) j is shifted by  $\Delta V_i$ . As a result, for the case depicted in Fig. 13, where the jth and (j + 1)th comparators have offsets  $\Delta V_i$  (<0) and  $\Delta V_{i+1}$  (>0), respectively, the residue varies from  $\Delta V_i$  to  $(32 \text{ LSB}) + \Delta V_{i+1}$ . Consequently, an input range of 32 LSB in the second stage is not sufficient to digitize the residue.

In this 12-b A/D converter, it is assumed that the offset of the first-stage comparators never exceeds 16 LSB ( $\approx$  40 mV), and the second stage is designed for 6-b resolution in order to digitize residues as large as 64 LSB. Details of this range expansion are shown in Fig. 14, wherein the residue is shifted up by 16 LSB. It is seen that, so long as  $|\Delta V_j|$  and  $|\Delta V_{j+1}|$  are less than 16 LSB, the residue falls within the input range of the second stage.

Fig. 15 illustrates the implementation of the overlap in a single-ended version of the ADC. In order to shift the residue *up* by 16 LSB, the analog estimate produced by the DAC is shifted *down* by this amount. This is accomplished by making the first capacitor in the DAC, C1 in Fig. 15, half the size of the others in the array. However, unless corrected for, this capacitor arrangement would introduce a positive gain error of approximately 32 LSB in the DAC. To cancel this error, the input capacitance of the subtractor, C2 in Fig. 15, is chosen to be equal to C1 in the array.

The digital outputs of the two ADC stages are combined to obtain the final digital representation of the analog input. If the digital output of the first stage is denoted by X1 and represented by a binary number  $a_{12} \cdot \cdot \cdot \cdot a_6$ , then its value normalized to the LSB of the 12-b system can be expressed as

$$X1 = a_{12}2^{11} + a_{11}2^{10} + \cdots + a_62^5.$$
 (4)

Similarly, the digital output of the second stage, X2, can be represented by a binary number  $b_5 \cdot \cdot \cdot b_0$  and expanded as

$$X2 = b_5 2^5 + b_4 2^4 + \cdots + b_0 2^0. \tag{5}$$



Fig. 13. Typical section of a plot of the residue versus input voltage.



Fig. 14. Range expansion in A/D converter.



Fig. 15. Implementation of range overlap between two stages.

Because the DAC output in the circuit of Fig. 15 is shifted down by 16 LSB, X1 is shifted down by the same amount. The resulting number is subsequently added to X2. Thus, the final output Y is

$$Y = (a_{12}2^{11} + a_{11}2^{10} + \cdots + a_62^5) - 2^4 + (b_02^5 + b_42^4 + \cdots + b_02^0).$$
 (6)

This equation indicates that the digital correction algorithm consists of the following steps: 1) shift  $a_{12} \cdot \cdot \cdot \cdot a_6$  to the left by five bits, 2) subtract the binary number 1000

from the result of step 1, and 3) add  $b_5 \cdots b_0$  to the difference obtained in step 2.

The above algorithm fails if the analog input is less than 32 LSB because in this case, as shown in Fig. 15, the output of the DAC remains at zero, i.e., it is not shifted down. Thus, when X1 is zero, the subtraction step in the correction algorithm must be bypassed. This can be accomplished using a few gates and multiplexers.

#### VII. FLOOR PLAN AND LAYOUT CONSIDERATIONS

A large mixed analog/digital system such as an A/D converter requires careful floor planning and often iteration between design and layout. On a large chip, long interconnects lower the speed, substrate noise and crosstalk degrade the resolution, and process gradients impair the linearity. Floor planning must take all of these concerns into account.

Fig. 16 shows the ADC floor plan. Most of the individual cells employ symmetric layouts to improve matching and reduce common-mode noise. The most critical sections of the converter are those that influence its integral and differential nonlinearity: namely, the DAC and the second-stage comparators. As a result, the floor plan is dictated primarily by the local and global layout considerations of these cells.

The top-plate parasitics of the DAC capacitors introduce a gain error that has been suppressed using the structure shown in Fig. 17. In this structure a poly-diffusion capacitor is covered with a layer of metal electrically connected to the bottom plate. This technique ensures that all of the electric field lines emanating from the top plate terminate, directly or indirectly, on the bottom plate; hence, there is negligible capacitance from the top plate to any other point.

In the second stage, each comparator is surrounded by guard rings to minimize substrate coupling. The second-stage comparators are laid out in a linear array so that the subtractor outputs connect immediately to the inputs of the SSC's and crosstalk from other lines is avoided.

In order to isolate critical supply lines from transient noise generated in various sections of the circuit, the chip employs four different types of power-supply and ground buses: analog supply lines for the most sensitive parts, semi-analog lines for the first-stage comparators and all of the substrate contacts, digital lines for digital sections, and output supply lines for output pad drivers. Moreover, some of these buses, as well as the  $\pm V_{\rm REF}$  lines, utilize more than one pad and bond wire to reduce the equivalent series inductance of the packaging.

# VIII. EXPERIMENTAL RESULTS

The A/D converter has been fabricated in a 1- $\mu$ m CMOS process with poly-to-diffusion capacitors [11]. Fig. 18 is a die photo of the prototype. The first stage is seen on the left, the subtractor at the bottom, and the second stage on the right.



Fig. 16. ADC floor plan.



Fig. 17. Poly-diffusion capacitor with top-plate shield.



Fig. 18. Prototype die photo.

The circuit's performance has been evaluated by applying differential sine waves of varying amplitudes at the analog input, acquiring the digital outputs, and transferring the outputs to a workstation for analysis and characterization.

The prototype was first characterized with a code density test [12], wherein a large number of digital outputs obtained in response to a sinusoidal input are collected to



Fig. 19. ADC differential nonlinearity at 5-MHz sampling frequency.



Fig. 20. Signal-to-(noise + distortion) ratio of ADC as a function of input level.

determine their relative frequency of occurrence. The resulting code density histogram for a 5-MHz sampling rate and a 5-kHz sinusoidal input with full-scale amplitude indicates that all of the codes occur with reasonable probability and hence that the ADC achieves a resolution of 12 b. The histogram can be normalized with respect to an ideal one to yield a plot of the differential nonlinearity [12]. Such a plot is shown in Fig. 19, exhibiting a peak DNL of 0.7 LSB.

As noted in [12], code density information does not provide a reliable basis for evaluating a converter's integral nonlinearity. In general, the signal-to-(noise + distortion) ratio (SNDR) is a better measure of such nonlinearity. Therefore, the sinusoidal minimum error method of analysis [13] in the simulator MIDAS [14] was used to determine the SNDR of the experimental prototypes from the digital outputs acquired in response to sinusoidal inputs. Fig. 20 shows a plot of the SNDR as a function of the input amplitude, as measured for a 5-kHz input frequency and a 5-MHz sampling rate. In this plot, an input level of 0 dB represents a full-scale amplitude, i.e., a differential peak-to-peak amplitude of 10 V. The peak SNDR is 65 dB.

The lack of an on-chip sample-and-hold circuit has precluded testing the stand-alone ADC for high-frequency analog inputs. Because a high-speed 12-b sample-and-hold amplifier with fully differential outputs was not available, measurement of the ADC's dynamic performance has not yet been possible. Nonetheless, the second-stage comparators, which bear the high-precision requirements of the

TABLE I PERFORMANCE OF A/D CONVERTER

| Differential Linearity | 12 b                                   |
|------------------------|----------------------------------------|
| Conversion Rate        | 5 MHz                                  |
| Peak SNDR              | 65 dB                                  |
| Input Range            | 5 V                                    |
| Power                  | 200 mW                                 |
| Power Supply           | 5 V                                    |
| Input Capacitance      | 15 pF                                  |
| Active Area            | $1.2 \text{ mm} \times 3.0 \text{ mm}$ |
| Technology             | 1-μm CMOS                              |
|                        |                                        |

system, have been tested and shown to achieve 12-b resolution for rapidly changing analog inputs [7].

Table I summarizes the converter's performance.

#### IX. CONCLUSION

The design of a 12-b, 5-Msample/s CMOS A/D converter has been described. Configured as a fully differential, two-step architecture, the converter employs precision comparison and D/A conversion techniques to avoid the need for operational amplifiers with high gain or large voltage swings. In particular, the interstage subtractor is implemented with a closed-loop gain of one to maximize its speed, while the comparators in the second stage are designed to resolve voltages as small as 1 LSB of the overall converter.

One bit of overlap between the two stages, along with digital correction, relaxes the precision required of the first-stage comparators and hence allows high speed in the coarse conversion.

Fabricated in a 1- $\mu$ m CMOS technology, an experimental prototype dissipates 200 mW from a single 5-V supply, has a rail-to-rail input range, and occupies an active area of 1.2 mm  $\times$  3.0 mm.

#### APPENDIX

Analysis of Nonlinearity in the Subtractor

For simplicity, the single-ended version of the subtractor circuit shown in Fig. 21 will be analyzed. In this figure,  $C_T$  represents the top-plate parasitics of the capacitors connected to node A, and  $C_B$  represents the bottom-plate parasitics of the capacitors connected to node B. To determine the error resulting only from the subtractor, it is assumed that redundancy and digital correction eliminate the errors associated with the first stage, so that it behaves ideally, and that the DAC capacitors match perfectly. As explained in Section VI, the first capacitor in the DAC array is half the size of those in the rest of the array. If n out of N = 128 comparators in the first stage produce ONES at their outputs, the DAC-subtractor circuit can be represented by the simplified equivalent circuit shown in Fig. 22. Upon replacing the equivalent subtractor input circuit, denoted by the dotted box in Fig. 22, with its Thevenin equivalent, the circuit in Fig. 23 is obtained. In this circuit.

$$C_{\text{THEV}} = \frac{[(2N-1)C + C_T](C + \Delta C_1)}{2NC + C_T + \Delta C_1}$$
 (7)



Fig. 21. Single-ended version of DAC-subtractor interface.



Fig. 22. Simplified version of Fig. 21.



Fig. 23. Thevenin-equivalent circuit substitution in Fig. 22.

and

$$V_{\text{THEV}} = \frac{(2n-1)C}{(2N-1)C + C_T} V_{\text{REF}}.$$
 (8)

If  $A_0$  is the open-loop gain of the operational amplifier in Fig. 23, then the output voltage of the circuit is

$$V_0 = -\left(\frac{C + \Delta C_3}{C_D} V_i + \frac{C_{\text{THEV}}}{C_D} V_{\text{THEV}}\right)$$
 (9)

where

$$C_D = C + \Delta C_5 + \frac{1}{A_0}$$
  
  $\cdot (C + \Delta C_3 + C_{\text{THEV}} + C + \Delta C_5 + C_B).$  (10)

Since typically  $\Delta C_1 \ll C$  and  $C_T \ll (2N-1)C$ , the expression for  $C_{\text{THEV}}$  can be simplified as

$$C_{\text{THEV}} \approx \frac{2N - 1}{2N} C \left( 1 + \frac{C_T}{(2N - 1)C} \right)$$

$$\cdot \left( 1 + \frac{\Delta C_1}{C} \right) \left( 1 - \frac{\Delta C_1 + C_T}{2NC} \right) \quad (11)$$

$$\approx C \left( 1 + \frac{\Delta C_1}{C} \right) \quad (12)$$

where it is assumed that 2N >> 1. Furthermore, from (7) and (8)  $C_{\text{THEV}} V_{\text{THEV}}$  can be written as

$$C_{\text{THEV}}V_{\text{THEV}} = \frac{(2n-1)C(C+\Delta C_1)}{2NC+C_T+\Delta C_1}V_{\text{REF}}$$
 (13)

$$\approx \frac{2n-1}{2N} C \left(1 + \frac{\Delta C_1}{C}\right) V_{\text{REF}}.$$
 (14)

If the approximations of (12) and (14) are substituted in (9), it follows that if  $\Delta C_1 + \Delta C_3 + \Delta C_5 + C_T/2N \ll 3C$ , then

$$V_0 \approx -\frac{\left(1 + \frac{\Delta C_3}{C}\right)V_i + \frac{2n-1}{2N}\left(1 + \frac{\Delta C_1}{C}\right)V_{\text{REF}}}{1 + \frac{\Delta C_5}{C} + \frac{1}{A_0}\left(3 + \frac{C_B}{C}\right)}.$$

(15)

Because typically  $\Delta C_5/C + (3 + C_B/C)/A_0 \ll 1$ , (15) reduces to

$$V_0 \approx -(1+a)V_i - \frac{2n-1}{2N}(1+b)V_{\text{REF}}$$
 (16)

where

$$a = \frac{\Delta C_3 - \Delta C_5}{C} - \frac{1}{A_0} \left( 3 + \frac{C_B}{C} \right) \tag{17}$$

$$b = \frac{\Delta C_1 - \Delta C_5}{C} - \frac{1}{A_0} \left( 3 + \frac{C_B}{C} \right). \tag{18}$$

In the ideal case,  $\Delta C_j = 0$  and  $A_0 \rightarrow \infty$ . Then, a = b = 0 and

$$V_0 = -V_i - \frac{2n-1}{2N} V_{\text{REF}}.$$
 (19)

The output voltage given by (16) is digitized by the second stage and the result is added to the digital output of the first stage. This is equivalent to adding  $[(2N - 1)/2N] V_{REF}$  to (16), yielding a final output of

$$V_{oF} = -(1 + a)V_i - \frac{2n - 1}{2N}bV_{REF}.$$
 (20)

This relationship shows that both gain error and differential nonlinearity are present in the subtractor output voltage. Passing a straight line through the end points of the characteristics indicates that the gain error is a-b, while the maximum differential nonlinearity is  $bV_{\rm REF}/N$ . Thus, the nonlinearity introduced by the subtractor is

$$DNL_{sub} = \left[ \frac{\Delta C_1 - \Delta C_5}{C} - \frac{1}{A_0} \left( 3 + \frac{C_B}{C} \right) \right] \frac{V_{REF}}{N}. \quad (21)$$

For N=128 and typical values of  $\Delta C_1 \approx \Delta C_3 \approx \Delta C_5 \approx 0.002C$ ,  $A_0=1000$ , and  $C_B=0.2C$ , the maximum nonlinearity is

$$DNL_{sub} \approx 0.22 LSB$$
 (22)

while the maximum gain error is

$$\Delta A_v \approx 8 \text{ LSB}.$$
 (23)

This gain error is small enough to be corrected with one bit of overlap between the two stages of the A/D converter.

#### ACKNOWLEDGEMENT

The authors wish to thank National Semiconductor Corporation for its support of this research, including fabrication of the experimental prototype. They are especially indebted to L. Stoian and S. Chin for their invaluable advice, support, and encouragement. They also gratefully acknowledge numerous contributions by Dr. B. Brandt.

#### REFERENCES

- D. A. Kerth, N. S. Sooch, and E. J. Swanson, "A 12-bit 1-MHz two-step flash ADC," *IEEE J. Solid-State Circuits*, vol. 24, pp. 250– 255, Apr. 1989.
- [2] J. Doernberg, P. R. Gray, and D. A. Hodges, "A 10-bit 5-Msam-ple/s CMOS two-step flash ADC," *IEEE J. Solid-State Circuits*, vol. 24, pp. 241-249, Apr. 1989.
- [3] J. L. McCreary and P. R. Gray, "All-MOS charge redistribution analog-to-digital conversion techniques—Part I," *IEEE J. Solid-State Circuits*, vol. SC-10, pp. 371-379, Dec. 1975.
- [4] R. E. Suarez, P. R. Gray, and D. A. Hodges, "All-MOS charge redistribution analog-to-digital conversion techniques—Part II," IEEE J. Solid-State Circuits, vol. SC-10, pp. 379-385, Dec. 1975.
- [5] B. S. Song, S. H. Lee, and M. F. Tompsett, "A 10-b 15-MHz CMOS recycling two-step A/D converter," *IEEE J. Solid-State Circuits.*, vol. 25, pp. 1328–1338, Dec. 1990.
- [6] A. G. F. Dingwall, "Monolithic expandable 6 bit 20 MHz CMOS/SOS A/D converter," *IEEE J. Solid-State Circuits*, vol. SC-14, pp. 926-932, Dec. 1979.
- [7] B. Razavi and B. A. Wooley, "Design techniques for high-speed, high-resolution comparators," this issue, pp. 1916-1926.
- [8] K. Tsugaru et al., A 10-bit 40MHz ADC using 0.8 

  µm BiCMOS technology, in Proc. Bipolar Circuits and Technol. Meet., Sept. 1989, pp. 48-51.
- [9] S. H. Lewis and P. R. Gray, "A pipelined 5-Msample/s 9-bit analog-to-digital converter," *IEEE J. Solid-State Circuits*, vol. SC-22, pp. 954-961, Dec. 1987.
- [10] O. A. Horna, "A 150 Mbps A/D and D/A conversion system," COMSAT Tech. Rev., vol. 2, pp. 39-72, Spring 1972.
- [11] T-I. Liou et al., "A single-poly CMOS process merging analog capacitors, bipolar and EPROM devices," in Proc. VLSI Tech. Symp., May 1989, pp. 37-38.
- [12] J. Doernberg et al., "Full-speed testing of A/D converters," IEEE J. Solid-State Circuits, vol. SC-19, pp. 820-827, Dec. 1984.
- [13] B. Boser et al., "Simulating and testing oversampled analog-to-digital converters," IEEE Trans. Computer-Aided Design, vol. 7, pp. 668-674, June 1988.
- [14] L. Williams et al., "MIDAS User Manual, Version 2.0," Integrated Circuits Lab., Stanford Univ., Stanford, CA, Aug. 1989.



Behzad Razavi (S'87-M'91) received the B.Sc. degree in electrical engineering from Tehran University of Technology, Tehran, Iran, in 1985, and the M.Sc. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1988 and 1991, respectively.

He worked at Tektronix, Inc., Beaverton, OR, during the summer of 1988 on the design of high-speed data acquisition systems, and was a Research Assistant at the Center for Integrated Systems, Stanford University, from 1988 to 1991.

Since December 1991 he has been a Member of the Technical Staff at AT&T

Bell Laboratories, Holmdel, NJ, where he is involved in integrated circuit design in emerging technologies. His current interests include data acquisition systems, clock recovery circuits, low-voltage techniques, and lightwave communication circuits.



Bruce A. Wooley (S'64-M'70-SM'70-F'82) was born in Milwaukee, WI, on October 14, 1943. He received the B.S., M.S., and Ph.D. degrees in electrical engineering from the University of California, Berkeley, in 1966, 1968, and 1970, respectively.

From 1970 to 1984 he was a member of the research staff at Bell Laboratories, Holmdel, NJ. In 1980 he was a Visiting Lecturer at the University of California, Berkeley. Since 1984 he has been a Professor of electrical engineering at Stanford

University, Stanford, CA. His research is in the field of integrated circuit design and technology where his interests have included monolithic broadband amplifier design, circuit architectures for high-speed arithmetic, analog-to-digital conversion, digital filtering, high-speed memory design, high-performance packaging and test systems, and high-speed instrumentation interfaces.

Prof. Wooley was the Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS from 1986 to 1989. He was the Program Chairman of the 1990 Symposium on VLSI Circuits and the Co-Chairman of the 1991 Symposium on VLSI Circuits. He was the Chairman of the 1981 International Solid-State Circuits Conference, and is a former Chairman of the IEEE Solid-State Circuits and Technology Committee. He has also served on the IEEE Solid-State Circuits Council and the IEEE Circuits and Systems Society Ad Com. In 1986 he was a member of the NSF-sponsored JTECH Panel on Telecommunications Technology in Japan. He is a member of Sigma Xi, Tau Beta Pi, and Eta Kappa Nu. In 1966 he was awarded the University Medal by the University of California, Berkeley, and he was the IEEE Fortescue Fellow for 1966–1967.