## Lawrence Berkeley National Laboratory

**Recent Work** 

## Title

Toward a 62.5 MHz Analog Virtual Pipeline Integrated Data Acquisition System

Permalink https://escholarship.org/uc/item/2rd021vp

**Journal** Nuclear physics B, 23A

## Authors

Kleinfelder, S.A. Levi, M. Milgrome, O.

Publication Date 1990-09-01



# **Physics Division**

Submitted to Nuclear Physics B

Toward a 62.5 MHz Analog Virtual Pipeline Integrated Data Acquisition System

S.A. Kleinfelder, M. Levi, and O. Milgrome

September 1990



#### DISCLAIMER

This document was prepared as an account of work sponsored by the United States Government. While this document is believed to contain correct information, neither the United States Government nor any agency thereof, nor the Regents of the University of California, nor any of their employees, makes any warranty, express or implied, or assumes any legal responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by its trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof, or the Regents of the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof or the Regents of the University of California.

0

#### LBL # 29608

# TOWARD A 62.5 MHZ ANALOG VIRTUAL PIPELINE INTEGRATED DATA ACQUISITION SYSTEM

### to be published in *Nuclear Physics B*

## Stuart A. Kleinfelder, Michael Levi, and Oren Milgrome

Physics Division Lawrence Berkeley Laboratory University of California Berkeley, California 94720

This work was supported by the United States Department of Energy, contract number DE-AC03-76SF00098.

1\*:

#### TOWARD A 62.5 MHz ANALOG VIRTUAL PIPELINE INTEGRATED DATA ACQUISITION SYSTEM

#### Stuart A. KLEINFELDER, Michael LEVI, Oren MILGROME

University of California, Lawrence Berkeley Laboratory, Berkeley, CA, 94720, USA

Requirements of analog pipeline memories at the SSC are reviewed and the concept of virtual pipelines is introduced. Design details and test results of several new custom analog and digital integrated circuits implementing sections of the virtual multiple pipeline (VMP) scheme are provided. These include serial, random access and simultaneous read and write random access analog storage and retrieval circuits, a 100 MHz systolic variable depth digital pipeline, and a prototype  $32\mu s$ , 12 bit serial analog to digital converter.

#### 1. Introduction

Detectors at the Superconducting Super Collider will face many difficult instrumentation issues. The large number of data channels  $(10^7)$  and high data rate  $(6 \times 10^7)$ words per second per channel) require the use of data reducing electronics, perhaps located within the detector. Several levels of trigger decisions, with long delays relative to the time between interactions, are needed to facilitate data reduction. This necessitates the use of deeply pipelined storage systems. The first level (level one) trigger decision is expected to take 1 to 2 microseconds, leading to a first level data buffer on every channel that is 64 to 128 elements deep. The second level trigger takes much longer. Additional levels of data pipelining are needed to buffer the readout of closely spaced events. The research reported on here is aimed at providing the compact, fast, low power, high dynamic range multi-level buffering needed by several detector sub-systems.

A conceptually simple architecture that satisfies these requirements uses several physically distinct data memories implemented as fixed and variable depth clocked delay lines. The first level data buffer is of fixed depth corresponding to the level one trigger delay. A level one accept signal from the first level trigger stimulates the transfer of data from the first level data buffer to a variable depth level two buffer (otherwise, data is discarded). A subsequent level two trigger decision would cause discarding of data from the level two buffer, or transfer to a variable depth readout buffer. Finally, the readout system would take data from the last buffer. In this scenario, data may be analog or digital. If the storage is analog, the problem of maintaining data integrity during transfers between the level one and two buffers within the allowed 16 ns is acute. In addition, the need for variable delay between load and readout is problematic. These problems are reduced if the data is digital, but digital buffering pushes the analog to digital conversion toward the front of the system where rates are highest. The cost, power and space required by a highly digital system are probably impermissible if high dynamic range is needed.

Early work [1] demonstrated that a clocked analog delay line of 128 to 256 depth with 12 bit dynamic range at 62 MHz with 10 mW per channel can be achieved. Present efforts are aimed at increasing flexibility so that SSC triggering and digitization throughput requirements are conveniently satisfied. The concept of virtual pipelining is being developed to evade the problem of high speed analog signal transfers between physically distinct buffers. Much like a computer can use one physically contiguous memory to store several logically disjoint sets of information, the virtual multiple pipeline (VMP) scheme simulates physically distinct variable length pipelines within a single physical analog memory. Stored signals are never moved at all until the final readout takes place, encouraging the highest possible signal integrity. Because readout rates at the SSC are modest (approx. 1 KHz), digitization during readout is undemanding, and dense, high resolution, low power integrated digitization is feasible.

Toward this goal, several prototype integrated circuits have been fabricated. The first, for which much experience has been gained (over 30,000 channels have been fabricated), is the "SCA" Switched Capacitor Array serial access analog memory. This device was produced in quantity for the EOS TPC [2] at LBL, and is not configured for the SSC, but it does yield early baseline performance figures. Two newer prototypes have been produced that are aimed at the SSC. These are both random access versions, one with multiplexed read and write and one allowing simultaneous read and write. Simultaneous read and write allows readout to occur without the need to stop data acquisition – a deadtimeless system can be made. Several additional integrated circuits are being produced, from improved operational amplifiers for readout, a fast asynchronous digital FIFO, to a low power, high dynamic range, analog to digital converter.

#### 2. The Switched Capacitor Array I.C.

The original SCA includes 4096 sample and hold cells organized as 16 channels each with 256 sample storage cells. Signal acquisition and readout are sequential and non-simultaneous. The maximum acquisition sample rate tested was 100 MHz. Readout is analog, with on chip multiplexing and buffering. Dynamic range, uncorrected for cell to cell baseline differences, is about 10 bits (1024:1). With corrections, it reaches 13 bits (8096:1). For unipolar signals and operating with a baseline offset, linearity is 1% over the quoted dynamic range. Two versions have been fabricated differing only in analog input bandwidth. One version has a 10 MHz bandwidth, and one approximately a 50 MHz bandwidth. The input capacitance of the former is 7.5 pF, and the latter about 20 pF. Power consumption is 10 mW per channel at 7 volts, and is almost insensitive to operating speed.

Improvements sought for the SSC version include simultaneous read and write, random access for both read and write, high analog bandwidth, improved baseline uniformity, better linearity and readout speed, and on-chip analog to digital conversion. Simultaneous read and write can permit deadtimeless acquisition by eliminating the need to multiplex access to the storage array. Random access to array elements is one of the cornerstones of the virtual pipeline implementation. With random access capability, samples can be saved in random locations, skipped over during subsequent acquisition, and then read out directly. No motion within or between storage arrays is necessary. Higher analog bandwidth was sought primarily to achieve fast settling to 12 bits within 8 ns (worst case sampling window) regardless of the previous contents of the sample cell array. Improved baseline uniformity is sought to maximize dynamic range without the need to correct for cell to cell differences. A higher performance readout amplifier was included to improve linearity and settling time. On chip analog to digital conversion is being incorporated to reduce exter-



Figure 1: SCA Output in response to a brief 6V pulse-10 samples out of 128 are shown. Note correct operation to the (6V) rails is achieved.

nal component count, speed readout, and eliminate the need to rapidly transmit analog signals externally with high dynamic range.

The first SSC prototype circuit included the random access feature in a non-simultaneous read/write architecture. The main goals were to evaluate the performance of the random access decoder circuit, increase analog bandwidth, and to attempt to eliminate all systematic cell to cell baseline differences. The circuit is shown in Fig. 2.

The random access decoder circuit uses fairly conventional dynamic NOR logic. The circuit starts with a seven bit address and two clock lines, yielding a nonoverlapping one of 128 selection. An access cycle consists of three stages: precharging all 128 output rows, evaluation of the selected output row, and actual clock presentation on the decoded row to engage the appropriate sample cell. An important point to this circuit is that all address-dependent circuit behavior has subsided before actually presenting the decoded result to the analog section of the circuit. Fixed pattern noise is therefore reduced or eliminated. Two externally controllable clocks were used to differentiate the three phases. This allowed measurements of the minimum duration of the precharge and evaluation phases to maximize the duration of the sample window width. Future circuits will have a self timed clock generation circuit on-chip to remove the need to control two clocks externally. Measured performance of the decoder was quite high considering the modest performance  $2\mu m$  CMOS technology used. Both precharge and evaluation windows require only 2 ns each. Therefore, out of 16 ns available during a write cycle, a 12 ns wide sample storage window was achieved.



 $\int_{Q}^{1}$ 

Q

Figure 2. Random access Switched Capacitor Array circuit diagram.

| Table 1: SSC–SCA Rev.1 Summary        |                             |  |
|---------------------------------------|-----------------------------|--|
| Number of channels per chip           | 16                          |  |
| Number of storage cells per channel   | 128                         |  |
| Total storage cells per chip          | 2048                        |  |
| Die size                              | $6.8 \times 4.6 \text{ mm}$ |  |
| Power consumption per channel at 6V   | < 10 mW                     |  |
| Non-uniformity in baseline (62.5 MHz) | < 0.5  mV rms               |  |
| Sample and hold time constant         | <1 ns                       |  |
| Output noise, 62.5 MHz sample freq.   | < 0.5 mV                    |  |
| Dynamic range (signal to noise)       | > 8000 : 1                  |  |
| Maximum sample rate                   | $\geq 100 \text{ MHz}$      |  |

The sample and hold charging time constant was reduced (from early SCA chips) to approximately 1 ns. This allows charging to 12 bits accuracy in about 8 nsconsidered to be our worst case charging window width. This increase in analog bandwidth is important in that it eliminates the need to erase old stored values from cells before they are written into again.

To maximize the potential dynamic range of the system, a rail-to-rail CMOS operational amplifier has been fabricated and tested as an individual circuit, and incorporated into our simultaneous read and write version of the SCA. This amplifier was based on the topology in reference [3], with adjustments for the capacitive loads presented by the storage array and comparator input. When configured as a rail-to-rail follower, linearity is dependent on hte loop gain, input transistor offset voltages, and output transistor saturation voltages. Measured linearity was better than 0.0% to within 100mV of each rail. Conveniently, the amplifier is capable of running on a single 5 Volt supply. Op-amp performance is summarized in Table 2.

| Table 2: SCA Op-Amp Performance Summary |                             |  |
|-----------------------------------------|-----------------------------|--|
| Power consumption                       | 2 mW                        |  |
| Open loop gain                          | > 400,000                   |  |
| Unity gain freq.                        | > 2 MHz                     |  |
| Settling time to 12 bits                | < 4µs                       |  |
| Voltage Noise at 1 KHz                  | $< 140 \text{ nV/Hz}^{1/2}$ |  |
| Input offset voltage                    | $400 \mu m^2$               |  |

No fixed pattern address or clock coupling noise was found due to acquisition. A very small coupling was found during readout, apparently due to the proximity of two of the seven address lines to all op-amp negative inputs (eliminated in subsequent designs) During normal operation, the magnitude of this systematic effect is less than or equal to the average random cell to cell variability. Peak to peak variability in cell baselines was measured to be about 1 mV. The best rms variability measured was less than 0.3 mV and is typically 0.4 mV. Fig. 1 shows a sample SCA output trace, showing about 10 samples out of 128. Recorded at 50 MHz while on a 6V supply, the response to a 6 volt pulse of about 80 ns duration is shown. Notice that the output responds to nearly the rails, and that settling time is less than  $2\mu s$ .

Ų

#### 3. The Address List Processor Circuit

The VMP scheme simulates the existence of any number of sequentially connected variable depth pipelines using one intelligently managed random access memory. The intelligence behind the addressing of the SCA is embodied in the Address List Processor (ALP) integrated circuit now under development. In essence, the ALP circuit substitutes movement of analog data (the stored signal) with digital data (the location of the stored signal). The ALP circuit manages a list of all possible SCA storage location addresses, tagging them as empty, pending level one trigger decision, pending level two, or pending readout. There is a conceptually simple way that an arbitrary number of levels can be added. In addition, the circuit can be constructed such that the length of each pipeline is uncommitted (other than some fixed maximum). Furthermore, the length of each pipeline can be adjusted dynamically-changing as trigger conditions change-without special programming.

Examining the Address List Processor block diagram shown in Fig. 3, we see four digital first-in-first-out memories. These are asynchronous, systolic memories, though counter based RAM schemes are interchangeable. The four digital FIFO's contain addresses of SCA storage locations. The presence of a given address in a given FIFO indicates the state of the corresponding storage location. In this diagram, the four possible states are free (empty), pending level one trigger decision, pending level two trigger decision, or pending readout. With each beam crossing, a data word moves from the free list to the pending level one trigger list. From there, data may move back on to the free list (on a level one reject signal) or onto the pending level two trigger list (on a level one accept). From that list, the data may also move back onto the free list (on a level two reject) or onto the pending readout list (on a level two accept). The pending readout list buffers readout requests during the readout of prior events. The movement of data among varying lists happens more or less simultaneously. If each list is of maximal size (the maximum number of SCA storage locations), then each list can expand or contract arbitrarily with no risk of overflow. Thus, the depth, or time delay of a given level is adjustable on the fly. Note that underflow-running out of data in a list is normal and not





Å

Ú

5

an error except for underflows of the free list. If the list of free storage locations ever becomes empty, it means that the SCA is full, and an error occurs. It is expected that the trigger system, or some auxilliary device, will monitor the number of free locations and prevent this occurrence.

The ALP circuit, according to the block diagram, requires the use of full handshakes between trigger processor and ALP chips. This is actually impractical (at least in the case of the level one trigger) given speed of light limitations. That is, a handshake signal cannot get from the trigger processor to ALP chips and back within 16 ns. Therefore, one additional FIFO is shown that can stack first level trigger decisions with a uniform, predictable, response time. Then, as long as the trigger signal conforms to minimal timing specs, the returning acknowledge signal can be ignored. The second level trigger and readout circuits can use full handshaking (as their frequency is much lower), or the same buffering scheme could be used. A reset signal (not shown) clears all FIFO's except for the free list, which is primed with all available addresses. This is performed using an integrated read only memory to parallel load all addresses for very fast response.

#### 4. Systolic Digital First-In-First-Out Memory.

The Address List Processor circuit uses an asynchronous, systolic, digital processing discipline. Fast digital FIFO pipelines play an important role in the ALP circuit. Proving correct operation and performance of the FIFO early in the design process is critical. Therefore, a 128 word deep by 8 bit wide FIFO circuit has been designed and fabricated. We chose to use the four cycle handshake discipline in our FIFO test circuit. The chip contains over 7,000 transistors and is about 2 mm by 2 mm in a two micron scalable technology. The circuit artwork is completely compatible, without any redesign, with an easily available 1.2 micron technology. This will yield an improvement in speed by about a factor of two while decreasing power and die area consumption.

| Table 3: Digital FIFO Performance Summary |                |  |
|-------------------------------------------|----------------|--|
| Maximum Read/Write Speed                  | 100 MHz        |  |
| Fall Through Speed                        | 300 MHz        |  |
| Die area                                  | 2000 $\mu m^2$ |  |
| Power at 100 MHz clock                    | 20 mW          |  |

One of the most important factors that will dictate the performance of the Address List Processor is the fall-through time of the FIFO's. This is the time it takes for a newly added data word to propagate down the pipeline and become available at the bottom. The fall though speed for our test device was measured to be 300 MHz or 3.3 nS per word, for a total worst case fall through time of 420 ns. Of course, many data words can propagate down at any time. The maximum I/O handshake speed was measured to be 100 MHz, or 10 ns per word. This is slower than the fall through speed (which also includes a complete handshake) largely because of the extra delays (about 6 ns) in the round trip through the multi-stage input and output pin drivers. These extra delays will not be present in the critical path of the ALP chip. Power consumption at 100 MHz operation was measured to be 4 mA at 5 V. Since one ALP chip can operate many channels of SCA's, the incremental power consumed per channel is very modest. The worst case 100 MHz input/output speed and 300 MHz internal propagation speed is adequate for the VMP circuit, but an advanced technology will be used to increase margins and reduce power consumption.

#### 6. Integrated Analog to Digital Converter.

A compact, high dynamic range analog to digital converter circuit is being developed as part of the data acquisition system. The ADC is required to exceed 12 bits of dynamic range, achieve 9 bit linearity, digitize at over 10 KHz, and be highly compact. Flash and half-flash analog to digital converters use a great deal of power, are very large, and cannot attain the required dynamic range. Pipelined and algorithmic ADCs are adequately fast, smaller than FADC's but still large, and are difficult to produce with greater than 10 bit range. Successive approximation converters can attain 13 bit accuracy, but are still large enough that one converter must be shared among all channels on a chip, raising the complexity of operation. Serial (single and dual slope) converters are very compact and simple to operate, have adequate range, but are much slower than other techniques.

Serial conversion, using a modified single slope technique, has been chosen as having the best compromise of performance features. The virtual multiple pipeline scheme allows conversion rate to match the level two accept rate (in the tens of KHz range), slow enough to permit the use of a serial ADC. Since there would be no need for multiplexing all channels through a single ADC, the speed of the converter per channel approach is effectively increased by a factor of 16 for a 16 channel chip. These converters are compact enough to allow a converter per channel to share space on the analog pipeline die with little overhead. In our modified serial ADC technique, the components needed per channel are limited to a comparator, a digital multiplexer and a latch-indeed compact. A high speed, rail to rail, fully differential comparator has been designed for this application and is presently being

Ϋ.

 $\mathbf{\Theta}$ 

fabricated. The periphery of the chip would contain a high speed counter and a voltage ramp generator. A rail to rail ramp generator has been designed and is presently in fabrication. The over-all effect is to achieve adequate speed, high dynamic range, low power, efficient silicon area overhead, and the simplest architecture and operation.

#### 7. High Speed Gray Code Counter.

A high speed synchronous gray code counter is an important sub-component of the ADC system. The speed of the counter directly impacts the digitization time. The fastest conveniently available clock signal is the 62.5 MHz beam crossing clock. If the counter can operate at this rate then a 12 bit digitization will take 66 microseconds. The implemented gray code counter circuit includes a refinement that allows the final count rate to be double the clock rate-in this case, 125 MHz. This higher rate allows a 12 bit conversion to complete in 32 microseconds. Remember that all channels convert in parallel, so this time does not increase with the number of channels. The synchronous gray code used allows asynchronous data latching without errors. A gray to binary converter will be located in the periphery of the chip as a convenience to the user.

| Gray Code Counter Performance Summary |               |  |
|---------------------------------------|---------------|--|
| Maximum Clock Speed                   | 125 MHz       |  |
| Maximum Counting Speed                | 256 MHz       |  |
| Die area                              | 750 $\mu m^2$ |  |
| Power at 125 MHz clock                | 5 mW          |  |

To achieve the high count rate, several speed enhancing design techniques were used. The circuit uses a streamlined dynamic logic discipline for lowest speed robbing parasitics. The carry propagation is pipelined, keeping the worst case propagation delay path to a handful of gates. Finally, the first bit is interleaved with respect to the others, allowing the count rate to be twice the clock rate. A compromise was struck between silicon area consumption and speed by choosing to pipeline every four bits. An oscilloscope photograph (Fig. 4) shows the clock (lowest trace) and the first five counter outputs counting at 250 MHz in gray code.

#### 8. High Speed ECL Level Receivers

In order to reduce coupling between digital clocks and analog inputs, it has been proposed that low level differential clocks be used. Therefore, a very high speed ECL level receiver has been designed using conventional CMOS technology running on a standard 5 volt power



Figure 4: Bottom trace is 125 MHz clock input, top 5 traces are gray code output counting at 250 MHz.

supply. The receivers are DC coupled, removing all concern about rate and duty cycle dependencies. The circuit has been fabricated and proven to operate at above 200 MHz using standard ECL levels. Used as clock receivers, the very high speed, fast edges will allow the random access Switched Capacitor Array to latch and decode addresses at 62.5 MHz with no performance penalty over conventional CMOS level receivers.

| ECL Receiver Performance Summary |                        |  |
|----------------------------------|------------------------|--|
| Speed                            | $\geq 200 \text{ MHz}$ |  |
| Die area                         | $200 \ \mu m^2$        |  |
| Power at 200 MHz                 | 5 mW                   |  |

#### 9. Conclusions.

This paper described work in progress on a 62.5 MHz virtual analog pipeline integrated circuit (SCA) and a trigger address list processor (ALP) circuit. Presently fabricated and proven components include serial access, random access, simultaneous read and write analog memory versions, a fast systolic digital FIFO, ADC subcomponents, and an improved rail to rail operational amplifier. A fully operational system is expected to be completed within one year.

#### References

- S. Kleinfelder, IEEE Trans. Nucl. Sci NS-37 (1990) 56.
- [2] G. Rai et al., IEEE Trans. Nucl. Sci NS-37 (1990) 1230.
- [3] J. Babanezhad, IEEE J.S.S.C. Vol. 23 No. 6 (1988) 1414.

فلغنته يدلقونه

٠,

۰.

raries

LAWRENCE BERKELEY LABORATORY UNIVERSITY OF CALIFORNIA INFORMATION RESOURCES DEPARTMENT BERKELEY, CALIFORNIA 94720