# A 192 x 128 Time Correlated Single Photon Counting Imager in 40nm CMOS Technology

Robert K. Henderson<sup>1</sup>, Nick Johnston<sup>1</sup>, Haochang Chen<sup>2</sup>, David Day-Uei Li<sup>2</sup>, Graham Hungerford<sup>3</sup>, Richard Hirsch<sup>3</sup>, Philip Yip<sup>3</sup> and David McLoskey<sup>3</sup>

1 - School of Engineering, University of Edinburgh, Kings Buildings, Mayfield Road, Edinburgh, EH9 3JL, UK

2 - Centre for Biophotonics, Strathclyde Institute of Pharmacy & Biomedical Sciences, University of Strathclyde, Glasgow G4 0RE
3 - HORIBA Jobin Yvon IBH Ltd., 133 Finnieston Street, Glasgow G3 8HB UK

Email: Robert.Henderson@ed.ac.uk

Abstract—A 192 x 128 pixel single photon avalanche diode (SPAD) time-resolved single photon counting (TCPSC) image sensor is implemented in STMicroelectronics 40nm CMOS technology. The 13 % fill-factor,  $18.4 \times 9.2 \mu m$  pixel contains a 33 ps resolution, 135 ns full-scale, 12-bit time to digital converter (TDC) with 0.9 LSB differential and 8.7 LSB integral nonlinearity (DNL/INL). The sensor achieves a mean 219 ps full-width half maximum (FWHM) impulse response function (IRF) and a 5 mW core power consumption and is operable at up to 18.6 kfps. Cylindrical microlenses with a concentration factor of 3.15 increase the fill-factor to 41%. The median dark count rate (DCR) is 25 Hz at 1.5 V excess bias. Fluorescence lifetime imaging (FLIM) results are presented.

## I. INTRODUCTION

TCSPC is a photon-efficient, statistical sampling technique whereby photon arrival times are measured relative to a pulsed laser source and are recorded in a histogram over many repeated cycles. Key application areas are in time-of-flight (ToF) range-finding, fluorescence lifetime imaging microscopy, diffuse optical tomography (DOT) and various types of spectroscopy [1]. Conventional instrumentation to implement TCSPC involves photon counting cards, discrete detectors such as photomultiplier tubes and desktop computers. This bulky and relatively expensive hardware has limited the approach to a few channels, MHz acquisition rates and imaging based on mechanical scanning. More recently CMOS manufacturing has permitted large arrays of SPAD detectors to be manufactured together with timing and signal processing electronics on a single chip. SPAD arrays together with parallel TCSPC have been the enabling factor in the first high volume quantum photonic consumer applications [2]. Large investment in LIDAR for autonomous vehicles is further propelling CMOS SPAD technology towards advanced nanometer nodes [3] with detector performance approaching that of custom devices.

A number of SPAD image sensors have been proposed permitting TCSPC data to be acquired in parallel from every pixel [4-7]. They have provided new capabilities to the light in flight, non-line-of-sight, diffuse media, 2-photon fluorescence lifetime, superresolution and automotive range imaging [4-7]. Despite their excellent timing performance, these arrays suffer from low fill-factor (a few percent) and large pixel pitches (40-150  $\mu$ m) limiting their sensitivity and spatial resolution.



Fig. 1 TCSPC imager micrograph

In this paper, we present a SPAD-based TCSPC imager in 40nm CMOS technology with the smallest time to digital converter (TDC) reported to date (9.2 µm x 9.2µm). The TDC achieves the finest timing resolution (tuneable from 33 ps to 120 ps) of all reported TCSPC pixels at good energy efficiency figure of merit (FoM) of 62fJ/conv. The photon detection efficiency (PDE) of the array has been enhanced with cylindrical microlenses to provide a mean concentration factor of 3.15 and a 41% effective fill-factor. The sensor also has a very low median DCR of 25Hz obtained at 1.5 V excess bias [3], This combination of high sensitivity, low noise and precise timing resolution offers a transformative capability to low-light time-resolved wide-field microscopy. In addition, the 12-bit TDC performance with 490ns full-scale range and <1LSB and <9LSB INL enables ToF laser ranging applications (up to 73.5 m distance). Full characterisation results of the sensor are presented as well as FLIM images.

## II. SENSOR DESIGN

A micrograph of the sensor is shown in Fig. 1. The 3.15 mm x 2.37 mm chip is integrated in STMicroelectronics 40 nm CMOS technology offering industrialised SPADs [3]. A 192 x 128, 18.4  $\mu$ m x 9.2  $\mu$ m pixel array adopts a column pair-wise SPAD well sharing layout strategy (shown as a zoomed inset in Fig. 2) to optimise fill-factor at the expense of non-uniform modulation transfer function (MTF). This layout style has also been adopted in preparation for 3D stacking at a regular 9.2  $\mu$ m pitch [8]. Fig. 2 shows the sensor block diagram consisting of row addressing and 64 parallel to serial converters allowing a maximum I/O rate of 6.4 Gbps and a frame rate of 18.6 kfps.

This work was funded by the Engineering and Physical Sciences Research Council (EPSRC) Quantum Hub in Quantum Enhanced Imaging (EP/M01326X/1).

## A. Circuit Architecture



Fig. 2 Sensor block diagram



Fig. 3 SPAD interface, gating and TDC control circuitry







Fig. 5 Pixel circuit and readout

The pixel implements a highly optimised version of the architecture originally proposed in [4] in order to attain a pitch compatible with scientific imaging or ToF applications and scalable to megapixel resolutions. The circuits interfacing the SPAD to the TDC and photon counting functions are shown in Fig. 3. The SPAD is passively quenched by a single thick oxide NMOS biased with a global gate voltage VQ. SPAD pulses are level shifted to the 1.1V digital  $V_{dd}$  by a thick oxide inverter. All other circuits exploit the digital 40nm transistors. In TCSPC mode, a compact edge-sensitive trigger circuit generates an enable signal S for the TDC by means of a pair of d-type flip-flops. The first flip flop will latch a 1 on the rising edge of the first SPAD pulse falling within the exposure period and coincident with a high state of WINDOW (time between Rst pulses) starting the TDC. The second flip-flop resets S to 0 on the next rising edge of the STOP waveform provided that the TDC has been started. In photon counting mode, another d-type flip-flop toggles on the rising edge of the SPAD pulse only if WINDOW is high generating the SPADWIN signal. This signal acts as the least significant bit of the photon count. The WINDOW signal thus provides an electrical masking signal to both TCSPC and photon counting modes to allow fine global exposure control.

Fig. 4 shows the TDC circuit consisting of a 4-stage pseudo-differential gated ring oscillator and level shifting and coupling stages. The ring oscillator core is supplied from a separate power rail  $V_{ddro}$  to allow tuning of the TDC resolution and to minimise power supply coupling to other digital functions on the chip. Setting signal R high resets the TDC to an initial condition. The rising edge of signal S starts the ring oscillator which operates over a range 2-4 GHz depending on the  $V_{ddro}$  setting. At the instant the signal S falls the nodes  $T_{3:0}$ and  $\overline{T}_{3:0}$  regenerate to memorise the internal state of the oscillator. The state of these internal nodes is used to provide the three least significant bits (LSBs) of the TDC. Three balanced dynamic comparators act to level shift the states of  $T_{2:0}$  from  $V_{ddro}$  to  $V_{dd}$  whilst reducing the loading on the loop to only two floating NMOS transistors. A cross-coupled level shifter couples  $T_3$  and  $\overline{T}_3$  to the first stage of a ripple counter and resolves potential metastability issues when S falls at the same instant as a positive transition on  $T_3$ .

The main pixel schematic is shown in Fig. 5. An 8-bit ripple counter is multiplexed to act either as a photon counter or to count oscillator periods to extend the dynamic range of the TDC. In TCSPC mode a dedicated high speed toggle flip-flop immediately divides the ring oscillator frequency to allow this high speed signal to pass the multiplexer. Thus the coarse LSB in TCSPC mode ( $C_0$ ) and the LSB in photon counting mode (*SPADWIN*) are derived from two different flip-flops. Tri-state inverters controlled by a row read signal drive the 14-bit state of the pixel onto a column output bus under control of the row addressing circuit.

# B. Sensor Operation

Pixels are read and reset in a rolling fashion. The time between resets is the reciprocal of the frame rate (around 54  $\mu$ s). In TCSPC mode (Fig. 6) a laser is pulsed in synchronisation with a *STOP* pulse distributed to the whole array via a clock tree. The *WINDOW* signal may be used to

enable the TDC during short sub-periods of the laser cycle or to achieve a global shutter function. The TDC will only start up if the rising edge of the SPAD pulse is contained in the *WINDOW* high period. Only the first such photon will be captured within an exposure period (period between rolling pixel resets).



Fig. 6 Sensor operation in TCSPC mode

A token-passing row shift register reads pairs of rows of the pixel array from the central rows outwards in a rolling cycle and operates continuously at up to 18.6 kfps. An arbitrary pattern of rows can be read-out at a faster frame rate upon identification of regions of interest. At any time only the currently two addressed rows of the pixel array are *not* in integration achieving a temporal aperture ratio (TAR) of 99%. Banks of 32 parallel to serial converters at the top and bottom of the array each convert 4 columns of 14-bit data into a 56 bit serial sequence to 64 I/O pads at a maximum rate of 100 MHz



Fig. 7 Typical TDC INL and DNL plots over 140ns

The TDC INL and DNL are measured using a code density test with ambient light providing a random input to populate a histogram with over 300k photon timestamps. The DNL/INL plot of a typical pixel is shown in Fig. 7 with the TDC operating at nominal 1.1 V power supply voltage over 140ns (92.5% of full-scale at this voltage). The IRF of a typical pixel is measured using a Hamamatsu PLP-10 685 nm laser diode in Fig. 8.



Fig. 10 TDC resolution vs power supply voltage

A map of IRFs of the full-pixel array is shown in Fig. 9 with IRF of hot pixels set to 0 and removed from calculations. The mean jitter is 219ps with a variance of 26.7ps. This is close to the native jitter of the SPAD of 170 ps [3] suggesting around 138ps is due to FPGA (master), laser and TDC.

Gated ring-oscillator TDC resolution is strongly influenced by power supply voltage and temperature. Fig. 10 shows that the TDC resolution can be tuned from 120 ps to 33 ps by varying the  $V_{ddro}$  power supply from 0.7 V to 1.2 V. A standard deviation of around 1% in the LSB has been determined across a single column. The wide TDC resolution tuning range is useful to extend the dynamic range of the sensor for different fluorescence lifetimes or ToF distances. Two columns of pixels on the left and right side of the imager continuously measure full-periods of the STOP clock to allow off-chip digital compensation of every frame on the fly.



Fig. 11 Photomicrograph of the microlensed sensor

Cylindrical microlenses have been implemented on a perdie basis [10] achieving a mean concentration factor of 3.15(Fig. 11) and an effective fill factor of 41%.



Fig. 12 (a) Photon counting fluorescence image of DASPMI stained onion cells [9] (b) average lifetime (c) (d) pre-exponentials.

In order to demonstrate widefield FLIM, onion cells stained with the dye DASPMI [9] were studied on a microscope set up using a HORIBA Scientific DeltaDiode DD-485L laser as the excitation source. The HORIBA Scientific EzTime Image software enables a "region of interest" to be selected in the photon counting intensity image (see Fig. 12a, where the red box indicates the region selected). TCSPC data were just collected from pixels in the region of interest. This showed an area including the cell walls. The lifetime data were analysed globally using the EzTime software as the sum of 2 exponentials and lifetimes of 1.58 ns and 2.55 ns were obtained. Maps showing the average lifetime (Fig. 12b) and the normalised pre-exponential components for each of the lifetimes are given in Fig. 12c,d. This shows that the longer-lived decay component is predominately associated with the cell wall and provides a contrast to the cell interior.

### IV. CONCLUSIONS

Advanced nanometer CMOS nodes provide TDC pixels with practical pitch and fill-factor for high resolution imaging.

# ACKNOWLEDGMENTS

The authors are grateful to STMicroelectronics for chip fabrication within the ENIAC POLIS project. We thank T. Al Abbas and R. Walker for assistance in SPAD layout and

chip finishing as well as F. Zanella at CSEM, Muttenz, Switzerland for microlens design.

#### REFERENCES

- E. Charbon, "Single-photon imaging in complementary metal oxide semiconductor processes," Phil. Trans. R. Soc. A, vol. 372, Feb. 2014.
- [2] STMicroelectronics, VL6180 Datasheet (http://www.st.com/en/imaging-and-photonics-solutions/proximitysensors.html).
- [3] S. Pellegrini et al., "Industrialised SPAD in 40 nm technology," 2017 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, 2017, pp. 16.5.1-16.5.4.
- [4] C. Veerappan et al., "A 160 x 128 single-photon image sensor with onpixel 55 ps 10 bit time-to-digital converter," in IEEE Int. Solid-State Circuit Conf. Dig. Tech. Papers, Feb. 2011, pp. 312–314.
- [5] I. Vornicu et al., "Arrayable Voltage-Controlled Ring-Oscillator for Direct Time-of-Flight Image Sensors," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 64, no. 11, pp. 2821-2834, Nov. 2017.
- [6] F. Villa, et al., "CMOS Imager with 1024 SPADs and TDCs for Single-Photon Timing and 3-D Time-of-Flight," IEEE J. Sel. Topics in Quantum Electronics, vol. 20, no. 6, pp. 364-373, Dec. 2014.
- [7] L. Gasparini et al, "A 32×32-pixel time-resolved single-photon image sensor with 44.64µm pitch and 19.48% fill-factor with on-chip row/frame skipping features reaching 800kHz observation rate for quantum physics applications", Proceedings of the 2018 IEEE International Solid-State Circuits Conference, 2018, pp. 98-100, San Francisco (US), 11-15 February 2018.
- [8] T. Al Abbas et al., "Backside illuminated SPAD image sensor with 7.83μm pitch in 3D-stacked CMOS technology" Proceedings of International Electron Devices Meeting, San Francisco, CA, 2016; 8.1
- [9] G. Hungerford et al., "In-situ formation of silver nanostructures within a polysaccharide film and its application as a potential biocompatible fluorescence sensing medium", Soft Matter. 8, 653-659, 2012.
- [10] I. Gyongy et al., "Cylindrical microlensing for enhanced collection efficiency of small pixel SPAD arrays in single-molecule localisation microscopy," Opt. Express 26, 2280-2291 (2018).

Table 1. Comparison of SPAD imagers with per-pixel TDC

| Parameter               | This Work | [7]         | [6]         | [4]       | [5]     |
|-------------------------|-----------|-------------|-------------|-----------|---------|
| Process                 | 40 nm     | 150 nm      |             |           | 180 nm  |
| Pixel                   |           |             |             |           |         |
|                         |           |             |             |           | 64      |
| Pixel pitch (µn         |           | 44.04       | 150         | 50        | 04      |
|                         | 9.2 (y)   | 10.0        | 20          | 6         | 10      |
| SPAD dia. (µn           |           | 19.8        | 30          | 6         | 12      |
| Fill factor (%)         | ) 13 (41) | 19.48       | 3.14        | 1.2       | 2.7     |
| Median DCR              | 25/1.5    | 600/3       | 120/6       | 50/0.73   | 42000/1 |
| (Hz)/Veb (V)            |           |             |             |           |         |
| TDC                     |           |             |             |           |         |
| Area (µm <sup>2</sup> ) | 84.6      | 402.7       | 21793       | 3 2244    | 812     |
| Range (ns)              | 135-491   | 53          | 360         | 55        | 297     |
| Resolution (ps)         | 33-120    | 204.5       | 350         | 55        | 145     |
| Depth (bit)             | 12        | 8           | 10          | 10        | 11      |
| DNL (LSB)               | 0.9       | 1.5         | 0.04        | 0.6       | 0.55    |
| INL (LSB)               | 8.7       | 2.17        | 0.20        | 4         | 3       |
| Precision (ps)          | 321       | 205         | 254         | 170       | 435     |
| FoM* (pJ/conv)          | 0.062     | $0.12^{**}$ | $0.088^{*}$ | * 0.034   | 0.67    |
| Chip                    |           |             |             |           |         |
| Array size              | 192x128   | 32x32       | 32x32       | 160x128   | 64x64   |
| Chip size (mm)          | 3.2x2.4   | 1.7x1.9     | 9x9         | 11.0x12.3 | 5x5     |
| Frame rate              | 18.6k     | 80k         | 100k        | 50k       | 5k      |
| (fps)                   |           |             |             |           |         |
| Power (mW)              | 5/600     | 11.1        | 400 /       | 500       | 2.7     |
| Core / IO               |           |             | 25          |           |         |

\* FoM= peak power x precision \*\*Estimated based on total core power