# A Modular, Direct Time-of-Flight Depth Sensor in 45/65-nm 3-D-Stacked CMOS Technology

Augusto Ronchini Ximenes<sup>10</sup>, Student Member, IEEE, Preethi Padmanabhan<sup>10</sup>, Student Member, IEEE,

Myung-Jae Lee, *Member, IEEE*, Yuichiro Yamashita<sup>10</sup>, Dun-Nian Yaung,

and Edoardo Charbon<sup>®</sup>, *Fellow, IEEE* 

Abstract—This article introduces a modular, direct time-offlight (TOF) depth sensor. Each module is digitally synthesized and features a  $2 \times (8 \times 8)$  single-photon avalanche diode (SPAD) pixel array, an edge-sensitive decision tree, a shared time-todigital converter (TDC), 21-bit per-pixel memory, and in-locus data processing. Each module operates autonomously, by internal data acquisition, management, and storage, being periodically read out by an external access. The prototype was fabricated in a TSMC 3-D-stacked 45/65-nm CMOS technology, featuring backside illumination (BSI) SPAD detectors on the top tier, and readout circuit on the bottom tier. The sensor was characterized by single-point measurements, in two different modes of resolution and range. In low-resolution mode, a maximum of 300-m and 80-cm accuracy was recorded; on the other hand, in highresolution mode, the maximum range and accuracy were 150 m and 7 cm, respectively. The module was also used in a flexible scanning light detection and ranging (LiDAR) system, where a  $256 \times 256$  depth map, with millimeter precision, was obtained. A laser signature based on pulse-position modulation (PPM) is also proposed, achieving a maximum of 28-dB interference reduction.

Index Terms—Depth sensor, interference reduction, laser signature, light detection and ranging (LiDAR), ranging imaging, single-photon avalanche diode (SPAD), 3-D-stacking, time-offlight (TOF) imaging.

# I. INTRODUCTION

CONSTANT increase in data processing efficiency has enabled, among many other things, the intensive use of depth mapping technologies. Consumer applications, such as

Manuscript received October 29, 2018; revised February 13, 2019, June 2, 2019, and August 6, 2019; accepted August 24, 2019. Date of publication September 19, 2019; date of current version October 23, 2019. This article was approved by Associate Editor David Stoppa. This work was supported in part by The Netherlands Organization for Scientific Research. (Augusto Ronchini Ximenes and Preethi Padmanabhan contributed equally to this work.) (Corresponding author: Augusto Ronchini Ximenes.)

A. R. Ximenes was with the Applied Quantum Architecture Laboratory (AQUA), Delft University of Technology, 2628 CD Delft, The Netherlands. He is now with the Facebook, Inc., Redmond, WA 98052 USA (e-mail: ximenes.a.r@ieee.org).

P. Padmanabhan and E. Charbon are with the Advanced Quantum Architecture Laboratory (AQUA), École Polytechnique Fédérale de Lausanne (EPFL), 2000 Neuchâtel, Switzerland (e-mail: preethi.padmanabhan@epfl.ch; edoardo.charbon@epfl.ch).

M.-J. Lee is with the Post-Silicon Semiconductor Institute, Korea Institute of Science and Technology (KIST), Seoul 02792, South Korea (e-mail: fodlmj@gmail.com).

Y. Yamashita and D.-N. Yaung are with Taiwan Semiconductor Manufacturing Company (TSMC), Hsinchu 300-78, Taiwan (e-mail: yuichiro@tsmc.com).

Color versions of one or more of the figures in this article are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2019.2938412

gaming [1], augmented and virtual realities (AR/VR) [2], and other human–machine interfaces [3], are typically based on intensive image processing, either by triangulation [4], [5] and/or structured light [6], which has limitations on speed, resolution, range, and robustness to background noise. On the other hand, time-of-flight (TOF) depth sensing has been investigated in the academic and industrial engineering communities for several years, as an alternative to solve such restrictions, and few products are emerging [7]–[10]. Direct TOF (dTOF) [11], [12], specifically, requires more elaborate detectors and data processing, but it has the potential of reaching much longer distances [13], [14] at higher speed and accuracy, with the advantage of being robust to high background noise, making it suitable for space, automotive, and consumer applications [15], [16].

One known drawback of dTOF, however, is data volume. For instance, automotive applications require over 100-m range, only few centimeters accuracy, and multiple measurements for a reasonable precision, which produce data rates that can reach tens or even hundreds of Gbps, in large sensors, thus setting processing constraints to even very efficient graphics processing units (GPUs) [17], as well as chip readout capability. It is essential to provide as much on-chip processing as possible, in order to reduce data throughput, thus reducing power consumption and speeding up processing time. Some architectures have been proposed [18] attempting to solve this problem, but the required memory renders them only feasible for an silicon photomultiplier (SiPM), singlepixel approach. Another known issue with light detection and ranging (LiDAR) is regarding the interference of multiple systems on each other. A software-based approach has been implemented [19], but requiring intensive post-processing resources.

In this article, we present a modular, digitally synthesized architecture for dTOF depth sensing [20]. It features local time-to-digital converters (TDCs), shared among several pixels, and an *in locus* processing unit, capable of uncertainty reduction. It introduces a laser signature, based on pulse-position modulation (PPM), that reduces interference and increases system robustness. The sensor is designed in a TSMC 3-D-stacking process, suitable for large scale arrays. The top tier, designed in 45-nm CMOS image sensor (CIS) technology, is dedicated to the single-photon avalanche diode (SPAD) array, whereas the bottom tier, designed in 65-nm CMOS, is dedicated to the processing circuitry. The proposed

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/



Fig. 1. Generic dTOF-based LiDAR system. With the future expansion of the single module into multiple modules on chip, different types of illuminator, including fixed laser arrays, will be used in both scanning and flash modes of operation.

dTOF sensor is characterized by single-point, long-range (up to 300 m) measurements, with narrow angular field-of-view (AFOV), and it is also used as a platform to implement a scanning LiDAR system, operating in short-range (0.1–10 m), with AFOV up to  $30^{\circ}$ , where a single module was used.

This article is organized as follows. In Section II, dTOF systems and their requirements for LiDAR operation are described and discussed. Then, in Section III, our sensor architecture, from pixel to readout circuit, is explained in detail, with supporting experimental results are presented in Section IV. Finally, conclusions are drawn in Section V.

## II. DIRECT TIME-OF-FLIGHT

A typical dTOF system is illustrated in Fig. 1. It operates by generating short pulses of light, triggered by a local periodic electrical source, whose photons travel to the target, are reflected back, and are subsequently detected by the sensor. By directly measuring the travel time of those photons ( $\Delta t$ ), the distance can be simply computed using  $d = c \cdot \Delta t/2$ , where *c* is the speed of light. This method is fast and accurate, since no elaborate, power hungry, or slow computation is required to extract the depth, as typically needed in stereoscopic vision and structured light. Multiple uncertainties and offsets can be present in the system, such as internal laser trigger delay ( $\delta$ ), laser pulsewidth, detector timing jitter, and quantization noise, which can be graphically represented as delays and timestamp spreads, as shown in Fig. 1 (bottom).

Since dTOF systems rely heavily on the absolute travel time of photons, measured by internal references and TDCs, the slightest timing error can result in large depth inaccuracies. Often, event-driven sensors, using reverse START/STOP TDCs [21] are employed, due to their potential for low power. However, depending on the sensor activity, proportional to the illumination, the power consumption can vary drastically and, consequently, the IR-drop in the TDC supply voltage can become non-negligible, thus rendering the TDC resolution unpredictable and difficult to calibrate [22]. On the contrary, a continuously running TDC array draws constant power,



Fig. 2. Proposed module implementation. (a) 3-D-stacking cross section. (b) Perspective view with photons reaching in BSI mode. (c) Block diagram—two subgroups of 8 x 8 pixels (SPADs), shared TDC, *in-locus* processing (DPCU), and memory.

reducing the TDCs' resolution unpredictability, independently on the activity, but at the expense of excessive power consumption.

Next, we will describe our proposed architecture. By exploiting a relatively moderate activity rate of the pixels [23], a continuously running TDC, shared among several pixels, is used, providing a more uniform and lower power solution. The system is implemented in a modular fashion, where several modules can handle multiple events simultaneously, independent of the time frame reference. The module will serve as a building block to larger systems, due to its self-containing nature (timing, storage, and processing), thus allowing the design of sensors of different sizes, without major impact on functionality.

This article makes use of a recently available 3-D-stacking technology. The silicon stack cross section is shown in Fig. 2(a), while a perspective view of the implementation is shown in Fig. 2(b).

# **III. PROPOSED ARCHITECTURE**

At the core of our proposed architecture lies a continuously running TDC. In order to reduce the overall power consumption, each TDC is shared among several pixels, through a series of edge-sensitive binary arbiters. Due to symmetry, the sampling signal is generated with virtually zero skew between the pixels. At the same time, the source of the event is tracked, which keeps the sensor granularity to a single SPAD. The block diagram of the system is displayed in Fig. 2(c).

Each TDC is shared among 128 pixels, divided into two independent subgroups of 64 pixels ( $8 \times 8$ ). The subgroup size was chosen in order to maintain a good compromise between conversion rate and power consumption [24], providing the expected activity rate, due to background noise



Fig. 3. Decision tree (8:1 concept), for signal propagation and ID extraction.

and returned signal. Two subgroups share a single TDC, where each pixel has a 19.8- $\mu$ m pitch, totaling 158.4 × 316.8  $\mu$ m<sup>2</sup> per module. Fig. 2 shows the block diagram of each subgroup, composed of a *decision tree*, which is responsible to manage multiple events across the pixels, generating a sampling signal, dTOF, and an identification (ID). The dTOF acquires the TDC timestamp, while the ID is used as a pointer for the *in-pixel* memory. This arrangement constitutes a module, which is digitally synthesized using custom-designed elements and regular standard-cells, and it is capable of operating autonomously, only being accessed for readout. Next, the details of each sub-block will be discussed.

## A. Decision Tree

The detection is managed entirely by the decision tree, responsible to organize, classify, and propagate only the first event from a burst, through a series (log<sub>2</sub> [# pixels] levels, i.e., 6 in our case) of decision makers. At each level, the earlier event of the two inputs is selected, allowing the signal propagation to the next level, while also generating an address bit. The pixels are connected to the first decision maker level and, the last, generate the TDC sampling signal (and clock for the local processing). A conceptual example (only three levels) of the described connection is shown in Fig. 3. Upon one or multiple events, a single dTOF signal is created, which is used to resample the winner pixel address (ID), while being sufficiently delayed to generate a reset signal for the tree. Internal processes, that will be seen further, such as memory read time, define the delay  $\Delta$ . If desirable, an external reset signal can be selected instead, which will limit the maximum number of events to the "Ext reset" signal rate.

The decision maker is shown in Fig. 4(a). Upon an event in any of the inputs, the logic one is sampled, where the earlier D-type flip-flop (DFF) output resets the later one. The DFF outputs are connected together, through a symmetric OR-gate, to generate the output Q. Internal nodes feed also an set– reset (SR) latch that generates the address A, identifying the



Fig. 4. Decision maker. (a) Schematic. (b) Metastability window simulation, after parasitic extraction, with and without nMOS latch.



Fig. 5. Passive quenching with electrical and optical masking capability, via internal memory.

event source. The structure is reset at the end of a complete event propagation, through all six levels, as well as by an external signal. Although there is no metastability between the inputs, potential conflicts between the DFF outputs could cause delay variations between the inputs and output  $(\tau_{in-to-Q})$ , affecting directly the timing. This issue is resolved through an nMOS latch, which reduces the delay variation from 120 to 7.5 ps ( $\pm 5\%$ ) within similar window ( $\Delta_{in} = \pm 7$  ps), as it can be observed in the post-layout simulation in Fig. 4(b).

The detection dead time between event acquisition is set to less than 2.4 ns, in order to accommodate all propagation delays, signal processing time, and tree reset, providing over  $830 \times 10^6$  conversions per second, through both parallel subgroups, in a total of 128 pixels. Such dead time implies an extra saturation bottleneck in the system, which should be ideally designed to accommodate the activity of all pixels in the arrangement, for both signal and noise. For this reason, it is essential to keep the decision tree dead time low, increasing the total conversion rate, so the moderate to high background noise can be dealt with. However, even under saturation, no signal distortion is observed due to the edge-sensitivity nature of the tree. An analysis and the tradeoff between subgroup size, decision tree dead time, sensor saturation, and conversion rate can be found in [24], where the subgroup arrangement choice done in this article is justified.

Passive quenching and recharge circuits are connected to the first level of decision makers, as shown in Fig. 5. They can be configured to output a pulse, proportional to the SPAD dead time, or a state, the latter being reset by an external signal. These modes are useful for applications where the user is interested either in the last or first event of each pixel, respectively. Also, since the *decision tree* has a certain dead time, the combination of multiple pixels tends to reduce the overall system saturation. Allowing all the pixels to fire continuously (pulse-mode) in conditions of high ambient noise or high activity spots (due to targets with high reflectivity), the tree can be saturated and less-active pixels can be completely masked. The state-mode allows only a single detection per pixel, per readout, which means that a pixel with high probability of detecting an event can be disabled after its first detection, allowing less-active pixels to fire later on, and thus also be acquired. The condition assumed is that between sensor readouts, there could be tens of thousands of laser pulses, and the detected events for different pixels may occur during different laser pulses. There is a chance that events in multiple pixels occur at the same time, which one of them would inevitable be lost due to conflict and rejection by the *decision tree*. Moreover, the larger the group sharing the same structure, better for hardware efficiency, but lower the saturation bound. Therefore, the trend is to reduce the total dead time of the decision tree and to operate the sensor in pulse mode.

Each SPAD can be disabled by an "electrical mask" (avalanches are prevented inside the SPAD) and a "logic mask" (a logic gate is used to stop propagation of avalanche pulses), through an externally configurable, internal 1-bit memory, thus avoiding any undesirable activity from a hot pixel. The frontend transistors ( $M_Q$ ,  $M_1$ , and  $M_2$ ) were implemented in thick oxide, allowing excess bias voltages up to 2.5 V.

According to TCAD simulations, the parasitic capacitance of the SPADs is minimized, to only a few femtofarad (about 35 fF, including SPAD, interconnection, quenching transistor, and buffer). This way, a simple passive quenching and recharge of about 100–200 k $\Omega$  can be used, providing fast avalanche quenching, while keeping the SPAD dead time below 10 ns.

Basic elements were custom-designed following the standard cell track and pitch, such as the decision maker, quenching, 1-bit memory, and TDC, for later use on the digital flow. In order to maintain symmetry, quenching, decision makers, and TDC were laid out via script, whereas everything else followed a standard digital synthesis flow. As an illustration, Fig. 6 shows the layout and final position of the elements placed via script. The symmetric connections between the pixels enable a maximum of 1% uniformity variation among the pixels, according to a Monte Carlo simulation, which can be calibrated during post-processing.

#### B. Time-to-Digital Converter

Since digital data processing can be performed within the module, it is essential to provide a readily available timing information straight from the TDC, which also imposes area restriction with respect to on-chip calibration and decoding. For these reasons, the TDC was designed using a current-starved eight pseudo-differential stages ring oscillator (RO), capable of providing 4-bit fractional resolution, through a set of sense-amplifier flip-flops (SAFF) [25]. The RO schematic and disposition is shown in Fig. 7(a). The frequency is



Fig. 6. Passive quenching, decision makers, and TDC location, in the digital flow.



Fig. 7. TDC. (a) Pseudo-differential stages and SAFF arrangement for the two, independent subgroups samplers. (b) Counter schematic. (c) Layout.

controlled by a pMOS current source, with identical schematic presented in [24]. The TDC consumes between 200 and 500  $\mu$ W, for  $\Delta_{LSB}$  of 204 and 61 ps, respectively, including RO and counter, in continuous operation.

Due to the relatively high speed of the RO (about 0.98 GHz— $\Delta_{LSB} \approx 61$  ps), an asynchronous binary counter was favored over a synchronous topology. However, since each bit of the counter is clocked by its predecessor, the delay accumulation through multiple stages can cause sampling errors. This is compensated by re-sampling the counter outputs



Fig. 8. DPCU block diagram for a single subgroup, with shared TDC.

with the same input clock and a chain of buffers. The block diagram is shown in Fig. 7(b). It is not mandatory to match the DFF delay with the re-sampling buffer, as long as the clock period is not extremely high. Hypothetically, if these delays were matched, the maximum counter operating frequency would be the inverse of a single DFF delay, pushing its limits to about 8 GHz (in 65 nm). Since the input clock is only about 0.98 GHz, by guaranteeing that the buffer delays are shorter than the DFF (which is most certainly the case, for library standard cells), and it is large enough to compensate partially the DFF delay, allowing the counter to operate more than twice as fast the required frequency, guaranteeing process-voltage-temperature (PVT) and four-corner operation. The sampling lines, coming from the dTOF signals, are then matched through exactly the same structure of buffer + DFF, as shown in Fig. 7(b).

The TDC is periodically sampled using an external signal for calibration, done off-chip. Due to continuous operation, its power consumption does not depend on the activity, thus the calibration is mainly used to track slow variations. Moreover, larger arrays, using several modules, can be synchronized by mutually coupling the TDCs [24], which reduces the burden on calibration. The TDC occupies a very small area of 550  $\mu$ m<sup>2</sup>, where about 40% of the area is dedicated to decoupling capacitors, while providing an equalized and calibration-free binary output. The layout is shown in Fig. 7(c).

#### C. Digital Processing and Communication Unit

From the decision tree, the dTOF signal and ID are fed to the digital processing and communication unit (DPCU). The former is used as a clock, whereas the latter is used to access the corresponding pixel memory, reading its previous information stored in memory, and combining it with the new timing information, sampled by dTOF. The result of the current processing information is then stored back into the memory, during the next, unrelated event. A block diagram of the DPCU is shown in Fig. 8.

The core of the processing unit is an arithmetic logic unit (ALU). Due to digital synthesis, its function can be more easily described and implemented. In our implementation, it can be configured to operate as a low-pass filter, through



Fig. 9. IIR filter Verilog simulation, for different  $\lambda = 2^0 \cdots -7$  and the effects on the standard deviation,  $\sigma_{\text{total}}$ , in meters.

a digital infinite impulse response (IIR) filter, and/or photon counting, for intensity measurements. The low-pass filter is responsible to accommodate multiple events between readouts, providing an average of the signal, in order to reduce its uncertainty. The frequency characteristics of the IIR filter and, consequently, the pole location, are controlled by the attenuation factor  $\lambda$ , which is realized as a right-bit-shift operator. The time-domain equation is expressed as

$$\mathbf{y}[k] = (1 - \lambda) \cdot \mathbf{y}[k - 1] + \lambda \cdot \mathbf{x}[k]. \tag{1}$$

An example of the IIR filtering can be seen in Fig. 9. Assuming a combination of several timing uncertainties to the system, such as the laser pulsewidth, SPAD jitter, and TDC integrated jitter and quantization noise, to a total of 0.8 ns, which corresponds to a depth uncertainty  $\sigma = 12$  cm. By changing the pole factor ( $\lambda$ ), the uncertainty progressively reduces to a minimum of  $\sigma = 1$  cm, for  $\lambda = 2^{-7}$ . The averaging effect of the filter produces an uncertainty reduction given by

$$\sigma_{\text{filtered}} = \frac{\sigma_{\text{total}}}{\sqrt{1/\lambda}}.$$
 (2)

The drawbacks of such signal processing are that, the smaller the  $\lambda$ , the slower the system, which could cause image blur. Moreover, in the presence of noise, this filtering approach is less effective, thus being suitable mostly for low noise applications intrinsically, including scanning system, with short integration time, small field-of-view (FOV) per point and high power laser, and/or via noise suppression [13], [15].

The TOF information is stored in a 14-bit memory. In order to host the fractional part of the IIR filter and/or to operate as intensity counter, an extra 7-bit memory was included, as shown at the bottom of Fig. 8. The 6-bit ID is already used as a pointer for the memory, not requiring it to be stored, thus totaling 21-bit memory per pixel. The extra 7-bit can be used for the aforementioned IIR filter, or also be configured to operate as an intensity counter (digital accumulator). A third configuration can be selected, where both modes can operate simultaneously, by reserving 4-bit for the IIR filter and the remaining 3-bit for the intensity counter.



Fig. 10. Custom-designed pixel memory. (a) Single-ended, tri-state SRAM. (b) 21-bit block memory per pixel.



Fig. 11. Laser signature concept. Implementation via encrypted key, divided according to modulation index and directly combined with digital TDC output.

To generate the memory array, a custom 1-bit static random access memory (SRAM), shown in Fig. 10(a), was designed. The read time was minimized using tri-state buffers, capable of driving the whole bank, with rail voltage, without the need of sense amplifiers or comparators. The organization and access of 21-bit pixel memory are shown in Fig. 10(b). Read and write times are 1.6 ns and 100 ps, respectively.

# D. Laser Signature

In a real scenario, multiple LiDAR systems might be operating simultaneously, from the same user or not. In any case, they all appear to each other as interferences and should be dealt with accordingly. Predicting such conditions, a codebased solution has been proposed [19], treating the problem mostly via firmware/software, which might increase postprocessing power and latency onto the system.

Alternatively, we propose a simple laser signature, applied directly to the laser trigger, through a digitally controlled delay line (DCDL), as well as to the acquired timestamp, by digital arithmetic calculation. The concept is shown in Fig. 11. Due to the discrete nature of the system, by controlling the position of the pulse with a known value, the signal can be recovered without any loss of information, while the interferences



Fig. 12. Laser signature histogram. (a) Signal modulation/recovery and interference scrambling. (b) Spectrum utilization for different delay gain (S), for 4-PPM modulation.

are scrambled, appearing as noise in the later accumulated histogram. Because the laser is shifted in time, we associate it with PPM, by defining the modulation index (K), which is the number of discrete laser positions, and the delay gain (S), defined by how much, in time, the laser is shifted per point.

The DCDL is implemented via an field-programmable gate array (FPGA)'s phase-locked loop (PLL), by selecting a desirable time shift. It could, however, be implemented via on-chip delay-locked loop (DLL), locked to the system clock, which can be beneficial to the versatility and speed of the modulation.

A generic histogram of such a scheme can be inspected in Fig. 12(a). In this illustration, the outgoing laser is spread over 16 equidistant chunks, uniformly, while the interference is unaware of the modulation and, consecutively, contained within a single chunk. The transmitted histogram is a representation of the scene, although it is not necessarily ever constructed. In the receiver, by applying the modulation to the TOF information, the detected signal is reconstructed, while the interference is then spread over the histogram, thus reducing its peak, easing a successful signal detection.

For maximum spectrum efficiency (interference reduction over spread in histogram), the delay offset, produced by the modulation, should correspond to the system uncertainty [fullwidth at half-maximum (FWHM)], as qualitatively demonstrated in an illustration shown in Fig. 12(b). If the delay gain is too low, the compounded histogram peak will have a peak higher than the individual chunks; if the delay gain is too high, the spectrum is overly utilized, putting constraints on the laser triggering capability. Moreover, to ease TDC correction, the modulation should be a multiple of the TDC LSB ( $\Delta_{LSB}$ ), unless extra fractional bits can be afforded.

In general, the delay gain *S*, see Fig. 11, which is effectively part of the DCDL, should be chosen as the nearest integer of  $\Delta_{\text{LSB}}$ , either in number of histogram bins or seconds, as

$$S = \left\lfloor \frac{\text{FWHM}}{\Delta_{\text{LSB}}} \right\rceil \quad \text{and} \\ \Delta \tau = S \cdot K \tag{3}$$

where  $\Delta \tau$  is the time delay, in picoseconds, applied to the laser trigger. The index *K* is chosen by simply selecting which bits to use, up to 8 bits in our case (256 PPM). A unique 128-bit

encrypted key can be added to the system, and subdivided in words of 8 or less bits, depending on K, to increase security. If optimized, the system provides interference reduction of about  $20 \cdot \log_{10} (0.89 \cdot K)$ .

One of the main advantages of the proposed laser signature is simplicity. In other schemes, such as code division multiple access (CDMA) [19], the signal must be acquired, demodulated, and processed, thus increasing power consumption and reducing speed. In our proposed solution, instead, the acquired signal is processed on chip, at the detection, and stored at its final value in time. It relaxes the post-processing and peak detection during time-correlated single-photon counting (TCSPC) histogramming.

Moreover, the robustness of the interference rejection can be increased by providing a frequency hopping operation on the laser trigger. Since the laser period is, in general, longer than the integration time, in order to increase the SNR, the laser trigger can be operated in a non-periodic fashion, reducing the chance of coincident operation, thus avoiding the detection of interference altogether. The example shown in Fig. 12 is the worst case scenario.

The proposed laser signature is currently being applied offchip, which limits the maximum modulation speed and is not compatible with the IIR filtering. Ideally, the modulation should be implemented on-chip, synchronized with an external DCDL, allowing local demodulation and full DPCU functionality, including the IIR filtering.

#### **IV. EXPERIMENTAL RESULTS**

The proposed architecture was implemented using TSMC 3-D-stacked technology, featuring a 4-metal, 45-nm CIS backside illumination (BSI) SPAD array, and a 5-metal, 65-nm low-power CMOS readout integrated circuit (ROIC), packaged in a ceramic QFP-120L. In order to prevent excessive IR-drop, especially in the extension of our approach to larger arrays, extra care was taken with power routing. The SPAD connection between dies cover 5% of the pixel area, leaving the remaining area for power mesh using the top two metal layers, shared between core and TDC supplies, with multiple connections all around the module.

Throughout the system operation and characterization, two lasers were used: for all depth measurements, a 532-nm PicoQuant VisUV, and for SPAD characterization and laser signature, a 637-nm ALDS PiL063X. In all measurements, the receiver is exposed without any lens or bandpass filters, through a 2-mm pinhole aperture. The FOV depends on the measurement and is described along this section. The laser is eye-safe for any of the given FOV and optical power [26].

The choice for lasers with different wavelengths was due to equipment availability in our lab. In real systems, different wavelengths should not interfere with each other, due to the presence of optical bandpass filters, which would also increase the system robustness to background noise, while providing the reported interference rejection for in-band harmful lasers.

Depth measurement precision, in dTOF systems, is directly related to timing error, as an independent combination of



Fig. 13. SPAD performance at excess bias voltage  $(V_E)$  of 2.5 V. (a) Timing jitter. (b) PDP.



Fig. 14. Irradiation measurement. (a) Setup. (b) DCR increase with accumulated dose.

SPAD response jitter, TDC variation (accumulated rms jitter and quantization noise), and laser pulsewidth, as

$$\sigma_{\text{total}} = \sqrt{\sigma_{\text{TDC\_rms}}^2 + \frac{\Delta_{\text{LSB}}^2}{12} + \sigma_{\text{SPAD}}^2 + \sigma_{\text{laser}}^2}.$$
 (4)

By approximating these sources to Gaussian-shape, FWHM  $\approx$  2.355 ·  $\sigma$  can be used, which is a widely adopted term in the SPAD sensor community.

The SPAD performance is shown in Fig. 13. Less than 108-ps FWHM timing jitter, maximum of 31.3% peak photon detection probability (PDP), and 55 cps/ $\mu$ m<sup>2</sup> dark-count rate (DCR) were measured, for the SPADs operating with excess bias voltage (above breakdown) of 2.5 V, and dead time of 100 ns, which leads to an afterpulsing probability of about 2.2% [27]. This device was selected from a larger sample, first time designed in 45-nm TSMC technology, whose design details are published at [27].

Since space applications are among the possible targets of this article, the sensor was exposed to a  $^{60}$ Co gamma source, shown in Fig. 14(a), so the effects of radiation on the device performance could be evaluated. At a dose rate of 73 krad/h, the DCR increases from 2.8 to 5.8 kcps over a 90-min exposure, as plotted in Fig. 14(b), and returned to the original value after annealing. The applied dose is much higher than required, thus, allowing the possibility for further investigations on use of this sensor for space applications.

For real ranging measurements, the sensor was characterized by single-point measurements, using targets with 50% reflectivity, perpendicular to the sensor optical axes. In this configuration, the sensor was operated in two modes: high resolution and low resolution. In the former, the TDCs were tuned to provide  $\Delta_{LSB} = 61$  ps and maximum range of about



Fig. 15. High-resolution single-point measurement. (a) Aerial view of measurement location. (b) Measured distance and accuracy.

1  $\mu$ s (14-bit), equivalent to 150-m range. In the latter mode,  $\Delta_{LSB}$  was tuned to 204 ps, covering about 3.34  $\mu$ s, which is equivalent to 500-m range. The characterization of the TDC leads to less than 2 LSB and 3 LSB differential and integral nonlinearities (DNL and INL), respectively, for  $\Delta_{LSB} = 61$  ps. The relatively high and periodic nonlinearity [20] arises from mismatches between the sampling signal and RO + counter phases [Fig. 7(b)]. Calibration can be performed to account for some of these issues, but since our solution requires internal binary TDC result, and the power and area budget for the module is scarce, no calibration was performed and reported here.

The laser parameters used here are 4 mW at 1-MHz frequency, and 1.4 mW at 300 kHz, for high- and low-resolution modes, respectively. Since the energy per pulse is roughly constant, in both modes, the optical energy per pulse is about 4 nJ, with pulsewidth of 80-ps FWHM, and 47-W peak power. In both modes, each measurement point was obtained by accumulating 100 chip readouts, and combining the dTOF information of all pixels of a single module, as a digital SiPM, into a histogram in MATLAB, without any other filter. The maximum chip readout is 2000 fps, totaling 20-fps depth measurement. All measurements were physically performed, without any emulation.

In high-resolution mode, the measurements were performed indoor. An aerial view of the location is shown in Fig. 15(a). The measured distance and accuracy are shown in Fig. 15(b), under indoor ambient light. The maximum accuracy error, i.e., the bias (deviation) of the mean value to the ground-truth, was below 7 cm (0.3% nonlinearity) and worst case standard deviation (precision) of 15 cm (0.1% uncertainty).

In low-resolution mode, the measurements were performed outdoor. Similarly, an aerial view of the location and the measured distance and accuracy are shown in Fig. 16(a) and (b), respectively. The maximum accuracy error was measured below 80 cm (0.4% nonlinearity) and worst case standard deviation (precision) of 47 cm (0.11% uncertainty).

The laser signature was measured and it is shown in Fig. 17, for 8-, 16-, and 32-PPM (index K of  $2^3$ ,  $2^4$ , and  $2^5$ , respectively, and gain  $S = 16 \cdot \Delta_{\text{LSB}}$ ). Two lasers were used in the measurements: a 637 nm, serving as interference,



Fig. 16. Low-resolution single-point measurement. (a) Aerial view of measurement location. (b) Measured distance and accuracy.



Fig. 17. Laser signature measurement (1-s integration time), for different PPM modulations. (a) No background illumination (FWHM = 0.7 ns). (b) 3 klux background illumination (FWHM = 1 ns).

and a 532 nm as signal, focused directly onto the sensor. Due to different laser wavelengths, no color filters were used, which increase the sensor susceptibility to background noise. Fig. 17(a) shows the effects of the modulation without any background illumination, where the interference reduction was measured very close to the expected value (about 1 dB off). Under background illumination, its effectiveness is reduced [see Fig. 17(b)], due to two effects: first, the noise adds a bias level for the histogram counts, for both signal and interference, reducing their ratio; second, our architecture is based on a sharing decision tree, and collisions between noise and signal are reflected on the maximum signal acquisition, thus the overall peaks (unmodulated and signal) are reduced in Fig. 17(b), if compared to Fig. 17(a).

| Parameter                     | Unit                      | This Work                | [13]                          | [15]                      | [28]                        | [29]                          |
|-------------------------------|---------------------------|--------------------------|-------------------------------|---------------------------|-----------------------------|-------------------------------|
| Technology                    | -                         | 45/65 nm CMOS            | 150 nm CMOS                   | 180 nm CMOS               | 130 nm CIS                  | 0.35µm CMOS                   |
| Architecture                  | -                         | Always-on, shared<br>TDC | Start/Stop, per-<br>pixel TDC | Column-wise<br>shared TDC | Histogramming<br>shared TDC | Start/Stop, per-<br>pixel TDC |
| Sensor characteristics        |                           |                          |                               |                           |                             |                               |
| Pixel count                   | -                         | $8 \times 16^{a}$        | 64×64                         | 340×96                    | 32×32                       | 32×32                         |
| Pixel pitch                   | $\mu$ m                   | 19.8                     | 60                            | 25                        | 21                          | 150                           |
| Pixel fill factor             | %                         | 31.3                     | 26.5                          | 70                        | 43                          | 3.14                          |
| SPAD DCR@V <sub>E</sub>       | cps/ $\mu$ m <sup>2</sup> | 55.4 @ 2.5 V             | 57 @ 3 V                      | 6 @ 3.3 V                 | N/A                         | 120 @ 6 V                     |
| TDC depth                     | bit                       | 14                       | 16/15                         | 12                        | 8                           | 10                            |
| TDC resolution                | ps                        | 61 – 204                 | 250 - 20000                   | 208                       | 71.4                        | 312                           |
| TDC power                     | mW                        | 0.5 - 0.2                | N/A                           | N/A                       | 14.1                        | 0.35/pixel <sup>f</sup>       |
| TDC area                      | $\mu$ m <sup>2</sup>      | 550                      | N/A                           | 31,000 <sup>d</sup>       | 30,000                      | $5,600^d$                     |
| TDC linearity                 | DNL [LSB]                 | +0.9/-1                  | $+1.2/-1^{b}$                 | +0/-0.52                  | +0.75/-0.61                 | +0.06/-0.06                   |
|                               | INL [LSB]                 | +3/0                     | +4.8/-3.2 <sup>b</sup>        | +0.73/-0.49               | +0.65/-0.2                  | +0.22/-0.22                   |
| Measured distance performance |                           |                          |                               |                           |                             |                               |
| Distance range                | m                         | 150 - 300                | $367 - 5862^c$                | 128                       | 2.82 - 3.375                | 48                            |
| Precision                     | m                         | 0.15 - 0.47              | $0.2 - 0.5^{c}$               | 0.1 <sup>e</sup>          | N/A                         | $0.04^{g}$                    |
|                               | %                         | 0.1 – 0.11               | $0.13 - 0.14^c$               | $0.1^{e}$                 | N/A                         | $0.8^{g}$                     |
| Accuracy                      | m                         | 0.07 - 0.8               | $1.5 - 35^c$                  | 0.37 <sup>e</sup>         | N/A                         | N/A                           |
|                               | %                         | 0.3 - 0.4                | 0.37 – 1.9 <sup>c</sup>       | 0.37 <sup>e</sup>         | N/A                         | N/A                           |

TABLE I Comparison performance of state-of-the-art CMOS LIDAR

 $^{a}$  Up to 256×256 resolution achieved by flexible scanning system.  $^{b}$  Measured over 5% of the total range.  $^{c}$  Emulated results with optical fiber.  $^{d}$  Estimated by layout.  $^{e}$  Measured at 100 m.  $^{f}$  DLL and TDC power.  $^{g}$  Measured at 5 m



Fig. 18. Coarse spatial resolution of  $32 \times 32$  image, featuring multiple targets with different reflectivities.

A dual-axis scanner was used to obtain higher spatial resolution images. In case of an optimized optical setup and availability of a higher power laser, the integration time per point could be substantially reduced, increasing the frame rate. Fig. 18 shows a  $32 \times 32$  image, featuring targets with different reflectivities (from 8 to 60%), with targets ranging from 4 to 10 m and about 30° of AFOV. The integration time in this case was set to 5 ms per point, or 10 chip readouts at the maximum rate, totaling 1280 TOF measurements per point, by combining the whole module. As can be seen from the depth map and a cross section (at row 30), the absolute ranging measurement is successfully acquired, independently on the target reflectivity and incident laser angle.

Another 3-D image was obtained through scanning, featuring a finer spatial resolution of  $256 \times 256$  and 7° AFOV. For this measurement, the same laser was used, but only 500  $\mu$ s per point integration time was used, due to higher reflectivity index, and combining all pixels in the module. The



Fig. 19. Fine spatial resolution,  $256 \times 256$  image: intensity and depth measurement simultaneously.

chip communication was implemented via an serial peripheral interface (SPI) controller, including the data readout, which provided flexibility to control the sensor, but limited the data throughput. Therefore, a maximum chip readout of 2000 fps was obtained, requiring 32 s to obtain the results shown in Fig. 19. A special feature about this image is that the internal DPCU was configured to obtain simultaneously the dTOF and intensity, where this 3-D reconstruction is an effective overlap of both measurements.

Table I shows a performance comparison of this article to recently published state-of-the-art LiDAR systems. The laser power, wavelength, and the speed of the measurement impact considerably the sensor sensitivity to noise. Reference [15] uses a 870-nm, 40-mW laser, to operate far away from the maximum of the sun irradiance and, in addition, it uses a narrow bandpass filter. Reference [13] uses the same wavelength as used in this article and, although capable of handling noise with the use of smart triggering, the results were emulated with a high-power laser and a fiber.



Fig. 20. Chip micrograph: only top tier (SPAD array) is visible.

A chip micrograph is shown in Fig. 20. Since in this article, a 3-D-stacked technology was used, the ROIC on the bottom tier is not visible, and only the circular shape SPAD array is visible.

# V. CONCLUSION

In this article, we have introduced a modular direct TOF sensor, based on TDC sharing, through a edge-sensitive decision tree, and *in-locus* data processing and storage. Each module is digitally synthesized and completely autonomous, which enables scaling to a desirable sensor size, without affecting its operation. The design was performed in a TSMC 3-D-stacking technology, featuring a BSI SPAD array on the top tier, connected to a readout and processing circuit on the bottom tier. A PPM-based laser signature recovery technique is proposed, achieving up to 28-dB interference reduction under no background noise conditions. Single-point measurements up to 150 and 300 m were achieved in two different resolution modes, with accuracy error lower than 0.4%. By using one module as a digital SiPM, 3-D images were obtained by a two-axis galvo scanning system, for up to 10-m range and 30° AFOV. With the future expansion of the single module into multiple modules on chip, different types of illuminator, including fixed laser arrays, will be used in both scanning and flash modes of operation.

The ideal operation wavelength depends on the system architecture and application. For example, as well-known, ambient light is by far the most important source of noise in the system, which can cause accuracy/precision reduction, by corrupting the signal data, and system saturation due to high activity. Another parameter to be considered is photon absorption and scattering in the environment, due to interactions with water, oxygen, and/or carbon dioxide molecules. In scanning-mode LiDAR, the exposure time per point is very short, and the laser is concentrated in a single point or line, thus minimizing the total converted noise. In this case, a wavelength in which the interaction with the environment is minimized [30] is ideal, for example, 850 and 905 nm [31], since most of the noise is already being rejected by a spatial gating effect of the scanner. In flash-mode systems, however, the exposure time is much longer, thus much more noise

is converted. In order to avoid system saturation, a wavelength in which the sun spectrum is strongly attenuated [30] is preferred, at around 765 or 940 nm [9], [10]. In this case, the system range is limited to few meters only and the effects of the environment are negligible.

All long range measurements in the article have been performed using a laser in the visible spectrum, at 532-nm wavelength, due to lab availability. Conversely, commercial LiDARs usually use non-visible lasers in the near-infrared spectrum, above 700 nm where, typically, CMOS detectors have lower sensitivity. An extrapolation of the results reported in this article can be done by comparing the SNR in different wavelengths. Assuming a constant optical power and a notsaturated system, the SNR at another wavelength is given by

$$SNR_{\lambda} = SNR_{532} \cdot \frac{PA_{532}}{PA_{\lambda}}$$
(5)

where PA<sub>532</sub> is the total sun spectral power, integrated around the bandpass filter centered at 532 nm, and PA<sub> $\lambda$ </sub> is similar integration at the extrapolated wavelength. For instance, at 850 nm, the system should provide an SNR 1.6× higher than at 532 nm, increasing the range by  $\sqrt{1.6} = 1.3\times$ . At 940 nm, the SNR increases by 5×, so the maximum range should increase by  $\sqrt{5} = 2.25\times$ . This simple extrapolation does not take into account the absolute range and a more careful study shall be performed depending on the application.

### ACKNOWLEDGMENT

The authors would like to thank PicoQuant GmbH for the laser loan and CEA-Leti for radiation measurement. They would also like to thank C. Zhang and A. Carimatto for fruitful discussion.

#### REFERENCES

- G. Yahav, G. J. Iddan, and D. Mandelboum, "3D imaging camera for gaming application," in *Int. Conf. Consum. Electron. (ICCE) Dig. Tech. Papers*, Jan. 2007, pp. 1–2.
- [2] E. Bastug, M. Bennis, M. Médard, and M. Debbah, "Toward interconnected virtual reality: Opportunities, challenges, and enablers," *IEEE Commun. Mag.*, vol. 55, no. 6, pp. 110–117, Jun. 2017.
- [3] M. Kutila, M. Jokela, G. Markkula, and M. R. Rué, "Driver distraction detection with a camera vision system," in *Proc. IEEE Int. Conf. Image Process. (ICIP)*, vol. 6, Sep. 2007, p. VI-201.
- [4] K. Khoshelham and S. O. Elberink, "Accuracy and resolution of Kinect depth data for indoor mapping applications," *Sensors*, vol. 12, no. 2, pp. 1437–1454, Feb. 2012.
- [5] R. M. Philipp and R. Etienne-Cummings, "A 128×128 33 mw 30 frames/s single-chip stereo imager," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2006, pp. 2050–2059.
- [6] D. Scharstein and R. Szeliski, "High-accuracy stereo depth maps using structured light," in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 1, Jun. 2003, p. I.
- [7] T. Al Abbas, N. A. W. Dutton, O. Almer, S. Pellegrini, Y. Henrion, and R. K. Henderson, "Backside illuminated SPAD image sensor with 7.83 μm pitch in 3D-stacked CMOS technology," in *IEDM Tech. Dig.*, Dec. 2016, pp. 1–8.
- [8] C. S. Baniji et al., "A 0.13 μm CMOS system-on-chip for a 512×424 time-of-flight image sensor with multi-frequency photo-demodulation up to 130 MHz and 2 GS/s ADC," *IEEE J. Solid-State Circuits*, vol. 50, no. 1, pp. 303–319, Nov. 2015.
- [9] AMS Group, Premstaetten, Austria. *Time-of-Flight Camera*. Accessed: Aug. 1, 2019. [Online]. Available: https://ams.com/time-of-flight

- [10] ST Microelectronics, Edinburgh, U.K. Proximity Sensors. Accessed: Aug. 1, 2019. [Online]. Available: https://www.st.com/en/imaging-andphotonics-solutions/proximity-sensors
- [11] R. A. Jarvis, "A laser time-of-flight range scanner for robotic vision," *IEEE Trans. Pattern Anal. Mach. Intell.*, vol. PAMI-5, no. 5, pp. 505–512, Sep. 1983.
- [12] U. Lehmann, M. Sergio, S. Pietrocola, C. Niclass, E. Charbon, and M. A. M. Gijs, "A CMOS microsystem combining magnetic actuation and *in-situ* optical detection of microparticles," in *Proc. IEEE Int. Solid-State Sens., Actuators Microsyst. Conf. (TRANSDUCERS)*, Jun. 2007, pp. 2493–2496.
- [13] M. Perenzoni, D. Perenzoni, and D. Stoppa, "A 64×64-pixels digital silicon photomultiplier direct TOF sensor with 100-MPhotons/s/pixel background rejection and imaging/altimeter mode with 0.14% precision up to 6 km for spacecraft navigation and landing," *IEEE J. Solid-State Circuits*, vol. 52, no. 1, pp. 151–160, Jan. 2017.
- [14] J. Lee, Y.-J. Kim, K. Lee, S. Lee, and S.-W. Kim, "Time-of-flight measurement with femtosecond light pulses," *Nature Photon.*, vol. 4, no. 10, p. 716, 2010.
- [15] C. Niclass, M. Soga, H. Matsubara, S. Kato, and M. Kagami, "A 100-m range 10-frame/s 340×96-pixel time-of-flight depth sensor in 0.18-μm CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 2, pp. 559–572, Feb. 2013.
- [16] K. Yoshioka *et al.*, "A 20-ch TDC/ADC hybrid architecture LiDAR SoC for 240×96 pixel 200-m range imaging with smart accumulation technique and residue quantizing SAR ADC," *IEEE J. Solid-State Circuits*, vol. 53, no. 11, pp. 3026–3038, Sep. 2018.
- [17] V. Campmany, S. Silva, A. Espinosa, J. C. Moure, D. Vázquez, and A. M. López, "GPU-based pedestrian detection for autonomous driving," 2016, arXiv:1611.01642. [Online]. Available: https://arxiv.org/ abs/1611.01642
- [18] N. A. Dutton *et al.*, "A time-correlated single-photon-counting sensor with 14 GS/S histogramming time-to-digital converter," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 1–3.
- [19] T. Fersch, R. Weigel, and A. Koelpin, "A CDMA modulation technique for automotive time-of-flight LiDAR systems," *IEEE Sensors J.*, vol. 17, no. 11, pp. 3507–3516, Mar. 2017.
- [20] A. R. Ximenes, P. Padmanabhan, M.-J. Lee, Y. Yamashita, D. Yaung, and E. Charbon, "A 256×256 45/65 nm 3D-stacked SPAD-based direct TOF image sensor for LiDAR applications with optical polar modulation for up to 18.6 dB interference suppression," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 96–98.
- [21] C. Veerappan et al., "A 160×128 single-photon image sensor with on-pixel 55 ps 10 b time-to-digital converter," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2011, pp. 312–314.
- [22] C. Veerappan et al., "Characterization of large-scale non-uniformities in a 20 k TDC/SPAD array integrated in a 130 nm CMOS process," in Proc. IEEE Eur. Solid-State Device Res. Conf. (ESSDERC), Sep. 2011, pp. 331–334.
- [23] C. L. Niclass, "Single-photon image sensors in CMOS: Picosecond resolution for three-dimensional imaging," École Polytechnique Fédérale Lausanne, Lausanne, Switzerland, Tech. Rep. 125145, 2008, p. 262. [Online]. Available: http://infoscience.epfl.ch/record/125145
- [24] A. R. Ximenes, P. Padmanabhan, and E. Charbon, "Mutually coupled time-to-digital converters (TDCs) for direct time-of-flight (dTOF) image sensors," *Sensors*, vol. 18, no. 10, p. 3413, 2018.
- [25] B. Nikolic, V. G. Oklobdzija, V. Stojanovic, W. Jia, J. K.-S. Chiu, and M. M.-T. Leung, "Improved sense-amplifier-based flip-flop: Design and measurements," *IEEE J. Solid-State Circuits*, vol. 35, no. 6, pp. 876–884, Jun. 2000.
- [26] I. White and H. Dederich, "American national standard for safe use of lasers," Laser Inst. America, Orlando, FL, USA, Tech. Rep. ANSI Z 136.1-2007, 2007.
- [27] M.-J. Lee *et al.*, "High-performance back-illuminated three-dimensional stacked single-photon avalanche diode implemented in 45-nm CMOS technology," *IEEE J. Sel. Topics Quantum Electron.*, vol. 24, no. 6, Nov./Dec. 2018, Art. no. 3801809.
- [28] T. A. Abbas, N. A. W. Dutton, O. Almer, N. Finlayson, F. M. D. Rocca, and R. Henderson, "A CMOS SPAD sensor with a multi-event folded flash time-to-digital converter for ultra-fast optical transient capture," *IEEE Sensors J.*, vol. 18, no. 8, pp. 3163–3173, Apr. 2018.
- [29] F. Villa et al., "CMOS imager with 1024 SPADs and TDCs for singlephoton timing and 3-D time-of-flight," *IEEE J. Sel. Topics Quantum Electron.*, vol. 20, no. 6, pp. 364–373, Nov./Dec. 2014.

- [30] Solar Energy–Reflectance Solar Spectral Irradiance at the Ground at Different Receiving conditions—Part 1: Direct Normal and Hemispherical Solar Irradiance for Air Mass 1.5, document ISO 9845-1, 1992.
- [31] R-SERIES SIPM: Silicon Photomultiplier Sensors. Accessed: Aug. 1, 2019. [Online]. Available: https://www.onsemi.cn/pub/Collateral/ MICRORB-SERIES-D.PDF



Augusto Ronchini Ximenes (S'10) received the B.S.E.E. and M.S.E.E. degrees from the State University of Campinas, Campinas, Brazil, in 2008 and 2011, respectively, and the Ph.D. degree from the Delft University of Technology, Delft, The Netherlands, in 2019.

In 2008, he spent nine months at McMaster University, Hamilton, ON, Canada, as an undergrad exchange student, working on post-processing APS image sensors. In 2009, he spent six months at the Technical University of Denmark (DTU), Kongens

Lyngby, Denmark, as a master's exchange student, working on RF circuit design. From 2010 to 2012, he worked as an RF Circuit Designer at the Center for Information Technology Renato Archer (CTI), Campinas. In 2015, he was an intern at Xilinx, Dublin, Ireland, working on high-performance ADPLLs using FinFet technology. He is currently working on depth sensors for AR/VR applications at Facebook, Inc., Redmond, WA, USA. His main research interests include mixed-signal circuit design, frequency synthesizers, and time-of-flight depth sensors.



**Preethi Padmanabhan** (S'17) received the M.Sc. degree (summa cum laude) in electrical engineering from the Delft University of Technology, Delft, The Netherlands, in August 2016. She is currently pursuing the Ph.D. degree in microsystems and microelectronics with the École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland. During the master's study, in 2015, she did an internship at NASA's Jet Propulsion Laboratory (JPL), Pasadena, CA, USA, where she designed a CMOS readout circuit for UV photodetectors.

Her current research interests include analog and digital circuit design, primarily focused on the design of time-of-flight depth sensors for LiDAR applications.



Myung-Jae Lee (S'08–M'13) received the B.S., M.S., and Ph.D. degrees in electrical and electronic engineering from Yonsei University, Seoul, South Korea, in 2006, 2008, and 2013, respectively. His doctoral dissertation concerned silicon avalanche photodetectors fabricated with standard CMOS/BiCMOS technology.

From 2013 to 2017, he was a Postdoctoral Researcher with the Faculty of electrical engineering, Delft University of Technology (TU Delft), Delft, The Netherlands, where he worked on single-

photon sensors and applications based on single-photon avalanche diodes. In 2017, he joined the School of Engineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland, as a Scientist, working on advanced single-photon sensors and applications and coordinating and managing several research projects as a Co-Principal Investigator. Since 2019, he has been a Senior Research Scientist with the Post-Silicon Semiconductor Institute, Korea Institute of Science and Technology (KIST), Seoul, where he has led the research and development of next-generation single-photon detectors and sensors for various applications. His research interests have spanned from photodiodes/photodetectors to single-photon detectors/sensors, concentrating since 2006 on CMOS-compatible avalanche photodetectors and single-photon avalanche diodes and applications thereof (e.g., LiDAR, 3-D vision, biophotonics, quantum photonics, space, security, silicon photonics, and optical interconnects).



**Yuichiro Yamashita** received the B.S. and M.S. degrees in electrical engineering from Tohoku University, Sendai, Japan, in 1995 and 1997, respectively, and the bachelor's degree in engineering from Stanford University, Stanford, CA, USA, in 2003.

In 1997, he joined Canon Inc., Tokyo, Japan, where he engaged in the research and development of the CIS pixel devices and readout circuits and the design of the CIS products. Since 2012, he has been with TSMC, Taiwan, where he has been responsi-

ble for simulation, characterization, and exploratory research studies of sensing devices. He holds more than 100 granted patents. Mr. Yamashita is a member of ITE.



**Dun-Nian Yaung** received the M.S. and Ph.D. degrees from the Institute of Microelectronics, National Cheng Kung University, Tainan, Taiwan, in 1994 and 2000, respectively.

In 1995, he joined Taiwan Semiconductor Manufacturing Company (TSMC), Taiwan, in 1995, where he dedicated in process integration and SRAM development. From 1999, he led CMOS Image Sensor RD Team in 0.25-0.11- $\mu$ m FSI development,  $0.11\mu/N65$  BSI, and stack technology initiation. He is currently the Director of CMOS Image Sensor

Divisions, TSMC Research and Development. He has authored and coauthored more than 45 papers. He holds 250 patents.

Dr. Yaung served as a Subcommittee Member for Display, Sensor and MEMS Session of IEDM, from 2012 to 2014, and has been a member of the Technical Program Committee of IISW since 2015.



Edoardo Charbon (SM'00–F'17) received the diploma degree in electrical engineering and EECS from ETH Zürich, Zürich, Switzerland, in 1988, the M.S. degree in electrical engineering and EECS from the University of California at San Diego, La Jolla, CA, USA, in 1991, and the Ph.D. degree in electrical engineering and EECS from the University of California at Berkeley, Berkeley, CA, USA, in 1995.

He has consulted numerous organizations, including Bosch, X-FAB, Texas Instruments, Maxim,

Sony, Agilent, and the Carlyle Group. From 1995 to 2000, he was with Cadence Design Systems, where he was the Architect of the company's initiative on information hiding for intellectual property protection. In 2000, he joined Canesta Inc., Sunnyvale, CA, USA, as Chief Architect, where he led the development of wireless 3-D CMOS image sensors. Since 2002, he has been a member of the Faculty of EPFL, Lausanne, Switzerland, where he has been a Full Professor since 2015. From 2008 to 2016, he was with the Delft University of Technology, Delft, The Netherlands, as a Chair of VLSI design. He has been the driving force behind the creation of CMOS SPAD Technology which is mass produced since 2015 and present in telemeters, proximity sensors, and medical diagnostics. He has authored or coauthored more than 300 peer-reviewed papers and two books. He holds 20 patents. His interests span from 3-D vision, FLIM, FCS, NIROT to super-resolution, and time-resolved Raman spectroscopy to cryo-CMOS circuits and systems for quantum computing.

Dr. Charbon is a fellow of the Kavli Institute of Nanoscience Delft. He is a Distinguished Visiting Scholar of the W. M. Keck Institute for Space at Caltech and a Distinguished Lecturer of the IEEE Photonics Society.