# 128-channel high-linearity resolutionadjustable time-to-digital converters for LiDAR applications: software predictions and hardware implementations

Wujun Xie, Yu Wang, Haochang Chen, and David Day-Uei Li

Abstract—This paper proposes a new calibration method, called the mixed-binning (MB) method, to pursue highlinearity time-to-digital converters (TDCs) for light detection and ranging (LiDAR) applications. The proposed TDCs were developed using tapped delay-line (TDL) cells in fieldprogrammable gate arrays (FPGAs). With the MB method, we implemented a resolution-adjustable TDC showing excellent linearity in Xilinx UltraScale FPGAs. We demonstrate a 128-channel TDC to show that the proposed method is cost-effective in logic resources. We also developed a software tool to predict the performances of TDL-based TDCs robustly. Results from both software analysis and hardware implementations are in good agreement and show that the proposed design has great potential for multichannel applications; the averaged  $DNL_{pk-pk}$  and  $INL_{pk-pk}$  are close to or even less than 0.05 LSB in multichannel designs.

## Index Terms—Light detection and ranging (LiDAR), Timeto-digital converters (TDCs), Time-of-flight (ToF), Fieldprogrammable gate arrays (FPGAs)

#### I. INTRODUCTION

Time-to-digital converters (TDCs), or simply high-precision time-sensors, have been widely used in industrial applications, including time-of-flight (ToF) light detection and ranging (LiDAR) in robotics, driverless vehicles, property surveying and landscape mapping [1]–[9], digital synthesizers for enhanced Gigabit Ethernet and wireless communications [10]–[12], thermal management systems for the Internet of Things and semiconductor manufacturing [13]–[15]. TDCs are also critical in time-resolved biomedical imaging techniques such as fluorescence lifetime imaging (FLIM) [16]–[18] and positron emission tomography (PET) [19]–[21].

The resolution, linearity and precision are three critical parameters to evaluate the performance of a TDC. The resolution or the least significant bit (LSB) is the least time interval a TDC can measure. The linearity can be characterized by the differential nonlinearity (DNL) and the integral nonlinearity (INL) [22]. The DNL is the deviation of a single quantization step from its ideal value, whereas the INL is the

W. Xie, Y. Wang, H. Chen, and D. D.-U. Li\* are with the Faculty of Science, University of Strathclyde, Glasgow, G4 0RE, U.K., (email:

accumulation of DNLs. The precision can be expressed as [22]:

$$\sigma_{TDC}^2 = \sigma_{in}^2 + \sigma_{clk}^2 + \sigma_q^2 + \sigma_{INL}^2 + \sigma_{extra}^2, \qquad (1)$$

where  $\sigma_{in}$  is the input signal jitter,  $\sigma_{clk}$  is the system-clock jitter,  $\sigma_q$  is the quantization error,  $\sigma_{INL}$  the INL standard deviation, and  $\sigma_{extra}$  jitters from external sources (R1, comment 1).

Digital TDCs can be implemented in application-specific integrated circuits (ASICs) or field-programmable gate arrays (FPGAs). Compared with ASIC-TDCs, FPGA-based TDCs have advantages in fast prototyping and realization. As carrychain modules are well established in modern FPGAs, tapped delay line (TDL) structures have been popular for FPGA-TDCs. However, due to non-uniform carry-chains and clock-tree distributions [23] (causing clock skews), FPGA-TDCs usually deliver worse linearity than ASIC-TDCs. However, innovative and cost-effective correction approaches have been proposed to change the landscape. The bin decimation method reorganizes physical bins into new groups to minimize the INL [24]. Won and Lee reported that the TDL could be tuned by changing carry-chain modules' output patterns, resulting in better linearity [25]. In 2019, we proposed a mixed-calibration (MC) method [26] showing a 5.0 ps high-linearity FPGA-TDC  $(DNL_{pk-pk} = 0.27 \text{ LSB and } INL_{pk-pk} = 0.51 \text{ LSB})$ , comparable to ASIC-TDCs with a similar resolution [27], [28] (R2, Minor 1; R3, Major 1).

There is a growing research trend for high-resolution TDCs due to increasing demands for high-precision PET imaging, early medical diagnosis, or biosensing [21], [29], [30]. Many architectures and methods, including the dual-sampling structure, the Vernier delay line, the multi-phase design, the multi-chain design and the wave-union method, were proposed to overcome process-related limitations improve TDC resolutions [31]–[35]. Other logic resources, for example, routings and digital signal processing (DSP) blocks, can also be used to build TDCs [36], [37].

TDCs in ToF LiDAR systems for robotics and driverless vehicles have different prioritized parameters, especially in the

wujun.xie@strath.ac.uk; y.wang.100@strath.ac.uk; haochang.chen@strath.ac.uk; david.li@strath.ac.uk).



Fig. 1. a) LiDAR system. b) Timing diagram of time interval measurements. c) Conception of a timing event histogramming function.

linearity and the measurement range [7]-[9]. In many applications, LiDAR systems can detect objects' locations and even estimate their speeds and directions of movements [9]. Distances between vehicles, for example, measured by a LiDAR system, can range from a few centimeters to hundreds of meters. Therefore, LiDAR systems for such specifications require TDCs with 50-200 ps resolution [8]. Noted that in ToF measurements, a time interval of 66.6 ps corresponds to a distance of 1 cm distance. (R2, Major 2; R4, Major According to Eq. (1), with a given resolution, improving linearity is an efficient way to achieve precise measurements (R4, Minor B). An extensive measurement range can be easily achieved using coarse and fine counters. For the coarse counter, the mixed-mode clock manager (MMCM) provides a stable and compensated clock [38]. Therefore, coarse-time codes do not need further calibration. However, for the fine counter, due to the uneven delay line, it is still challenging to guarantee high linearity. (R4, Minor 1).

In a ToF LiDAR system, measurements are assessed by a TDC channel, as the laser diode and the TDC are synchronous to the timing generator module (see Fig. 1a). Figure 1b shows a timing diagram of time interval measurements; the measured time contains two parts: the coarse time  $(T_{coarse})$  and the fine time  $(T_{fine})$ . However, device uncertainties and offsets, such as the laser trigger delay ( $\delta$ ), detector timing jitter, background noise and quantization error, can result in measurement uncertainties [4]. Therefore, post-processing (histogramming of timing events, see Fig. 1c) is needed. To improve processing efficiency, onboard histogramming modules using on-chip block random-access memories (BRAMs) were proposed [26], [39], [40]. However, for TDCs with an extended measurement range (> 500 ns), onboard histogramming modules cost significant BRAM resources, not suitable for multichannel TDC designs. Therefore, many previously reported TDCs with long measurement ranges can only post-process data in PCs [35], [37], [41], [42].

Furthermore, most time-correlated single-photon counting (TCSPC) commercially available systems only have a fixed resolution [30]. Commercial TCSPC products [43] providing operation-mode selections (for example, high-speed low-resolution or low-speed high-resolution modes) are standard. It is desirable to have a resolution-adjustable TDC offering broader time-resolved applications.

We aim to develop a high-linearity resolution-adjustable TDC with a histogramming function for long-range synchronous LiDAR applications. The main innovations and contributions of this work include:

- We developed comprehensively the mixed-binning (MB) method first proposed in a work-in-progress report [44] (implementing a preliminary 8-channel 50 ps TDC with a measurement range of only 2 ns) to implement a 128-channel resolution-adjustable FPGA-TDC. The proposed TDC delivers excellent linearity and precision performances, even better than recently reported ASIC-TDCs in LiDAR systems [7]–[9]. Moreover, it is costeffective, suitable for commercial applications [43].
- 2) The proposed TDC contains a built-in 1000 ns (measurement range) histogramming function in FPGAs using coarse-fine histogramming modules.
- 3) A software tool [45] has been developed to predict the performance of the proposed TDCs. The 128-channel TDC was implemented and tested in the Xilinx Kintex UltraScale KCU105 Kit (UltraScale XCKU040). Testing results show that the software tool can predict the performance of the proposed TDC robustly.

#### II. ARCHITECTURE AND DESIGN

# A. Architecture

Figure 2a shows the proposed TDC architecture. The TDL is implemented with cascaded carry-chain modules (CARRY8, CY8) in UltraScale FPGAs. The TDL is also tuned to maximize linearity [25]. The sub-TDL structure [26] can remove bubbles effectively by elongating tap intervals to minimize mismatch effects, whereas the encoder converts thermometer codes from sub-TDL modules to binary codes (R2, Minor 2). The TDC resolution is adjustable by the signal *Resol\_sel* (highlighted in red). A 9-bit coarse counter is for extending the measurement range, and the signal, *Trig*, is an asynchronous reset signal for the coarse counter. The calibration module (highlighted in blue) can be removed if the calibration method is not applied. We call the uncalibrated TDC the *original TDC* in this report.

Figure 2b is the block diagram for the coarse code histogramming module. In [6], a two-step coarse-fine timing method was proposed to achieve a histogramming function for a measurement distance > 50 m. Long-range measurements are divided into two steps: coarse timing and fine timing, requiring more photon events. However, the UltraScale XCKU040 FPGA has sufficient resources to implement two histogramming modules simultaneously (see Fig. 2a): the fine histogramming module and the coarse histogramming module. (R4, Minor 1).

Figure 2c shows the hardware implementation of the proposed MB method with resolution adjustments. Two BRAM



modules are used in the proposed method: the calibration *I* module and the histogramming module. Unlike the MC method [26] in Fig. 2d, 1) the calibration module contains serval BRAMs (for different resolutions) in the extended MB method and 2) each calibration BRAM only stores *two* factors: the bin-correction factor (*BCF*) and the bin-width calibration factor (*WCF*). A multiplexer is controlled by the signal *Resol\_sel* and outputs the factors for the corresponding resolution. The

## B. Distortions caused by mixed-calibration methods

histogram is stored in the module Histo BRAM.

TDCs with the MC method [26] (derived from the histogram processing algorithm [46]) show excellent linearity; both  $DNL_{pk-pk}$  and  $INL_{pk-pk}$  are much less than 1 LSB. The MC method contains two steps: bin compensations and width calibrations. As shown in Fig. 2d, four factors are stored in the calibration BRAM: the main bin factor ( $BCF_m$ ), the compensated bin factor ( $BCF_c$ ), the main width factor ( $WCF_m$ ) and the compensated width factor ( $WCF_c$ ).  $BCF_m$  and  $BCF_c$  are used to re-assign actual TDLs to virtual TDLs (see Fig. 3a). The width calibration makes TDLs more even, e.g., Bin [CAL2] and Bin [CAL4]. Although the MC's bin compensation (related to  $BCF_m$  and  $BCF_c$ ) can improve linearity, it introduces extra errors ( $\sigma_{comp}$ ). As a simple example shown in Fig. 3c, hit signals with a fixed time interval are registered in one bin ideally before the MC's bin compensation (without considering

jitters from circuits and signals). In this scenario,  $\sigma_{comp} = 0$ . However, following the rules shown in Fig. 3a, a much larger bin (e.g.,  $Bin_{actual}$  [3], highlighted in yellow in Fig. 3a) remaps to two ideal bins, resulting in  $\sigma_{comp} \sim 0.5$  LSB (see Fig. 3c). Although  $\sigma_{comp}$  can be reduced through width calibration, it is still significant in LiDAR TDCs when LSB  $\gg 10$  ps. In other words, the MC method can 'over calibrate' the proposed TDC and is therefore not suitable for this work. Instead, we developed a much efficient MB strategy to improve linearity. (R2, Major 1, Minor 3)

# C. Mixed-binning method with resolution adjustments

We aim to develop high-linearity TDCs for driverless vehicle LiDAR systems instead of high-resolution solutions for scientific applications [26]. The proposed MB method integrates the binning method (or the bin decimation [24]) and the width calibration. To avoid  $\sigma_{comp}$ , each fine code is remapped to a new bin (see the difference between Figs 2c and 2d, highlighted in yellow). Unlike down-sampling methods with a fixed sampling interval, the binning method is more flexible, merging several physical bins, regardless of smaller or larger bins, into a new bin and making a more even TDL with a larger average bin size. Figure 3c shows the binning method's concept with resolution adjustments, and the pseudo-code is shown below.

assume *n* actual bins and *m* merged bins  $(m=\text{floor}\left(\frac{n}{i}\right))$ set  $W_{merged}[m] = i \times W_{ideal}$ set  $T_{merged}[m] = \sum_{0}^{k=m-1} W_{merged}[m]$ set  $T_{actual}[n] = \sum_{0}^{k=n-1} W_{actual}[n]$ For k = 0: *n*   $j = \text{floor}\left(\frac{k}{i}\right)$ if  $(T_{actual}[k] < T_{merged}[j])$  BCF[k] = jelse continue...

As in Fig. 3b, the ideal width of the *m*-th merged bin is:

$$W_{merged}[m] = i \times W_{ideal},\tag{2}$$

where *i* is the number of ideal bins to be merged and  $W_{ideal}$  is the ideal bin-width (R1, Minor 1). *BCFs* are the addresses of merged bins calculated by the actual bin distribution obtained from code density tests. *BCFs*' remapping operations can make the TDL smoother but cannot even the bins. Therefore, the bin width calibration is needed to enhance linearity further (R2, Major 1, Minor 3). *WCFs* can be considered as a normalization factor and can be estimated from the results of code density tests after binning, expressed as:

$$WCF[m] = (DNL\{BCF[m]\} + 1)^{-1},$$
 (3)

To implement Eq. (3) in FPGAs, WCF[m] can be converted into an approximate integral number in binary codes [40]:

$$WCF[m] = 2^{M} \cdot (DNL\{BCF[M]\} + 1)^{-1}.$$
 (4)

Accumulation operations for *WCFs* act like multiplication operations (see Fig. 2c, highlighted in green). The *J*-bit output data from *Histo\_BRAM* is right-shifted by *M*-bit (in red).

# D. Software predictions

We have developed a software tool to predict TDC performances before hardware implementations (available in [45]; readers interested in it can upload an uncalibrated design).

To find a proper *i*, a full-length (2400 bins; LSB = 5.13 ps) original TDC placed in Slice X49Y0-X49Y299 was implemented without using the proposed MB method. Figure 4a shows its linearity curves;  $DNL_{pk-pk}$  is 8.63 LSB and  $INL_{pk-pk}$  is 41.81 LSB (R2, Minor 7). For multichannel (128 or more channels) TDCs, TDLs are placed in the whole FPGA chip. Therefore, to ease ultra-wide bin problems, the proposed TDC has three different resolutions by merging 10, 16 and 20 ideal bins. The achievable resolutions are around 50 ps, 80 ps, and 100 ps, respectively; they are typical resolutions for ToF LiDAR applications [7]–[9] and are also similar to the resolutions of commercial TCSPC systems [43], [47].

We tested 17 original TDCs placed in different clock regions (where the final 128-channel TDC was implemented) to ensure that the software tool covers possible variations as much as possible. Figure 4b shows the linearity curves for one of the tested original TDCs placed in Slice X49Y120-X48Y179, achieving 5.02 ps resolution with DNL<sub>*pk*-*pk*</sub> = 3.86 LSB and INL<sub>*pk*-*pk*</sub> = 8.52 LSB. Figure 4c shows the tool's graphical user interface (GUI). The linearity measurements of original TDCs are used as the raw data for predictions, selected by the channel number (defined as *Ch-No* in the GUI).

With the binning method, a new TDL can be built by remapping actual bins to merged bins [24]. WCFs make the TDL more even. Due to the clock network, ultra-wide bins commonly appear at the edges of a TDL [48] (R2, Minor 4). To further improve the linearity, we only select a segment of the TDL by changing the start-point and the end-point (see Fig. 4c). In this case, the start-point is Bin 6, and the end-point is Bin 400.

Measurements contain two parts: signal propagation ( $\sigma_{sig}$ ) and equivalent quantization ( $\sigma_{eq}$ ). Therefore, the expected precision ( $\sigma_{exp}$ ) of a TDC can be expressed as:

$$\sigma_{exp}^2 = \sigma_{sig}^2 + \sigma_{eq}^2. \tag{5}$$

According to [49],  $\sigma_{sig}$  can be re-derived as:

$$\sigma_{sig}^{2} = \sigma_{clk}^{2} + \sigma_{in}^{2} + \sigma_{DL}^{2} + \sigma_{TIC}^{2}.$$
  
=  $\sigma_{clk}^{2} + \sigma_{in}^{2} + \frac{n}{2}\sigma_{CY}^{2} + \sigma_{TIC}^{2}.$  (6)

 $\sigma_{clk}$  is the system-clock jitter and  $\sigma_{in}$  is the input signal jitter (R2, Minor 6). The architecture-dependent jitter ( $\sigma_{DL}$ ) caused by delay elements ( $\sigma_{CY}$ ) accumulates through the delay line. The jitter from input circuits ( $\sigma_{TIC}$ ) is negligible in single-TDL single-stage TDCs, as signals are from input/output buffers (IOBs) and transmitted via internal wire connections. (R1, Minor 1). Therefore, the expected precision ( $\sigma_{exp}$ ) of a single-



Fig. 4. Linearity curves for a) the full-length (2400 bins; LSB = 5.13 ps) original TDC (without using the proposed MB method) placed in Slice X49Y0-X49Y299 and b) the 460-bin original TDC placed in Slice X49Y120-X49Y179. c) The prediction tool's GUI.

stage single-TDL TDC can be considered as:

$$\sigma_{exp}^2 = \sigma_{clk}^2 + \sigma_{in}^2 + \frac{n}{2}\sigma_{CY}^2 + \sigma_{eq}^2.$$
(7)

According to [50], [51], the equivalent quantization error  $\sigma_{eq}$  and the equivalent bin width  $w_{eq}$  can be calculated as:

$$\sigma_{eq}^{2} = \sum_{i=1}^{N} \left( \frac{W[i]^{2}}{12} \times \frac{W[i]}{W_{total}} \right), \text{ where } W_{total} = \sum_{i=1}^{N} W[i], (8)$$

$$w_{eq} = \sigma_{eq} \sqrt{12}.$$
 (9)

For a fixed input time interval  $(T_{input})$ , the propagation jitter  $(\sigma_{sig})$  follows a Gaussian distribution [52] and causes errors  $(\varepsilon_{sig})$ . Therefore, the captured time interval  $(T_{captured})$  can be expressed in Eq. (10). The corresponding bin registers the time interval and results in quantization errors  $(\sigma_{eg})$ .

$$T_{captured} = T_{input} + \varepsilon_{sig}.$$
 (10)

With Eqs (5)-(10), we can predict the precision in software.  $\sigma_{CY}$ ,  $\sigma_{in}$  and  $\sigma_{clk}$  can be formulated by changing the element jitter (see Fig. 4c) and each prediction tests 100,000 times.

#### III. EXPERIMENTAL RESULTS

To evaluate the proposed method, we implemented the proposed 128-channel TDCs in the Xilinx Kintex UltraScale KCU105 Kit (UltraScale XCKU040), operating at 500 MHz. Code density tests and time interval tests were conducted to assess linearity and precision performances. For code density tests, two independent onboard low-jitter crystal oscillators were used to ensuring the randomness of hit signals and the sampling clock [53]. In time interval tests, the delay elements, IDELAYE3 and ODELAYE3, were used to generate a short delay ( $\leq 2ns$ ) with a controllable time interval between the two event signals [26]. The precision in long measurement ranges is tested by measuring the time intervals generated from mixedmode clock manager (MMCM) modules and delay elements (IDELAYE3 and ODELAYE3). Each experiment captured 1,000,000 samples in code density tests and 100,000 samples in time interval tests. The testing environment's temperature was maintained with an IDELAYCTRL module to reduce the impact of process, voltage and temperature (PVT) variations.

## A. Linearity

TABLE I summarizes linearity performances of the proposed TDCs obtained from software predictions and hardware implementations. Binned TDCs use the binning method only, whereas hybrid TDCs integrate the MB method.

By changing the *i* (*i* = 10, 16, 20), we constructed three virtual TDLs with different LSBs. From software predictions, the binning method can improve the linearity but degrade the resolution. The MB method can further improve the linearity by making bins more even. A similar conclusion can be drawn based on the results from hardware implementations. Figure 5 shows the DNL and INL curves for the proposed TDCs in hardware implementations (*i* = 10, 16, 20). Hybrid TDCs achieve relatively good linearity (DNL<sub>*pk*-*pk*</sub> and INL<sub>*pk*-*pk*</sub> are less than 0.06 LSB, 0.04 and 0.02 LSB when the resolutions are 51.28 ps, 83.33 ps and 105.26 ps, respectively). Moreover, with the MB method, the virtual TDLs are even enough, making  $\sigma_{eq}$  close to its ideal value ( $\frac{1}{\sqrt{12}} \approx 0.289$  LSB, based on [51]).

The resolutions and linearities obtained from software predictions and hardware implementations are slightly different (highlighted in bold). The difference in the resolution is caused by interpolation loss. In software, virtual TDLs are constructed by merging actual bins directly. To interpolate the 2 ns clock period, the TDCs need 39 bins, 24 bins and 19 bins when i = 10, 16 and 20 (as the proposed TDC operates at 500MHz). Therefore, the resolutions in hardware implementations are 51.28 ps, 83.33 ps and 105.26 ps. Quantization errors caused by WCFs result in linearity differences. In software predictions, WCFs are floating-point numbers and are multiplied with the bins' widths directly. However, in hardware implementations, as a trade-off between hardware resources and accuracy. WCFs are approximate integers in binary codes (see Eq. (4)) and contribute quantization errors. Moreover, accumulation operations in hardware also contribute to these errors. Therefore, the proposed hybrid TDC's linearity in hardware is slightly worse than software estimations. Although there are discrepancies, they are minimal and acceptable. (R2, Minor 9)

In contrast, the binned TDC linearity estimations are similar in software and hardware, especially  $INL_{pk-pk}$ .

# B. Precision

Using the WaveRunner 640Z, we obtained  $\sigma_{clk} = 4.42$  ps,  $\sigma_{in} = 4.81$  ps, and  $\sigma_{CY} = 0.16$  ps. We can evaluate the precision of the proposed TDC in software. Jitters caused by delay elements are accumulated and degrade the precision, see Eq. (6). From Eqs. (7) and (8), the precision decreases when the resolution drops. The expected precisions are 0.31 LSB, 0.30 LSB and 0.29 LSB when i = 10, 16, 20 (included in TABLE II).

Feeding hit signals with a fixed time interval to the TDC, the precision or root-mean-square (RMS) resolution can be estimated by the standard deviation ( $\sigma$ ) of time interval tests in hardware and can be expressed as:

$$\sigma^2 = \frac{1}{N-1} \sum_{i=1}^{n} (x_i - \mu)^2, \tag{11}$$

where  $x_i$  is the bin number of *i*-th output and  $\mu$  is the average value of *N* measurements (R2, Minor 8). In time interval tests, IDELAYE3 and ODELAYE3 controlled small intervals with a step of 11.11 ps. Figure 6 shows the results of the short-delay (< 2 ns) time interval tests. Due to an even TDL distribution, the worst cases happen when the input signal falls at the boundary between two bins. The two bins register the signals equally, resulting in the maximum RMS resolutions of 0.5 LSB for the three selected resolutions. Also, due to larger LSBs, at times, only one bin catches time intervals, and the RMS resolution is 0 LSB (See Figs 6b and 6c).

The averaged value and the maximum value are not suitable for evaluating the precision performance of the proposed TDCs, because they overestimate or underestimate the TDC with good linearities. Therefore, we conducted time interval tests with *H* different intervals in a coarse counter period ( $T_H - T_1 < 2$  ns) and defined the valid RMS resolution ( $\sigma_{valid}$ ) to evaluate the precision:

$$\sigma_{valid}^2 = \frac{1}{H} \sum_{i=1}^{H} \sigma_i^2, \qquad (12)$$

where  $\sigma_i$  is the standard deviation for tests with a fixed time interval (R2, Minor 8). Figure 7 presents valid RMS resolutions for long-range time interval tests. The averaged valid RMS resolutions ( $\sigma_{valid\_ave}$ ) are 0.31 LSB, 0.26 LSB and 0.25 LSB when i = 10, 16, 20. Figure 7 shows that the proposed TDC performs robustly in precision in long-range measurements (up to 1000 ns).

 TABLE I.

 LINEARITY PERFORMANCES OF THE PROPOSED TDCS OBTAINED FROM SOFTWARE PREDICTIONS AND HARDWARE IMPLEMENTATIONS

| Software predictions (Start-point = 6, Stop-point = 400) (R2, Minor 9) |                 |                    |                  |                              |                 |                              |  |  |  |  |
|------------------------------------------------------------------------|-----------------|--------------------|------------------|------------------------------|-----------------|------------------------------|--|--|--|--|
|                                                                        | <i>i</i> =      | 10                 | <i>i</i> =       | 16                           | i = 20          |                              |  |  |  |  |
|                                                                        | Binned Hybrid   |                    | Binned Hybrid    |                              | Binned          | Hybrid                       |  |  |  |  |
| LSB                                                                    | 50.20           |                    | 80               | .32                          | 100.40          |                              |  |  |  |  |
| DNL (LSB)                                                              | [-0.296, 0.305] | [-0.004, 0.003]    | [-0.115, 0.116]  | <mark>[-0.004, 0.004]</mark> | [-0.120, 0.154] | <mark>[-0.004, 0.004]</mark> |  |  |  |  |
| $DNL_{pk-pk}$ (LSB)                                                    | 0.601           | <mark>0.008</mark> | 0.231            | <mark>0.008</mark>           | 0.275           | <mark>0.007</mark>           |  |  |  |  |
| $\sigma_{DNL}$ (LSB)                                                   | 0.109           | <mark>0.002</mark> | 0.067            | <mark>0.003</mark>           | 0.070           | <mark>0.002</mark>           |  |  |  |  |
| INL (LSB)                                                              | [-0.121, 0.184] | [-0.010, 0.002]    | [-0.031, 0.107]  | <mark>[-0.005, 0.007]</mark> | [-0.039, 0.120] | <mark>[-0.006, 0.008]</mark> |  |  |  |  |
| $INL_{pk-pk}$ (LSB)                                                    | 0.305           | <mark>0.012</mark> | 0.139            | <mark>0.012</mark>           | 0.159           | <mark>0.014</mark>           |  |  |  |  |
| $\sigma_{INL}$ (LSB)                                                   | 0.078           | <mark>0.004</mark> | 0.040            | <mark>0.003</mark>           | 0.045           | <mark>0.003</mark>           |  |  |  |  |
| $\sigma_{eq}$ (LSB)                                                    | 0.294           | 0.289              | 0.290            | 0.289                        | 0.291           | 0.289                        |  |  |  |  |
| $W_{eq}$ (ps)                                                          | 51.05           | 50.20              | 80.83            | 80.32                        | 101.10          | 100.40                       |  |  |  |  |
|                                                                        |                 | Hardy              | vare implementat | ions                         |                 |                              |  |  |  |  |
| LSB                                                                    | 51.             | 28                 | 83.              | .33                          | 105.26          |                              |  |  |  |  |
| DNL (LSB)                                                              | [-0.313, 0.215] | [-0.018, 0.021]    | [-0.097, 0.113]  | <mark>[-0.017, 0.016]</mark> | [-0.118, 0.156] | <mark>[-0.008, 0.008]</mark> |  |  |  |  |
| $DNL_{pk-pk}$ (LSB)                                                    | 0.528           | <mark>0.039</mark> | 0.210            | <mark>0.033</mark>           | 0.274           | <mark>0.016</mark>           |  |  |  |  |
| $\sigma_{DNL}$ (LSB)                                                   | 0.095           | <mark>0.011</mark> | 0.052            | <mark>0.008</mark>           | 0.064           | <mark>0.004</mark>           |  |  |  |  |
| INL (LSB)                                                              | [-0.328, 0.000] | [-0.019, 0.035]    | [-0.111, 0.067]  | <b>[-0.028, 0.003]</b>       | [-0.158, 0.000] | <mark>[-0.009, 0.007]</mark> |  |  |  |  |
| $INL_{pk-pk}$ (LSB)                                                    | 0.328           | <mark>0.054</mark> | 0.178            | <mark>0.032</mark>           | 0.158           | <mark>0.016</mark>           |  |  |  |  |
| $\sigma_{INL}$ (LSB)                                                   | 0.069           | <mark>0.012</mark> | 0.041            | <mark>0.007</mark>           | 0.039           | <mark>0.004</mark>           |  |  |  |  |
| $\sigma_{eq}$ (LSB)                                                    | 0.292           | 0.289              | 0.290            | 0.289                        | 0.290           | 0.289                        |  |  |  |  |
| $W_{eq}$ (ps)                                                          | 51.94           | 51.29              | 83.65            | 83.34                        | 105.87          | 105.26                       |  |  |  |  |



Fig. 5. a-c) DNL curves for the proposed TDCs when i = 10, 16 and 20. d-f) INL curves for the proposed TDCs when i = 10, 16 and 20.

TABLE II. PRECISION PERFORMANCE OF THE PROPOSED TDCS IN THE SOFTWARE PREDICTION AND THE HARDWARE IMPLEMENTATION

| PREDICTION AND THE HARDWARE IMPLEMENTATION |                                            |                    |                  |                                    |  |  |  |  |  |
|--------------------------------------------|--------------------------------------------|--------------------|------------------|------------------------------------|--|--|--|--|--|
|                                            |                                            | Software           | Hardware         |                                    |  |  |  |  |  |
|                                            | LSB                                        | σ <sup>1</sup>     | Short Delay      | Long Delay                         |  |  |  |  |  |
|                                            |                                            | $\sigma_{exp}^{1}$ | $\sigma_{valid}$ | $\sigma_{valid\_ave}$ <sup>2</sup> |  |  |  |  |  |
| Units                                      | ps                                         | LSB                | LSB              | LSB                                |  |  |  |  |  |
| <i>i</i> =10                               | 50.20 <sup>3</sup><br>51.28 <sup>4</sup>   | 0.31               | 0.31             | 0.31                               |  |  |  |  |  |
| <i>i</i> =16                               | 80.32 <sup>3</sup><br>83.33 <sup>4</sup>   | 0.30               | 0.26             | 0.26                               |  |  |  |  |  |
| <i>i</i> =20                               | 100.40 <sup>3</sup><br>105.26 <sup>4</sup> | 0.29               | 0.25             | 0.25                               |  |  |  |  |  |

<sup>1</sup> The expected precision based on Eq (7); <sup>2</sup> Averaged valid RMS resolution; <sup>3</sup> Values from software predictions; <sup>4</sup> Values from hardware implementations.



Fig. 6. Short delay time interval tests results (< 2 ns): a) i = 10, b) i = 16, and c) i = 20.



Fig. 7. Valid RMS resolutions in long delay time interval tests (< 1000 ns).

|          | TABLE III.       |             |
|----------|------------------|-------------|
| CONSUMPT | ION OF LOGIC RES | OURCES      |
|          | 1-channel        | 128-channel |

|         |        | I-channel    | 128-channel     |
|---------|--------|--------------|-----------------|
| Modules | Total  | Used         | Used            |
| CARRY8  | 30300  | 74 (0.24%)   | 9472 (31.26%)   |
| LUT     | 242400 | 663 (0.27%)  | 87078 (35.92%)  |
| FF      | 484800 | 1124 (0.23%) | 143940 (29.69%) |
| BRAM    | 600    | 2.5 (0.42%)  | 320 (53.33%)    |
| CLB     | 30300  | 185 (0.61%)  | 20729 (68.41%)  |

Differences are also observed in Table II;  $\sigma_{exp}$  in software is slightly larger than  $\sigma_{valid\_ave}$  in hardware (highlighted in bold). In software, we can restore the signal propagation in the delay line and predict the expected precision. The quantization error



Fig. 8. The layout of the 128-channel hybrid TDC.

in hardware significantly increases when the input time interval is close to two bins' boundaries. However, the input time intervals step is relatively large (11.11 ps), resulting in an overestimation in precision. Although the precision estimated from software predictions is different from the precision measured from hardware implementations,  $\sigma_{eq}$  estimated from both manners are still similar (see Table I), showing that the software tool can robustly predict hardware implementations. (R2, Minor 9)

### C. Multichannel design

We implemented a 128-channel hybrid TDC in UltraScale FPGAs, and Table III concludes the logic resource consumption. Each channel costs around 660 LUTs and 1100 registers. The BRAM usage depends on the configuration of the resolution. In this design, each channel requires 2.5 BRAMs (R1, Comment 2).

To avoid significant clock skews, each channel is placed within a clock region (R4, Minor 2). The 128 channels are placed evenly in the target chip, and the space between adjacent channels needs to be maintained due to the timing requirement and routing congestion. Figure 8 shows the layout of the 128channel hybrid TDC in UltraScale FPGAs. Table IV summarizes the linearity performances of 16 (out of 128 to avoid an over-length presentation) channels spread evenly across the FPGA chip. The linearities of the TDC channels in different locations are uniform.

## IV. COMPARISONS & DISCUSSIONS

Table V summarizes the proposed TDCs and recently reported TDCs with similar resolutions, including FPGA and ASIC designs in the past four years. Although TDCs in [54], [55] can achieve similar resolutions (close to 50 ps), the proposed TDCs have much better linearities. Also, the proposed built-in histogramming function ensures that fast data transmission and processing are feasible for LiDAR systems, especially in driverless vehicles and robotics.

Using gated ring-oscillator architectures, ASIC-TDCs in [5] can tune their resolutions by changing the voltage. However, as a function of the supply voltage, the TDC resolution can be significantly affected by voltage jitter [5]. In contrast, carry-chain structures in FPGAs are more robust [54], and the MB method provides a more flexible and reliable way to adjust TDCs' resolution. In general, ASIC-TDCs can achieve better linearity than FPGA-TDCs through well-planned layout strategies. However, compared with ASIC-TDCs in [5]–[9], the proposed TDC can achieve much better linearity and

comparable precision (see the proposed TDC with i = 10 for comparison), thanks to the proposed MB method and the resolution-adjustable architecture. (R4, Minor B)

Choosing a suitable resolution is essential in LiDAR systems. In LiDAR image reconstruction, memory usages limit reconstruction methods' performance (e.g., neural networks methods [57]–[59]). A larger bin size corresponds to a smaller number of bins and consumes less memory. Furthermore, fewer bins can speed up image reconstruction with faster bin indexing [57]. Unlike binning in software [57], the proposed MB method in hardware maintains TDCs' linearity and provides an efficient and flexible way to change the resolution, suitable for LiDAR applications. (R4, Minor A)

With the MB method,  $\sigma_{TDC}$  is degraded due to relatively large  $\sigma_{eq}$  (see Eq. (1)) but is still acceptable. LiDAR systems (see Fig. 1a) in driverless vehicles can tolerate a distance error of a few centimeters. Single-photon avalanche diodes (SPADs) are portable and cost-effective detectors for LiDAR systems but can contribute significant jitters (compared with the proposed TDC). For example, the typical jitter of SPADs is 219 ps in Ref. [5] and is 170 ps in Ref. [6]. However, the distance errors, including jitters from SPADs and the proposed TDC ( $\sigma_{TDC}$  = 15.89 ps when i = 10), are still acceptable in driverless vehicles. because measured distances are in general from tens of centimeters to hundreds of meters and 1 cm corresponds to 66.6 ps in ToF measurements. Furthermore, if a low-jitter detector is

used (e.g., SPADs have 25 ps jitter in Ref. [60] and 35 ps jitter in Ref. [61]), the proposed TDC can offer an overall low-jitter system, better than TDCs in Refs [5]–[9]. (R4, Minor B)

The width calibration performs like the bin-by-bin calibration proposed in [34]. In [34], TDC output codes were calibrated to bins' center value, resulting in fewer quantization errors. Similarly, with the width calibration, the difference is negligible no matter calibrating bins to their center or boundary values because all bins are even enough. Moreover, compared with the look-up table (LUT) based bin-by-bin calibration, BRAM-based width calibration is more suitable for multichannel applications. Using many LUTs (or distributed RAMs) would result in congestions in the synthesis and implementation stages [62]. (R2, Major 2; R3, Major 1, 2)

#### V. CONCLUSION

We developed a new calibration method, the MB method, to improve linearity and adjust TDC's resolutions. A software tool was developed for TDC communities to predict TDC performances robustly. It can assess the performances of calibration methods and TDCs before hardware implementations. As a guide for beginners to understand the TDC principle, a GUI has also been developed to facilitate users designing their systems.

| LINEARITY PERFORMANCES OF 16 CHANNELS (OUT OF 128 CHANNELS IN THE PROPOSED MULTICHANNEL TDC, UNIT: ×1.0E-3 LSB) (R2, MINOR 10) |        |    |    |    |    |    |    |            |    | 10) |    |    |    |     |     |     |     |
|--------------------------------------------------------------------------------------------------------------------------------|--------|----|----|----|----|----|----|------------|----|-----|----|----|----|-----|-----|-----|-----|
| Ch.                                                                                                                            | 0      | 8  | 16 | 24 | 32 | 40 | 48 | 56         | 64 | 72  | 80 | 88 | 96 | 104 | 112 | 120 | Ave |
| i = 10                                                                                                                         |        |    |    |    |    |    |    |            |    |     |    |    |    |     |     |     |     |
| DNL <sub>pk-pk</sub>                                                                                                           | 38     | 33 | 36 | 36 | 39 | 37 | 30 | 34         | 32 | 36  | 34 | 43 | 38 | 35  | 39  | 30  | 36  |
| INL <sub>pk-pk</sub>                                                                                                           | 57     | 55 | 50 | 58 | 55 | 55 | 51 | 54         | 58 | 57  | 54 | 54 | 59 | 50  | 57  | 52  | 55  |
|                                                                                                                                |        |    |    |    |    |    |    | <i>i</i> = | 16 |     |    |    |    |     |     |     |     |
| DNL <sub>pk-pk</sub>                                                                                                           | 29     | 35 | 34 | 27 | 35 | 34 | 32 | 29         | 29 | 24  | 29 | 27 | 28 | 32  | 25  | 25  | 30  |
| INL <sub>pk-pk</sub>                                                                                                           | 28     | 31 | 26 | 28 | 33 | 35 | 28 | 28         | 26 | 27  | 30 | 25 | 35 | 30  | 29  | 26  | 29  |
|                                                                                                                                | i = 20 |    |    |    |    |    |    |            |    |     |    |    |    |     |     |     |     |
| DNL <sub>pk-pk</sub>                                                                                                           | 20     | 14 | 20 | 12 | 15 | 21 | 19 | 19         | 18 | 15  | 16 | 16 | 14 | 27  | 19  | 15  | 18  |
| INL <sub>pk-pk</sub>                                                                                                           | 18     | 15 | 12 | 23 | 12 | 13 | 23 | 18         | 14 | 15  | 23 | 13 | 14 | 19  | 18  | 22  | 17  |

TABLE V.

COMPARISON BETWEEN REPORTED HIGH-LINEARITY TDCs WITH ACCEPTABLE RESOLUTIONS. (R2, MINOR 10)

|                     |                      | FPGA        |               | ASIC             |                                                                                                                                                       |             |             |             |  |
|---------------------|----------------------|-------------|---------------|------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|-------------|-------------|--|
|                     | This Work            | RSI'18 [54] | NIMPR'17 [55] | TIM'20 [9]       | JSSC'20 [7]                                                                                                                                           | JSSC'19 [5] | JSSC'19 [6] | JSSC'19 [8] |  |
| Device / Technology | UltraScale           | Cyclone IV  | Virtex 5      | 180 nm           | 350 nm                                                                                                                                                | 40 nm       | 90 nm       | 180 nm      |  |
| Method              | Mixed-               | Bin         | Counting-     | DLL <sup>3</sup> | DLL <sup>3</sup> Gated Ring<br>Oscillator         Multi-event<br>Histogramming         I<br>c           78.00         33-120         35/560         4 | Gated Ring  | Multi-event | Dual        |  |
| Method              | binning              | realignment | weighted      | DLL              |                                                                                                                                                       | clock       |             |             |  |
| Resolution          | 51.28, <i>i</i> =10  |             |               |                  |                                                                                                                                                       |             |             |             |  |
| (ps)                | 83.33, <i>i</i> =16  | 45.00       | 60.00         | 50.00            | 78.00                                                                                                                                                 | 33-120      | 35/560      | 48.80       |  |
| (F-)                | 105.26, i = 20       |             |               |                  |                                                                                                                                                       |             |             |             |  |
| Precision           | $15.89^{-1}, i = 10$ | 10.00       | 2.7.0         | <                |                                                                                                                                                       |             |             | (a a a      |  |
| (ps)                | $21.67^{-1}, i = 16$ | 18.00       | N/S           | 36.50            | 33.60                                                                                                                                                 | 208.00      | N/S         | 62.37       |  |
|                     | $26.32^{-1}, i = 20$ |             |               |                  |                                                                                                                                                       |             |             |             |  |
| $DNL_{pk-pk}$       | $36^{2}, i = 10$     | (20)        | 700           | 470              | 540 <sup>4</sup>                                                                                                                                      | 000         | 100         | 0.00        |  |
| (×1.0E-3 LSB)       | $30^{2}, i = 16$     | 630         | 780           | 470              | 830 5                                                                                                                                                 | 900         | 100         | 960         |  |
| (                   | $18^2, i = 20$       |             |               |                  |                                                                                                                                                       |             |             |             |  |
| $INL_{pk-pk}$       | $55^{2}, i = 10$     | 050         | 1210          | 710              | 360 4                                                                                                                                                 | 5640        | 100         | 25(0)       |  |
| (×1.0E-3 LSB)       | $29^{2}, i = 16$     | 850         | 1310          | 710              | 1240 5                                                                                                                                                | 5640        | 180         | 2560        |  |
|                     | $17^{2}, i = 20$     |             |               |                  |                                                                                                                                                       |             | (           | (           |  |
| Range (us)          | 1.00                 | 0.007       | N/S           | 13.10            | 0.64                                                                                                                                                  | 0.14-0.49   | 0.33 6      | 0.33 6      |  |

<sup>1</sup> Averaged valid RMS resolution measured from long-range tests; <sup>2</sup> The averaged peak-to-peak DNL and INL results of the multichannel hybrid TDC; <sup>3</sup> Delay locked loop, DLL. <sup>4</sup>Minimum value measured from 257 channels; <sup>5</sup> Maximum value measured from 257 channels. <sup>6</sup> Calculated by 50 m maximum measured distance.

A cost-effective 128-channel high-linearity resolutionadjustable TDC has been implemented and tested in UltraScale FPGAs. The proposed 128-channel TDC shows excellent uniformity, and it offers excellent linearity with the MB method, comparable with recently reported ASIC-TDCs with similar resolutions [8], [9]. With an adjustable resolution and the built-in histogramming function, the proposed TDC can apply to broad ToF LiDAR applications, such as driverless vehicles and robotics. Moreover, the short development cycle in FPGAs is suitable for the current competitive market. (R2, Minor 11)

## ACKNOWLEDGEMENT

The research has been supported by the Engineering and Physical Sciences Research Council under EPSRC Grant: EP/L01596X/1 and the Royal Society of Edinburgh. We would also like to acknowledge the support from Xilinx for donating FPGA development kits to the research group.

#### DATA AVAILABILITY STATEMENT

The software tool developed in this work can be accessed from https://github.com/GitForWJ/TDC tools.

#### REFERENCES

- H. Song, W. Choi, and H. Kim, 'Robust Vision-Based Relative-Localization Approach Using an RGB-Depth Camera and LiDAR Sensor Fusion', *IEEE Trans. Ind. Electron.*, vol. 63, no. 6, pp. 3725– 3736, Jun. 2016.
- [2] U. Larsson, J. Forsberg, and A. Wernersson, 'Mobile robot localization: integrating measurements from a time-of-flight laser', *IEEE Trans. Ind. Electron.*, vol. 43, no. 3, pp. 422–431, Jun. 1996.
- [3] Z.-P. Li et al., 'Super-resolution single-photon imaging at 8.2 kilometers', Opt. Express, vol. 28, no. 3, pp. 4076–4087, Feb. 2020.
- [4] A. R. Ximenes, P. Padmanabhan, M. Lee, Y. Yamashita, D. Yaung, and E. Charbon, 'A Modular, Direct Time-of-Flight Depth Sensor in 45/65nm 3-D-Stacked CMOS Technology', *IEEE J. Solid-State Circuits*, vol. 54, no. 11, pp. 3203–3214, Nov. 2019.
- [5] R. K. Henderson *et al.*, 'A 192x128 Time Correlated SPAD Image Sensor in 40-nm CMOS Technology', *IEEE J. Solid-State Circuits*, vol. 54, no. 7, pp. 1907–1916, Jul. 2019.
- [6] S. W. Hutchings *et al.*, 'A Reconfigurable 3-D-Stacked SPAD Imager With In-Pixel Histogramming for Flash LIDAR or High-Speed Timeof-Flight Imaging', *IEEE J. Solid-State Circuits*, vol. 54, no. 11, pp. 2947–2956, Nov. 2019.
- S. Jahromi, J. Jansson, P. Keränen, and J. Kostamovaara, 'A 32 × 128
   SPAD-257 TDC Receiver IC for Pulsed TOF Solid-State 3-D Imaging', *IEEE J. Solid-State Circuits*, vol. 55, no. 7, pp. 1960–1970, Jul. 2020.
- [8] C. Zhang, S. Lindner, I. M. Antolović, J. Mata Pavia, M. Wolf, and E. Charbon, 'A 30-frames/s, 252x144 SPAD Flash LiDAR With 1728 Dual-Clock 48.8-ps TDCs, and Pixel-Wise Integrated Histogramming', *IEEE J. Solid-State Circuits*, vol. 54, no. 4, pp. 1137–1151, Apr. 2019.
- [9] A. Hejazi et al., 'A Low Power Multichannel Time to Digital Converter Using All Digital Nested Delay Locked Loops with 50 ps Resolution and High Throughput for LiDAR Sensors', *IEEE Trans. Instrum. Meas.*, pp. 1–1, 2020.
- [10] S. Lewis and M. Inggs, 'Synchronisation of Coherent Netted Radar Using White Rabbit compared with one-way multi-channel GPSDOs', *IEEE Trans. Aerosp. Electron. Syst.*, pp. 1–1, 2020.
- [11] A. Hu, D. Liu, K. Zhang, L. Liu, and X. Zou, 'A 0.045- to 2.5-GHz Frequency Synthesizer With TDC-Based AFC and Phase Switching Multi-Modulus Divider', *IEEE Trans. Circuits Syst. Regul. Pap.*, vol. 67, no. 12, pp. 4470–4483, Dec. 2020.
- [12] J. Sánchez-Garrido *et al.*, 'A White Rabbit-Synchronized Accurate Time-Stamping Solution for the Small-Sized Cameras of the Cherenkov Telescope Array', *IEEE Trans. Instrum. Meas.*, vol. 70, pp. 1–14, 2021.

- [13] C. Chen, C. Chen, Y. Lin, and S. You, 'An All-Digital Time-Domain Smart Temperature Sensor With a Cost-Efficient Curvature Correction', *IEEE Trans. Very Large Scale Integr. VLSI Syst.*, vol. 27, no. 1, pp. 29–36, Jan. 2019.
- [14] W. Song, J. Lee, N. Cho, and J. Burm, 'An Ultralow Power Time-Domain Temperature Sensor With Time-Domain Delta–Sigma TDC', *IEEE Trans. Circuits Syst. II Express Briefs*, vol. 64, no. 10, pp. 1117– 1121, Oct. 2017.
- [15] H. Jiang, C. Huang, M. R. Chan, and D. A. Hall, 'A 2-in-1 Temperature and Humidity Sensor With a Single FLL Wheatstone-Bridge Front-End', *IEEE J. Solid-State Circuits*, vol. 55, no. 8, pp. 2174–2185, Aug. 2020.
- [16] C. Liu, Y.-L. Liu, E. P. Perillo, A. K. Dunn, and H.-C. Yeh, 'Single-Molecule Tracking and Its Application in Biomolecular Binding Detection', *IEEE J. Sel. Top. Quantum Electron.*, vol. 22, no. 4, pp. 64– 76, Jul. 2016.
- [17] W. Becker, 'Fluorescence lifetime imaging by multi-dimensional time correlated single photon counting', *Med. Photonics*, vol. 27, pp. 41–61, May 2015.
- [18] D. D.-U. Li et al., 'Video-rate fluorescence lifetime imaging camera with CMOS single-photon avalanche diode arrays and high-speed imaging algorithm', J. Biomed. Opt., vol. 16, no. 9, p. 096012, Sep. 2011.
- [19] J. Y. Won and J. S. Lee, 'Highly Integrated FPGA-Only Signal Digitization Method Using Single-Ended Memory Interface Input Receivers for Time-of-Flight PET Detectors', *IEEE Trans. Biomed. Circuits Syst.*, vol. 12, no. 6, pp. 1401–1409, Dec. 2018.
- [20] F. Nolet et al., 'A 256 Pixelated SPAD readout ASIC with in-Pixel TDC and embedded digital signal processing for uniformity and skew correction', Nucl. Instrum. Methods Phys. Res. Sect. Accel. Spectrometers Detect. Assoc. Equip., vol. 949, p. 162891, Jan. 2020.
- [21] P. Lecoq, 'Pushing the Limits in Time-of-Flight PET Imaging', IEEE Trans. Radiat. Plasma Med. Sci., vol. 1, no. 6, pp. 473–485, Nov. 2017.
- [22] R. Machado, J. Cabral, and F. S. Alves, 'Recent Developments and Challenges in FPGA-Based Time-to-Digital Converters', *IEEE Trans. Instrum. Meas.*, vol. 68, no. 11, pp. 4205–4221, Nov. 2019.
- [23] J. Y. Won, S. I. Kwon, H. S. Yoon, G. B. Ko, J. Son, and J. S. Lee, 'Dual-Phase Tapped-Delay-Line Time-to-Digital Converter With Onthe-Fly Calibration Implemented in 40 nm FPGA', *IEEE Trans. Biomed. Circuits Syst.*, vol. 10, no. 1, pp. 231–242, Feb. 2016.
- [24] Y. Wang and C. Liu, 'A Nonlinearity Minimization-Oriented Resource-Saving Time-to-Digital Converter Implemented in a 28 nm Xilinx FPGA', *IEEE Trans. Nucl. Sci.*, vol. 62, no. 5, pp. 2003–2009, Oct. 2015.
- [25] J. Y. Won and J. S. Lee, 'Time-to-Digital Converter Using a Tuned-Delay Line Evaluated in 28-, 40-, and 45-nm FPGAs', *IEEE Trans. Instrum. Meas.*, vol. 65, no. 7, pp. 1678–1689, Jul. 2016.
- [26] H. Chen and D. D. Li, 'Multichannel, Low Nonlinearity Time-to-Digital Converters Based on 20 and 28 nm FPGAs', *IEEE Trans. Ind. Electron.*, vol. 66, no. 4, pp. 3265–3274, Apr. 2019.
- [27] H. Molaei and K. Hajsadeghi, 'A 5.3-ps, 8-b Time to Digital Converter Using a New Gain-Reconfigurable Time Amplifier', *IEEE Trans. Circuits Syst. II Express Briefs*, vol. 66, no. 3, pp. 352–356, Mar. 2019.
- [28] L. Perktold and J. Christiansen, 'A multichannel time-to-digital converter ASIC with better than 3 ps RMS time resolution', *J. Instrum.*, vol. 9, no. 01, pp. C01060–C01060, Jan. 2014.
- [29] D. Xiao, Y. Chen, and D. D.-U. Li, 'One-Dimensional Deep Learning Architecture for Fast Fluorescence Lifetime Imaging', *IEEE J. Sel. Top. Quantum Electron.*, vol. 27, no. 4, pp. 1–10, Jul. 2021.
- [30] W. Becker, *The bh TCSPC Handbook*, 8th Edition. 2019.
- [31] Y. Wang, J. Kuang, C. Liu, and Q. Cao, 'A 3.9-ps RMS Precision Timeto-Digital Converter Using Ones-Counter Encoding Scheme in a Kintex-7 FPGA', *IEEE Trans. Nucl. Sci.*, vol. 64, no. 10, pp. 2713– 2718, Oct. 2017.
- [32] Q. Shen et al., 'A 1.7 ps Equivalent Bin Size and 4.2 ps RMS FPGA TDC Based on Multichain Measurements Averaging Method', IEEE Trans. Nucl. Sci., vol. 62, no. 3, pp. 947–954, Jun. 2015.
- [33] T. Sui et al., 'A 2.3-ps RMS Resolution Time-to-Digital Converter Implemented in a Low-Cost Cyclone V FPGA', IEEE Trans. Instrum. Meas., pp. 1–14, 2018.
- [34] J. Wu, 'On-Chip processing for the wave union TDC implemented in FPGA', in 2009 16th IEEE-NPSS Real Time Conference, May 2009, pp. 279–282.

- [35] P. Chen et al., 'High-Precision PLL Delay Matrix With Overclocking and Double Data Rate for Accurate FPGA Time-to-Digital Converters', *IEEE Trans. Very Large Scale Integr. VLSI Syst.*, vol. 28, no. 4, pp. 904–913, Apr. 2020.
- [36] X. Qin et al., 'A high resolution time-to-digital-convertor based on a carry-chain and DSP48E1 adders in a 28-nm field-programmable-gatearray', *Rev. Sci. Instrum.*, vol. 91, no. 2, p. 024708, Feb. 2020.
- [37] M. Zhang, K. Yang, Z. Chai, H. Wang, Z. Ding, and W. Bao, 'High-Resolution Time-to-Digital Converters Implemented on 40-, 28-, and 20-nm FPGAs', *IEEE Trans. Instrum. Meas.*, vol. 70, pp. 1–10, 2021.
- [38] Xilinx, 'UltraScale Architecture Libraries Guide (UG974)', 2014. https://www.xilinx.com/support/documentation/sw\_manuals/xilinx201 4 1/ug974-vivado-ultrascale-libraries.pdf.
- [39] N. Dutton et al., 'Multiple-event direct to histogram TDC in 65nm FPGA technology', in 2014 10th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME), Jun. 2014, pp. 1–5.
- [40] H. Chen, Y. Zhang, and D. D. Li, 'A Low Nonlinearity, Missing-Code Free Time-to-Digital Converter Based on 28-nm FPGAs With Embedded Bin-Width Calibrations', *IEEE Trans. Instrum. Meas.*, vol. 66, no. 7, pp. 1912–1921, Jul. 2017.
- [41] K. Choi and D. Jee, 'Design and Calibration Techniques for a Multichannel FPGA-Based Time-to-Digital Converter in an Object Positioning System', *IEEE Trans. Instrum. Meas.*, vol. 70, pp. 1–9, 2021.
- [42] K. Cui and X. Li, 'A High-Linearity Vernier Time-to-Digital Converter on FPGAs With Improved Resolution Using Bidirectional-Operating Vernier Delay Lines', *IEEE Trans. Instrum. Meas.*, vol. 69, no. 8, pp. 5941–5949, Aug. 2020.
- [43] 'ID900 Brochure', ID Quantique. https://marketing.idquantique.com/acton/attachment/11868/f-023e/1/-/-/-/ID900\_Brochure.pdf (accessed Oct. 13, 2020).
- [44] W. Xie, H. Chen, Z. Zang, and D. D.-U. Li, 'Multi-channel highlinearity time-to-digital converters in 20 nm and 28 nm FPGAs for LiDAR applications', in 2020 6th International Conference on Event-Based Control, Communication, and Signal Processing (EBCCSP), Sep. 2020, pp. 1–4.
- [45] 'GitForWJ/TDC\_tools', GitHub. https://github.com/GitForWJ/TDC\_tools (accessed Nov. 01, 2020).
- [46] S. Burri, H. Homulle, C. Bruschini, and E. Charbon, 'LinoSPAD: a time-resolved 256x1 CMOS SPAD line sensor system featuring 64 FPGA-based TDC channels running at up to 8.5 giga-events per second', in *Optical Sensing and Detection IV*, Apr. 2016, vol. 9899, p. 98990D.
- [47] PicoQuant, 'PicoQuant Photon Counting and Timing'. https://www.picoquant.com/images/uploads/downloads/7304photon\_counting\_brochure.pdf.
- [48] P. Kwiatkowski and R. Szplet, 'Efficient implementation of multiple time coding lines-based TDC in an FPGA device', *IEEE Trans. Instrum. Meas.*, vol. 69, no. 10, pp. 7353–7364, Oct. 2020.
- [49] R. Szplet, R. Szymanowski, and D. Sondej, 'Measurement Uncertainty of Precise Interpolating Time Counters', *IEEE Trans. Instrum. Meas.*, vol. 68, no. 11, pp. 4348–4356, Nov. 2019.
- [50] J. Wu, 'Uneven bin width digitization and a timing calibration method using cascaded PLL', in 2014 19th IEEE-NPSS Real Time Conference, May 2014, pp. 1–4.
- [51] R. Szymanowski, R. Szplet, and P. Kwiatkowski, 'Quantization error in precision time counters', *Meas. Sci. Technol.*, vol. 26, no. 7, p. 075002, Jun. 2015.
- [52] S. Henzler, *Time-to-digital converters*. Dordrecht: Springer, 2010.
- [53] J. Wu, 'Several Key Issues on Implementing Delay Line Based TDCs Using FPGAs', *IEEE Trans. Nucl. Sci.*, vol. 57, no. 3, pp. 1543–1548, Jun. 2010.
- [54] G. Cao, H. Xia, and N. Dong, 'An 18-ps TDC using timing adjustment and bin realignment methods in a Cyclone-IV FPGA', *Rev. Sci. Instrum.*, vol. 89, no. 5, p. 054707, May 2018.
- [55] Y.-H. Chen, 'A counting-weighted calibration method for a fieldprogrammable-gate-array-based time-to-digital converter', *Nucl. Instrum. Methods Phys. Res. Sect. Accel. Spectrometers Detect. Assoc. Equip.*, vol. 854, pp. 61–63, May 2017.
- [56] W. Pan, G. Gong, and J. Li, 'A 20-ps Time-to-Digital Converter (TDC) Implemented in Field-Programmable Gate Array (FPGA) with Automatic Temperature Correction', *IEEE Trans. Nucl. Sci.*, vol. 61, no. 3, pp. 1468–1473, Jun. 2014.

- [57] Z. Sun, D. B. Lindell, O. Solgaard, and G. Wetzstein, 'SPADnet: deep RGB-SPAD sensor fusion assisted by monocular depth estimation', *Opt. Express*, vol. 28, no. 10, pp. 14948–14962, May 2020.
- [58] A. G. Howard *et al.*, 'MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications', *ArXiv170404861 Cs*, Apr. 2017, Accessed: Mar. 05, 2021. [Online]. Available: http://arxiv.org/abs/1704.04861.
- [59] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, 'SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size', *ArXiv160207360 Cs*, Nov. 2016, Accessed: Mar. 05, 2021. [Online]. Available: http://arxiv.org/abs/1602.07360.
- [60] K. Zang *et al.*, 'Silicon single-photon avalanche diodes with nanostructured light trapping', *Nat. Commun.*, vol. 8, no. 1, p. 628, Dec. 2017.
- [61] M. Ghioni, A. Gulinatti, I. Rech, P. Maccagnani, and S. Cova, 'Largearea low-jitter silicon single photon avalanche diodes', in *Quantum Sensing and Nanophotonic Devices V*, Feb. 2008, vol. 6900, p. 69001D.
- [62] Xilinx, 'XST User Guide for Virtex-6, Spartan-6, and 7 Series Devices', Mar. 20, 2013. https://www.xilinx.com/support/documentation/sw\_manuals/xilinx14\_ 7/xst v6s6.pdf.