scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Journal of Solid-state Circuits in 2017"


Journal ArticleDOI
TL;DR: Eyeriss as mentioned in this paper is an accelerator for state-of-the-art deep convolutional neural networks (CNNs) that optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, by reconfiguring the architecture.
Abstract: Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs). It optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, for various CNN shapes by reconfiguring the architecture. CNNs are widely used in modern AI systems but also bring challenges on throughput and energy efficiency to the underlying hardware. This is because its computation requires a large amount of data, creating significant data movement from on-chip and off-chip that is more energy-consuming than computation. Minimizing data movement energy cost for any CNN shape, therefore, is the key to high throughput and energy efficiency. Eyeriss achieves these goals by using a proposed processing dataflow, called row stationary (RS), on a spatial architecture with 168 processing elements. RS dataflow reconfigures the computation mapping of a given shape, which optimizes energy efficiency by maximally reusing data locally to reduce expensive data movement, such as DRAM accesses. Compression and data gating are also applied to further improve energy efficiency. Eyeriss processes the convolutional layers at 35 frames/s and 0.0029 DRAM access/multiply and accumulation (MAC) for AlexNet at 278 mW (batch size $N = 4$ ), and 0.7 frames/s and 0.0035 DRAM access/MAC for VGG-16 at 236 mW ( $N = 3$ ).

2,165 citations


Journal ArticleDOI
TL;DR: This paper presents the first reported 28-GHz phased-array IC for 5G communications, implemented in 130-nm SiGe BiCMOS, which includes 32 TRX elements and features concurrent independent beams in two polarizations in either TX or RX operation.
Abstract: This paper presents the first reported 28-GHz phased-array IC for 5G communications. Implemented in 130-nm SiGe BiCMOS, the IC includes 32 TRX elements and features concurrent independent beams in two polarizations in either TX or RX operation. Circuit techniques to enable precise beam steering, orthogonal phase and amplitude control at each front end, and independent tapering and beam steering at the array level are presented. A TX/RX switch design is introduced which minimizes TX path loss resulting in 13.5 dBm/16 dBm Op1dB/Psat per front end with >20% peak power added efficiency of the power amplifier (including switch and off-mode LNA) while maintaining a 6 dB noise figure in the low noise amplifier (including switch and off-mode PA). Comprehensive on-wafer measurement results for the IC across multiple samples and temperature variation are presented. A package with four ICs and 64 dual-polarized antennas provides eight 16-element or two 64-element concurrent beams with 1.4°/step beam steering (<0.6° rms error) across a ±50° steering range without requiring calibration. A maximum saturated effective isotropic radiated power of 54 dBm is measured in the broadside direction for each polarization. Tapering control without requiring calibration achieves up to 20-dB sidelobe rejection without affecting the main lobe direction.

426 citations


Journal ArticleDOI
TL;DR: A machine-learning classifier where computations are performed in a standard 6T SRAM array, which stores the machine- learning model, and a training algorithm enables a strong classifier through boosting and also overcomes circuit nonidealities, by combining multiple columns.
Abstract: This paper presents a machine-learning classifier where computations are performed in a standard 6T SRAM array, which stores the machine-learning model. Peripheral circuits implement mixed-signal weak classifiers via columns of the SRAM, and a training algorithm enables a strong classifier through boosting and also overcomes circuit nonidealities, by combining multiple columns. A prototype 128 $\times $ 128 SRAM array, implemented in a 130-nm CMOS process, demonstrates ten-way classification of MNIST images (using image-pixel features downsampled from 28 $\times $ 28 = 784 to 9 $\times $ 9 = 81, which yields a baseline accuracy of 90%). In SRAM mode (bit-cell read/write), the prototype operates up to 300 MHz, and in classify mode, it operates at 50 MHz, generating a classification every cycle. With accuracy equivalent to a discrete SRAM/digital-MAC system, the system achieves ten-way classification at an energy of 630 pJ per decision, 113 times lower than a discrete system with standard training algorithm and 13 times lower than a discrete system with the proposed training algorithm.

376 citations


Journal ArticleDOI
TL;DR: This paper is the first to implement dynamic precision and energy scaling and exploit the sparsity of convolutions in a dedicated processor architecture and outperforms the state-of-the-art up to five times in energy efficiency.
Abstract: A precision-scalable processor for low-power ConvNets or convolutional neural networks is implemented in a 40-nm CMOS technology. To minimize energy consumption while maintaining throughput, this paper is the first to implement dynamic precision and energy scaling and exploit the sparsity of convolutions in a dedicated processor architecture. The processor’s 256 parallel processing units achieve a peak 102 GOPS running at 204 MHz and 1.1 V. It is fully C-programmable through a custom generated compiler and consumes 25–287 mW at 204 MHz and a scaling efficiency between 0.3 and 2.7 effective TOPS/W. It achieves 47 frames/s on the convolutional layers of the AlexNet benchmark, consuming only 76 mW. This system hereby outperforms the state-of-the-art up to five times in energy efficiency.

164 citations


Journal ArticleDOI
TL;DR: A CMOS-based microelectrode array system for in vitro applications that integrates six measurement and stimulation functions, the largest number to date, and features the largest active electrode array area to date.
Abstract: Biological cells are characterized by highly complex phenomena and processes that are, to a great extent, interdependent. To gain detailed insights, devices designed to study cellular phenomena need to enable tracking and manipulation of multiple cell parameters in parallel; they have to provide high signal quality and high-spatiotemporal resolution. To this end, we have developed a CMOS-based microelectrode array system for in vitro applications that integrates six measurement and stimulation functions, the largest number to date. Moreover, the system features the largest active electrode array area to date ( $4.48 \times 2.43$ mm2) to accommodate 59 760 electrodes, while its power consumption, noise characteristics, and spatial resolution (13.5- $\mu$ m electrode pitch) are comparable to the best state-of-the-art devices. The system includes: 2048 action potential (AP, bandwidth: 300 Hz–10 kHz) recording units, 32 local-field-potential (LFP, bandwidth: 1 Hz–300 Hz) recording units, 32 current recording units, 32 impedance measurement units, and 28 neurotransmitter detection units, in addition to the 16 dual-mode voltage-only or current/voltage-controlled stimulation units. The electrode array architecture is based on a switch matrix, which allows for connecting any measurement/stimulation unit to any electrode in the array and for performing different measurement/stimulation functions in parallel.

160 citations


Journal ArticleDOI
TL;DR: A neural recording chopper amplifier capable of handling in-band artifacts up to 40 mV up topp while preserving the accompanying small neural signals while achieving similar power and noise performance is presented.
Abstract: Closed-loop neuromodulation is essential for the advance of neuroscience and for administering therapy in patients suffering from drug-resistant neurological conditions. Neural stimulation generates large artifacts at the recording sites, which easily saturate traditional recording front ends. This paper presents a neural recording chopper amplifier capable of handling in-band artifacts up to 40 mVpp while preserving the accompanying small neural signals. New techniques have been proposed that solve the issues of low input impedance and electrode-offset rejection, which enable a DC input impedance of 300 MΩ and a dynamic range of 69 dB (200 Hz-5 kHz) and 78 dB (1-200 Hz). Implemented in a 40-nm CMOS process, the prototype occupies an area of 0.071 mm2/channel, and consumes 2 μW from a 1.2 V supply. The input-referred noise is 7 μV rms (200 Hz-20 kHz) and 2 μV rms (1-200 Hz). The total harmonic distortion for a 20-mV p input at 1 kHz is -74 dB. This paper improves the linearity by 14-26 dB, dynamic range by 11-28 dB, and input-impedance for chopped front ends by a factor of 11 as compared with the current state of the art, while achieving similar power and noise performance.

157 citations


Journal ArticleDOI
TL;DR: A subthreshold voltage reference in which the output voltage is scalable depending on the number of stacked PMOS transistors, which achieves a line sensitivity of 0.31%/V and a power supply rejection of −41 dB while consuming 35 pW from 1.4 V at room temperature.
Abstract: This paper presents a subthreshold voltage reference in which the output voltage is scalable depending on the number of stacked PMOS transistors. A key advantage is that its output voltage can be higher than that obtained with conventional low-power subthreshold voltage references. The proposed reference uses native NMOS transistors as a current source and develops a reference voltage by stacking one or more PMOS transistors. The temperature coefficient of the reference voltage is compensated by setting the size ratio of the native NMOS and stacked pMOS transistors to cancel temperature dependence of transistor threshold voltage and thermal voltage. Also, the transistor size is determined considering the trade-off between diode current between n-well and p-sub and process variation. Prototype chips are fabricated in a 0.18- $\mu \text{m}$ CMOS process. Measurement results from three wafers show $3\sigma $ inaccuracy of ±1.0% from 0 °C to 100 °C after a single room-temperature trim. The proposed voltage reference achieves a line sensitivity of 0.31%/V and a power supply rejection of −41 dB while consuming 35 pW from 1.4 V at room temperature.

142 citations


Journal ArticleDOI
TL;DR: A 64×64-pixel 3-D imager based on single-photon avalanche diodes (SPADs) for long-range applications, such as spacecraft navigation and landing, consuming less than 100 mW.
Abstract: This paper describes a 64 $\times $ 64-pixel 3-D imager based on single-photon avalanche diodes (SPADs) for long-range applications, such as spacecraft navigation and landing. Each 60- $\mu \text{m}$ pixel includes eight SPADs combined as a digital silicon photomultiplier, a triggering logic for photons temporal correlation, a 250-ps 16-b time-to-digital converter, and an intensity counter, with an overall 26.5% fill factor. The sensor provides time-of-flight and intensity information even with a background intensity up to 100 MPhotons/s/pixel. The sensor can work in imaging (short range, 3-D image) and altimeter (long range, single point) modes, achieving up to 300-m and 6-km maximum distance with <0.2-m and <0.5-m precision, respectively, consuming less than 100 mW.

130 citations


Journal ArticleDOI
TL;DR: The analysis of performance metrics, such as loss, isolation, linearity, and tuning range, is presented in terms of the design parameters for the first CMOS nonmagnetic nonreciprocal passive circulator based on N-path filters.
Abstract: Recently, we demonstrated the first CMOS nonmagnetic nonreciprocal passive circulator based on N-path filters that uses time variance to break reciprocity. Here, the analysis of performance metrics, such as loss, isolation, linearity, and tuning range, is presented in terms of the design parameters. The analysis is verified by the measured performance of a 65-nm CMOS circulator prototype that exhibits 1.7 dB of loss in the transmitter-antenna (TX-ANT) and antenna-receiver (ANT-RX) paths, and has high isolation [TX–RX, up to 50 dB through tuning and 20-dB bandwidth (BW) of 32 MHz] and a tuning range of 610–850 MHz. Through an architectural feature specifically designed to enhance TX linearity, the circulator achieves an in-band TX-ANT input-referred third-order intercept point (IIP3) of +27.5 dBm, nearly two orders of magnitude higher than the ANT-RX IIP3 of +8.7 dBm. The circulator is also integrated with a self-interference-canceling full-duplex (FD) RX featuring an analog baseband (BB) SI canceller. The FD RX achieves 42-dB on-chip SI suppression across the circulator and analog BB domains over a 12-MHz signal BW. In conjunction with digital SI and its input-referred third-order intermodulation (IM3) cancellation, the FD RX demonstrates 85-dB overall SI suppression, enabling an FD link budget of −7-dBm TX average output power and −92-dBm noise floor.

125 citations


Journal ArticleDOI
TL;DR: This paper discusses the design of on-chip transformer-based fourth order filters, suitable for mm-Wave highly sensitive broadband low-noise amplifiers and receivers implemented in deep-scaled CMOS, and achieves a figure of merit better than state-of-the-art designs in the same band and comparable to LNAs at lower frequencies.
Abstract: This paper discusses the design of on-chip transformer-based fourth order filters, suitable for mm-Wave highly sensitive broadband low-noise amplifiers (LNAs) and receivers (RXs) implemented in deep-scaled CMOS. Second order effects due to layout parasitics are analyzed and new design techniques are introduced to further enhance the gain-bandwidth product of this class of filters. The design and measurements of a broadband 28-nm bulk CMOS LNA and a sliding-IF RX tailored for ${E}$ -band (i.e., 71–76-GHz and 81–86-GHz) point-to-point communication links are presented. Leveraging the proposed design methodologies, the ${E}$ -band LNA achieves a figure of merit $\approx 10.5$ -dB better than state-of-the-art designs in the same band and comparable to LNAs at lower frequencies. The RX achieves 30.8-dB conversion gain with ${E}$ -band with wide margin.

119 citations


Journal ArticleDOI
TL;DR: An inductorless bias-flip rectifier is proposed in this paper to perform residual charge inversion using capacitors instead of inductors, which shows a performance improvement higher than most of the reported state-of-the-art inductor-based interface circuits, and has a significantly smaller overall volume enabling system miniaturization.
Abstract: Piezoelectric vibration energy harvesters have drawn much interest for powering self-sustained electronic devices. Furthermore, the continuous push toward miniaturization and higher levels of integration continues to form key drivers for autonomous sensor systems being developed as parts of the emerging Internet of Things (IoT) paradigm. The synchronized switch harvesting (SSH) on inductor and synchronous electrical charge extraction are two of the most efficient interface circuits for piezoelectric energy harvesters; however, inductors are indispensable components in these interfaces. The required inductor values can be up to 10 mH to achieve high efficiencies, which significantly increase overall system volume, counter to the requirement for miniaturized self-power systems for IoT. An inductorless bias-flip rectifier is proposed in this paper to perform residual charge inversion using capacitors instead of inductors. The voltage flip efficiency goes up to 80% while eight switched capacitors are employed. The proposed SSH on capacitors circuit is designed and fabricated in a 0.35- $\mu \text{m}$ CMOS process. The performance is experimentally measured and it shows a 9.7 $\times $ performance improvement compared with a full-bridge rectifier for the case of a 2.5-V open-circuit zero-peak voltage amplitude generated by the piezoelectric harvester. This performance improvement is higher than most of the reported state-of-the-art inductor-based interface circuits, while the proposed circuit has a significantly smaller overall volume enabling system miniaturization.

Journal ArticleDOI
TL;DR: A 12-bit 10-GS/s interleaved (IL) pipeline analog-to-digital converter (ADC) is described in this paper, which achieves a signal to noise and distortion ratio (SNDR) and a spurious free dynamic range (SFDR) of 66 dB with a 4-GHz input signal.
Abstract: A 12-bit 10-GS/s interleaved (IL) pipeline analog-to-digital converter (ADC) is described in this paper. The ADC achieves a signal to noise and distortion ratio (SNDR) of 55 dB and a spurious free dynamic range (SFDR) of 66 dB with a 4-GHz input signal, is fabricated in the 28-nm CMOS technology, and dissipates 2.9 W. Eight pipeline sub-ADCs are interleaved to achieve 10-GS/s sample rate, and mismatches between sub-ADCs are calibrated in the background. The pipeline sub-ADCs employ a variety of techniques to lower power, like avoiding a dedicated sample-and-hold amplifier (SHA-less), residue scaling, flash background calibration, dithering and inter-stage gain error background calibration. A push–pull input buffer optimized for high-frequency linearity drives the interleaved sub-ADCs to enable >7-GHz bandwidth. A fast turn-ON bootstrapped switch enables 100-ps sampling. The ADC also includes the ability to randomize the sub-ADC selection pattern to further reduce residual interleaving spurs.

Journal ArticleDOI
TL;DR: A neural-recording front-end that has an input range of ±50 mV and can be used in closed-loop systems and avoids the saturation due to stimulation artifacts by employing a voltage-controlled oscillator (VCO) to directly convert the input signal into the frequency domain.
Abstract: Closed-loop neuromodulation is an essential function in future neural implants for delivering efficient and effective therapy. However, a closed-loop system requires the neural-recording front-end to handle large stimulation artifacts—a feature not supported by most state-of-the-art designs. In this paper, we present a neural-recording front-end that has an input range of ±50 mV and can be used in closed-loop systems. The proposed front-end avoids the saturation due to stimulation artifacts by employing a voltage-controlled oscillator (VCO) to directly convert the input signal into the frequency domain. The VCO nonlinearity is corrected using area-efficient foreground polynomial correction. Implemented in a 40-nm CMOS process, the design occupies 0.135 mm $^{\mathrm { {2}}}$ with an analog power of 3 $\mu \text{W}$ and a digital switching power of 4 $\mu \text{W}$ . It achieves ten times higher linear input range than prior art, and 79-dB spurious-free dynamic range at peak input, with an input-referred noise of 5.2 $\mu \text{V}_{\mathrm { {rms}}}$ across the local-field-potential band of 1–200 Hz. With on-chip subhertz high-pass filters realized by duty-cycled resistors, the front-end also eliminates the need of off-chip dc-blocking capacitors.

Journal ArticleDOI
TL;DR: This paper presents 64-quadrature amplitude modulation (QAM) 60-GHz CMOS transceivers with four-channel bonding capability, which can be categorized into a one-stream transceiver and a two-stream frequency-interleaved (FI) transceiver.
Abstract: This paper presents 64-quadrature amplitude modulation (QAM) 60-GHz CMOS transceivers with four-channel bonding capability, which can be categorized into a one-stream transceiver and a two-stream frequency-interleaved (FI) transceiver. The transceivers are both fabricated in a standard 65-nm CMOS technology. For the proposed one-stream transceiver, the TX-to-RX error vector magnitude (EVM) is less than −23.9 dB for 64-QAM wireless communication in all four channels defined in the IEEE 802.11ad/WiGig. The maximum communication distance with the full rate can reach 0.13 m for 64 QAM, 0.8 m for 16 QAM, and 2.6 m for QPSK using 14-dBi horn antennas. A data rate of 28.16 Gb/s is achieved in 16 QAM by four-channel bonding. The transmitter, receiver, and phase-locked loop consume 186, 155, and 64 mW, respectively. The core area of the transceiver is 3.9 mm2. For the proposed two-stream FI transceiver, four-channel bonding in 64 QAM is realized with a data rate of 42.24 Gb/s and an EVM of less than −23 dB. The front end consumes 544 mW in transmitting mode and 432 mW in receiving mode from a 1.2-V supply. The core area of the transceiver is 7.2 mm2.

Journal ArticleDOI
David Murphy1, Hooman Darabi1, Hao Wu1
TL;DR: It is demonstrated that additional inductors are not strictly necessary by showing that common-mode resonance can be obtained using a single tank, and an NMOS architecture that uses a single differential inductor and a CMOS design that use a single transformer are presented.
Abstract: The performance of a differential LC oscillator can be enhanced by resonating the common mode of the circuit at twice the oscillation frequency. When this technique is correctly employed, Q-degradation due to the triode operation of the differential pair is eliminated and flicker noise is nulled. Until recently, one or more tail inductors have been used to achieve this common-mode resonance. In this paper, we demonstrate that additional inductors are not strictly necessary by showing that common-mode resonance can be obtained using a single tank. We present an NMOS architecture that uses a single differential inductor and a CMOS design that uses a single transformer. Prototypes are presented that achieve figure-of-merits of 192 and 195 dBc/Hz, respectively.

Journal ArticleDOI
TL;DR: A CMOS system on a chip (SoC) for neuroelectrical monitoring and responsive neurostimulation is presented and is validated in vivo using epilepsy monitoring (seizure detection) and treatment ( seizure suppression) experiments.
Abstract: A 64-channel 0.13- $\mu \text{m}$ CMOS system on a chip (SoC) for neuroelectrical monitoring and responsive neurostimulation is presented. The $\Delta \Sigma $ -based neural channel records signals with rail-to-rail dc offset at the input without any area-intensive dc-removing passive components, which leads to a compact 0.013-mm2 integration area of recording and stimulation circuits. The channel consumes 630 nW, yields a signal to noise and distortion ratio of 72.2 dB, a 1.13- $\mu $ Vrms integrated input-referred noise over 0.1–500 Hz frequency range, and a noise efficiency factor of 2.86. Analog multipliers are implemented in each channel with minimum additional area cost by reusing the multi-bit current-digital to analog converter that is originally placed for current-mode stimulation. The multipliers are used for compact implementation of bandpass finite impulse response filters, as well as voltage gain scaling. A tri-core low-power DSP conducts phase-synchrony-based neurophysiological event detection and triggers a subset of 64 programmable arbitrary-waveform current-mode stimulators for subsequent neuromodulation. Two ultra-wideband (UWB) wireless transmitters communicate to receivers located at 10 cm to 2 m distance from the implanted SoC with data rates of 10–46 Mb/s, respectively. An inductive link that operates at 1.5 MHz provides power to the SoC and is also used to communicate commands to an on-chip ASK receiver. The chip occupies 6 mm2 while consuming 1.07 and 5.44 mW with delay-based and voltage controlled oscillator-based UWB transmitters, respectively. The SoC is validated in vivo using epilepsy monitoring (seizure detection) and treatment (seizure suppression) experiments.

Journal ArticleDOI
TL;DR: The proposed TRXs are equipped with binary phase shift keying modulators as well as an I/Q receiver and can be utilized to build a flexible software-defined radar platform for range and distant-selective vibration sensors utilizing frequency-modulated continuous wave as wellAs pseudo-random noise radar techniques.
Abstract: This paper describes a multi-purpose radar system suitable for applications with different requirements on dynamic range, resolution, and miniaturization degree. It utilizes a scalable sensor platform that includes a wideband 30.5-GHz voltage-controlled oscillator (VCO) as well as 61- and 122-GHz transceivers (TRXs) in a silicon-germanium BiCMOS technology. The proposed architecture enables the cascading of multiple TRXs and allows the implementation of MIMO radar systems in two different frequency bands by using a single VCO. The higher transmit output power of 11.5 dBm as well as receive gain of 24 dB make the 61-GHz TRX suitable for applications requiring a high dynamic range. The lower wavelength allows the integration of on-chip antennas in the 122-GHz TRX and enables, thus, a high miniaturization degree. The higher LO scaling factor makes the 122-GHz TRX also more attractive for high-resolution applications. A sweep bandwidth of 2.5 GHz generated by the VCO is scaled up to 10 GHz and results in a range resolution of 3 cm. The proposed TRXs are equipped with binary phase shift keying modulators as well as an I/Q receiver and can be utilized to build a flexible software-defined radar platform for range and distant-selective vibration sensors utilizing frequency-modulated continuous wave as well as pseudo-random noise radar techniques.

Journal ArticleDOI
TL;DR: A neural recording chopper amplifier capable of handling in-band 80-mVpp differential artifacts and 650-m V CM artifacts while preserving the accompanying small neural signals is presented.
Abstract: Closed-loop neuromodulation is essential for the advance of neuroscience and for administering therapy in patients suffering from drug-resistant neurological conditions. Neural stimulation generates large differential and common-mode (CM) artifacts at the recording sites, which easily saturate traditional recording front ends. This paper presents a neural recording chopper amplifier capable of handling in-band 80-mVpp differential artifacts and 650-mVpp CM artifacts while preserving the accompanying small neural signals. New techniques have been proposed that introduce immunity to CM interference, increase the input impedance of the chopper amplifier to 1.6 $\text{G}\Omega $ , and increase the maximum realizable resistance of duty-cycled resistors (DCR) to 90 $\text{G}\Omega $ . These techniques enable our recording front-end to achieve a dynamic range of 74 dB (200 Hz–5 kHz) and 81 dB (1–200 Hz). Implemented in a 40-nm CMOS process, the prototype occupies an area of 0.069 mm2/channel, and consumes 2.8 $\mu \text{W}$ from a 1.2-V supply. The input-referred noise is 5.3 $\mu \text{V}_{\mathrm {\mathbf {rms}}}$ (200 Hz–5 kHz) and 1.8 $\mu \text{V}_{\mathrm {\mathbf {rms}}}$ (1–200Hz). The total harmonic distortion for a 40-mV $_{\mathrm {\mathbf {p}}}$ input at 1 kHz is −76 dB. This work improves the input impedance by 5.3 $\times $ for chopped front-ends, linear-input range by 2 $\times $ , maximum resistance of DCR by 32 $\times $ , and tolerance to CM interferers by 6.5 $\times $ , while maintaining comparable power and noise performance.

Journal ArticleDOI
TL;DR: The concept of phase modulated MIMO radars is explained and demonstrated with a 28-nm CMOS fully integrated 79-GHz radar SoC that includes two transmitters, two receivers, and the mm-wave frequency generation.
Abstract: In this paper, the concept of phase modulated MIMO radars is explained and demonstrated with a 28-nm CMOS fully integrated 79-GHz radar SoC. It includes two transmitters, two receivers, and the mm-wave frequency generation. The receivers’ outputs are digitized by on-chip ADCs and processed by a custom designed digital core, which performs correlation and accumulation with a pseudorandom sequence used in transmission. The SoC consumes 1 W to achieve 7.5 cm range resolution. A module with antennas allows for 5° resolution over ±60° elevation and azimuth scan in 2 $\times $ 2 code domain MIMO operation. A 4 $\times $ 4 MIMO system is also demonstrated by means of two SoCs mounted on the same module.

Journal ArticleDOI
TL;DR: A 48 WL stacked 256-Gb V-NAND flash memory with a 3 b MLC technology withDual state machine architecture is proposed to achieve optimal timing for BL and WL, respectively and an embedded ZQ calibration technique with temperature compensation is introduced.
Abstract: A 48 WL stacked 256-Gb V-NAND flash memory with a 3 b MLC technology is presented. Several vertical scale-down effects such as deteriorated WL loading and variations are discussed. To enhance performance, reverse read scheme and variable-pulse scheme are presented to cope with nonuniform WL characteristics. For improved performance, dual state machine architecture is proposed to achieve optimal timing for BL and WL, respectively. Also, to maintain robust IO driver strength against PVT variations, an embedded ZQ calibration technique with temperature compensation is introduced. The chip, fabricated in a third generation of V-NAND technology, achieved a density of 2.6 Gb/mm2 with 53.2 MB/s of program throughput.

Journal ArticleDOI
TL;DR: This paper presents two 4-bit (16-phase) mm-wave vector modulator phase shifters exploiting a novel in-phase and quadrature signal generator that consists of a single-input double-output cascode amplifier incorporating a lumped-element coupled-line Quadrature coupler.
Abstract: This paper presents two 4-bit (16-phase) mm-wave vector modulator phase shifters exploiting a novel in-phase and quadrature signal generator that consists of a single-input double-output cascode amplifier incorporating a lumped-element coupled-line quadrature coupler. The two circuit implementations have been designed and fabricated in a 28 nm fully depleted silicon-on-insulator CMOS. The first (PS1) achieves a higher gain and the second (PS2) has a more compact area (reduced to about 50%). Each consumes 18 mA from a 1.2 V supply. PS1 exhibits an average gain of 2.3 dB at 87.4 GHz and $B_{3\text {dB}}$ from 78.8 to 92.8 GHz; rms gain error of 1.68 dB at 87.4 GHz and $B_{3\text {dB}}$ ; rms phase error of 9.4° at 87.4 GHz and $B_{3\text {dB}}$ ; $S_{11} dB in $B_{3\text {dB}}$ ; average $P_{1\text {dB}}$ of −7 dBm; and average noise figure (NF) equal to 10.8 dB at 87 GHz. PS2 exhibits an average gain of 0.83 dB at 89.2 GHz and $B_{3\text {dB}}$ from 80.2 to 96.8 GHz; rms gain error of 1.46 dB at 89.2 GHz and $B_{3\text {dB}}$ ; rms phase error of 11.2° at 89.2 GHz and $B_{3\text {dB}}$ ; $S_{11} dB in $B_{3\text {dB}}$ ; average $P_{1\text {dB}}$ of −6 dBm; and average NF of 11.9 dB at 89 GHz.

Journal ArticleDOI
TL;DR: A fully integrated RF energy-harvesting system that can simultaneously deliver the current demanded by external dc loads and store the extra energy in external capacitors, during periods of extra output power, is introduced.
Abstract: This paper introduces a fully integrated RF energy-harvesting system. The system can simultaneously deliver the current demanded by external dc loads and store the extra energy in external capacitors, during periods of extra output power. The design is fabricated in 0.18-μm CMOS technology, and the active chip area is 1.08 mm 2 . The proposed self-startup system is reconfigurable with an integrated LC matching network, an RF rectifier, and a power management/controller unit, which consumes 66-157 nW. The required clock generation and the voltage reference circuit are integrated on the same chip. Duty cycle control is used to operate for the low input power that cannot provide the demanded output power. Moreover, the number of stages of the RF rectifier is reconfigurable to increase the efficiency of the available output power. For high available power, a secondary path is activated to charge an external energy storage element. The measured RF input power sensitivity is -14.8 dBm at a 1-V dc output.

Journal ArticleDOI
Benqing Guo, Jun Chen, Lei Li, Haiyan Jin, Guoning Yang1 
TL;DR: A complementary noise-canceling CMOS low-noise amplifier (LNA) with enhanced linearity is proposed, while an active shunt feedback input stage offers input matching, while extended input matching bandwidth is acquired by a
Abstract: A complementary noise-canceling CMOS low-noise amplifier (LNA) with enhanced linearity is proposed. An active shunt feedback input stage offers input matching, while extended input matching bandwidth is acquired by a $\pi$ -type matching network. The intrinsic noise cancellation mechanism maintains acceptable noise figure (NF) with reduced power consumption due to the current reuse principle. Multiple complementary nMOS and pMOS configurations commonly restrain nonlinear components in individual stage of the LNA. Complementary multigated transistor architecture is further employed to nullify the third-order distortion of noise-canceling stage and compensate the second-order nonlinearity of that. High third-order input intercept point (IIP3) is thus obtained, while the second-order input intercept point (IIP2) is guaranteed by differential operation. Implemented in a 0.18- $\mu \text{m}$ CMOS process, the experimental results show that the proposed LNA provides a maximum gain of 17.5 dB and an input 1-dB compression point (IP1 dB) of −3 dBm. An NF of 2.9–3.5 dB and an IIP3 of 10.6–14.3 dBm are obtained from 0.1 to 2 GHz, respectively. The circuit core only draws 9.7 mA from a 2.2 V supply.

Journal ArticleDOI
TL;DR: A 56-Gb/s PAM4 wireline transceiver testchip is implemented in 16-nm FinFET, and the ADC-based receiver incorporates hybrid analog and digital equalizations.
Abstract: A 56-Gb/s PAM4 wireline transceiver testchip is implemented in 16-nm FinFET. The current mode logic transmitter incorporates an auxiliary current injection at the output nodes to maintain PAM4 amplitude linearity. The ADC-based receiver incorporates hybrid analog and digital equalizations. The analog equalization is performed using two identical stages of continuous time linear equalizer, each having a constant of ~0-dB dc-gain and a maximum peaking of ~7 dB peaking at 14 GHz. A 28-GSample/s 32-way time-interleaved SAR ADC converts the equalized analog signal into digital domain for further equalization using digital signal processing. The transceiver achieves <1e-8 bit error rate over a backplane channel with 31-dB loss at 14-GHz and 3.5-mVrms additional crosstalk, using a fixed ~10-dB TX equalization and an adaptive hybrid RX equalization, with the DSP configured to have a 24-tap feed forward equalizer and a 1-tap decision feedback equalizer. The transceiver consumes 550-mW power at 56 Gb/s, excluding the power of the on-chip configurable DSP that cannot be accurately measured as it is implemented as part of a larger test structure.

Journal ArticleDOI
TL;DR: The time-domain neural network (TDNN), which employs time- domain analog and digital mixed-signal processing (TDAMS) that uses delay time as the analog signal, is proposed, which exploits energy-efficient analog computing, but also enables fully spatially unrolled architecture by the hardware-efficient feature of TDAMS.
Abstract: Demand for highly energy-efficient coprocessor for the inference computation of deep neural networks is increasing. We propose the time-domain neural network (TDNN), which employs time-domain analog and digital mixed-signal processing (TDAMS) that uses delay time as the analog signal. TDNN not only exploits energy-efficient analog computing, but also enables fully spatially unrolled architecture by the hardware-efficient feature of TDAMS. The proposed fully spatially unrolled architecture reduces energy-hungry data moving for weight and activations, thus contributing to significant improvement of energy efficiency. We also propose useful training techniques that mitigate the non-ideal effect of analog circuits, which enables to simplify the circuits and leads to maximizing the energy efficiency. The proof-of-concept chip shows unprecedentedly high energy efficiency of 48.2 TSop/s/W.

Journal ArticleDOI
TL;DR: This paper proposes a coarse-fine dual-loop architecture for the digital low drop-out (LDO) regulators with fast transient response and more than 200-mA load capacity and a digital controller is implemented to prevent contentions between the two loops.
Abstract: This paper proposes a coarse-fine dual-loop architecture for the digital low drop-out (LDO) regulators with fast transient response and more than 200-mA load capacity. In the proposed scheme, the output voltage is coregulated by two loops, namely, the coarse loop and the fine loop. The coarse loop adopts a fast current-mirror flash analog to digital converter and supplies high output current to enhance the transient performance, while the fine loop delivers low output current and helps reduce the voltage ripples and improve the regulation accuracies. Besides, a digital controller is implemented to prevent contentions between the two loops. Fabricated in a 28-nm Samsung CMOS process, the proposed digital LDO achieves maximum load up to 200 mA when the input and the output voltages are 1.1 and 0.9 V, respectively, with a chip area of 0.021 mm2. The measured output voltage drop of around 120 mV is observed for a load step of 180 mA.

Journal ArticleDOI
TL;DR: An integrated on-chip matching network serves to both PA and low-noise transconductance amplifier, thus allowing a 1-pin direct antenna connection with no external band-selection filters.
Abstract: We present an ultra-low-power Bluetooth low-energy (BLE) transceiver (TRX) for the Internet of Things (IoT) optimized for digital 28-nm CMOS. A transmitter (TX) employs an all-digital phase-locked loop (ADPLL) with a switched current-source digitally controlled oscillator (DCO) featuring low frequency pushing, and class-E/F2 digital power amplifier (PA), featuring high efficiency. Low 1/ $f$ DCO noise allows the ADPLL to shut down after acquiring lock. The receiver operates in discrete time at high sampling rate (~10 Gsamples/s) with intermediate frequency placed beyond 1/ $f$ noise corner of MOS devices. New multistage multirate charge-sharing bandpass filters are adapted to achieve high out-of-band linearity, low noise, and low power consumption. An integrated on-chip matching network serves to both PA and low-noise transconductance amplifier, thus allowing a 1-pin direct antenna connection with no external band-selection filters. The TRX consumes 2.75 mW on the RX side and 3.7 mW on the TX side when delivering 0 dBm in BLE.

Journal ArticleDOI
TL;DR: A 94-GHz phased-array transceiver IC for frequency modulated continuous wave (FMCW) radar with four transmitters, four receivers, and integrated LO generation has been designed and fabricated in a 130-nm SiGe BiCMOS technology, integrated into an antenna-in-package module.
Abstract: A 94-GHz phased-array transceiver IC for frequency modulated continuous wave (FMCW) radar with four transmitters, four receivers, and integrated LO generation has been designed and fabricated in a 130-nm SiGe BiCMOS technology, and integrated into an antenna-in-package module. The transceiver, targeting gesture recognition applications for mobile devices, has been designed using phased-array techniques to reduce the total DC power while still maintaining the required link budget for FMCW operation. The complete array achieves state-of-the-art for W-band per-element power consumption of 106 mW per TX element and 91 mW per RX element, and measurements indicate a per-element output power of 6.4 dBm and single-sideband noise figure of 12.5 dB at 94 GHz. The array is able to achieve a beam steering range of ±20° while maintaining at least 3 dB main lobe to side lobe levels. The complete chip-antenna module has been tested to characterize basic FMCW radar functionality. Initial radar experiments suggest a sub-5-cm range resolution is possible with 3.68 GHz RF sweep bandwidth, which is in line with theoretical predictions.

Journal ArticleDOI
TL;DR: A pixel pitch-matched readout chip for 3-D photoacoustic (PA) imaging, featuring a dedicated signal conditioning and delta-sigma modulation integrated within a pixel area of 250, that obviates the need for area-consuming Nyquist ADCs and enables an efficient in-pixel A/D conversion.
Abstract: This paper presents a pixel pitch-matched readout chip for 3-D photoacoustic (PA) imaging, featuring a dedicated signal conditioning and delta-sigma modulation integrated within a pixel area of 250 $\mu \text{m}$ by 250 $\mu \text{m}$ . The proof-of-concept receiver was implemented in an STMicroelectronics’s 28-nm Fully Depleted Silicon On Insulator technology, and interfaces to a $4 \times 4$ subarray of capacitive micromachined ultrasound transducers (CMUTs). The front-end signal conditioning in each pixel employs a coarse/fine gain tuning architecture to fulfill the 90-dB dynamic range requirement of the application. The employed delta-sigma beamforming architecture obviates the need for area-consuming Nyquist ADCs and thereby enables an efficient in-pixel A/D conversion. The per-pixel switched-capacitor $\Delta \Sigma $ modulator leverages slewing-dominated and area-optimized inverter-based amplifiers. It occupies only 1/4th of the pixel, and its area compares favorably with state-of-the-art designs that offer the same SNR and bandwidth. The modulator’s measured peak signal-to-noise-and-distortion ratio is 59.9 dB for a 10-MHz input bandwidth, and it consumes 6.65 mW from a 1-V supply. The overall subarray beamforming approach improves the area per channel by 7.4 times and the single-channel SNR by 8 dB compared to prior art with similar delay resolution and power dissipation. The functionality of the designed chip was evaluated within a PA imaging experiment, employing a flip-chip bonded 2-D CMUT array.

Journal ArticleDOI
TL;DR: Compared with a variety of Intel i7s and Nvidia GPUs, the KiloCore at 1.1 V has geometric mean improvements of 4.3 $\times$ higher throughput per area and 9.3 pJ/instruction for AES encryption, 4095-b low-density parity-check decoding, 4096-point complex fast Fourier transform, and 100-B record sorting applications.
Abstract: A processor array containing 1000 independent processors and 12 memory modules was fabricated in 32-nm partially depleted silicon on insulator CMOS. The programmable processors occupy 0.055 mm2 each, contain no algorithm-specific hardware, and operate up to an average maximum clock frequency of 1.78 GHz at 1.1 V. At 0.9 V, processors operating at an average of 1.24 GHz dissipate 17 mW while issuing one instruction per cycle. At 0.56 V, processors operating at an average of 115 MHz dissipate 0.61 mW while issuing one instruction per cycle, resulting in an energy consumption of 5.3 pJ/instruction. On-die communication is performed by complementary circuit and packet-based networks that yield a total array bisection bandwidth of 4.2 Tb/s. Independent memory modules handle data and instructions and operate up to an average maximum clock frequency of 1.77 GHz at 1.1 V. All processors, their packet routers, and the memory modules contain unconstrained clock oscillators within independent clock domains that adapt to large supply voltage noise. Compared with a variety of Intel i7s and Nvidia GPUs, the KiloCore at 1.1 V has geometric mean improvements of 4.3 $\times$ higher throughput per area and 9.4 $\times$ higher energy efficiency for AES encryption, 4095-b low-density parity-check decoding, 4096-point complex fast Fourier transform, and 100-B record sorting applications.