scispace - formally typeset
Search or ask a question
Author

Taegeun Yoo

Other affiliations: Samsung, Chung-Ang University
Bio: Taegeun Yoo is an academic researcher from Nanyang Technological University. The author has contributed to research in topics: Static random-access memory & Gesture recognition. The author has an hindex of 7, co-authored 28 publications receiving 196 citations. Previous affiliations of Taegeun Yoo include Samsung & Chung-Ang University.

Papers
More filters
Proceedings ArticleDOI
22 Mar 2020
TL;DR: A novel 8T SRAM -based bitcell is proposed for current-based compute-in-memory dot-product operations and Monte-Carlo simulations and test-chip measurement results have verified both linearity and process variation.
Abstract: A novel 8T SRAM -based bitcell is proposed for current-based compute-in-memory dot-product operations. The proposed bitcell with two extra NMOS transistors (vs. standard 6T SRAM) decouples SRAM read and write operation. A 128×128 8T SRAM bitcell array is built for processing a vector-matrix multiplication (or parallel dot-products) with 64x binary (0 or 1) inputs, 64×128 binary (-1 or +1) weights, and 128x 1-5bit outputs. Each column (i.e. neuron) of the proposed SRAM compute-in-memory macro consists of 64x bitcells for dot-product, 32x bitcells for ADC, and 32x bitcells for calibration. The column-based neuron minimizes the ADC overhead by reusing a sense amplifier for SRAM read. The column-wise ADC converts the analog dot-product results to N-bit output codes (N=1 to 5) by sweeping reference levels using replica bitcells for 2N-1 cycles for each conversion. Monte-Carlo simulations and test-chip measurement results have verified both linearity and process variation. The largest variation (σ=2.48%) results in the MNIST classification accuracy of 96.2% (i.e. 0.4% lower than a baseline with no variation). A test-chip is fabricated using 65nm, and the 16K SRAM bitcell array occupies 0.055mm2. The energy efficiency of the 1bit operation is 490-to-15.8TOPS/W at 1-5bit ADC mode using 0.45/0.8V core supply and 200MHz.

66 citations

Journal ArticleDOI
TL;DR: Based on the benefits of digital CIM, reconfigurability, and bit-serial computing architecture, the Colonnade can achieve both high performance and energy efficiency for processing neural networks.
Abstract: This article (Colonnade) presents a fully digital bit-serial compute-in-memory (CIM) macro. The digital CIM macro is designed for processing neural networks with reconfigurable 1–16 bit input and weight precisions based on bit-serial computing architecture and a novel all-digital bitcell structure. A column of bitcells forms a column MAC and used for computing a multiply-and-accumulate (MAC) operation. The column MACs placed in a row work as a single neuron and computes a dot-product, which is an essential building block of neural network accelerators. Several key features differentiate the proposed Colonnade architecture from the existing analog and digital implementations. First, its full-digital circuit implementation is free from process variation, noise susceptibility, and data-conversion overhead that are prevalent in prior analog CIM macros. A bitwise MAC operation in a bitcell is performed in the digital domain using a custom-designed XNOR gate and a full-adder. Second, the proposed CIM macro is fully reconfigurable in both weight and input precision from 1 to 16 bit. So far, most of the analog macros were used for processing quantized neural networks with very low input/weight precisions, mainly due to a memory density issue. Recent digital accelerators have implemented reconfigurable precisions, but they are inferior in energy efficiency due to significant off-chip memory access. We present a regular digital bitcell array that is readily reconfigured to a 1–16 bit weight-stationary bit-serial CIM macro. The macro computes parallel dot-product operations between the weights stored in memory and inputs that are serialized from LSB to MSB. Finally, the bit-serial computing scheme significantly reduces the area overhead while sacrificing latency due to bit-by-bit operation cycles. Based on the benefits of digital CIM, reconfigurability, and bit-serial computing architecture, the Colonnade can achieve both high performance and energy efficiency (i.e., both benefits of prior analog and digital accelerators) for processing neural networks. A test-chip with $128 \times 128$ SRAM-based bitcells for digital bit-serial computing is implemented using 65-nm technology and tested with 1–16 bit weight/input precisions. The measured energy efficiency is 117.3 TOPS/W at 1 bit and 2.06 TOPS/W at 16 bit.

58 citations

Journal ArticleDOI
TL;DR: A cold startup technique is incorporated in the proposed EHS with an energy-aware algorithm for input harvester select purpose that eliminates the gate driver/conduction loss tradeoff in a reconfigurable switched capacitor charge pump.
Abstract: This paper presents a novel 3-D maximum power point tracking (3-D MPPT) system for energy harvesting systems (EHS) within Internet of Things (IoT) smart nodes. The proposed 3-D MPPT utilizes a switch width modulation (SWM) technique for improving power efficiency (PE) at idle ( $\mu \text{A}$ ) and heavy (>300 $\mu \text{A}$ ) load modes. The SWM eliminates the gate driver/conduction loss tradeoff in a reconfigurable switched capacitor charge pump (SCCP). The proposed SWM technique modulates the SCCP switch resistance in proportion to the load condition, input voltage, and $V_{\mathrm {gs}}$ applied. A cold startup technique is incorporated in the proposed EHS with an energy-aware algorithm for input harvester select purpose. The fabricated test chip in 65-nm CMOS technology can harvest solar and thermal energy from 0.35 V and provides a regulated output voltage at 1 V with the peak efficiency of 88% at 200 $\mu \text{W}$ and PE >60% at 100 nW.

54 citations

Journal ArticleDOI
TL;DR: In this article, a fast and efficient maximum power point tracking (MPPT) technique was proposed to minimize the power loss with the adaptively binary-weighted step (ABWS) followed by the monotonically decreased step (MDS) without causing output power fluctuation or requiring additional ad hoc parameters.
Abstract: When conventional maximum power point tracking (MPPT) techniques are required to operate fast under rapidly changing environmental conditions, a large power loss can be caused by slow tracking speed, output power fluctuation, or additionally required ad hoc parameters. This paper proposes a fast and efficient MPPT technique that minimizes the power loss with the adaptively binary-weighted step (ABWS) followed by the monotonically decreased step (MDS) without causing output power fluctuation or requiring additional ad hoc parameter. The proposed MPPT system for a photovoltaic (PV) module is implemented by a boost converter with a microcontroller unit. The theoretical analysis and the simulation results show that the proposed MPPT provides fast and accurate tracking under rapidly changing environmental conditions. The experimental results based on a distributed PV system demonstrate that the proposed MPPT technique is superior to the conventional perturb and observe (P&O) technique, which reduces the tracking time and the overall power loss by up to 82.95%, 91.51% and 82.46%, 97.71% for two PV modules, respectively.

49 citations

Proceedings ArticleDOI
01 Sep 2019
TL;DR: This work proposes a digital in-memory computing macro with 1-16b reconfigurable weight and input bit-precisions for energy-efficient DNN processing that comprises 128×128 bitcells, and each bitcell consists of an XNOR-based bitwise multiplier, a full-adder, and an SRAM cell.
Abstract: This work proposes a digital in-memory computing macro with 1-16b reconfigurable weight and input bit-precisions for energy-efficient DNN processing. The proposed digital macro comprises 128×128 bitcells, and each bitcell consists of three building blocks for in-memory computing, an XNOR-based bitwise multiplier, a full-adder, and an SRAM cell. The two-dimensional bitcell array is then divided into parallel neurons, each with 128× column-shape multiply-and-accumulate (column-MAC) units arranged in a row. Each column-MAC with N-bit variable weight precision is built with ‘N+7’ bitcells in a column (i.e., 8-to-23 bitcells at 1-to-16bit). The N-bit weights are stored at SRAM cells for in-memory computing with the minimal memory access for fetching weights. The remaining 7 bitcells are needed to extend MSBs for accumulating partial-sums through 128 column-MACs. A bit-serial input is broadcasted to all bitcells in the same column, and parallel bitwise multiply operations are performed. Bitwise multiplied results from each column-MAC are then accumulated using N+7 full-adders which are vertically connected to work as a ripple carry adder. Meanwhile, the input precision is determined by the number of bit-serial input cycles from LSB to MSB. Hence, the post-accumulation is required for multi-bit input precision. A 65nm test-chip is fabricated, and the measured energy-efficiency is 117.3 to 2.06TOPS/W at 1-16bit.

45 citations


Cited by
More filters
Proceedings Article
01 Jan 2010
TL;DR: In this article, a low power boost converter for thermoelectric energy harvesting that demonstrates an efficiency that is 15% higher than the state-of-the-art for voltage conversion ratios above 20.
Abstract: This paper presents a low power boost converter for thermoelectric energy harvesting that demonstrates an efficiency that is 15% higher than the state-of-the-art for voltage conversion ratios above 20. This is achieved by utilizing a technique allowing synchronous rectification in the discontinuous conduction mode. A low-power method for input voltage monitoring is presented. The low input voltage requirements allow operation from a thermoelectric generator powered by body heat. The converter, fabricated in a 0.13 μm CMOS process, operates from input voltages ranging from 20 mV to 250 mV while supplying a regulated 1 V output. The converter consumes 1.6 (1.1) μW of quiescent power, delivers up to 25 (175) μW of output power, and is 46 (75)% efficient for a 20 mV and 100 mV input, respectively.

412 citations

Journal ArticleDOI
11 May 2017
TL;DR: The power-conversion and control technologies used for DPGSs are reviewed, the impacts of the DPGs on the distributed grid are examined, and more importantly, strategies for enhancing the connection and protection of the BES are discussed.
Abstract: Continuously expanding deployments of distributed power-generation systems (DPGSs) are transforming the conventional centralized power grid into a mixed distributed electrical network. The modern power grid requires flexible energy utilization but presents challenges in the case of a high penetration degree of renewable energy, among which wind and solar photovoltaics are typical sources. The integration level of the DPGS into the grid plays a critical role in developing sustainable and resilient power systems, especially with highly intermittent renewable energy resources. To address the challenging issues and, more importantly, to leverage the energy generation, stringent demands from both utility operators and consumers have been imposed on the DPGS. Furthermore, as the core of energy conversion, numerous power electronic converters employing advanced control techniques have been developed for the DPGS to consolidate the integration. In light of the above, this paper reviews the power-conversion and control technologies used for DPGSs. The impacts of the DPGS on the distributed grid are also examined, and more importantly, strategies for enhancing the connection and protection of the DPGS are discussed.

399 citations

Journal ArticleDOI
TL;DR: In this paper, the authors provide an up-to-date comparison and evaluation of a recent progress in the field of thermoelectricity, resulting primarily from multidisciplinary optimization of materials, topologies and controlling circuitry.

205 citations

Journal Article
TL;DR: The current landscape of TinyML is presented and the challenges and direction towards developing a fair and useful hardware benchmark for TinyML workloads are discussed, along with three preliminary benchmarks and the selection methodology are discussed.
Abstract: Recent advancements in ultra-low-power machine learning (TinyML) hardware promises to unlock an entirely new class of smart applications. However, continued progress is limited by the lack of a widely accepted benchmark for these systems. Benchmarking allows us to measure and thereby systematically compare, evaluate, and improve the performance of systems and is therefore fundamental to a field reaching maturity. In this position paper, we present the current landscape of TinyML and discuss the challenges and direction towards developing a fair and useful hardware benchmark for TinyML workloads. Furthermore, we present our four benchmarks and discuss our selection methodology. Our viewpoints reflect the collective thoughts of the TinyMLPerf working group that is comprised of over 30 organizations.

127 citations

Journal ArticleDOI
TL;DR: A multi-mode operation for the three-phase photovoltaic (PV) power system with low-voltage ride-through (LVRT) capability is proposed, which can provide maximum reactive power under rated current amplitude during the voltage sag period and output the demanded reactive/rated current ratio to meet different LVRT codes.
Abstract: A multi-mode operation for the three-phase photovoltaic (PV) power system with low-voltage ride-through (LVRT) capability is proposed. With the proposed multi-mode control strategy, the active power from the PV arrays can be continuously extracted by the interleaved boost converter during LVRT, whereas the maximum power point tracking operation can be quickly achieved after the grid fault clearance. In addition, the multi-channel boost converter with interleaved operation can increase power conversion efficiency while decreasing the input current ripple. On the other hand, the current amplitude limitation control of the three-phase inverter can provide maximum reactive power under rated current amplitude during the voltage sag period and output the demanded reactive/rated current ratio to meet different LVRT codes. A three-phase 5-kVA prototype PV converter is built and tested to verify the performance of the proposed multi-mode operation strategy and the LVRT capability.

123 citations