Home
/
Authors
/
Yu Chen

Author

Yu Chen

Bio: Yu Chen is an academic researcher from University of Michigan. The author has contributed to research in topics: Computer science & Deep learning. The author has an hindex of 6, co-authored 8 publications receiving 127 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

17.2 A 142nW Voice and Acoustic Activity Detection Chip for mm-Scale Sensor Nodes Using Time-Interleaved Mixer-Based Frequency Scanning

[...]

Minchang Cho¹, Sechang Oh¹, Zhan Shi¹, Jongyup Lim¹, Yejoong Kim¹, Seokhyeon Jeong¹, Yu Chen¹, David Blaauw¹, Hun-Seok Kim¹, Dennis Sylvester¹ - Show less +6 more•Institutions (1)

University of Michigan¹

01 Feb 2019

TL;DR: No sub $-\mu \mathrm {W}$ VAD has been reported to date, preventing the use of VADs in unobtrusive mm-scale sensor nodes, and their simple decision tree or fixed neural network-based approach limited broader use for various acoustic event targets.

...read moreread less

Abstract: Acoustic sensing is one of the most widely used sensing modalities to intelligently assess the environment. In particular, ultra-low power (ULP) always-on voice activity detection (VAD) is gaining attention as an enabling technology for IoT platforms. In many practical applications, acoustic events-of-interest occur infrequently. Therefore, the system power consumption is typically dominated by the always-on acoustic wakeup detector, while the remainder of the system is power-gated the vast majority of the time. A previous acoustic wakeup detector [1] consumed just 12nW but could not process voice signals (up to 4kHz bandwidth) or handle non-stationary events, which are essential qualities for a VAD. Prior VAD ICs [2], [3] demonstrated reliable performance but consumed significant power $(\gt 20 \mu \mathrm {W})$ and lacked an analog frontend (AFE), which further increases power. Recent analog-domain feature extraction-based VADs [4], [5] also reported $\mu \mathrm {W}-$ level power consumption, and their simple decision tree [4] or fixed neural network-based approach [5] limited broader use for various acoustic event targets. In summary, no sub $-\mu \mathrm {W}$ VAD has been reported to date, preventing the use of VADs in unobtrusive mm-scale sensor nodes.

...read moreread less

44 citations

Proceedings Article•DOI•

An 879GOPS 243mW 80fps VGA Fully Visual CNN-SLAM Processor for Wide-Range Autonomous Exploration

[...]

Ziyun Li¹, Yu Chen¹, Luyao Gong¹, Lu Liu¹, Dennis Sylvester¹, David Blaauw¹, Hun-Seok Kim¹ - Show less +3 more•Institutions (1)

University of Michigan¹

01 Feb 2019

TL;DR: Visual SLAM requires massive computation in the CNN-based feature extraction and matching, as well as data-dependent dynamic memory access and control flow with high-precision operations, creating significant low-power design challenges.

...read moreread less

Abstract: Simultaneous localization and mapping (SLAM) estimates an agent’s trajectory for all six degrees of freedom (6 DoF) and constructs a 3D map of an unknown surrounding. It is a fundamental kernel that enables head-mounted augmented/virtual reality devices and autonomous navigation of micro aerial vehicles. A noticeable recent trend in visual SLAM is to apply computation- and memory-intensive convolutional neural networks (CNNs) that outperform traditional hand-designed feature-based methods [1]. For each video frame, CNN-extracted features are matched with stored keypoints to estimate the agent’s 6-DoF pose by solving a perspective-n-points (PnP) non-linear optimization problem (Fig. 7.3.1, left). The agent’s long-term trajectory over multiple frames is refined by a bundle adjustment process (BA, Fig. 7.3.1 right), which involves a large-scale ($\sim$120 variables) non-linear optimization. Visual SLAM requires massive computation ($\gt250$ GOP/s) in the CNN-based feature extraction and matching, as well as data-dependent dynamic memory access and control flow with high-precision operations, creating significant low-power design challenges. Software implementations are impractical, resulting in 0.2s runtime with a $\sim$3 GHz CPU + GPU system with $\gt100$ MB memory footprint and $\gt100$ W power consumption. Prior ASICs have implemented either an incomplete SLAM system [2, 3] that lacks estimation of ego-motion or employed a simplified (non-CNN) feature extraction and tracking [2, 4, 5] that limits SLAM quality and range. A recent ASIC [5] augments visual SLAM with an off-chip high-precision inertial measurement unit (IMU), simplifying the computational complexity, but incurring additional power and cost overhead.

...read moreread less

40 citations

Journal Article•DOI•

Always-On 12-nW Acoustic Sensing and Object Recognition Microsystem for Unattended Ground Sensor Nodes

[...]

Seokhyeon Jeong¹, Yu Chen¹, Taekwang Jang¹, Julius Ming-Lin Tsai, David Blaauw¹, Hun-Seok Kim¹, Dennis Sylvester¹ - Show less +3 more•Institutions (1)

University of Michigan¹

01 Jan 2018-IEEE Journal of Solid-state Circuits

TL;DR: An algorithm-circuit cross optimization is introduced to realize a 12-nW stand-alone microsystem that integrates the analog frontend with the digital backend signal classifier and replaces a conventional high-power/area-consuming parallel feature extraction using the fast Fourier transform.

...read moreread less

Abstract: This paper presents an ultra-low power acoustic sensing and object recognition microsystem for Internet of Things applications. The microsystem is targeted for unattended ground sensor nodes where long-term (decades) life time is desired without the need for battery replacement. The system incorporates an microelectromechanical systems microphone as a frontend sensor along with active circuitry to identify target objects. We introduce an algorithm-circuit cross optimization to realize a 12-nW stand-alone microsystem that integrates the analog frontend with the digital backend signal classifier. The frequency-domain analysis of target audio signals reveals that the system can operate with a relatively low bandwidth ( 3 dB) which significantly relaxes power constraints on both analog frontend and digital backend circuits. To further relax the current requirement of the preceding amplifier, we propose an 8-bit SAR-analog-to-digital converter that is designed to have a highly reduced sampling capacitance ( 95% reliability and consumes only 12 nW with continuous monitoring.

...read moreread less

39 citations

Proceedings Article•DOI•

Deep Learning in Latent Space for Video Prediction and Compression

[...]

Bowen Liu¹, Yu Chen¹, Shiyu Liu¹, Hun-Seok Kim¹•Institutions (1)

University of Michigan¹

20 Jun 2021

TL;DR: In this paper, a generative adversarial network (GAN) is used to predict the latent vector representation of the future frame and a convolutional long short-term memory (ConvLSTM) network is employed to predict future frames.

...read moreread less

Abstract: Learning-based video compression has achieved substantial progress during recent years. The most influential approaches adopt deep neural networks (DNNs) to remove spatial and temporal redundancies by finding the appropriate lower-dimensional representations of frames in the video. We propose a novel DNN based framework that predicts and compresses video sequences in the latent vector space. The proposed method first learns the efficient lower-dimensional latent space representation of each video frame and then performs inter-frame prediction in that latent domain. The proposed latent domain compression of individual frames is obtained by a deep autoencoder trained with a generative adversarial network (GAN). To exploit the temporal correlation within the video frame sequence, we employ a convolutional long short-term memory (ConvLSTM) network to predict the latent vector representation of the future frame. We demonstrate our method with two applications; video compression and abnormal event detection that share the identical latent frame prediction network. The proposed method exhibits superior or competitive performance compared to the state-of-the-art algorithms specifically designed for either video compression or anomaly detection.1

...read moreread less

39 citations

Journal Article•DOI•

An Acoustic Signal Processing Chip With 142-nW Voice Activity Detection Using Mixer-Based Sequential Frequency Scanning and Neural Network Classification

[...]

Sechang Oh¹, Minchang Cho¹, Zhan Shi¹, Jongyup Lim¹, Yejoong Kim¹, Seokhyeon Jeong¹, Yu Chen¹, Rohit Rothe¹, David Blaauw¹, Hun-Seok Kim¹, Dennis Sylvester¹ - Show less +7 more•Institutions (1)

University of Michigan¹

12 Sep 2019-IEEE Journal of Solid-state Circuits

TL;DR: This article presents a voice and acoustic activity detector that uses a mixer-based architecture and ultra-low-power neural network (NN)-based classifier that features inaudible acoustic signature detection for intentional remote silent wakeup of the system while re-using a subset of the same system components.

...read moreread less

Abstract: This article presents a voice and acoustic activity detector that uses a mixer-based architecture and ultra-low-power neural network (NN)-based classifier. By sequentially scanning 4 kHz of frequency bands and down-converting to below 500 Hz, feature extraction power consumption is reduced by 4 $\times $ . The NN processor employs computational sprinting, enabling 12 $\times $ power reduction. The system also features inaudible acoustic signature detection for intentional remote silent wakeup of the system while re-using a subset of the same system components. The measurement results achieve 91.5%/90% speech/non-speech hit rates at 10-dB SNR with babble noise and 142-nW power consumption. Acoustic signature detection consumes 66 nW, successfully detecting a signature 10 dB below the noise level.

...read moreread less

35 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications

[...]

Deniz Gunduz, Zhijin Qin, Inaki Estella Aguerri, Harpreet S. Dhillon, Zhaohui Yang, Aylin Yener, Kai-Kit Wong, Chan-Byoung Chae - Show less +4 more

19 Jul 2022-IEEE Journal on Selected Areas in Communications

TL;DR: This tutorial summarizes the efforts to date, starting from its early adaptations, semantic-aware and task-oriented communications, covering the foundations, algorithms and potential implementations, and focuses on approaches that utilize information theory to provide the foundations.

...read moreread less

Abstract: Communication systems to date primarily aim at reliably communicating bit sequences. Such an approach provides efficient engineering designs that are agnostic to the meanings of the messages or to the goal that the message exchange aims to achieve. Next generation systems, however, can be potentially enriched by folding message semantics and goals of communication into their design. Further, these systems can be made cognizant of the context in which communication exchange takes place, thereby providing avenues for novel design insights. This tutorial summarizes the efforts to date, starting from its early adaptations, semantic-aware and task-oriented communications, covering the foundations, algorithms and potential implementations. The focus is on approaches that utilize information theory to provide the foundations, as well as the significant role of learning in semantics and task-aware communications.

...read moreread less

67 citations

Journal Article•DOI•

Design of an Always-On Deep Neural Network-Based 1- $\mu$ W Voice Activity Detector Aided With a Customized Software Model for Analog Feature Extraction

[...]

Minhao Yang¹, Chung-Heng Yeh², Yiyin Zhou², Joao P. Cerqueira², Aurel A. Lazar², Mingoo Seok² - Show less +2 more•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, Columbia University²

18 Apr 2019-IEEE Journal of Solid-state Circuits

TL;DR: This paper presents an ultra-low-power voice activity detector (VAD) that uses analog signal processing for acoustic feature extraction (AFE) directly on the microphone output, approximate event-driven analog-to-digital conversion (ED-ADC), and digital deep neural network (DNN) for speech/non-speech classification.

...read moreread less

Abstract: This paper presents an ultra-low-power voice activity detector (VAD). It uses analog signal processing for acoustic feature extraction (AFE) directly on the microphone output, approximate event-driven analog-to-digital conversion (ED-ADC), and digital deep neural network (DNN) for speech/non-speech classification. New circuits, including the low-noise amplifier, bandpass filter, and full-wave rectifier contribute to the more than 9 $\times $ normalized power/channel reduction in the feature extraction front-end compared to the best prior art. The digital DNN is a three-hidden-layer binarized multilayer perceptron (MLP) with a 2-neuron output layer and a 48-neuron input layer that receives parallel event streams from the ED-ADCs. To obtain the DNN weights via off-line training, a customized front-end model written in python is constructed to accelerate feature generation in software emulation, and the model parameters are extracted from Spectre simulations. The chip, fabricated in 0.18- $\mu \text{m}$ CMOS, has a core area of 1.66 $\times $ 1.52 mm2 and consumes 1 $\mu \text{W}$ . The classification measurements using the 1-hour 10-dB signal-to-noise ratio audio with restaurant background noise show a mean speech/non-speech hit rate of 84.4%/85.4% with a 1.88%/4.65% 1- $\sigma $ variation across ten dies that are all loaded with the same weights.

...read moreread less

57 citations

Proceedings Article•DOI•

BARNET: Towards Activity Recognition Using Passive Backscattering Tag-to-Tag Network

[...]

Jihoon Ryoo, Yasha Karimi¹, Akshay Athalye¹, Milutin Stanacevic¹, Samir R. Das¹, Petar M. Djuric¹ - Show less +2 more•Institutions (1)

Stony Brook University¹

10 Jun 2018

TL;DR: The vision of BARNET (Backscattering Activity Recognition NEtwork of Tags), a network of passive RF tags that use RF backscatter for tag-to-tag communication, is presented and the BARNET tag architecture shows that an ASIC implementation can run on harvested RF power.

...read moreread less

Abstract: We present the vision of BARNET (Backscattering Activity Recognition NEtwork of Tags), a network of passive RF tags that use RF backscatter for tag-to-tag communication. BARNET not only provides identification of tagged objects but also can serve as a 'device-free' activity recognition system. BARNET's key innovation is the concept of backscatter channel state information (BCSI) which can be measured via systematic multiphase probing of the backscatter tag-to-tag channel using innovative processing on the passive tags. So far such measurements were only possible using active radio receivers that consume much higher power. Changes in BCSI provide signatures for different activities in the environment that can be learned using suitable machine learning tools. We develop the BARNET tag architecture which shows that an ASIC implementation can run on harvested RF power. We develop a printed circuit board (PCB) prototype using discrete components to evaluate activity recognition performance. We show that the prototype can recognize human daily activities with an average error around 6%. Overall, BARNET uses passive tags to achieve the same level of performance as systems that use powered, active radios.

...read moreread less

52 citations

Journal Article•DOI•

Vega: A Ten-Core SoC for IoT Endnodes With DNN Acceleration and Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode

[...]

Davide Rossi¹, Francesco Conti¹, Manuel Eggimann², Alfio Di Mauro², Giuseppe Tagliavini¹, Stefan Mach², Marco Guermandi¹, Antonio Pullini, Igor Loi, Jie Chen¹, Eric Flamand, Luca Benini¹ - Show less +8 more•Institutions (2)

University of Bologna¹, ETH Zurich²

06 Oct 2021-IEEE Journal of Solid-state Circuits

TL;DR: Vega as discussed by the authors is an IoT endnode system on chip (SoC) capable of scaling from a 1.7-μW fully retentive cognitive sleep mode up to 32.2-GOPS (at 49.4 mW).

...read moreread less

Abstract: The Internet-of-Things (IoT) requires endnodes with ultra-low-power always-on capability for a long battery lifetime, as well as high performance, energy efficiency, and extreme flexibility to deal with complex and fast-evolving near-sensor analytics algorithms (NSAAs). We present Vega, an IoT endnode system on chip (SoC) capable of scaling from a 1.7-μW fully retentive cognitive sleep mode up to 32.2-GOPS (at 49.4 mW) peak performance on NSAAs, including mobile deep neural network (DNN) inference, exploiting 1.6 MB of state-retentive SRAM, and 4 MB of non-volatile magnetoresistive random access memory (MRAM). To meet the performance and flexibility requirements of NSAAs, the SoC features ten RISC-V cores: one core for SoC and IO management and a nine-core cluster supporting multi-precision single instruction multiple data (SIMD) integer and floating-point (FP) computation. Vega achieves the state-of-the-art (SoA)-leading efficiency of 615 GOPS/W on 8-bit INT computation (boosted to 1.3 TOPS/W for 8-bit DNN inference with hardware acceleration). On FP computation, it achieves the SoA-leading efficiency of 79 and 129 GFLOPS/W on 32- and 16-bit FP, respectively. Two programmable machine learning (ML) accelerators boost energy efficiency in cognitive sleep and active states.

...read moreread less

46 citations

Proceedings Article•DOI•

17.2 A 142nW Voice and Acoustic Activity Detection Chip for mm-Scale Sensor Nodes Using Time-Interleaved Mixer-Based Frequency Scanning

[...]

Minchang Cho¹, Sechang Oh¹, Zhan Shi¹, Jongyup Lim¹, Yejoong Kim¹, Seokhyeon Jeong¹, Yu Chen¹, David Blaauw¹, Hun-Seok Kim¹, Dennis Sylvester¹ - Show less +6 more•Institutions (1)

University of Michigan¹

01 Feb 2019

...read moreread less

44 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

Collapse