Journal ArticleDOI
A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks
Reads0
Chats0
TLDR
It is argued that VADs should prioritize accuracy over area and power, and it is introduced a VAD circuit that uses an NN to classify modulation frequency features with 22.3-mW power consumption.Abstract:
This paper describes digital circuit architectures for automatic speech recognition (ASR) and voice activity detection (VAD) with improved accuracy, programmability, and scalability. Our ASR architecture is designed to minimize off-chip memory bandwidth, which is the main driver of system power consumption. A SIMD processor with 32 parallel execution units efficiently evaluates feed-forward deep neural networks (NNs) for ASR, limiting memory usage with a sparse quantized weight matrix format. We argue that VADs should prioritize accuracy over area and power, and introduce a VAD circuit that uses an NN to classify modulation frequency features with 22.3- $\mu \text{W}$ power consumption. The 65-nm test chip is shown to perform a variety of ASR tasks in real time, with vocabularies ranging from 11 words to 145 000 words and full-chip power consumption ranging from 172 $\mu \text{W}$ to 7.78 mW.read more
Citations
More filters
Proceedings ArticleDOI
Intelligence Beyond the Edge: Inference on Intermittent Embedded Systems
TL;DR: This paper designs and implements SONIC, an intermittence-aware software system with specialized support for DNN inference, and introduces loop continuation, a new technique that dramatically reduces the cost of guaranteeing correct intermittent execution for loop-heavy code likeDNN inference.
Journal ArticleDOI
rVAD: An unsupervised segment-based robust voice activity detection method
TL;DR: A modified version of rVAD is presented where computationally intensive pitch extraction is replaced by computationally efficient spectral flatness calculation, which significantly reduces the computational complexity at the cost of moderately inferior VAD performance, which is an advantage when processing a large amount of data and running on low resource devices.
Proceedings ArticleDOI
LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization
TL;DR: A supervised method to measure the similarity matrix between all segments of an audio recording with sequential bidirectional long short-term memory networks (Bi-LSTM), which significantly outperforms the state-of-the-art methods and achieves a diarization error rate below average.
Posted Content
Intelligence Beyond the Edge: Inference on Intermittent Embedded Systems.
TL;DR: SONIC as mentioned in this paper is an intermittence-aware software system with specialized support for DNN inference, which introduces loop continuation, a new technique that dramatically reduces the cost of guaranteeing correct intermittent execution for loop-heavy code like DNN, and automatically compresses networks to optimally balance inference accuracy and energy.
Journal ArticleDOI
SATURN: A Thin and Flexible Self-powered Microphone Leveraging Triboelectric Nanogenerator
Nivedita Arora,Steven L. Zhang,Fereshteh Shahmiri,Diego Osorio,Yi-Cheng Wang,Mohit Gupta,Zhengjun Wang,Thad Starner,Zhong Lin Wang,Gregory D. Abowd +9 more
TL;DR: The design, fabrication, evaluation, and use of a self-powered microphone that is thin, flexible, and easily manufactured that takes advantage of the triboelectric nanogenerator to transform vibrations into an electric signal without applying an external power source is demonstrated.
References
More filters
Proceedings Article
The design for the wall street journal-based CSR corpus.
Douglas B. Paul,Janet M. Baker +1 more
TL;DR: The WSJ CSR Corpus as mentioned in this paper is the first general-purpose English, large vocabulary, natural language, high perplexity, corpus containing significant quantities of both speech data (400 hrs.) and text data (47M words), thereby providing a means to integrate speech recognition and natural language processing in application domains with high potential practical value.
Journal ArticleDOI
JUPlTER: a telephone-based conversational interface for weather information
Victor W. Zue,Stephanie Seneff,James Glass,Joseph Polifroni,Christine Pao,Timothy J. Hazen,Lee Hetherington +6 more
TL;DR: The purpose of this paper is to describe the development effort of JUPITER in terms of the underlying human language technologies as well as other system-related issues such as utterance rejection and content harvesting.
Proceedings ArticleDOI
A database for speaker-independent digit recognition
TL;DR: A large speech database has been collected for use in designing and evaluating algorithms for speaker independent recognition of connected digit sequences and formal human listening tests on this database provided certification of the labelling of the digit sequences.
Proceedings ArticleDOI
Characterizing flash memory: anomalies, observations, and applications
Laura M. Grupp,Adrian M. Caulfield,Joel Coburn,Steven Swanson,Eitan Yaakobi,Paul H. Siegel,Jack K. Wolf +6 more
TL;DR: This work empirically characterized flash memory technology from five manufacturers by directly measuring the performance, power, and reliability, and demonstrates that performance varies significantly between vendors, devices, and from publicly available datasheets.
Proceedings ArticleDOI
14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks
TL;DR: To achieve state-of-the-art accuracy, CNNs with not only a larger number of layers, but also millions of filters weights, and varying shapes are needed, which results in substantial data movement, which consumes significant energy.