A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks

doi:10.1109/JSSC.2017.2752838

Journal ArticleDOI

A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks

Michael Price, +2 more

- 01 Jan 2018 -

IEEE Journal of Solid-state Circuits

- Vol. 53, Iss: 1, pp 66-75

Chats0

TLDR

It is argued that VADs should prioritize accuracy over area and power, and it is introduced a VAD circuit that uses an NN to classify modulation frequency features with 22.3-mW power consumption.

Abstract:

This paper describes digital circuit architectures for automatic speech recognition (ASR) and voice activity detection (VAD) with improved accuracy, programmability, and scalability. Our ASR architecture is designed to minimize off-chip memory bandwidth, which is the main driver of system power consumption. A SIMD processor with 32 parallel execution units efficiently evaluates feed-forward deep neural networks (NNs) for ASR, limiting memory usage with a sparse quantized weight matrix format. We argue that VADs should prioritize accuracy over area and power, and introduce a VAD circuit that uses an NN to classify modulation frequency features with 22.3- $\mu \text{W}$ power consumption. The 65-nm test chip is shown to perform a variety of ASR tasks in real time, with vocabularies ranging from 11 words to 145 000 words and full-chip power consumption ranging from 172 $\mu \text{W}$ to 7.78 mW.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Intelligence Beyond the Edge: Inference on Intermittent Embedded Systems

Graham Gobieski, +2 more

TL;DR: This paper designs and implements SONIC, an intermittence-aware software system with specialized support for DNN inference, and introduces loop continuation, a new technique that dramatically reduces the cost of guaranteeing correct intermittent execution for loop-heavy code likeDNN inference.

...read moreread less

Journal ArticleDOI

rVAD: An unsupervised segment-based robust voice activity detection method

Zheng-Hua Tan, +2 more

- 01 Jan 2020 -

Computer Speech & Language

TL;DR: A modified version of rVAD is presented where computationally intensive pitch extraction is replaced by computationally efficient spectral flatness calculation, which significantly reduces the computational complexity at the cost of moderately inferior VAD performance, which is an advantage when processing a large amount of data and running on low resource devices.

...read moreread less

Proceedings ArticleDOI

LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization

Qingjian Lin, +4 more

TL;DR: A supervised method to measure the similarity matrix between all segments of an audio recording with sequential bidirectional long short-term memory networks (Bi-LSTM), which significantly outperforms the state-of-the-art methods and achieves a diarization error rate below average.

...read moreread less

Posted Content

Intelligence Beyond the Edge: Inference on Intermittent Embedded Systems.

Graham Gobieski, +2 more

- 28 Sep 2018 -

arXiv: Distributed, Parallel, and Cluste...

TL;DR: SONIC as mentioned in this paper is an intermittence-aware software system with specialized support for DNN inference, which introduces loop continuation, a new technique that dramatically reduces the cost of guaranteeing correct intermittent execution for loop-heavy code like DNN, and automatically compresses networks to optimally balance inference accuracy and energy.

...read moreread less

Journal ArticleDOI

SATURN: A Thin and Flexible Self-powered Microphone Leveraging Triboelectric Nanogenerator

Nivedita Arora, +9 more

TL;DR: The design, fabrication, evaluation, and use of a self-powered microphone that is thin, flexible, and easily manufactured that takes advantage of the triboelectric nanogenerator to transform vibrations into an electric signal without applying an external power source is demonstrated.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

The design for the wall street journal-based CSR corpus.

Douglas B. Paul, +1 more

TL;DR: The WSJ CSR Corpus as mentioned in this paper is the first general-purpose English, large vocabulary, natural language, high perplexity, corpus containing significant quantities of both speech data (400 hrs.) and text data (47M words), thereby providing a means to integrate speech recognition and natural language processing in application domains with high potential practical value.

...read moreread less

Journal ArticleDOI

JUPlTER: a telephone-based conversational interface for weather information

Victor W. Zue, +6 more

- 01 Jan 2000 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: The purpose of this paper is to describe the development effort of JUPITER in terms of the underlying human language technologies as well as other system-related issues such as utterance rejection and content harvesting.

...read moreread less

Proceedings ArticleDOI

A database for speaker-independent digit recognition

R. Leonard

TL;DR: A large speech database has been collected for use in designing and evaluating algorithms for speaker independent recognition of connected digit sequences and formal human listening tests on this database provided certification of the labelling of the digit sequences.

...read moreread less

Proceedings ArticleDOI

Characterizing flash memory: anomalies, observations, and applications

Laura M. Grupp, +6 more

TL;DR: This work empirically characterized flash memory technology from five manufacturers by directly measuring the performance, power, and reliability, and demonstrates that performance varies significantly between vendors, devices, and from publicly available datasheets.

...read moreread less

Proceedings ArticleDOI

14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks

Yu-Hsin Chen, +3 more

TL;DR: To achieve state-of-the-art accuracy, CNNs with not only a larger number of layers, but also millions of filters weights, and varying shapes are needed, which results in substantial data movement, which consumes significant energy.

...read moreread less