scispace - formally typeset
Journal ArticleDOI

A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks

Reads0
Chats0
TLDR
It is argued that VADs should prioritize accuracy over area and power, and it is introduced a VAD circuit that uses an NN to classify modulation frequency features with 22.3-mW power consumption.
Abstract
This paper describes digital circuit architectures for automatic speech recognition (ASR) and voice activity detection (VAD) with improved accuracy, programmability, and scalability. Our ASR architecture is designed to minimize off-chip memory bandwidth, which is the main driver of system power consumption. A SIMD processor with 32 parallel execution units efficiently evaluates feed-forward deep neural networks (NNs) for ASR, limiting memory usage with a sparse quantized weight matrix format. We argue that VADs should prioritize accuracy over area and power, and introduce a VAD circuit that uses an NN to classify modulation frequency features with 22.3- $\mu \text{W}$ power consumption. The 65-nm test chip is shown to perform a variety of ASR tasks in real time, with vocabularies ranging from 11 words to 145 000 words and full-chip power consumption ranging from 172 $\mu \text{W}$ to 7.78 mW.

read more

Citations
More filters
Proceedings ArticleDOI

Intelligence Beyond the Edge: Inference on Intermittent Embedded Systems

TL;DR: This paper designs and implements SONIC, an intermittence-aware software system with specialized support for DNN inference, and introduces loop continuation, a new technique that dramatically reduces the cost of guaranteeing correct intermittent execution for loop-heavy code likeDNN inference.
Journal ArticleDOI

rVAD: An unsupervised segment-based robust voice activity detection method

TL;DR: A modified version of rVAD is presented where computationally intensive pitch extraction is replaced by computationally efficient spectral flatness calculation, which significantly reduces the computational complexity at the cost of moderately inferior VAD performance, which is an advantage when processing a large amount of data and running on low resource devices.
Proceedings ArticleDOI

LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization

TL;DR: A supervised method to measure the similarity matrix between all segments of an audio recording with sequential bidirectional long short-term memory networks (Bi-LSTM), which significantly outperforms the state-of-the-art methods and achieves a diarization error rate below average.
Posted Content

Intelligence Beyond the Edge: Inference on Intermittent Embedded Systems.

TL;DR: SONIC as mentioned in this paper is an intermittence-aware software system with specialized support for DNN inference, which introduces loop continuation, a new technique that dramatically reduces the cost of guaranteeing correct intermittent execution for loop-heavy code like DNN, and automatically compresses networks to optimally balance inference accuracy and energy.
Journal ArticleDOI

SATURN: A Thin and Flexible Self-powered Microphone Leveraging Triboelectric Nanogenerator

TL;DR: The design, fabrication, evaluation, and use of a self-powered microphone that is thin, flexible, and easily manufactured that takes advantage of the triboelectric nanogenerator to transform vibrations into an electric signal without applying an external power source is demonstrated.
References
More filters
Proceedings Article

The design for the wall street journal-based CSR corpus.

TL;DR: The WSJ CSR Corpus as mentioned in this paper is the first general-purpose English, large vocabulary, natural language, high perplexity, corpus containing significant quantities of both speech data (400 hrs.) and text data (47M words), thereby providing a means to integrate speech recognition and natural language processing in application domains with high potential practical value.
Journal ArticleDOI

JUPlTER: a telephone-based conversational interface for weather information

TL;DR: The purpose of this paper is to describe the development effort of JUPITER in terms of the underlying human language technologies as well as other system-related issues such as utterance rejection and content harvesting.
Proceedings ArticleDOI

A database for speaker-independent digit recognition

TL;DR: A large speech database has been collected for use in designing and evaluating algorithms for speaker independent recognition of connected digit sequences and formal human listening tests on this database provided certification of the labelling of the digit sequences.
Proceedings ArticleDOI

Characterizing flash memory: anomalies, observations, and applications

TL;DR: This work empirically characterized flash memory technology from five manufacturers by directly measuring the performance, power, and reliability, and demonstrates that performance varies significantly between vendors, devices, and from publicly available datasheets.
Proceedings ArticleDOI

14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks

TL;DR: To achieve state-of-the-art accuracy, CNNs with not only a larger number of layers, but also millions of filters weights, and varying shapes are needed, which results in substantial data movement, which consumes significant energy.
Related Papers (5)