Model Compression Applied to Small-Footprint Keyword Spotting.

doi:10.21437/INTERSPEECH.2016-1393

Proceedings ArticleDOI

Model Compression Applied to Small-Footprint Keyword Spotting.

- pp 1878-1882

TLDR

Two ways to improve deep neural network acoustic models for keyword spotting without increasing CPU usage by using low-rank weight matrices throughout the DNN and knowledge distilled from an ensemble of much larger DNNs used only during training are investigated.

Abstract:

Several consumer speech devices feature voice interfaces that perform on-device keyword spotting to initiate user interactions. Accurate on-device keyword spotting within a tight CPU budget is crucial for such devices. Motivated by this, we investigated two ways to improve deep neural network (DNN) acoustic models for keyword spotting without increasing CPU usage. First, we used low-rank weight matrices throughout the DNN. This allowed us to increase representational power by increasing the number of hidden nodes per layer without changing the total number of multiplications. Second, we used knowledge distilled from an ensemble of much larger DNNs used only during training. We systematically evaluated these two approaches on a massive corpus of far-field utterances. Alone both techniques improve performance and together they combine to give significant reductions in false alarms and misses without increasing CPU or memory usage.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks.

Daniel Povey, +6 more

TL;DR: A factored form of TDNNs (TDNN-F) is introduced which is structurally the same as a TDNN whose layers have been compressed via SVD, but is trained from a random start with one of the two factors of each matrix constrained to be semi-orthogonal.

...read moreread less

Posted Content

Hello Edge: Keyword Spotting on Microcontrollers

Yundong Zhang, +3 more

- 20 Nov 2017 -

arXiv: Sound

TL;DR: It is shown that it is possible to optimize these neural network architectures to fit within the memory and compute constraints of microcontrollers without sacrificing accuracy, and the depthwise separable convolutional neural network (DS-CNN) is explored and compared against other neural network architecture.

...read moreread less

PatentDOI

Convolutional recurrent neural networks for small-footprint keyword spotting

Sercan O. Arik, +7 more

TL;DR: Systems and methods for creating and using Convolutional Recurrent Neural Networks for small-footprint keyword spotting (KWS) systems and a CRNN model embodiment demonstrated high accuracy and robust performance in a wide range of environments are described.

...read moreread less

Proceedings ArticleDOI

Multi-task learning and Weighted Cross-entropy for DNN-based Keyword Spotting

Sankaran Panchapagesan, +6 more

TL;DR: It is shown that the combination of 3 techniques LVCSR-initialization, multi-task training and weighted cross-entropy gives the best results, with significantly lower False Alarm Rate than the LV CSR- initialization technique alone, across a wide range of Miss Rates.

...read moreread less

Proceedings ArticleDOI

Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting.

Ming Sun, +8 more

TL;DR: This paper proposes to apply singular value decomposition (SVD) to further reduce TDNN complexity, and results show that the full-rank TDNN achieves a 19.7% DET AUC reduction compared to a similar-size deep neural network baseline.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Posted Content

Distilling the Knowledge in a Neural Network

Geoffrey E. Hinton, +2 more

- 09 Mar 2015 -

arXiv: Machine Learning

TL;DR: This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.

...read moreread less

Posted Content

ADADELTA: An Adaptive Learning Rate Method

Matthew D. Zeiler

- 22 Dec 2012 -

arXiv: Learning

TL;DR: A novel per-dimension learning rate method for gradient descent called ADADELTA that dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent is presented.

...read moreread less

Proceedings Article

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

Emily Denton, +4 more

TL;DR: In this paper, the authors exploit the redundancy present within the convolutional filters to derive approximations that significantly reduce the required computation, while keeping the accuracy within 1% of the original model.

...read moreread less

Proceedings ArticleDOI

Speeding up Convolutional Neural Networks with Low Rank Expansions

Max Jaderberg, +2 more

TL;DR: Two simple schemes for drastically speeding up convolutional neural networks are presented, achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filters that are rank-1 in the spatial domain.

...read moreread less

Posted Content

Do Deep Nets Really Need to be Deep

Lei Jimmy Ba, +1 more

- 21 Dec 2013 -

arXiv: Learning

TL;DR: This paper showed that shallow feed-forward networks can learn the complex functions previously learned by deep networks and achieve accuracies previously only achievable with deep models, and in some cases the shallow neural nets can learn these deep functions using a total number of parameters similar to the original deep model.

...read moreread less

Model Compression Applied to Small-Footprint Keyword Spotting.

Citations

Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks.

Hello Edge: Keyword Spotting on Microcontrollers

Convolutional recurrent neural networks for small-footprint keyword spotting

Multi-task learning and Weighted Cross-entropy for DNN-based Keyword Spotting

Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting.

References

Distilling the Knowledge in a Neural Network

ADADELTA: An Adaptive Learning Rate Method

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

Speeding up Convolutional Neural Networks with Low Rank Expansions

Do Deep Nets Really Need to be Deep

Related Papers (5)

Small-footprint keyword spotting using deep neural networks

Convolutional Neural Networks for Small-Footprint Keyword Spotting

Deep Residual Learning for Image Recognition

Distilling the Knowledge in a Neural Network

Long short-term memory