Proceedings ArticleDOI
Model Compression Applied to Small-Footprint Keyword Spotting.
George Tucker,Minhua Wu,Ming Sun,Sankaran Panchapagesan,Geng-Shen Fu,Shiv Naga Prasad Vitaladevuni +5 more
- pp 1878-1882
TLDR
Two ways to improve deep neural network acoustic models for keyword spotting without increasing CPU usage by using low-rank weight matrices throughout the DNN and knowledge distilled from an ensemble of much larger DNNs used only during training are investigated.Abstract:
Several consumer speech devices feature voice interfaces that perform on-device keyword spotting to initiate user interactions. Accurate on-device keyword spotting within a tight CPU budget is crucial for such devices. Motivated by this, we investigated two ways to improve deep neural network (DNN) acoustic models for keyword spotting without increasing CPU usage. First, we used low-rank weight matrices throughout the DNN. This allowed us to increase representational power by increasing the number of hidden nodes per layer without changing the total number of multiplications. Second, we used knowledge distilled from an ensemble of much larger DNNs used only during training. We systematically evaluated these two approaches on a massive corpus of far-field utterances. Alone both techniques improve performance and together they combine to give significant reductions in false alarms and misses without increasing CPU or memory usage.read more
Citations
More filters
Proceedings ArticleDOI
Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks.
TL;DR: A factored form of TDNNs (TDNN-F) is introduced which is structurally the same as a TDNN whose layers have been compressed via SVD, but is trained from a random start with one of the two factors of each matrix constrained to be semi-orthogonal.
Posted Content
Hello Edge: Keyword Spotting on Microcontrollers
TL;DR: It is shown that it is possible to optimize these neural network architectures to fit within the memory and compute constraints of microcontrollers without sacrificing accuracy, and the depthwise separable convolutional neural network (DS-CNN) is explored and compared against other neural network architecture.
PatentDOI
Convolutional recurrent neural networks for small-footprint keyword spotting
Sercan O. Arik,Markus Kliegl,Rewon Child,Joel Hestness,Andrew Gibiansky,Christopher Fougner,Ryan Prenger,Adam Coates +7 more
TL;DR: Systems and methods for creating and using Convolutional Recurrent Neural Networks for small-footprint keyword spotting (KWS) systems and a CRNN model embodiment demonstrated high accuracy and robust performance in a wide range of environments are described.
Proceedings ArticleDOI
Multi-task learning and Weighted Cross-entropy for DNN-based Keyword Spotting
Sankaran Panchapagesan,Ming Sun,Aparna Khare,Spyros Matsoukas,Arindam Mandal,Bjorn Hoffmeister,Shiv Naga Prasad Vitaladevuni +6 more
TL;DR: It is shown that the combination of 3 techniques LVCSR-initialization, multi-task training and weighted cross-entropy gives the best results, with significantly lower False Alarm Rate than the LV CSR- initialization technique alone, across a wide range of Miss Rates.
Proceedings ArticleDOI
Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting.
Ming Sun,David Snyder,Yixin Gao,Varun K. Nagaraja,Mike Rodehorst,Sankaran Panchapagesan,Nikko Strom,Spyros Matsoukas,Shiv Naga Prasad Vitaladevuni +8 more
TL;DR: This paper proposes to apply singular value decomposition (SVD) to further reduce TDNN complexity, and results show that the full-rank TDNN achieves a 19.7% DET AUC reduction compared to a similar-size deep neural network baseline.
References
More filters
Posted Content
Distilling the Knowledge in a Neural Network
TL;DR: This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
Posted Content
ADADELTA: An Adaptive Learning Rate Method
TL;DR: A novel per-dimension learning rate method for gradient descent called ADADELTA that dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent is presented.
Proceedings Article
Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
TL;DR: In this paper, the authors exploit the redundancy present within the convolutional filters to derive approximations that significantly reduce the required computation, while keeping the accuracy within 1% of the original model.
Proceedings ArticleDOI
Speeding up Convolutional Neural Networks with Low Rank Expansions
TL;DR: Two simple schemes for drastically speeding up convolutional neural networks are presented, achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filters that are rank-1 in the spatial domain.
Posted Content
Do Deep Nets Really Need to be Deep
Lei Jimmy Ba,Rich Caruana +1 more
TL;DR: This paper showed that shallow feed-forward networks can learn the complex functions previously learned by deep networks and achieve accuracies previously only achievable with deep models, and in some cases the shallow neural nets can learn these deep functions using a total number of parameters similar to the original deep model.
Related Papers (5)
Convolutional Neural Networks for Small-Footprint Keyword Spotting
Tara N. Sainath,Carolina Parada +1 more