Deep Maxout Networks Applied to Noise-Robust Speech Recognition
read more
Citations
A review on machine learning principles for multi-view biological data integration.
Attacks and defenses in user authentication systems: A survey
New Artificial Intelligence approaches for future UAV Ground Control Stations
Deep residual networks for pre-classification based Indian language identification
An Analysis of Deep Neural Networks in Broad Phonetic Classes for Noisy Speech Recognition
References
Bagging predictors
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
Improving neural networks by preventing co-adaptation of feature detectors
The Kaldi Speech Recognition Toolkit
Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising 1 criterion
Related Papers (5)
Frequently Asked Questions (14)
Q2. What have the authors stated for future works in "Deep maxout networks applied to noise-robust speech recognition" ?
Further lines of research include testing the DMN in a more complete datasets.
Q3. What is the important problem to overcome in DNN training?
To obtain state emission likelihoods p(ot|s), the Bayes rule is used as follows:p(ot|s) = p(s|ot) · p(ot) p(s)(3)where p(s|ot) is the posterior probability estimated by the DNN, p(ot) is a scaling factor constant for each observation and can be ignored, and p(s) is the class prior which can be estimated by counting the occurrences of each state on the training data.
Q4. What is the effect of dropout on the DMN?
Note that DMNs fairly reduce the number of parameters over DNNs, as the weight matrix W(l) of each layer in the DMN is 1/g of the size of its equivalent DNN weight matrix.
Q5. What is the main reason why the authors have used DMNs in noisy environments?
The authors hypothesize that DMNs can improve the recognition rates in noisy conditions given that they are capable to model the speech variability from limited data more effectively [14].
Q6. What are the main reasons why DNNs are still far away from humans?
Machine performance in Automatic Speech Recognition (ASR) tasks is still far away from that of humans, and noisy conditions only compound the problem.
Q7. What is the way to train a DNN in noisy conditions?
Training a DNN using the well-known error back-propagation (BP) algorithm with a random initialization of its weight matrices may not provide a good performance as it may become stuck in a local minimum.
Q8. What were the input features of the DNNs?
In all of the cases, the input features were 12th-order MFCCs plus a log-energy coefficient, and their corresponding first and second order derivatives yielding a 39 component feature vector.
Q9. How many layers are used for the epoch time?
The authors computed the average epoch time over all the iterations for 5 hidden layers networks with 1024 nodes per layer for the DNNs and 400 maxout units per layer and group size g = 3 for the DMN.
Q10. What are the advantages of DNN-HMM hybrid systems?
DNN-HMM hybrid systems combine several features that make them superior to previous Artificial Neural Network (ANN)-HMM hybrid systems [11]: a) DNNs have a larger number of hidden layers leading to systems with many more parameters than the later.
Q11. What is the definition of a dropout DNN?
Dropout DNN can be seen as an ensemble of DNNs, given that on each presentation of a training example, a different sub-model is trained and the sub-models predictions are averaged together.
Q12. How many noises were tested on the development set?
HDF and group size were validated on the development set as can be seen on Figure 2 considering 5 hidden layer networks, yielding an optimal dropout factor of 0.1 for dropout DNNs, 0.2 for DMNs and a group size of g = 3.
Q13. What is the output of the hidden node i of the layer l?
The output of the hidden node i of the layer l + 1 can be computed as follows:h (l+1) i = max j∈1,...,g z (l+1) ij , 1 ≤ l ≤ L (7)where z (l+1) ij are the lineal pre-activation values from the l layer:z(l+1) = W(l)h(l) + b(l) (8)As can be observed the max-pooling operation is applied over the z(l+1) vector.
Q14. What is the main difference between a dropout and a normal DMN?
Another interpretation of the behaviour of dropout is that in the training state it adds random noise to the training set resulting in a network that is very robust to variabilities in the inputs (in their particular case, due to the addition of noise).