scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Evaluation of Modified Deep Neural Network Architecture Performance for Speech Recognition

TL;DR: Four different Deep Neural Network (DNN) architectures are proposed and comparison is made between these four proposed DNN architectures in terms of accuracy and training time and modified triangular architecture gave the highest accuracy as compared to other architectures.
Abstract: Recently, Deep Neural Networks (DNN) has been widely used for pattern recognition and classification applications because of its high accuracy. Here in this paper, we propose four different Deep Neural Network (DNN) architectures and comparison is made between these four proposed DNN architectures in terms of accuracy and training time. The proposed DNN models are evaluated for speech recognition application using TIDIGITS corpus. Mel-Frequency Cepstral Coefficients (MFCC) technique is used to extract feature vectors of speech data. It is observed that modified triangular architecture gave the highest accuracy of 99.31 % as compared to other architectures while the triangular architecture gave the least training time of 49.72 sec. Furthermore, results of proposed DNN architecture is compared with the existing Hidden Markov Model based speech recognition and the proposed DNN provide an increased accuracy of 2.33%.
Citations
More filters
Proceedings ArticleDOI
05 Jun 2019
TL;DR: This paper compares the performance of two existing deep neural network architectures with the proposed architecture, namely the Venturi Architecture in terms of training accuracy, training loss, testing accuracy and testing loss and shows significant accuracy improvement.
Abstract: Facial expressions are one of the key features of a human being and it can be used to speculate the emotional state at a particular moment. This paper employs the Convolutional Neural Network and Deep Neural Network to develop a facial emotion recognition model that categorizes a facial expression into seven different emotions categorized as Afraid, Angry, Disgusted, Happy, Neutral, Sad and Surprised. This paper compares the performance of two existing deep neural network architectures with our proposed architecture, namely the Venturi Architecture in terms of training accuracy, training loss, testing accuracy and testing loss. This paper uses the Karolinska Directed Emotional Faces dataset which is a set of 4900 pictures of human facial expressions. Two layers of feature maps were used to convolute the features from the images, and then it was passed on to the deep neural network with up to 6 hidden layers. The proposed Venturi architecture shows significant accuracy improvement compared to the modified triangular architecture and the rectangular architecture.

28 citations

Journal Article
TL;DR: It is proposed that higher recognition rates can be achieved using MFCC features with DTW, and it is claimed that the results of a recognizer based on the DTW-algorithm template matching are more “intuitive" to humans than theresults of other recognizers.

16 citations

Journal ArticleDOI
TL;DR: A novel Artificial Intelligence therapy for depression analysis is proposed in this paper , where Machine learning based Face Emotion techniques are used to detect depression level in any patient, which can be tested for any age / category of patient, who faces depression due to any kind of problem or different sequences of life.
Abstract: Depression or stress is faced by most of the population throughout the world for multiple reasons and at different stages of life. Due to present busy life cycle, humans get into stress in their daily life, which leads to depression on long term. Stress is faced in education activity, competitive / challenging tasks, work pressure, family consequences, different types of human relation management, health disorders, old age etc. In this paper, a novel Artificial Intelligence therapy for depression analysis is proposed. This research is helpful for Psychologist to conduct counselling for their patients. Machine learning based Face Emotion techniques are used to detect depression level in any patient. This model can be tested for any age / category of patient, who faces depression due to any kind of problem or different sequences of life. To train machine learning algorithm, fer2013 open-source dataset is used. The algorithm was well trained and experiment were conducted on different age people. The results of this proposed algorithm were able to analyze depression more effectively.
References
More filters
Posted Content
TL;DR: In this paper, deep recurrent neural networks (RNNs) are used to combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Abstract: Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates \emph{deep recurrent neural networks}, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.

5,310 citations

Journal ArticleDOI
TL;DR: The purpose of this tutorial paper is to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.
Abstract: The basic theory of Markov chains has been known to mathematicians and engineers for close to 80 years, but it is only in the past decade that it has been applied explicitly to problems in speech processing. One of the major reasons why speech models, based on Markov chains, have not been developed until recently was the lack of a method for optimizing the parameters of the Markov model to match observed signal patterns. Such a method was proposed in the late 1960's and was immediately applied to speech processing in several research institutions. Continued refinements in the theory and implementation of Markov modelling techniques have greatly enhanced the method, leading to a wide range of applications of these models. It is the purpose of this tutorial paper to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.

4,546 citations


"Evaluation of Modified Deep Neural ..." refers methods in this paper

  • ...These results of proposed DNN architectures were compared with the Hidden Markov Model results and it is found that the proposed DNN architecture yield more accurate results of 99.31 % as compared to HMM results of 96.98 %....

    [...]

  • ...The accuracy achieved with our modified DNN architecture and the accuracy achieved by the existing HMM are shown in figure 6....

    [...]

  • ...Recently various algorithms are used for speech recognition such as Hidden Markov Model (HMM) [6], Dynamic Time warping (DTW) [2], Artificial Neural Networks (ANN) [5], Recurrent Neural Networks (RNN) [4] and Deep Neural Network(DNN) [3]....

    [...]

  • ...Finally, analysis has been done between our proposed DNN model output with existing output of HMM model [1] used for the same application....

    [...]

  • ...Total 1500 feature vectors of each sample is extracted using MFCC technique[6]....

    [...]

Proceedings ArticleDOI
04 May 2014
TL;DR: This application requires a keyword spotting system with a small memory footprint, low computational cost, and high precision, and proposes a simple approach based on deep neural networks that achieves 45% relative improvement with respect to a competitive Hidden Markov Model-based system.
Abstract: Our application requires a keyword spotting system with a small memory footprint, low computational cost, and high precision To meet these requirements, we propose a simple approach based on deep neural networks A deep neural network is trained to directly predict the keyword(s) or subword units of the keyword(s) followed by a posterior handling method producing a final confidence score Keyword recognition results achieve 45% relative improvement with respect to a competitive Hidden Markov Model-based system, while performance in the presence of babble noise shows 39% relative improvement

601 citations


"Evaluation of Modified Deep Neural ..." refers methods in this paper

  • ...Recently various algorithms are used for speech recognition such as Hidden Markov Model (HMM) [6], Dynamic Time warping (DTW) [2], Artificial Neural Networks (ANN) [5], Recurrent Neural Networks (RNN) [4] and Deep Neural Network(DNN) [3]....

    [...]

Proceedings ArticleDOI
Tara N. Sainath1, Carolina Parada1
06 Sep 2015
TL;DR: This work explores using Convolutional Neural Networks for a small-footprint keyword spotting task and finds that the CNN architectures offer between a 27-44% relative improvement in false reject rate compared to a DNN, while fitting into the constraints of each application.
Abstract: We explore using Convolutional Neural Networks (CNNs) for a small-footprint keyword spotting (KWS) task. CNNs are attractive for KWS since they have been shown to outperform DNNs with far fewer parameters. We consider two different applications in our work, one where we limit the number of multiplications of the KWS system, and another where we limit the number of parameters. We present new CNN architectures to address the constraints of each applications. We find that the CNN architectures offer between a 27-44% relative improvement in false reject rate compared to a DNN, while fitting into the constraints of each application.

511 citations


"Evaluation of Modified Deep Neural ..." refers background in this paper

  • ...Recently, a variant of DNN architecture which is a Convolution Neural Network (CNN) was explored for speech recognition [9]....

    [...]

Proceedings ArticleDOI
01 Nov 2008
TL;DR: The paper shows the memory efficiency offered by using speech detection for separating the words from silence and the improved system performance achieved by using Dynamic Time Warping while keeping in view the overall design process, supported by experimental results.
Abstract: Speech Recognition is a technology enabling human interaction with machines. The design of a speech recognition system capable of 100% accuracy is far from solved. This paper describes an isolated word, speaker dependent speech recognition system capable of recognizing spoken words at sufficiently high accuracy. The system has been tested and verified on MATLAB as well as the TMS320 C6713 DSK with an overall accuracy exceeding 90%. The paper shows the memory efficiency offered by using speech detection for separating the words from silence and the improved system performance achieved by using Dynamic Time Warping while keeping in view the overall design process, supported by experimental results. In future, speech recognition can serve as a means of data interoperability and distribution by allowing a mobile user (client) to retrieve information from the data networks (GPRS, WEB) using a client server architecture. The satellite system can be used as a wireless medium for accessing the data network.

62 citations