scispace - formally typeset

Book ChapterDOI

Experimental Evaluation of CNN Architecture for Speech Recognition

01 Jan 2020-pp 507-514

TL;DR: A method that uses the CNN on audio samples rather than on the image samples in which the CNN method is usually used to train the model, which was found to have the highest accuracy among the discussed architectures.
Abstract: In recent days, deep learning has been widely used in signal and information processing. Among the deep learning algorithms, Convolution Neural Network (CNN) has been widely used for image recognition and classification because of its architecture, high accuracy and efficiency. This paper proposes a method that uses the CNN on audio samples rather than on the image samples in which the CNN method is usually used to train the model. The one-dimensional audio samples are converted into two-dimensional data that consists of matrix of Mel-Frequency Cepstral Coefficients (MFCCs) that are extracted from the audio samples and the number of windows used in the extraction. This proposed CNN model has been evaluated on the TIDIGITS corpus dataset. The paper analyzes different convolution layer architectures with different number of feature maps in each architecture. The three-layer convolution architecture was found to have the highest accuracy of 97.46% among the other discussed architectures.
Citations
More filters

Posted Content
Adnan Siraj Rakin1, Zhezhi He2, Jingtao Li1, Fan Yao3  +2 moreInstitutions (3)
TL;DR: This paper proposes the first work of targetedBFA based (T-BFA) adversarial weight attack on DNN models, which can intentionally mislead selected inputs to a target output class through a novel class-dependent weight bit ranking algorithm.
Abstract: Traditional Deep Neural Network (DNN) security is mostly related to the well-known adversarial input example attack. Recently, another dimension of adversarial attack, namely, attack on DNN weight parameters, has been shown to be very powerful. As a representative one, the Bit-Flip-based adversarial weight Attack (BFA) injects an extremely small amount of faults into weight parameters to hijack the executing DNN function. Prior works of BFA focus on un-targeted attack that can hack all inputs into a random output class by flipping a very small number of weight bits stored in computer memory. This paper proposes the first work of targeted BFA based (T-BFA) adversarial weight attack on DNNs, which can intentionally mislead selected inputs to a target output class. The objective is achieved by identifying the weight bits that are highly associated with classification of a targeted output through a class-dependent weight bit ranking algorithm. Our proposed T-BFA performance is successfully demonstrated on multiple DNN architectures for image classification tasks. For example, by merely flipping 27 out of 88 million weight bits of ResNet-18, our T-BFA can misclassify all the images from 'Hen' class into 'Goose' class (i.e., 100 % attack success rate) in ImageNet dataset, while maintaining 59.35 % validation accuracy. Moreover, we successfully demonstrate our T-BFA attack in a real computer prototype system running DNN computation, with Ivy Bridge-based Intel i7 CPU and 8GB DDR3 memory.

11 citations


Cites background from "Experimental Evaluation of CNN Arch..."

  • ...In recent years, deep neural networks (DNNs) have achieved tremendous success in a wide variety of applications, including image classification [1, 2], speech recognition [3, 4] and machine translation [5, 6]....

    [...]


Book ChapterDOI
Abstract: Last two decades, neural networks and fuzzy logic have been successfully implemented in intelligent systems. The fuzzy neural network system framework infers the union of fuzzy logic and neural system framework thoughts, which consolidates the advantages of fuzzy logic and neural network system framework. This FNN is applied in many scientific and engineering areas. Wherever there is an uncertainty associated with data fuzzy logic place a vital rule, and the fuzzy set can represent and handle uncertain information effectively. The main objective of the FNN system is to achieve a high level of accuracy by including the fuzzy logic in either neural network structure, activation function, or learning algorithms. In computer vision and intelligent systems such as convolutional neural network has more popular architectures, and their performance is excellent in many applications. In this article, fuzzy-based CNN image classification methods are analyzed, and also interval type-2 fuzzy-based CNN is proposed. From the experiment, it is identified that the proposed method performance is well.

1 citations


Journal ArticleDOI
06 Oct 2021-Energies
Abstract: The role of the Internet of Things (IoT) networks and systems in our daily life cannot be underestimated. IoT is among the fastest evolving innovative technologies that are digitizing and interconnecting many domains. Most life-critical and finance-critical systems are now IoT-based. It is, therefore, paramount that the Quality of Service (QoS) of IoTs is guaranteed. Traditionally, IoTs use heuristic, game theory approaches and optimization techniques for QoS guarantee. However, these methods and approaches have challenges whenever the number of users and devices increases or when multicellular situations are considered. Moreover, IoTs receive and generate huge amounts of data that cannot be effectively handled by the traditional methods for QoS assurance, especially in extracting useful features from this data. Deep Learning (DL) approaches have been suggested as a potential candidate in solving and handling the above-mentioned challenges in order to enhance and guarantee QoS in IoT. In this paper, we provide an extensive review of how DL techniques have been applied to enhance QoS in IoT. From the papers reviewed, we note that QoS in IoT-based systems is breached when the security and privacy of the systems are compromised or when the IoT resources are not properly managed. Therefore, this paper aims at finding out how Deep Learning has been applied to enhance QoS in IoT by preventing security and privacy breaches of the IoT-based systems and ensuring the proper and efficient allocation and management of IoT resources. We identify Deep Learning models and technologies described in state-of-the-art research and review papers and identify those that are most used in handling IoT QoS issues. We provide a detailed explanation of QoS in IoT and an overview of commonly used DL-based algorithms in enhancing QoS. Then, we provide a comprehensive discussion of how various DL techniques have been applied for enhancing QoS. We conclude the paper by highlighting the emerging areas of research around Deep Learning and its applicability in IoT QoS enhancement, future trends, and the associated challenges in the application of Deep Learning for QoS in IoT.

Book ChapterDOI
30 Jul 2021

Proceedings ArticleDOI
28 Jun 2021
Abstract: Given the ever-increasing volume of music created and released every day, it has never been more important to study automatic music tagging. In this paper, we present an ensemble-based convolutional neural network (CNN) model trained using various loss functions for tagging musical genres from audio. We investigate the effect of different loss functions and resampling strategies on prediction performance, finding that using focal loss improves overall performance on the the MTG-Jamendo dataset: an imbalanced, multi-label dataset with over 18,000 songs in the public domain, containing 57 labels. Additionally, we report results from varying the receptive field on our base classifier—a CNN-based architecture trained using Mel spectrograms—which also results in a model performance boost and state-of-the-art performance on the Jamendo dataset. We conclude that the choice of the loss function is paramount for improving on existing methods in music tagging, particularly in the presence of class imbalance.

References
More filters

Proceedings Article
03 Dec 2012
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,871 citations


Journal ArticleDOI
Geoffrey E. Hinton1, Li Deng2, Dong Yu2, George E. Dahl1  +7 moreInstitutions (4)
TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Abstract: Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

7,700 citations


Proceedings ArticleDOI
Yoon Kim1Institutions (1)
25 Aug 2014
TL;DR: The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification, and are proposed to allow for the use of both task-specific and static vectors.
Abstract: We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification.

7,176 citations


Proceedings ArticleDOI
26 May 2013
TL;DR: This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Abstract: Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.

5,938 citations


Posted Content
Abstract: Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates \emph{deep recurrent neural networks}, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.

5,310 citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20217
20203