scispace - formally typeset
Proceedings ArticleDOI

Speaker Identification Using a Hybrid CNN-MFCC Approach

TLDR
A novel architecture is proposed using a convolutional neural network (CNN) and mel frequency cepstral coefficient (MFCC) to identify the speaker in a noisy environment to verify the effectiveness of this hybrid architecture.
Abstract
In this paper, a novel architecture is proposed using a convolutional neural network (CNN) and mel frequency cepstral coefficient (MFCC) to identify the speaker in a noisy environment. This architecture is used in a text-independent setting. The most important task in any text-independent speaker identification is the capability of the system to learn features that are useful for classification. We are using a hybrid feature extraction technique using CNN as a feature extractor combines with MFCC as a single set. For classification, we used a deep neural network which shows very promising results in classifying speakers. We made our dataset containing 60 speakers, each speaker has 4 voice samples. Our best hybrid model achieved an accuracy of 87.5%. To verify the effectiveness of this hybrid architecture, we use parameters such as accuracy and precision.

read more

Citations
More filters
Proceedings ArticleDOI

Speech Emotion Recognition Using Quaternion Convolutional Neural Networks

TL;DR: This paper proposed a quaternion convolutional neural network (QCNN) based speech emotion recognition (SER) model in which Mel-spectrogram features of speech signals are encoded in an RGB quaternions domain.
Journal ArticleDOI

Speaker identification based on Radon transform and CNNs in the presence of different types of interference for Robotic Applications

TL;DR: A new approach to improve the accuracy of speaker identification in the presence of interference for robot control applications with a convolutional neural network (CNN) that achieves a high classification accuracy up to 97.5%, which is more than double the performance reported for some traditional methods that are used for speaker identification.
Journal ArticleDOI

Enhanced Indonesian Ethnic Speaker Recognition using Data Augmentation Deep Neural Network

TL;DR: After seeing the performance of this model, it can be concluded that Data Augmentation Deep Neural Network can improve the speaker's recognition performance using the Indonesian ethnic dataset.
Journal ArticleDOI

An optimum end-to-end text-independent speaker identification system using convolutional neural network

TL;DR: In this article , the authors proposed a new CNN for text-independent speaker identification inspired by the VGG-13 architecture with fewer parameters but an acceptable accuracy, which reduced the time complexity and memory cost of network training by using a short segment of each audio sample.
Journal ArticleDOI

A strong hybrid AdaBoost classification algorithm for speaker recognition

TL;DR: In this article, a hybrid adaptive boosting (AdaBoost) combined with a powerful ML classifier (Random Forest) is proposed to handle multi-class imbalanced speaker data classification.
References
More filters
Proceedings Article

Rectified Linear Units Improve Restricted Boltzmann Machines

TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.
Journal ArticleDOI

Front-End Factor Analysis for Speaker Verification

TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Journal ArticleDOI

Convolutional neural networks for speech recognition

TL;DR: It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.
Proceedings ArticleDOI

A vector quantization approach to speaker recognition

TL;DR: A vector quantization (VQ) codebook was used as an efficient means of characterizing the short-time spectral features of a speaker and was used to recognize the identity of an unknown speaker from his/her unlabelled spoken utterances based on a minimum distance (distortion) classification rule.
Proceedings ArticleDOI

Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis

TL;DR: A novel way of extracting features from short texts, based on the activation values of an inner layer of a deep convolutional neural network, is presented and a parallelizable decision-level data fusion method is presented, which is much faster, though slightly less accurate.
Related Papers (5)