A text-independent speaker verification model: A comparative analysis

doi:10.1109/I2C2.2017.8321794

Home
/
Papers
/
A text-independent speaker verification model: A comparative analysis

Proceedings Article•DOI•

A text-independent speaker verification model: A comparative analysis

Rishi Charan¹, A. Manisha¹, R. Karthik¹, M Rajesh Kumar¹•Institutions (1)

VIT University¹

01 Jun 2017-

TL;DR: In this paper, the authors explore various methods available in each block in the process of speaker recognition with the objective to identify best of techniques that could be used to get precise results.

read less

Abstract: The most pressing challenge in the field of voice biometrics is selecting the most efficient technique of speaker recognition. Every individual's voice is peculiar, factors like physical differences in vocal organs, accent and pronunciation contributes to the problem's complexity. In this paper, we explore the various methods available in each block in the process of speaker recognition with the objective to identify best of techniques that could be used to get precise results. We study the results on text independent corpora. We use MFCC (Mel-frequency cepstral coefficient), LPCC (linear predictive cepstral coefficient) and PLP (perceptual linear prediction) algorithms for feature extraction, PCA (Principal Component Analysis) and t-SNE for dimensionality reduction and SVM (Support Vector Machine), feed forward, nearest neighbor and decision tree algorithms for classification block in speaker recognition system and comparatively analyze each block to determine the best technique.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Multimodal Biometrics for Enhanced IoT Security

[...]

Oscar Olazabal¹, Mikhail Gofman¹, Yu Bai¹, Yoonsuk Choi¹, Noel Sandico¹, Sinjini Mitra¹, Kevin Pham¹ - Show less +3 more•Institutions (1)

California State University, Fullerton¹

01 Jan 2019

TL;DR: This work used discriminant correlation analysis (DCA) to fuse features from face and voice and used the K-nearest neighbors (KNN) algorithm to classify the features and showed that fusion increased recognition accuracy by 52.45% compared to using face alone and 81.62% when using voice alone.

...read moreread less

Abstract: Biometric authentication is a promising approach to securing the Internet of Things (IoT). Although existing research shows that using multiple biometrics for authentication helps increase recognition accuracy, the majority of biometric approaches for IoT today continue to rely on a single modality. We propose a multimodal biometric approach for IoT based on face and voice modalities that is designed to scale to the limited resources of an IoT device. Our work builds on the foundation of Gofman et al. [7] in implementing face and voice feature-level fusion on mobile devices. We used discriminant correlation analysis (DCA) to fuse features from face and voice and used the K-nearest neighbors (KNN) algorithm to classify the features. The approach was implemented on the Raspberry Pi IoT device and was evaluated on a dataset of face images and voice files acquired using a Samsung Galaxy S5 device in real-world conditions such as dark rooms and noisy settings. The results show that fusion increased recognition accuracy by 52.45% compared to using face alone and 81.62% compared to using voice alone. It took an average of 1.34 seconds to enroll a user and 0.91 seconds to perform the authentication. To further optimize execution speed and reduce power consumption, we implemented classification on a field-programmable gate array (FPGA) chip that can be easily integrated into an IoT device. Experimental results showed that the proposed FPGA-accelerated KNN could achieve 150x faster execution time and 12x lower energy consumption compared to a CPU.

...read moreread less

23 citations

Journal Article•DOI•

Survey of biometric pattern recognition via machine learning techniques

[...]

Nicolas Ortiz, Ruben D. Hernández, Robinson Jimenez, Mauricio Mauledeoux, Oscar F. Avilés - Show less +1 more

01 Jan 2018-Contemporary engineering sciences

TL;DR: This paper presents an overview of some Machine Learning approaches for biometric pattern recognition and some of these approaches are presented are good for single or several individuals identification.

...read moreread less

Abstract: Biometrics, as a computer science field, can be understood as the discipline that study how to generate computer models of the physical (e.g. hand geometry, fingerprints, iris and so on) and behavioral (e.g. signature; a kind of behavior pattern) characteristics of the human being for single or several individuals identification. Usually, these characteristics are used to provide authentication information for security systems. However, some of these characteristics are hard to obtain in a properly way and it is necessary to use several algorithms both to process them and to use them on a security systems. In this sense, in this paper it is presented an overview of some Machine Learning approaches for biometric pattern recognition.

...read moreread less

17 citations

Journal Article•DOI•

Finding Meanings in Low Dimensional Structures: Stochastic Neighbor Embedding Applied to the Analysis of Indri indri Vocal Repertoire.

[...]

Daria Valente¹, Chiara De Gregorio¹, Valeria Torti¹, Longondraza Miaretsoa¹, Olivier Friard¹, Rose Marie Randrianarison, Cristina Giacoma¹, Marco Gamba¹ - Show less +4 more•Institutions (1)

University of Turin¹

15 May 2019-Open Access Journal

TL;DR: The results indicated that the t-distributed stochastic neighbor embedding (t-SNE), successfully been employed in several studies, showed a good performance also in the analysis of indris’ repertoire and may open new perspectives towards the achievement of shared methodical techniques for the comparison of animal vocal repertoires.

...read moreread less

Abstract: Although there is a growing number of researches focusing on acoustic communication, the lack of shared analytic approaches leads to inconsistency among studies. Here, we introduced a computational method used to examine 3360 calls recorded from wild indris (Indri indri) from 2005–2018. We split each sound into ten portions of equal length and, from each portion we extracted spectral coefficients, considering frequency values up to 15,000 Hz. We submitted the set of acoustic features first to a t-distributed stochastic neighbor embedding algorithm, then to a hard-clustering procedure using a k-means algorithm. The t-distributed stochastic neighbor embedding (t-SNE) mapping indicated the presence of eight different groups, consistent with the acoustic structure of the a priori identification of calls, while the cluster analysis revealed that an overlay between distinct call types might exist. Our results indicated that the t-distributed stochastic neighbor embedding (t-SNE), successfully been employed in several studies, showed a good performance also in the analysis of indris’ repertoire and may open new perspectives towards the achievement of shared methodical techniques for the comparison of animal vocal repertoires.

...read moreread less

12 citations

Journal Article•DOI•

Speeding up training of automated bird recognizers by data reduction of audio features.

[...]

Allan Gonçalves de Oliveira¹, Thiago Meirelles Ventura¹, Todor Ganchev¹, Todor Ganchev², Lucas N.S. Silva¹, Marinêz Isaac Marques, Karl-L. Schuchmann - Show less +3 more•Institutions (2)

Universidade Federal de Mato Grosso¹, Technical University of Varna²

27 Jan 2020-PeerJ

TL;DR: The experimental results obtained using Mel-frequency cepstral coefficients (MFCCs) and hidden Markov models (HMMs) support the finding that a reduction in training data by a factor of 10 does not significantly affect the recognition performance.

...read moreread less

Abstract: Automated acoustic recognition of birds is considered an important technology in support of biodiversity monitoring and biodiversity conservation activities. These activities require processing large amounts of soundscape recordings. Typically, recordings are transformed to a number of acoustic features, and a machine learning method is used to build models and recognize the sound events of interest. The main problem is the scalability of data processing, either for developing models or for processing recordings made over long time periods. In those cases, the processing time and resources required might become prohibitive for the average user. To address this problem, we evaluated the applicability of three data reduction methods. These methods were applied to a series of acoustic feature vectors as an additional postprocessing step, which aims to reduce the computational demand during training. The experimental results obtained using Mel-frequency cepstral coefficients (MFCCs) and hidden Markov models (HMMs) support the finding that a reduction in training data by a factor of 10 does not significantly affect the recognition performance.

...read moreread less

5 citations

Proceedings Article•DOI•

Intelligent Instruction-Based IoT Framework for Smart Home Applications using Speech Recognition

[...]

Yao Ge¹, Shuja Ansari¹, Amir M. Abdulghani¹, Muhammad Imran¹, Qammer H. Abbasi¹ - Show less +1 more•Institutions (1)

University of Glasgow¹

10 Sep 2020

TL;DR: Intelligent speech recognition system is designed and presented here in detail as part of this work to achieve a novel futuristic smart home system design framework with intelligent instruction-based operation mechanism.

...read moreread less

Abstract: Design of a smart home using Internet of Things (IoT) and Machine Learning technology has been presented in this paper. This design is primarily based on LoRaWAN protocol and the main objective of this work was to establish an IoT network that is based on integration of sensors, gateway, network server and data visualization system. More importantly, intelligent speech recognition system is designed and presented here in detail as part of this work to achieve a novel futuristic smart home system design framework with intelligent instruction-based operation mechanism. In the case of low noise, the success rate of speaker recognition is above 90% based on THCHS-30 dataset.

...read moreread less

5 citations

References

PDF

Open Access

More filters

Journal Article•DOI•

Perceptual linear predictive (PLP) analysis of speech

[...]

Hynek Hermansky¹•Institutions (1)

Panasonic¹

01 Apr 1990-Journal of the Acoustical Society of America

TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.

...read moreread less

Abstract: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, is presented and examined. This technique uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum: (1) the critical-band spectral resolution, (2) the equal-loudness curve, and (3) the intensity-loudness power law. The auditory spectrum is then approximated by an autoregressive all-pole model. A 5th-order all-pole model is effective in suppressing speaker-dependent details of the auditory spectrum. In comparison with conventional linear predictive (LP) analysis, PLP analysis is more consistent with human hearing. The effective second formant F2' and the 3.5-Bark spectral-peak integration theories of vowel perception are well accounted for. PLP analysis is computationally efficient and yields a low-dimensional representation of speech. These properties are found to be useful in speaker-independent automatic-speech recognition.

...read moreread less

2,969 citations

Book•

Biometrics: Personal Identification in Networked Society

[...]

Anil K. Jain, Ruud M. Bolle, Sharath Pankanti

02 Apr 2013

TL;DR: This book covers the general principles and ideas of designing biometric-based systems and their underlying tradeoffs, and the exploration of some of the numerous privacy and security implications of biometrics.

...read moreread less

Abstract: Biometrics: Personal Identification in Networked Society is a comprehensive and accessible source of state-of-the-art information on all existing and emerging biometrics: the science of automatically identifying individuals based on their physiological or behavior characteristics. In particular, the book covers: *General principles and ideas of designing biometric-based systems and their underlying tradeoffs *Identification of important issues in the evaluation of biometrics-based systems *Integration of biometric cues, and the integration of biometrics with other existing technologies *Assessment of the capabilities and limitations of different biometrics *The comprehensive examination of biometric methods in commercial use and in research development *Exploration of some of the numerous privacy and security implications of biometrics. Also included are chapters on face and eye identification, speaker recognition, networking, and other timely technology-related issues. All chapters are written by leading internationally recognized experts from academia and industry. Biometrics: Personal Identification in Networked Society is an invaluable work for scientists, engineers, application developers, systems integrators, and others working in biometrics.

...read moreread less

1,845 citations

Posted Content•

Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques

[...]

Lindasalwa Muda, Mumtaj Begam, Irraivan Elamvazuthi

22 Mar 2010-arXiv: Multimedia

TL;DR: This paper presents the viability of MFCC to extract features and DTW to compare the test patterns and explains why the alignment is important to produce the better performance.

...read moreread less

Abstract: — Digital processing of speech signal and voice recognition algorithm is very important for fast and accurate automatic voice recognition technology The voice is a signal of infinite information A direct analysis and synthesizing the complex voice signal is due to too much information contained in the signal Therefore the digital signal processes such as Feature Extraction and Feature Matching are introduced to represent the voice signal Several methods such as Liner Predictive Predictive Coding (LPC), Hidden Markov Model (HMM), Artificial Neural Network (ANN) and etc are evaluated with a view to identify a straight forward and effective method for voice signal The extraction and matching process is implemented right after the Pre Processing or filtering signal is performed The non-parametric method for modelling the human auditory perception system, Mel Frequency Cepstral Coefficients (MFCCs) are utilize as extraction techniques The non linear sequence alignment known as Dynamic Time Warping (DTW) introduced by Sakoe Chiba has been used as features matching techniques Since it’s obvious that the voice signal tends to have different temporal rate, the alignment is important to produce the better performanceThis paper present the viability of MFCC to extract features and DTW to compare the test patterns

...read moreread less

846 citations

Journal Article•

The Journal of the Acoustical Society of America

[...]

Léon Auger

01 Jan 1949-Revue D'histoire Des Sciences

631 citations

Posted Content•

Techniques for Feature Extraction In Speech Recognition System : A Comparative Study

[...]

Urmila Shrawankar, Vilas M. Thakare

06 May 2013-arXiv: Sound

TL;DR: Some of the most used methods for reducing the information of each segment in the audio signal into a relatively small number of parameters, or features are presented.

...read moreread less

Abstract: The time domain waveform of a speech signal carries all of the auditory information. From the phonological point of view, it little can be said on the basis of the waveform itself. However, past research in mathematics, acoustics, and speech technology have provided many methods for converting data that can be considered as information if interpreted correctly. In order to find some statistically relevant information from incoming data, it is important to have mechanisms for reducing the information of each segment in the audio signal into a relatively small number of parameters, or features. These features should describe each segment in such a characteristic way that other similar segments can be grouped together by comparing their features. There are enormous interesting and exceptional ways to describe the speech signal in terms of parameters. Though, they all have their strengths and weaknesses, we have presented some of the most used methods with their importance.

...read moreread less

103 citations