scispace - formally typeset
Search or ask a question
Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A seamless neutral/whisper mismatched closed-set speaker recognition system based on an Mel-frequency cepstral coefficient-Gaussian mixture model (MFCC-GMM) framework and an alternative feature extraction algorithm based on linear and exponential frequency scales is applied.
Abstract: Whisper is an alternative speech production mode used by subjects in natural conversation to protect the privacy. Due to the profound differences between whisper and neutral speech in both excitation and vocal tract function, the performance of speaker identification systems trained with neutral speech degrades significantly. In this paper, a seamless neutral/whisper mismatched closed-set speaker recognition system is developed. First, performance characteristics of a neutral trained closed-set speaker ID system based on an Mel-frequency cepstral coefficient-Gaussian mixture model (MFCC-GMM) framework is considered. It is observed that for whisper speaker recognition, performance degradation is concentrated for only a subset of speakers. Next, it is shown that the performance loss for speaker identification in neutral/whisper mismatched conditions is focused on phonemes other than low-energy unvoiced consonants. In order to increase system performance for unvoiced consonants, an alternative feature extraction algorithm based on linear and exponential frequency scales is applied. The acoustic properties of misrecognized and correctly recognized whisper are analyzed in order to develop more effective processing schemes. A two-dimensional feature space is proposed in order to predict on which whispered utterances the system will perform poorly, with evaluations conducted to measure the quality of whispered speech. Finally, a system for seamless neutral/whisper speaker identification is proposed, resulting in an absolute improvement of 8.85%-10.30% for speaker recognition, with the best closed set speaker ID performance of 88.35% obtained for a total of 961 read whisper test utterances, and 83.84% using a total of 495 spontaneous whisper test utterances.

76 citations

Proceedings ArticleDOI
11 May 2010
TL;DR: This work was based on the Hidden Markov Model (HMM), which provides a highly reliable way for recognizing speech, and two modules were developed, namely the isolated words speech recognition and the continuous speech recognition.
Abstract: This paper aims to design and implement English digits speech recognition system using Matlab (GUI). This work was based on the Hidden Markov Model (HMM), which provides a highly reliable way for recognizing speech. The system is able to recognize the speech waveform by translating the speech waveform into a set of feature vectors using Mel Frequency Cepstral Coefficients (MFCC) technique This paper focuses on all English digits from (Zero through Nine), which is based on isolated words structure. Two modules were developed, namely the isolated words speech recognition and the continuous speech recognition. Both modules were tested in both clean and noisy environments and showed a successful recognition rates. In clean environment and isolated words speech recognition module, the multi-speaker mode achieved 99.5% whereas the speaker-independent mode achieved 79.5%. In clean environment and continuous speech recognition module, the multi-speaker mode achieved 72.5% whereas the speaker-independent mode achieved 56.25%. However in noisy environment and isolated words speech recognition module, the multi-speaker mode achieved 88% whereas the speaker-independent mode achieved 67%. In noisy environment and continuous speech recognition module, the multi-speaker mode achieved 82.5% whereas the speaker-independent mode achieved 76.67%. These recognition rates are relatively successful if compared to similar systems.

76 citations

Proceedings ArticleDOI
14 Apr 1991
TL;DR: Experimental results on a 40-speaker database indicate that the modified neural approach significantly outperforms both a standard multilayer perceptron and a vector quantization based system.
Abstract: A speaker recognition system, using a modified form of feedforward neural network based on radial basis functions (RBFs), is presented. Each person to be recognized has his/her own neural model which is trained to recognise spectral feature vectors representative of his/her speech. Experimental results on a 40-speaker database indicate that the modified neural approach significantly outperforms both a standard multilayer perceptron and a vector quantization based system. The best performance for 4 digit test utterances is obtained from an RBF network with 384 RBF nodes in the hidden layer, given an 8% true talker rejection rate for a fixed 1% imposter acceptance rate. Additional advantages include a substantial reduction in training time over an MLP approach, and the ability to readily interpret the resulting model. >

76 citations

Journal ArticleDOI
TL;DR: The anatomical and physiological bases for individual differences in the human voice are reviewed, before discussing how recent methodological progress in voice morphing and voice synthesis has promoted research on current theoretical issues, such as how voices are mentally represented in thehuman brain.
Abstract: While humans use their voice mainly for communicating information about the world, paralinguistic cues in the voice signal convey rich dynamic information about a speaker's arousal and emotional state, and extralinguistic cues reflect more stable speaker characteristics including identity, biological sex and social gender, socioeconomic or regional background, and age. Here we review the anatomical and physiological bases for individual differences in the human voice, before discussing how recent methodological progress in voice morphing and voice synthesis has promoted research on current theoretical issues, such as how voices are mentally represented in the human brain. Special attention is dedicated to the distinction between the recognition of familiar and unfamiliar speakers, in everyday situations or in the forensic context, and on the processes and representational changes that accompany the learning of new voices. We describe how specific impairments and individual differences in voice perception could relate to specific brain correlates. Finally, we consider that voices are produced by speakers who are often visible during communication, and review recent evidence that shows how speaker perception involves dynamic face-voice integration. The representation of para- and extralinguistic vocal information plays a major role in person perception and social communication, could be neuronally encoded in a prototype-referenced manner, and is subject to flexible adaptive recalibration as a result of specific perceptual experience. WIREs Cogn Sci 2014, 5:15-25. doi: 10.1002/wcs.1261 CONFLICT OF INTEREST: The authors have declared no conflicts of interest for this article. For further resources related to this article, please visit the WIREs website.

76 citations

Proceedings ArticleDOI
05 Apr 2003
TL;DR: Unvoiced speech recognition, "Mime Speech Recognition", is proposed, not based on voice signals but electromyography (EMG), which will realize unvoiced communication, which is a new communication style.
Abstract: We propose unvoiced speech recognition, "Mime Speech Recognition" It recognizes speech by observing the muscles associated with speech It is not based on voice signals but electromyography (EMG) It will realize unvoiced communication, which is a new communication style Because voice signals are not used, it can be applied in noisy environments; it also supports people without vocal-cords and aphasics In preliminary experiments, we try to recognize the 5 Japanese vowels EMG signals from the 3 muscles that contribute greatly to the utterance of Japanese vowels are input to a neural network The recognition accuracy is over 90% for the three subjects tested

76 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Signal processing
73.4K papers, 983.5K citations
81% related
Decoding methods
65.7K papers, 900K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023165
2022468
2021283
2020475
2019484
2018420