Topic
Speaker recognition
About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.
Papers published on a yearly basis
Papers
More filters
•
31 Mar 1998
TL;DR: In this article, a speech sample is received and speech recognition is performed on the speech sample to produce recognition results, and the recognition results are evaluated in view of the training data and the identification of the speech elements to which the portions of training data are related.
Abstract: A speech sample is evaluated using a computer. Training data that include samples of speech are received and stored along with identification of speech elements to which portions of the training data are related. A speech sample is received and speech recognition is performed on the speech sample to produce recognition results. Finally, the recognition results are evaluated in view of the training data and the identification of the speech elements to which the portions of the training data are related. The technique may be used to perform tasks such as speech recognition, speaker identification, and language identification.
118 citations
•
28 Apr 2004TL;DR: In this article, a speech feature vector for a voice associated with a source of a text message was determined and compared to speaker models, and a speaker model was selected as a preferred match for the voice based on the comparison.
Abstract: A method of generating speech from textmessages includes determining a speech feature vector for a voice associated with a source of a text message, and comparing the speech feature vector to speaker models. The method also includes selecting one of the speaker models as a preferred match for the voice based on the comparison, and generating speech from the text message based on the selected speaker model.
118 citations
••
26 Apr 2012TL;DR: A fast and accurate automatic voice recognition algorithm using Mel frequency Cepstral Coefficient (MFCC) to extract the features from voice and Vector quantization technique to identify the speaker.
Abstract: This paper presents a fast and accurate automatic voice recognition algorithm. We use Mel frequency Cepstral Coefficient (MFCC) to extract the features from voice and Vector quantization technique to identify the speaker, this technique is usually used in data compression, it allows to model a probability functions by the distribution of different vectors, the results that we achieve were 100% of precision with a database of 10 speakers.
118 citations
•
20 Jan 2014TL;DR: In this paper, an application detecting speaker locations and prompting a user to input rough room boundaries and a desired listener location in the room is used to determine optimum speaker locations/frequency assignations/speaker parameters.
Abstract: In an audio speaker network, setup of speaker location, sound track or channel assignation, and speaker parameters is facilitated by an application detecting speaker locations and prompting a user to input rough room boundaries and a desired listener location in the room. Based on this, optimum speaker locations/frequency assignations/speaker parameters may be determined and output.
117 citations
••
TL;DR: A method based on the vowel onset point (VOP) is proposed for locating the end-points of an utterance and combining the evidence from these features seem to improve the performance of the system significantly.
Abstract: This paper proposes a text-dependent (fixed-text) speaker verification system which uses different types of information for making a decision regarding the identity claim of a speaker. The baseline system uses the dynamic time warping (DTW) technique for matching. Detection of the end-points of an utterance is crucial for the performance of the DTW-based template matching. A method based on the vowel onset point (VOP) is proposed for locating the end-points of an utterance. The proposed method for speaker verification uses the suprasegmental and source features, besides spectral features. The suprasegmental features such as pitch and duration are extracted using the warping path information in the DTW algorithm. Features of the excitation source, extracted using the neural network models, are also used in the text-dependent speaker verification system. Although the suprasegmental and source features individually may not yield good performance, combining the evidence from these features seem to improve the performance of the system significantly. Neural network models are used to combine the evidence from multiple sources of information.
117 citations