scispace - formally typeset
Search or ask a question
Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.


Papers
More filters
Patent
21 Feb 1997
TL;DR: In this article, a speech model is produced for use in determining whether a speaker associated with the speech model produced an unidentified speech sample, without using an external mechanism to monitor the accuracy with which the contents were identified.
Abstract: A speech model is produced for use in determining whether a speaker associated with the speech model produced an unidentified speech sample. First a sample of speech of a particular speaker is obtained. Next, the contents of the sample of speech are identified using speech recognition. Finally, a speech model associated with the particular speaker is produced using the sample of speech and the identified contents thereof. The speech model is produced without using an external mechanism to monitor the accuracy with which the contents were identified.

89 citations

01 Jan 2009
TL;DR: This thesis attempts to overcome difficulties in the area of speaker verification by proposing to combine support vector machines with two generative approaches based on Gaussian mixture models and presents a new approach to modeling the speaker's long-term prosodic and spectral characteristics.
Abstract: The speaker verification problem can be stated as follows: given two speech recordings, determine whether or not they have been uttered by the same speaker. Most current speaker verification systems are based on Gaussian mixture models. This probabilistic representation allows to adequately model the complex distribution of the speech frames. It however represents an inadequate basis for discriminating between speakers, which is the key issue in the area of speaker verification. In the first part of this thesis, we attempt to overcome these difficulties by proposing to combine support vector machines with two generative approaches based on Gaussian mixture models. In the second part of this thesis, we present a new approach to modeling the speaker's long-term prosodic and spectral characteristics. This novel approach is based on continuous approximations of the prosodic and cepstral contours. Finally, we perform a scores fusion between systems based on long- and short-term speaker features.

89 citations

Journal ArticleDOI
TL;DR: It is shown experimentally that increasing the inter-speaker variability in the UBM data while maintaining the overall total data size constant gradually improves system performance, dispels the myth of "There's no data like more data” for the purpose of UBM construction.
Abstract: State-of-the-art Gaussian mixture model (GMM)-based speaker recognition/verification systems utilize a universal background model (UBM), which typically requires extensive resources, especially if multiple channel and microphone categories are considered. In this study, a systematic analysis of speaker verification system performance is considered for which the UBM data is selected and purposefully altered in different ways, including variation in the amount of data, sub-sampling structure of the feature frames, and variation in the number of speakers. An objective measure is formulated from the UBM covariance matrix which is found to be highly correlated with system performance when the data amount was varied while keeping the UBM data set constant, and increasing the number of UBM speakers while keeping the data amount constant. The advantages of feature sub-sampling for improving UBM training speed is also discussed, and a novel and effective phonetic distance-based frame selection method is developed. The sub-sampling methods presented are shown to retain baseline equal error rate (EER) system performance using only 1% of the original UBM data, resulting in a drastic reduction in UBM training computation time. This, in theory, dispels the myth of “There's no data like more data” for the purpose of UBM construction. With respect to the UBM speakers, the effect of systematically controlling the number of training (UBM) speakers versus overall system performance is analyzed. It is shown experimentally that increasing the inter-speaker variability in the UBM data while maintaining the overall total data size constant gradually improves system performance. Finally, two alternative speaker selection methods based on different speaker diversity measures are presented. Using the proposed schemes, it is shown that by selecting a diverse set of UBM speakers, the baseline system performance can be retained using less than 30% of the original UBM speakers.

88 citations

Proceedings ArticleDOI
26 May 2013
TL;DR: A new countermeasure is presented for the protection of automatic speaker verification systems from spoofed, converted voice signals that exploits the common shift applied to the spectral slope of consecutive speech frames involved in the mapping of a spoofer's voice signal towards a statistical model of a given target.
Abstract: This paper presents a new countermeasure for the protection of automatic speaker verification systems from spoofed, converted voice signals. The new countermeasure exploits the common shift applied to the spectral slope of consecutive speech frames involved in the mapping of a spoofer's voice signal towards a statistical model of a given target. While the countermeasure exploits prior knowledge of the attack in an admittedly unrealistic sense, it is shown to detect almost all spoofed signals which otherwise provoke significant increases in false acceptance. The work also discusses the need for formal evaluations to develop new countermeasures which are less reliant on prior knowledge.

88 citations

Proceedings ArticleDOI
14 Apr 1991
TL;DR: A speaker verification system using connected word verification phrases has been implemented and studied and the system has been evaluated on a 20-speaker telephone database of connected digital utterances.
Abstract: A speaker verification system using connected word verification phrases has been implemented and studied. Verification utterances are represented as concatenated speaker-dependent whole-word hidden Markov models (HMMs). Verification phrases are specified as strings of words drawn from a small fixed vocabulary, such as the digits. Phrases can either be individualized or randomized for greater security. Training techniques to create speaker-dependent models for verification are used in which initial word models are created by bootstrapping from existing speaker-independent models. The system has been evaluated on a 20-speaker telephone database of connected digital utterances. Using approximately 66 s of connected digit training utterances per speaker, the verification equal-error rate is approximately 3.5% for 1.1 s test utterances and 0.3% for 4.4 s test utterances. In comparison, the performance of a template-based system using the same amount of training data is 6.7% and 1.5%, respectively. >

88 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Signal processing
73.4K papers, 983.5K citations
81% related
Decoding methods
65.7K papers, 900K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023165
2022468
2021283
2020475
2019484
2018420