scispace - formally typeset
Search or ask a question
Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.


Papers
More filters
Proceedings ArticleDOI
15 Mar 1999
TL;DR: An overview of current publicly available corpora intended for speaker recognition research and evaluation is presented and the corpora's salient features with respect to their suitability for conducting speaker recognition experiments and evaluations are outlined.
Abstract: Using standard speech corpora for development and evaluation has proven to be very valuable in promoting progress in speech and speaker recognition research. In this paper, we present an overview of current publicly available corpora intended for speaker recognition research and evaluation. We outline the corpora's salient features with respect to their suitability for conducting speaker recognition experiments and evaluations. We hope to increase the awareness and use of these standard corpora and corresponding evaluation procedures throughout the speaker recognition community.

131 citations

Proceedings ArticleDOI
09 Dec 2001
TL;DR: Speech recognition experiments show that it is beneficial in this multispeaker setting to use the output of the speech activity detector for presegmenting the recognizer input, achieving word error rates within 10% of those achieved with manual turn labeling.
Abstract: As part of a project into speech recognition in meeting environments, we have collected a corpus of multichannel meeting recordings. We expected the identification of speaker activity to be straightforward given that the participants had individual microphones, but simple approaches yielded unacceptably erroneous labelings, mainly due to crosstalk between nearby speakers and wide variations in channel characteristics. Therefore, we have developed a more sophisticated approach for multichannel speech activity detection using a simple hidden Markov model (HMM). A baseline HMM speech activity detector has been extended to use mixtures of Gaussians to achieve robustness for different speakers under different conditions. Feature normalization and crosscorrelation processing are used to increase the channel independence and to detect crosstalk. The use of both energy normalization and crosscorrelation based postprocessing results in a 35% relative reduction of the frame error rate. Speech recognition experiments show that it is beneficial in this multispeaker setting to use the output of the speech activity detector for presegmenting the recognizer input, achieving word error rates within 10% of those achieved with manual turn labeling.

131 citations

Journal ArticleDOI
TL;DR: The results of the experiments show that there is only a slight difference between the recognition accuracies for statistical features and dynamic features over the long term, and it is more efficient to use statistical features than dynamic features.
Abstract: This paper describes results of speaker recognition experiments using statistical features and dynamic features of speech spectra extracted from fixed Japanese word utterances. The speech wave is transformed into a set of time functions of log area ratios and a fundamental frequency. In the case of statistical features, a mean value and a standard deviation for each time function and a correlation matrix between these functions are calculated in the voiced portion of each word, and after a feature selection procedure, they are compared with reference features. In the case of dynamic features, the time functions are brought into time registration with reference functions. The results of the experiments show that there is only a slight difference between the recognition accuracies for statistical features and dynamic features over the long term. Since the amount of calculation necessary for recognition using statistical features is only about one-tenth of that for recognition using dynamic features, it is more efficient to use statistical features than dynamic features. When training utterances are recorded over ten months for each customer and spectral equalization is applied, 99.5 percent and 96.3 percent verification accuracies can be obtained for input utterances ten months and five years later, respectively, using statistical features extracted from two words. Combination of dynamic features with statistical features can reduce the error rate to half that obtained with either one alone.

131 citations

Patent
13 Oct 1994
TL;DR: In this article, a system and method of implementing purchasing and other transactions in an integrated multimedia communication system by utilizing an intelligent peripheral of an advanced intelligent telephone network which peripheral includes a voice recognition module for providing control capability based on received voice signals, and a voice verification module which includes a storage database for individualized voice authentication templates.
Abstract: Disclosed is a system and method of implementing purchasing and other transactions in an integrated multimedia communication system by utilizing an intelligent peripheral of an advanced intelligent telephone network which peripheral includes a voice recognition module for providing control capability based on received voice signals, and a voice verification module which includes a storage database for individualized voice authentication templates. The processor also includes a transaction manager module for controlling interaction with external money management devices such as automated teller machines (ATM's). The service control point of the network maintains a separate database for identifying a specific voice identification template on the basis of recognition and identification of an incoming signal as associated with a specific subscriber for a terminal and/or a telephone station. The peripheral also includes an internal data communications system carrying information between the processor, the voice verification module, the transaction monitor and the signaling communications interface.

130 citations

Journal ArticleDOI
TL;DR: Two techniques to separate out the speech signal of the speaker of interest from a mixture of speech signals are presented and can result in significant enhancement of individual speakers in mixed recordings, consistently achieving better performance than that obtained with hard binary masks.
Abstract: The problem of single-channel speaker separation attempts to extract a speech signal uttered by the speaker of interest from a signal containing a mixture of acoustic signals. Most algorithms that deal with this problem are based on masking, wherein unreliable frequency components from the mixed signal spectrogram are suppressed, and the reliable components are inverted to obtain the speech signal from speaker of interest. Most current techniques estimate this mask in a binary fashion, resulting in a hard mask. In this paper, we present two techniques to separate out the speech signal of the speaker of interest from a mixture of speech signals. One technique estimates all the spectral components of the desired speaker. The second technique estimates a soft mask that weights the frequency subbands of the mixed signal. In both cases, the speech signal of the speaker of interest is reconstructed from the complete spectral descriptions obtained. In their native form, these algorithms are computationally expensive. We also present fast factored approximations to the algorithms. Experiments reveal that the proposed algorithms can result in significant enhancement of individual speakers in mixed recordings, consistently achieving better performance than that obtained with hard binary masks.

129 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Signal processing
73.4K papers, 983.5K citations
81% related
Decoding methods
65.7K papers, 900K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023165
2022468
2021283
2020475
2019484
2018420