Topic
Speaker recognition
About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: This paper extracts information about the cell phones from their speech records by using mel-frequency cepstrum coefficients and identify their brands and models using vector quantization and support vector machine classifiers.
Abstract: Speech signals convey various pieces of information such as the identity of its speaker, the language spoken, and the linguistic information about the text being spoken, etc. In this paper, we extract information about the cell phones from their speech records by using mel-frequency cepstrum coefficients and identify their brands and models. Closed-set identification rates of 92.56% and 96.42% have been obtained on a set of 14 different cell phones in the experiments using vector quantization and support vector machine classifiers, respectively.
70 citations
••
TL;DR: Perception by normal-hearing subjects of gender and identity of a talker as a function of the number of channels in spectrally reduced speech was examined and results showed that gender and talker identification was better for the sine-wave processor, and that performance through the noise-band processor was more sensitive to thenumber of channels.
Abstract: Considerable research on speech intelligibility for cochlear-implant users has been conducted using acoustic simulations with normal-hearing subjects. However, some relevant topics about perception through cochlear implants remain scantly explored. The present study examined the perception by normal-hearing subjects of gender and identity of a talker as a function of the number of channels in spectrally reduced speech. Two simulation strategies were compared. They were implemented by two different processors that presented signals as either the sum of sine waves at the center of the channels or as the sum of noise bands. In Experiment 1, 15 subjects determined the gender of 40 talkers (20 males + 20 females) from a natural utterance processed through 3, 4, 5, 6, 8, 10, 12, and 16 channels with both processors. In Experiment 2, 56 subjects matched a natural sentence uttered by 10 talkers with the corresponding simulation replicas processed through 3, 4, 8, and 16 channels for each processor. In Experiment 3, 72 subjects performed the same task but different sentences were used for natural and processed stimuli. A control Experiment 4 was conducted to equate the processing steps between the two simulation strategies. Results showed that gender and talker identification was better for the sine-wave processor, and that performance through the noise-band processor was more sensitive to the number of channels. Implications and possible explanations for the superiority of sine-wave simulations are discussed.
70 citations
••
14 Sep 2014TL;DR: This paper applies a convolutional neural network trained for automatic speech recognition (ASR) to the task of speaker identification (SID), and in the CNN/i-vector front end, the sufficient statistics are collected based on the outputs of the CNN as opposed to the traditional universal background model (UBM).
Abstract: This paper applies a convolutional neural network (CNN) trained for automatic speech recognition (ASR) to the task of speaker identification (SID). In the CNN/i-vector front end, the sufficient statistics are collected based on the outputs of the CNN as opposed to the traditional universal background model (UBM). Evaluated on heavily degraded speech data, the CNN/i-vector front end provides performance comparable to the UBM/i-vector baseline. The combination of these approaches, however, is shown to provide improvements of 26% in miss rate to considerably outperform the fusion of two different features in the traditional UBM/i-vectors approach. An analysis of the language- and channel-dependency of the CNN/i-vector approach is also provided to highlight future research directions. Index Terms: Deep neural networks, Convolutional neural networks, Speaker recognition, i-vectors, noisy speech
70 citations
••
TL;DR: This work combines the decisions of two classifiers as an alternative means of improving the performance of a speaker recognition system in adverse environments and shows that there is information that is not captured in the popular mel-frequency cepstral coefficients (MFCC), and the parametric feature-sets (PFS) is able to add further information for improved performance.
70 citations
••
22 May 2011TL;DR: An audio/video database, especially built for the speaker diarization task, based on different video genres, is described, which highlights the difficulties encountered in this context, mainly linked to the database heterogeneity.
Abstract: In the last ten years, internet as well as its applications changed significantly, mainly thanks to the raising of available personal resources. Concerning multimedia, the most impressive evolution is the continuous growing success of the video sharing websites. But with this success come the difficulties to efficiently search, index and access relevant information about these documents. Speaker diarization is an important task in the overall information retrieval process. This paper describes an audio/video database, especially built for the speaker diarization task, based on different video genres. Through some preliminary experiments, it highlights the difficulties encountered in this context, mainly linked to the database heterogeneity.
69 citations