scispace - formally typeset
Search or ask a question
Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.


Papers
More filters
Proceedings ArticleDOI
19 Apr 2015
TL;DR: The studies show that the CNN-based approach achieves better performance than the conventional ANN- based approach with as many parameters and that the features learned from raw speech by the CNN -based approach could generalize across different databases.
Abstract: State-of-the-art automatic speech recognition systems model the relationship between acoustic speech signal and phone classes in two stages, namely, extraction of spectral-based features based on prior knowledge followed by training of acoustic model, typically an artificial neural network (ANN). In our recent work, it was shown that Convolutional Neural Networks (CNNs) can model phone classes from raw acoustic speech signal, reaching performance on par with other existing feature-based approaches. This paper extends the CNN-based approach to large vocabulary speech recognition task. More precisely, we compare the CNN-based approach against the conventional ANN-based approach on Wall Street Journal corpus. Our studies show that the CNN-based approach achieves better performance than the conventional ANN-based approach with as many parameters. We also show that the features learned from raw speech by the CNN-based approach could generalize across different databases.

171 citations

Patent
Lijuan Wang1, Frank K. Soong1
05 Mar 2008
TL;DR: In this article, a speech recognition result is displayed for review by a user. If it is incorrect, the user provides pen-based editing marks and an error type and location (within the speech recognition results) are identified based on the pen-base editing marks.
Abstract: A speech recognition result is displayed for review by a user. If it is incorrect, the user provides pen-based editing marks. An error type and location (within the speech recognition result) are identified based on the pen-based editing marks. An alternative result template is generated, and an N-best alternative list is also generated by applying the template to intermediate recognition results from an automatic speech recognizer. The N-best alternative list is output for use in correcting the speech recognition results.

171 citations

Journal ArticleDOI
TL;DR: One of the most effective factors for speaker identification was the duration of the speech signal, which appears to be important only insofar as it admits a smaller or larger statistical sampling of the speaker's speech repertoire.
Abstract: The effect of several factors upon voice identification was examined. These factors were: the size of the class of possible voices, the duration of the speech signal, the frequency range of the speech signal, voicing vs nonvoicing speech characteristics, and the simultaneous presentation of several voices. One of the most effective factors for speaker identification was the duration of the speech signal. Duration, as such, appears to be important, however, only insofar as it admits a smaller or larger statistical sampling of the speaker's speech repertoire.

170 citations

Patent
24 Sep 2004
TL;DR: In this paper, the authors proposed a speech recognition system using selectable recognition modes. But their focus was on speech recognition using large vocabulary speech recognition programs that supplies recognized words to external program as they are recognized, and allows a user to select between large vocabulary recognition of an utterance with and without language context from the prior utterance independently of state of the external program.
Abstract: The present invention relates to speech recognition using selectable recognition modes. This includes innovations such as: large vocabulary speech recognition programming that supplies recognized words to external program as they are recognized, and allows a user to select between large vocabulary recognition of an utterance with and without language context from the prior utterance independently of state of the external program; allowing a user to select between continuous and discrete speech recognition that use substantially the same vocabulary; allowing a user to select between continuous and discrete large-vocabulary speech recognition modes; allowing a user to select between at least two different alphabetic entry speech recognition modes; and allowing a user to select from among four or more of the following recognitions modes when creating text: a large-vocabulary mode, an alphabetic entry mode, a number entry mode, and a punctuation entry mode.

170 citations

Proceedings Article
01 Jan 2012
TL;DR: Experiments show that the performance of the features derived from phase spectrum outperform the melfrequency cepstral coefficients (MFCCs) tremendously: even without converted speech for training, the equal error rate (EER) is reduced from 20.20% of MFCCs to 2.35%.
Abstract: Voice conversion techniques present a threat to speaker verification systems. To enhance the security of speaker verification systems, We study how to automatically distinguish natural speech and synthetic/converted speech. Motivated by the research on phase spectrum in speech perception, in this study, we propose to use features derived from phase spectrum to detect converted speech. The features are tested under three different training situations of the converted speech detector: a) only Gaussian mixture model (GMM) based converted speech data are available; b) only unit-selection based converted speech data are available; c) no converted speech data are available for training converted speech model. Experiments conducted on the National Institute of Standards and Technology (NIST) 2006 speaker recognition evaluation (SRE) corpus show that the performance of the features derived from phase spectrum outperform the melfrequency cepstral coefficients (MFCCs) tremendously: even without converted speech for training, the equal error rate (EER) is reduced from 20.20% of MFCCs to 2.35%.

170 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Signal processing
73.4K papers, 983.5K citations
81% related
Decoding methods
65.7K papers, 900K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023165
2022468
2021283
2020475
2019484
2018420