scispace - formally typeset
Search or ask a question
Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.


Papers
More filters
PatentDOI
TL;DR: Embodiments of the present invention include a speech recognition method that includes receiving from an external system first recognition information to recognize a first plurality of words in a first system.
Abstract: Embodiments of the present invention include a speech recognition method. In one embodiment, the method includes receiving from an external system first recognition information to recognize a first plurality of words in a first system, programming the first system with the first recognition information to recognize the first plurality of words, generating first recognition results in response to receiving at least one of the first plurality of words in the first system, receiving from the external system second recognition information to recognize a second plurality of words, wherein the second recognition information is selected based on the first recognition results, and programming the first system with the second recognition information to recognize a second plurality of words.

183 citations

Proceedings ArticleDOI
Jing Huang1, Brian Kingsbury1
26 May 2013
TL;DR: This work uses DBNs for audio-visual speech recognition; in particular, it uses deep learning from audio and visual features for noise robust speech recognition and test two methods for using DBN’s in a multimodal setting.
Abstract: Deep belief networks (DBN) have shown impressive improvements over Gaussian mixture models for automatic speech recognition. In this work we use DBNs for audio-visual speech recognition; in particular, we use deep learning from audio and visual features for noise robust speech recognition. We test two methods for using DBNs in a multimodal setting: a conventional decision fusion method that combines scores from single-modality DBNs, and a novel feature fusion method that operates on mid-level features learned by the single-modality DBNs. On a continuously spoken digit recognition task, our experiments show that these methods can reduce word error rate by as much as 21% relative over a baseline multi-stream audio-visual GMM/HMM system.

182 citations

Proceedings ArticleDOI
01 May 1988
TL;DR: An improved version of a previously described automatic lipreading system has been developed which uses vector quantization, dynamic time warping, and a new heuristic distance measure to improve acoustic speech recognition.
Abstract: Current acoustic speech recognition technology performs well with very small vocabularies in noise or with large vocabularies in very low noise. Accurate acoustic speech recognition in noise with vocabularies over 100 words has yet to be achieved. Humans frequently lipread the visible facial speech articulations to enhance speech recognition, especially when the acoustic signal is degraded by noise or hearing impairment. Automatic lipreading has been found to improve significantly acoustic speech recognition and could be advantageous in noisy environments such as offices, aircraft and factories.An improved version of a previously described automatic lipreading system has been developed which uses vector quantization, dynamic time warping, and a new heuristic distance measure. This paper presents visual speech recognition results from multiple speakers under optimal conditions. Results from combined acoustic and visual speech recognition are also presented which show significantly improved performance compared to the acoustic recognition system alone.

181 citations

Journal ArticleDOI
TL;DR: The group delay function is modified to overcome the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects and is called the modified group delay feature (MODGDF).
Abstract: Spectral representation of speech is complete when both the Fourier transform magnitude and phase spectra are specified. In conventional speech recognition systems, features are generally derived from the short-time magnitude spectrum. Although the importance of Fourier transform phase in speech perception has been realized, few attempts have been made to extract features from it. This is primarily because the resonances of the speech signal which manifest as transitions in the phase spectrum are completely masked by the wrapping of the phase spectrum. Hence, an alternative to processing the Fourier transform phase, for extracting speech features, is to process the group delay function which can be directly computed from the speech signal. The group delay function has been used in earlier efforts, to extract pitch and formant information from the speech signal. In all these efforts, no attempt was made to extract features from the speech signal and use them for speech recognition applications. This is primarily because the group delay function fails to capture the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects. In this paper, the group delay function is modified to overcome these effects. Cepstral features are extracted from the modified group delay function and are called the modified group delay feature (MODGDF). The MODGDF is used for three speech recognition tasks namely, speaker, language, and continuous-speech recognition. Based on the results of feature and performance evaluation, the significance of the MODGDF as a new feature for speech recognition is discussed

181 citations

Patent
Jung-Eun Kim1, Jeong-Su Kim1
16 Feb 2006
TL;DR: In this paper, a user adaptive speech recognition method and apparatus is disclosed that controls user confirmation of a recognition candidate using a new threshold value adapted to a user, which includes calculating a confidence score of recognition candidate according to the result of speech recognition.
Abstract: A user adaptive speech recognition method and apparatus is disclosed that controls user confirmation of a recognition candidate using a new threshold value adapted to a user. The user adaptive speech recognition method includes calculating a confidence score of a recognition candidate according to the result of speech recognition, setting a new threshold value adapted to the user based on a result of user confirmation of the recognition candidate and the confidence score of the recognition candidate, and outputting a corresponding recognition candidate as a result of the speech recognition if the calculated confidence score is higher than the new threshold value. Thus, the need for user confirmation of the result of speech recognition is reduced and the probability of speech recognition success is increased.

181 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Signal processing
73.4K papers, 983.5K citations
81% related
Decoding methods
65.7K papers, 900K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023165
2022468
2021283
2020475
2019484
2018420