Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Patent•DOI•

Speech recognition programming information retrieved from a remote source to a speech recognition system for performing a speech recognition method

[...]

Todd F. Mozer, F. S. Mozer

15 Oct 2003-Journal of the Acoustical Society of America

TL;DR: Embodiments of the present invention include a speech recognition method that includes receiving from an external system first recognition information to recognize a first plurality of words in a first system.

...read moreread less

Abstract: Embodiments of the present invention include a speech recognition method. In one embodiment, the method includes receiving from an external system first recognition information to recognize a first plurality of words in a first system, programming the first system with the first recognition information to recognize the first plurality of words, generating first recognition results in response to receiving at least one of the first plurality of words in the first system, receiving from the external system second recognition information to recognize a second plurality of words, wherein the second recognition information is selected based on the first recognition results, and programming the first system with the second recognition information to recognize a second plurality of words.

...read moreread less

183 citations

Proceedings Article•DOI•

Audio-visual deep learning for noise robust speech recognition

[...]

Jing Huang¹, Brian Kingsbury¹•Institutions (1)

IBM¹

26 May 2013

TL;DR: This work uses DBNs for audio-visual speech recognition; in particular, it uses deep learning from audio and visual features for noise robust speech recognition and test two methods for using DBN’s in a multimodal setting.

...read moreread less

Abstract: Deep belief networks (DBN) have shown impressive improvements over Gaussian mixture models for automatic speech recognition. In this work we use DBNs for audio-visual speech recognition; in particular, we use deep learning from audio and visual features for noise robust speech recognition. We test two methods for using DBNs in a multimodal setting: a conventional decision fusion method that combines scores from single-modality DBNs, and a novel feature fusion method that operates on mid-level features learned by the single-modality DBNs. On a continuously spoken digit recognition task, our experiments show that these methods can reduce word error rate by as much as 21% relative over a baseline multi-stream audio-visual GMM/HMM system.

...read moreread less

182 citations

Proceedings Article•DOI•

An improved automatic lipreading system to enhance speech recognition

[...]

E. Petajan¹, B. Bischoff¹, David Bodoff¹, N. M. Brooke²•Institutions (2)

Bell Labs¹, University of Bath²

01 May 1988

TL;DR: An improved version of a previously described automatic lipreading system has been developed which uses vector quantization, dynamic time warping, and a new heuristic distance measure to improve acoustic speech recognition.

...read moreread less

Abstract: Current acoustic speech recognition technology performs well with very small vocabularies in noise or with large vocabularies in very low noise. Accurate acoustic speech recognition in noise with vocabularies over 100 words has yet to be achieved. Humans frequently lipread the visible facial speech articulations to enhance speech recognition, especially when the acoustic signal is degraded by noise or hearing impairment. Automatic lipreading has been found to improve significantly acoustic speech recognition and could be advantageous in noisy environments such as offices, aircraft and factories.An improved version of a previously described automatic lipreading system has been developed which uses vector quantization, dynamic time warping, and a new heuristic distance measure. This paper presents visual speech recognition results from multiple speakers under optimal conditions. Results from combined acoustic and visual speech recognition are also presented which show significantly improved performance compared to the acoustic recognition system alone.

...read moreread less

181 citations

Journal Article•DOI•

Significance of the Modified Group Delay Feature in Speech Recognition

[...]

Rajesh M. Hegde¹, Hema A. Murthy¹, Venkata Ramana Rao Gadde²•Institutions (2)

Indian Institute of Technology Madras¹, SRI International²

01 Jan 2007-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The group delay function is modified to overcome the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects and is called the modified group delay feature (MODGDF).

...read moreread less

Abstract: Spectral representation of speech is complete when both the Fourier transform magnitude and phase spectra are specified. In conventional speech recognition systems, features are generally derived from the short-time magnitude spectrum. Although the importance of Fourier transform phase in speech perception has been realized, few attempts have been made to extract features from it. This is primarily because the resonances of the speech signal which manifest as transitions in the phase spectrum are completely masked by the wrapping of the phase spectrum. Hence, an alternative to processing the Fourier transform phase, for extracting speech features, is to process the group delay function which can be directly computed from the speech signal. The group delay function has been used in earlier efforts, to extract pitch and formant information from the speech signal. In all these efforts, no attempt was made to extract features from the speech signal and use them for speech recognition applications. This is primarily because the group delay function fails to capture the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects. In this paper, the group delay function is modified to overcome these effects. Cepstral features are extracted from the modified group delay function and are called the modified group delay feature (MODGDF). The MODGDF is used for three speech recognition tasks namely, speaker, language, and continuous-speech recognition. Based on the results of feature and performance evaluation, the significance of the MODGDF as a new feature for speech recognition is discussed

...read moreread less

181 citations

Patent•

User adaptive speech recognition method and apparatus

[...]

Jung-Eun Kim¹, Jeong-Su Kim¹•Institutions (1)

Samsung¹

16 Feb 2006

TL;DR: In this paper, a user adaptive speech recognition method and apparatus is disclosed that controls user confirmation of a recognition candidate using a new threshold value adapted to a user, which includes calculating a confidence score of recognition candidate according to the result of speech recognition.

...read moreread less

Abstract: A user adaptive speech recognition method and apparatus is disclosed that controls user confirmation of a recognition candidate using a new threshold value adapted to a user. The user adaptive speech recognition method includes calculating a confidence score of a recognition candidate according to the result of speech recognition, setting a new threshold value adapted to the user based on a result of user confirmation of the recognition candidate and the confidence score of the recognition candidate, and outputting a corresponding recognition candidate as a result of the speech recognition if the calculated confidence score is higher than the new threshold value. Thus, the need for user confirmation of the result of speech recognition is reduced and the probability of speech recognition success is increased.

...read moreread less

181 citations

Collapse

Network Information

Performance

Metrics

15,632

Papers

337,766

Citations

No. of papers in the topic in previous years
Year	Papers
2023	165
2022	468
2021	283
2020	475
2019	484
2018	420

Speaker recognition

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics