scispace - formally typeset
Search or ask a question
Topic

Speaker recognition

About: Speaker recognition is a research topic. Over the lifetime, 14990 publications have been published within this topic receiving 310061 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The new approach of phonetic feature bundling for modeling coarticulation in EMG-based speech recognition is described and results on theEMG-PIT corpus, a multiple speaker large vocabulary database of silent and audible EMG speech recordings, which was recently collected are reported.

161 citations

Patent
03 Dec 2003
TL;DR: In this article, a fast on-line automatic speaker/environment adaptation suitable for speech/speaker recognition system, method and computer program product is presented, which consists of a computer system including a processor, a memory coupled with the processor, an input coupled with a processor for receiving acoustic signals, and an output coupled with an output for outputting recognized words or sounds.
Abstract: A fast on-line automatic speaker/environment adaptation suitable for speech/speaker recognition system, method and computer program product are presented. The system comprises a computer system including a processor, a memory coupled with the processor, an input coupled with the processor for receiving acoustic signals, and an output coupled with the processor for outputting recognized words or sounds. The system includes a model-adaptation system and a recognition system, configured to accurately and efficiently recognize on-line distorted sounds or words spoken with different accents, in the presence of randomly changing environmental conditions. The model-adaptation system quickly adapts standard acoustic training models, available on audio recognition systems, by incorporating distortion parameters representative of the changing environmental conditions or the speaker's accent. By adapting models already available to the new environment, the system does not need separate adaptation training data.

161 citations

Patent
Ilya Skuratovsky1
22 Nov 2005
TL;DR: A method of speech synthesis can include automatically identifying spoken passages and non-spoken passages within a text source and converting the text source to speech by applying different voice configurations according to whether each portion of text was identified as a spoken passage or not.
Abstract: A method of speech synthesis can include automatically identifying spoken passages and non-spoken passages within a text source and converting the text source to speech by applying different voice configurations to different portions of text within the text source according to whether each portion of text was identified as a spoken passage or a non-spoken passage. The method further can include identifying the speaker and/or the gender of the speaker and applying different voice configurations according to the speaker identity and/or speaker gender.

160 citations

Proceedings ArticleDOI
19 Apr 1994
TL;DR: A new method of creating speaker-specific phoneme models consisting of tied-mixture HMMs and adapts the feature space of the tied- mixtures to that of the speaker through phoneme-dependent/independent iterative training is proposed.
Abstract: Speaker adaptation methods for tied-mixture-based phoneme models are investigated for text-prompted speaker recognition. For this type of speaker recognition, speaker-specific phoneme models are essential for verifying both the key text and the speaker. This paper proposes a new method of creating speaker-specific phoneme models. This uses speaker-independent (universal) phoneme models consisting of tied-mixture HMMs and adapts the feature space of the tied-mixtures to that of the speaker through phoneme-dependent/independent iterative training. Therefore, it can adapt models of phonemes that have a small amount of training data to the speaker. The proposed method was tested using 15 speakers' voices recorded over 10 months and achieved a speaker and text verification rate of 99.4% even when both the voices of different speakers and different texts uttered by the true speaker were to be rejected. >

160 citations

Proceedings ArticleDOI
30 Jul 2000
TL;DR: A method of automatically detecting a talking person using video and audio data from a single microphone using a time-delayed neural network and a spatio-temporal search for a speaking person is described.
Abstract: The visual motion of the mouth and the corresponding audio data generated when a person speaks are highly correlated. This fact has been exploited for lip/speech-reading and for improving speech recognition. We describe a method of automatically detecting a talking person (both spatially and temporally) using video and audio data from a single microphone. The audio-visual correlation is learned using a time-delayed neural network, which is then used to perform a spatio-temporal search for a speaking person. Applications include videoconferencing, video indexing and improving human-computer interaction (HCI). An example HCI application is provided.

160 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Recurrent neural network
29.2K papers, 890K citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Signal processing
73.4K papers, 983.5K citations
81% related
Decoding methods
65.7K papers, 900K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023165
2022468
2021283
2020475
2019484
2018420