scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of “motherese” on early language learning, and (iii) the “social-gating” hypothesis and humans’ development of social understanding.
Abstract: Historic theories of speech perception (Motor Theory and Analysis by Synthesis) invoked listeners’ knowledge of speech production to explain speech perception. Neuroimaging data show that adult listeners activate motor brain areas during speech perception. In two experiments using magnetoencephalography (MEG), we investigated motor brain activation, as well as auditory brain activation, during discrimination of native and nonnative syllables in infants at two ages that straddle the developmental transition from language-universal to language-specific speech perception. Adults are also tested in Exp. 1. MEG data revealed that 7-mo-old infants activate auditory (superior temporal) as well as motor brain areas (Broca’s area, cerebellum) in response to speech, and equivalently for native and nonnative syllables. However, in 11- and 12-mo-old infants, native speech activates auditory brain areas to a greater degree than nonnative, whereas nonnative speech activates motor brain areas to a greater degree than native speech. This double dissociation in 11- to 12-mo-old infants matches the pattern of results obtained in adult listeners. Our infant data are consistent with Analysis by Synthesis: auditory analysis of speech is coupled with synthesis of the motor plans necessary to produce the speech signal. The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of “motherese” on early language learning, and (iii) the “social-gating” hypothesis and humans’ development of social understanding.

187 citations

Journal ArticleDOI
TL;DR: Experimental results indicate that the proposed SAD scheme is highly effective and provides superior and consistent performance across various noise types and distortion levels.
Abstract: Effective speech activity detection (SAD) is a necessary first step for robust speech applications. In this letter, we propose a robust and unsupervised SAD solution that leverages four different speech voicing measures combined with a perceptual spectral flux feature, for audio-based surveillance and monitoring applications. Effectiveness of the proposed technique is evaluated and compared against several commonly adopted unsupervised SAD methods under simulated and actual harsh acoustic conditions with varying distortion levels. Experimental results indicate that the proposed SAD scheme is highly effective and provides superior and consistent performance across various noise types and distortion levels.

186 citations

PatentDOI
Steven G. Woodward1
TL;DR: In this paper, a method for processing a misrecognition error in an embedded speech recognition system during a speech recognition session can include the step of speech-to-text converting audio input in the embedded SPR system based on an active language model.
Abstract: A method for processing a misrecognition error in an embedded speech recognition system during a speech recognition session can include the step of speech-to-text converting audio input in the embedded speech recognition system based on an active language model. The speech-to-text conversion can produce speech recognized text that can be presented through a user interface. A user-initiated misrecognition error notification can be detected. The audio input and a reference to the active language model can be provided to a speech recognition system training process associated with the embedded speech recognition system.

186 citations

Patent
28 Mar 2007
TL;DR: In this paper, a signal decorrelator for deriving an output audio signal from an input audio signal has a frequency analyzer for extracting from the audio signal a first partial signal descriptive of an audio content in a first audio frequency range and a second partial signal describing audio content with higher frequencies compared to the second frequency range.
Abstract: An audio signal decorrelator for deriving an output audio signal from an input audio signal has a frequency analyzer for extracting from the input audio signal a first partial signal descriptive of an audio content in a first audio frequency range and a second partial signal descriptive of an audio content in a second audio frequency range having higher frequencies compared to the second audio frequency range. A partial signal modifier modifies the first and second partial signals, to obtain first and second processed partial signals, so that a modulation amplitude of a time variant phase shift or time variant delay applied to the first partial signal is higher than that applied to the second partial signal, or for modifying only the first partial signal. A signal combiner combines the first and second processed partial signals, or combines the first processed partial signal and the second partial signal, to obtain an output audio signal.

185 citations

Proceedings ArticleDOI
07 Apr 1986
TL;DR: A new method is presented for text-to-speech synthesis using diphones, based on a representation of the speech signal by its short-time Fourier transform at a pitch-synchronous sampling rate.
Abstract: A new method is presented for text-to-speech synthesis using diphones. The diphone database consists of the diphone waveforms labeled with pitch-marks indicating the pitch-periods. At synthesis time, the diphone waveforms are processed through a new analysis-synthesis system, providing an independent control of all prosodic parameters, while retaining a good degree of naturalness. This system is based on a representation of the speech signal by its short-time Fourier transform (STFT) at a pitch-synchronous sampling rate. The synthesis part of the system works by overlap-adding the modified short-term signals and it ensures a smooth concatenation of the diphone waveforms. The synthetic speech obtained by this method sounds more natural than with the conventional LPC method.

184 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108