Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Infants' brain responses to speech suggest analysis by synthesis.

[...]

Patricia K. Kuhl¹, Rey R. Ramírez¹, Alexis Bosseler¹, Jo Fu Lotus Lin¹, Toshiaki Imada¹ - Show less +1 more•Institutions (1)

University of Washington¹

05 Aug 2014-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of “motherese” on early language learning, and (iii) the “social-gating” hypothesis and humans’ development of social understanding.

...read moreread less

Abstract: Historic theories of speech perception (Motor Theory and Analysis by Synthesis) invoked listeners’ knowledge of speech production to explain speech perception. Neuroimaging data show that adult listeners activate motor brain areas during speech perception. In two experiments using magnetoencephalography (MEG), we investigated motor brain activation, as well as auditory brain activation, during discrimination of native and nonnative syllables in infants at two ages that straddle the developmental transition from language-universal to language-specific speech perception. Adults are also tested in Exp. 1. MEG data revealed that 7-mo-old infants activate auditory (superior temporal) as well as motor brain areas (Broca’s area, cerebellum) in response to speech, and equivalently for native and nonnative syllables. However, in 11- and 12-mo-old infants, native speech activates auditory brain areas to a greater degree than nonnative, whereas nonnative speech activates motor brain areas to a greater degree than native speech. This double dissociation in 11- to 12-mo-old infants matches the pattern of results obtained in adult listeners. Our infant data are consistent with Analysis by Synthesis: auditory analysis of speech is coupled with synthesis of the motor plans necessary to produce the speech signal. The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of “motherese” on early language learning, and (iii) the “social-gating” hypothesis and humans’ development of social understanding.

...read moreread less

187 citations

Journal Article•DOI•

Unsupervised Speech Activity Detection Using Voicing Measures and Perceptual Spectral Flux

[...]

Seyed Omid Sadjadi¹, John H. L. Hansen¹•Institutions (1)

University of Texas at Dallas¹

04 Jan 2013-IEEE Signal Processing Letters

TL;DR: Experimental results indicate that the proposed SAD scheme is highly effective and provides superior and consistent performance across various noise types and distortion levels.

...read moreread less

Abstract: Effective speech activity detection (SAD) is a necessary first step for robust speech applications. In this letter, we propose a robust and unsupervised SAD solution that leverages four different speech voicing measures combined with a perceptual spectral flux feature, for audio-based surveillance and monitoring applications. Effectiveness of the proposed technique is evaluated and compared against several commonly adopted unsupervised SAD methods under simulated and actual harsh acoustic conditions with varying distortion levels. Experimental results indicate that the proposed SAD scheme is highly effective and provides superior and consistent performance across various noise types and distortion levels.

...read moreread less

186 citations

Patent•DOI•

Detecting speech recognition errors in an embedded speech recognition system

[...]

Steven G. Woodward¹•Institutions (1)

IBM¹

01 Mar 2001-Journal of the Acoustical Society of America

TL;DR: In this paper, a method for processing a misrecognition error in an embedded speech recognition system during a speech recognition session can include the step of speech-to-text converting audio input in the embedded SPR system based on an active language model.

...read moreread less

Abstract: A method for processing a misrecognition error in an embedded speech recognition system during a speech recognition session can include the step of speech-to-text converting audio input in the embedded speech recognition system based on an active language model. The speech-to-text conversion can produce speech recognized text that can be presented through a user interface. A user-initiated misrecognition error notification can be detected. The audio input and a reference to the active language model can be provided to a speech recognition system training process associated with the embedded speech recognition system.

...read moreread less

186 citations

Patent•

Audio signal decorrelator, multi channel audio signal processor, audio signal processor, method for deriving an output audio signal from an input audio signal and computer program

[...]

Jürgen Herre, Herbert Buchner

28 Mar 2007

TL;DR: In this paper, a signal decorrelator for deriving an output audio signal from an input audio signal has a frequency analyzer for extracting from the audio signal a first partial signal descriptive of an audio content in a first audio frequency range and a second partial signal describing audio content with higher frequencies compared to the second frequency range.

...read moreread less

Abstract: An audio signal decorrelator for deriving an output audio signal from an input audio signal has a frequency analyzer for extracting from the input audio signal a first partial signal descriptive of an audio content in a first audio frequency range and a second partial signal descriptive of an audio content in a second audio frequency range having higher frequencies compared to the second audio frequency range. A partial signal modifier modifies the first and second partial signals, to obtain first and second processed partial signals, so that a modulation amplitude of a time variant phase shift or time variant delay applied to the first partial signal is higher than that applied to the second partial signal, or for modifying only the first partial signal. A signal combiner combines the first and second processed partial signals, or combines the first processed partial signal and the second partial signal, to obtain an output audio signal.

...read moreread less

185 citations

Proceedings Article•DOI•

Diphone synthesis using an overlap-add technique for speech waveforms concatenation

[...]

F. Charpentier¹, M. Stella¹•Institutions (1)

Centre national d'études des télécommunications¹

07 Apr 1986

TL;DR: A new method is presented for text-to-speech synthesis using diphones, based on a representation of the speech signal by its short-time Fourier transform at a pitch-synchronous sampling rate.

...read moreread less

Abstract: A new method is presented for text-to-speech synthesis using diphones. The diphone database consists of the diphone waveforms labeled with pitch-marks indicating the pitch-periods. At synthesis time, the diphone waveforms are processed through a new analysis-synthesis system, providing an independent control of all prosodic parameters, while retaining a good degree of naturalness. This system is based on a representation of the speech signal by its short-time Fourier transform (STFT) at a pitch-synchronous sampling rate. The synthesis part of the system works by overlap-adding the modified short-term signals and it ensures a smooth concatenation of the diphone waveforms. The synthetic speech obtained by this method sounds more natural than with the conventional LPC method.

...read moreread less

184 citations

Collapse

Network Information

Performance

Metrics

14,368

Papers

279,843

Citations

No. of papers in the topic in previous years
Year	Papers
2023	38
2022	84
2021	70
2020	62
2019	77
2018	108

Speech coding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics