scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Patent
Gilad Cohen1, Yossef Cohen1, Doron Hoffman1, Hagai Krupnik1, Aharon Satt1 
04 Mar 1998
TL;DR: In this paper, a method for adaptively switching between transform audio coder and CELP coder, which makes use of the superior performance of cELP coders for speech signal coding, while enjoying the benefits of transform coder for other audio signals.
Abstract: Apparatus is described for digitally encoding an input audio signal for storage or transmission. A distinguishing parameter is measure from the input signal. It is determined from the measured distinguishing parameter whether the input signal contains an audio signal of a first type or a second type. First and second coders are provided for digitally encoding the input signal using first and second coding methods respectively and a switching arrangement directs, at any particular time, the generation of an output signal by encoding the input signal using either the first or second coders according to whether the input signal contains an audio signal of the first type or the second type at that time. A method for adaptively switching between transform audio coder and CELP coder, is presented. In a preferred embodiment, the method makes use of the superior performance of CELP coders for speech signal coding, while enjoying the benefits of transform coder for other audio signals. The combined coder is designed to handle both speech and music and achieve an improved quality.

148 citations

Patent
Michael M. Lee1
02 Apr 2008
TL;DR: In this paper, the authors present a system for altering an audio output to sound as if a different person had recorded it when it was played back when the audio data file was sent to the system.
Abstract: Methods, systems and computer readable media for altering an audio output are provided. In some embodiments, the system may change the original frequency content of an audio data file to a second frequency content so that a recorded audio track will sound as if a different person had recorded it when it is played back. In other embodiments, the system may receive an audio data file and a voice signature, and it may apply the voice signature to the audio data file to alter the audio output of the audio data file. In that instance, the audio data file may be a textual representation of a recorded audio data file.

148 citations

Patent
TL;DR: In this paper, a method of producing synthetic visual speech according to this invention includes receiving an input containing speech information, one or more visemes that correspond to the speech input are then identified.
Abstract: A method of producing synthetic visual speech according to this invention includes receiving an input containing speech information. One or more visemes that correspond to the speech input are then identified. Next, the weights of those visemes are calculated using a coarticulation engine including viseme deformability information. Finally, a synthetic visual speech output is produced based on the visemes' weights over time (or tracks). The synthetic visual speech output is combined with a synchronized audio output corresponding to the input to produce a multimedia output containing a 3D lipsyncing animation.

148 citations

Journal ArticleDOI
TL;DR: A new objective estimation approach that uses a simple but effective perceptual transformation and a distance measure that consists of a hierarchy of measuring normalizing blocks that reflects the magnitude of a perceived distance between two perceptually transformed signals.
Abstract: Perceived speech quality is most directly measured by subjective listening tests. These tests are often slow and expensive, and numerous attempts have been made to supplement them with objective estimators of perceived speech quality. These attempts have found limited success, primarily in analog and higher-rate, error-free digital environments where speech waveforms are preserved or nearly preserved. The objective estimation of the perceived quality of highly compressed digital speech, possibly with bit errors or frame erasures has remained an open question. We report our findings regarding two essential components of objective estimators of perceived speech quality: perceptual transformations and distance measures. A perceptual transformation modifies a representation of an audio signal in a way that is approximately equivalent to the human hearing process. A distance measure reflects the magnitude of a perceived distance between two perceptually transformed signals. We then describe a new objective estimation approach that uses a simple but effective perceptual transformation and a distance measure that consists of a hierarchy of measuring normalizing blocks. Each measuring normalizing block integrates two perceptually transformed signals over some time or frequency interval to determine the average difference across that interval. This difference is then normalized out of one signal, and is further processed to generate one or more measurements.

147 citations

Journal ArticleDOI
TL;DR: Group differences in performance were significant in 4 conditions: vowel identification, difficult sentence material at +5 dB and +10 dB SNR, and a measure that quantified performance in noise and low input levels relative to performance in quiet.
Abstract: Objective To determine if subjects who used different cochlear implant devices and who were matched on consonant-vowel-consonant (CNC) identification in quiet would show differences in performance on speech-based tests of spectral and temporal resolution, speech understanding in noise, or speech understanding at low sound levels. Design The performance of 15 subjects fit with the CII Bionic Ear System (CII Bionic Ear behind-the-ear speech processor with the Hi-Resolution sound processing strategy; Advanced Bionics Corporation) was compared with the performance of 15 subjects fit with the Nucleus 24 electrode array and ESPrit 3G behind-the-ear speech processor with the Advanced Combination Encoder speech coding strategy (Cochlear Corporation). Subjects Thirty adults with late-onset deafness and above-average speech perception abilities who used cochlear implants. Main Outcome Measures Vowel recognition, consonant recognition, sentences in quiet (74, 64, and 54 dB SPL [sound pressure level]) and in noise (+10 and +5 dB SNR [signal-to-noise ratio]), voice discrimination, and melody recognition. Results Group differences in performance were significant in 4 conditions: vowel identification, difficult sentence material at +5 dB and +10 dB SNR, and a measure that quantified performance in noise and low input levels relative to performance in quiet. Conclusions We have identified tasks on which there are between-group differences in performance for subjects matched on CNC word scores in quiet. We suspect that the differences in performance are due to differences in signal processing. Our next goal is to uncover the signal processing attributes of the speech processors that are responsible for the differences in performance.

147 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108