scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Patent
Craig L. Reding, Suzi Levas1
30 Dec 2011
TL;DR: In this paper, a shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facilities via a communications channel, e.g., the Internet.
Abstract: Techniques for generating, distributing, and using speech recognition models are described. A shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facility via a communications channel, e.g., the Internet. Devices with audio capture capability record and transmit to the speech processing facility, via the Internet, digitized speech and receive speech processing services, e.g., speech recognition model generation and/or speech recognition services, in response. The Internet is used to return speech recognition models and/or information identifying recognized words or phrases. Thus, the speech processing facility can be used to provide speech recognition capabilities to devices without such capabilities and/or to augment a device's speech processing capability. Voice dialing, telephone control and/or other services are provided by the speech processing facility in response to speech recognition results.

58 citations

Patent
TL;DR: In this article, the authors propose a method for processing a series of input audio signals representing virtual audio sound sources to produce a reduced set of audio output signals for playback over speaker devices placed around a listener.
Abstract: A method of processing a series of input audio signals representing a series of virtual audio sound sources placed at predetermined positions around a listener to produce a reduced set of audio output signals for playback over speaker devices placed around a listener, the method comprising the steps of: (a) for each of the input audio signals and for each of the audio output signals: (i) convolving the input audio signals with an initial head portion of a corresponding impulse response mapping substantially the initial sound and early reflections for an impulse response of a corresponding virtual audio source to a corresponding speaker device so as to form a series of initial responses; (b) for each of the input audio signals and for each of the audio output signals: (i) forming a combined mix from the audio input signals; and (ii) forming a combined convolution tail from the tails of the corresponding impulse responses; (iii) convolving the combined mix with the combined convolution tail to form a combined tail response; (c)for each of the audio output signals: (i) combining a corresponding series of initial responses and a corresponding combined tail response to form the audio output signal

58 citations

01 Jun 1998
TL;DR: This thesis focuses on the theory analysis and algorithm aspects of signal subspace methods used for speech enhancement in digital speech processing and illustrates the power and ro bustness of the subspace approach.
Abstract: This thesis focus on the theory analysis and algorithm aspects of signal subspace methods used for speech enhancement in digital speech processing The problem is approached by initially performing an analysis of subspace principles applied to speech signals in order to characterize the usefulness of de ning a signal subspace for this application The theory is formulated by means of the singular value decomposition or the eigendecomposition and subspace methods are linked to ltering in the frequency domain Nonparametric speech enhancement using linear signal subspace based estimation of the clean signal from the noisy signal is reviewed and connections between existing algorithms and litterature are explored An analysis of the practical behavior of the estimators is given and aspects regarding their performance in the case with prewhitening is covered The relation to the popular spectral subtraction approach is discussed and the origin of the musical noise is pointed out A possible way to reduce the latter is devised In the noisy case model based estimation is a nonlinear problem which is normally solved by iterative techniques However a new idea based on multi microphone inverse ltering is presented where the solution is obtained by subspace methods The algorithm aspects of signal subspace methods are discussed in terms of the rank revealing ULV ULLV decompositions which are numerically stable and can be cheaply updated when a new data sample is present The potential of the decompositions when applied to speech problems are analyzed and di erent estimation strategies are suggested Again the practical behavior of the estimators are analyzed A recursive ULLV algorithm for a so called sliding window estimation is presented which is new in its complete treatment and implementation Many aspects of the algorithm are discussed in details and important considerations are pointed out Both the ULV ULLV algorithms and the subspace based enhancement algorithms are implemented in a Matlab toolbox Throughout the thesis the speech enhancement application illustrates the power and ro bustness of the subspace approach and a number of illustrative examples are given Peter S K Hansen iii

58 citations

Proceedings ArticleDOI
14 Mar 2010
TL;DR: This work presents a monaural speech enhancement method based on sparse coding of noisy speech signals in a composite dictionary, consisting of the concatenation of a speech and interferer dictionary, both being possibly over-complete.
Abstract: The enhancement of speech degraded by non-stationary interferers is a highly relevant and difficult task of many signal processing applications. We present a monaural speech enhancement method based on sparse coding of noisy speech signals in a composite dictionary, consisting of the concatenation of a speech and interferer dictionary, both being possibly over-complete. The speech dictionary is learned off-line on a training corpus, while an environment specific interferer dictionary is learned on-line during speech pauses. Our approach optimizes the trade-off between source distortion and source confusion, and thus achieves significant improvements on objective quality measures like cepstral distance, in the speaker dependent and independent case, in several real-world environments and at low signal-to-noise ratios. Our enhancement method outperforms state-of-the-art methods like multi-band spectral subtraction and approaches based on vector quantization.

58 citations

Patent
30 Sep 2008
TL;DR: In this article, a demultiplexer (401) and decoder (403) are used to generate a binaural audio signal, which is a downmix of an N-channel audio signal and spatial parameter data.
Abstract: An apparatus for generating a binaural audio signal comprises a demultiplexer (401) and decoder (403) which receives audio data comprising an audio M-channel audio signal which is a downmix of an N-channel audio signal and spatial parameter data for upmixing the M-channel audio signal to the N-channel audio signal. A conversion processor (411) converts spatial parameters of the spatial parameter data into first binaural parameters in response to at least one binaural perceptual transfer function. A matrix processor (409) converts the M-channel audio signal into a first stereo signal in response to the first binaural parameters. A stereo filter (415, 417) generates the binaural audio signal by filtering the first stereo signal. The filter coefficients for the stereo filter are determined in response to the at least one binaural perceptual transfer function by a coefficient processor (419). The combination of parameter conversion/ processing and filtering allows a high quality binaural signal to be generated with low complexity.

58 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108