scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Proceedings ArticleDOI
15 Apr 2007
TL;DR: Joint adaptive training is presented including formula for estimating the transforms and canonical model parameters and results show that multistyle models benefit from VTS compensation or joint uncertainty decoding by reducing the mismatch between training and test.
Abstract: Standard noise compensation techniques for automatic speech recognition assume a clean trained acoustic model. What is thought of as "clean" data, may still have a variety of speakers, different channels and varying noise conditions. Hence it may be more reasonable to consider such data multi-conditional for multistyle training. This paper shows that multistyle models benefit from VTS compensation or joint uncertainty decoding by reducing the mismatch between training and test. An EM-based noise estimation procedure that produces ML VTS or joint noise models is also described. Alternatively, adaptive training with joint uncertainty transforms factors out the noise from the data. The uncertainty variance bias de-weights observations in the training data where the SNR is low. This property allows data with a wide SNR range to be used and produces canonical models that truly represent clean speech, whereas multistyle trained models must account for all acoustic variation associated with different noise conditions. This paper presents joint adaptive training including formula for estimating the transforms and canonical model parameters. Experiments are conducted on the resource management and broadcast news corpora.

83 citations

Patent
04 Jan 2007
TL;DR: In this paper, a method for synthesizing a binaural audio signal is described, which consists of inputting a parametrically encoded audio signal comprising at least one combined signal of a plurality of audio channels and one or more corresponding sets of side information describing a multi-channel sound image.
Abstract: A method for synthesizing a binaural audio signal, the method comprising: inputting a parametrically encoded audio signal comprising at least one combined signal of a plurality of audio channels and one or more corresponding sets of side information describing a multi- channel sound image; and applying a predetermined set of head-related transfer function filters to the at least one combined signal in proportion determined by the corresponding set of side information to synthesize a binaural audio signal. A corresponding parametric audio decoder, parametric audio encoder, computer program product, and apparatus for synthesizing a binaural audio signal are also described.

83 citations

Patent
29 Jun 2001
TL;DR: In this article, a method and system for noiselessly switching between independent audio streams is presented, which preserves valid RTP information at the time of switch over, and can be used for VOIP calls.
Abstract: The present invention provides a method and system for noiselessly switching between independent audio streams. Such noiseless switching preserves valid RTP information at the time of switch over. For established VOIP calls, the present invention can noiselessly switch audio from one audio source to another. A switch directs audio data from multiple audio sources to a network interface controller. The switch can be a cell switch or a packet switch. The audio sources can be internal audio sources and/or external audio sources. An egress audio controller controls the operation of internal audio sources, the switch and the network interface controller to carry out noiseless switching according to the present invention. Certain call events which involve additional audio trigger a noiseless switch over.

83 citations

Proceedings ArticleDOI
14 Mar 2010
TL;DR: A novel feature extraction technique for speech recognition based on the principles of sparse coding to express a spectro-temporal pattern of speech as a linear combination of an overcomplete set of basis functions such that the weights of the linear combination are sparse.
Abstract: This paper proposes a novel feature extraction technique for speech recognition based on the principles of sparse coding. The idea is to express a spectro-temporal pattern of speech as a linear combination of an overcomplete set of basis functions such that the weights of the linear combination are sparse. These weights (features) are subsequently used for acoustic modeling. We learn a set of overcomplete basis functions (dictionary) from the training set by adopting a previously proposed algorithm which iteratively minimizes the reconstruction error and maximizes the sparsity of weights. Furthermore, features are derived using the learned basis functions by applying the well established principles of compressive sensing. Phoneme recognition experiments show that the proposed features outperform the conventional features in both clean and noisy conditions.

83 citations

Patent
11 Dec 2003
TL;DR: In this paper, a system and method for separating a mixture of audio signal into desired audio signals (e.g., speech) and a noise signal (440) is disclosed, where microphones are positioned to receive the mixed audio signals, and an independent component analysis (ICA) processes (212) the sound mixture using stability constraints.
Abstract: A system and method for separating a mixture of audio signal into desired audio signals (430) (e.g., speech) and a noise signal (440) is disclosed. Microphones (310, 320) are positioned to receive the mixed audio signals, and an independent component analysis (ICA) processes (212) the sound mixture using stability constraints. The ICA process (508) uses predefined characteristics of the desired speech signal to identify and isolate a target sound signal (430). Filter coefficients are adapted with a learning rule and filter weight update dynamics are stabilized to assist convergence to a stable separated ICA signal result. The separated signals may be peripherally-processed to further reduce noise effects using post-­processing (214) and pre-processing (220, 230) techniques and information. The proposed system is designed and easily adaptable for implementation on DSP units or CPUs in audio communication hardware environments.

83 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108