Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Adaptive Training with Joint Uncertainty Decoding for Robust Recognition of Noisy Data

[...]

Hank Liao¹, Mark J. F. Gales¹•Institutions (1)

University of Cambridge¹

15 Apr 2007

TL;DR: Joint adaptive training is presented including formula for estimating the transforms and canonical model parameters and results show that multistyle models benefit from VTS compensation or joint uncertainty decoding by reducing the mismatch between training and test.

...read moreread less

Abstract: Standard noise compensation techniques for automatic speech recognition assume a clean trained acoustic model. What is thought of as "clean" data, may still have a variety of speakers, different channels and varying noise conditions. Hence it may be more reasonable to consider such data multi-conditional for multistyle training. This paper shows that multistyle models benefit from VTS compensation or joint uncertainty decoding by reducing the mismatch between training and test. An EM-based noise estimation procedure that produces ML VTS or joint noise models is also described. Alternatively, adaptive training with joint uncertainty transforms factors out the noise from the data. The uncertainty variance bias de-weights observations in the training data where the SNR is low. This property allows data with a wide SNR range to be used and produces canonical models that truly represent clean speech, whereas multistyle trained models must account for all acoustic variation associated with different noise conditions. This paper presents joint adaptive training including formula for estimating the transforms and canonical model parameters. Experiments are conducted on the resource management and broadcast news corpora.

...read moreread less

83 citations

Patent•

Decoding of binaural audio signals

[...]

Pasi Ojala¹, Julia Turku¹, Mauri Vaananen¹, Mikko Tammi¹•Institutions (1)

Nokia¹

04 Jan 2007

TL;DR: In this paper, a method for synthesizing a binaural audio signal is described, which consists of inputting a parametrically encoded audio signal comprising at least one combined signal of a plurality of audio channels and one or more corresponding sets of side information describing a multi-channel sound image.

...read moreread less

Abstract: A method for synthesizing a binaural audio signal, the method comprising: inputting a parametrically encoded audio signal comprising at least one combined signal of a plurality of audio channels and one or more corresponding sets of side information describing a multi- channel sound image; and applying a predetermined set of head-related transfer function filters to the at least one combined signal in proportion determined by the corresponding set of side information to synthesize a binaural audio signal. A corresponding parametric audio decoder, parametric audio encoder, computer program product, and apparatus for synthesizing a binaural audio signal are also described.

...read moreread less

83 citations

Patent•

Method and system for switching among independent packetized audio streams

[...]

David Israel, Arthur Irvin Laursen, Serkan Recep Dost

29 Jun 2001

TL;DR: In this article, a method and system for noiselessly switching between independent audio streams is presented, which preserves valid RTP information at the time of switch over, and can be used for VOIP calls.

...read moreread less

Abstract: The present invention provides a method and system for noiselessly switching between independent audio streams. Such noiseless switching preserves valid RTP information at the time of switch over. For established VOIP calls, the present invention can noiselessly switch audio from one audio source to another. A switch directs audio data from multiple audio sources to a network interface controller. The switch can be a cell switch or a packet switch. The audio sources can be internal audio sources and/or external audio sources. An egress audio controller controls the operation of internal audio sources, the switch and the network interface controller to carry out noiseless switching according to the present invention. Certain call events which involve additional audio trigger a noiseless switch over.

...read moreread less

83 citations

Proceedings Article•DOI•

Sparse coding for speech recognition

[...]

G.S.V.S. Sivaram¹, Sridhar Krishna Nemala¹, Mounya Elhilali¹, Trac D. Tran¹, Hynek Hermansky¹ - Show less +1 more•Institutions (1)

Johns Hopkins University¹

14 Mar 2010

TL;DR: A novel feature extraction technique for speech recognition based on the principles of sparse coding to express a spectro-temporal pattern of speech as a linear combination of an overcomplete set of basis functions such that the weights of the linear combination are sparse.

...read moreread less

Abstract: This paper proposes a novel feature extraction technique for speech recognition based on the principles of sparse coding. The idea is to express a spectro-temporal pattern of speech as a linear combination of an overcomplete set of basis functions such that the weights of the linear combination are sparse. These weights (features) are subsequently used for acoustic modeling. We learn a set of overcomplete basis functions (dictionary) from the training set by adopting a previously proposed algorithm which iteratively minimizes the reconstruction error and maximizes the sparsity of weights. Furthermore, features are derived using the learned basis functions by applying the well established principles of compressive sensing. Phoneme recognition experiments show that the proposed features outperform the conventional features in both clean and noisy conditions.

...read moreread less

83 citations

Patent•

System and method for speech processing using independent component analysis under stability constraints

[...]

Erik Visser, Te-Won Lee

11 Dec 2003

TL;DR: In this paper, a system and method for separating a mixture of audio signal into desired audio signals (e.g., speech) and a noise signal (440) is disclosed, where microphones are positioned to receive the mixed audio signals, and an independent component analysis (ICA) processes (212) the sound mixture using stability constraints.

...read moreread less

Abstract: A system and method for separating a mixture of audio signal into desired audio signals (430) (e.g., speech) and a noise signal (440) is disclosed. Microphones (310, 320) are positioned to receive the mixed audio signals, and an independent component analysis (ICA) processes (212) the sound mixture using stability constraints. The ICA process (508) uses predefined characteristics of the desired speech signal to identify and isolate a target sound signal (430). Filter coefficients are adapted with a learning rule and filter weight update dynamics are stabilized to assist convergence to a stable separated ICA signal result. The separated signals may be peripherally-processed to further reduce noise effects using post-processing (214) and pre-processing (220, 230) techniques and information. The proposed system is designed and easily adaptable for implementation on DSP units or CPUs in audio communication hardware environments.

...read moreread less

83 citations

Collapse

Network Information

Performance

Metrics

14,368

Papers

279,843

Citations

No. of papers in the topic in previous years
Year	Papers
2023	38
2022	84
2021	70
2020	62
2019	77
2018	108

Speech coding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics