Topic
Speech coding
About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.
Papers published on a yearly basis
Papers
More filters
••
07 May 2001
TL;DR: An improvement in the state-of-the-art large vocabulary continuous speech recognition (LVCSR) performance is demonstrated by the use of visual information, in addition to the traditional audio one, by taking a decision fusion approach for the audio-visual information.
Abstract: We demonstrate an improvement in the state-of-the-art large vocabulary continuous speech recognition (LVCSR) performance, under clean and noisy conditions, by the use of visual information, in addition to the traditional audio one. We take a decision fusion approach for the audio-visual information, where the single-modality (audio- and visual- only) HMM classifiers are combined to recognize audio-visual speech. More specifically, we tackle the problem of estimating the appropriate combination weights for each of the modalities. Two different techniques are described: the first uses an automatically extracted estimate of the audio stream reliability in order to modify the weights for each modality (both clean and noisy audio results are reported), while the second is a discriminative model combination approach where weights on pre-defined model classes are optimized to minimize WER (clean audio only results).
95 citations
••
TL;DR: A high performance packet loss concealment algorithm for pulse code modulation (PCM) coded speech that extracts the residual signal of the previously received speech by linear prediction analysis, uses periodic replication to generate an approximation for the excitation signal of missing speech, and generates synthesized speech using this excitation.
Abstract: One of the well-known problems in real-time packetized voice applications is the degradation in voice quality due to delayed or misrouted packets. When a voice packet does not arrive at the receiver on time, the receiver needs a packet loss concealment algorithm to generate a signal instead of the missing voice segment. In this paper we describe a high performance packet loss concealment algorithm for pulse code modulation (PCM) coded speech. The algorithm extracts the residual signal of the previously received speech by linear prediction analysis, uses periodic replication to generate an approximation for the excitation signal of missing speech, and generates synthesized speech using this excitation. It also performs overlap-and-adding and scaling operations to smooth out transitions at frame boundaries. The new algorithm is compared to other algorithms by subjective quality tests, and is found to be better than the existing algorithms in some cases.
95 citations
•
30 Jun 2006
TL;DR: In this article, a controller is connected for providing the time-varying control signal, which depends on the audio signal, and the controller is introduced to an encoding processor having different encoding algorithms adapted to a specific signal pattern.
Abstract: An audio encoder, an audio decoder or an audio processor includes a filter for generating a filtered audio signal, the filter having a variable warping characteristic, the characteristic being controllable in response to a time-varying control signal, the control signal indicating a small or no warping characteristic or a comparatively high warping characteristic. Furthermore, a controller is connected for providing the time-varying control signal, which depends on the audio signal. The filtered audio signal can be introduced to an encoding processor having different encoding algorithms, one of which is a coding algorithm adapted to a specific signal pattern. Alternatively, the filter is a post-filter receiving a decoded audio signal.
95 citations
••
07 May 2001TL;DR: This contribution deals with an iterative source-channel decoding approach where a simple channel decoder and a softbit-source decoder are concatenated, and derives a new formula that shows how the residual redundancy transforms into extrinsic information utilizable for iterative decoding.
Abstract: In digital mobile communications, efficient compression algorithms are needed to encode speech or audio signals. As the determined source parameters are highly sensitive to transmission errors, robust source and channel decoding schemes are required. This contribution deals with an iterative source-channel decoding approach where a simple channel decoder and a softbit-source decoder are concatenated. We mainly focus on softbit-source decoding which can be considered as an error concealment technique. This technique utilizes residual redundancy remaining after source coding. We derive a new formula that shows how the residual redundancy transforms into extrinsic information utilizable for iterative decoding. The derived formula opens several starting points for optimizations, e.g. it helps to find a robust index assignment. Furthermore, it allows the conclusion that softbit-source decoding is the limiting factor if applied to iterative decoding processes. Therefore, no significant gain will be obtainable by more than two iterations. This will be demonstrated by simulation.
94 citations
••
15 Mar 1999TL;DR: The experimental results show that the line spectral frequencies (LSFs) are robust features in distinguishing the different classes of noises.
Abstract: Background environmental noises degrade the performance of speech-processing systems (e.g. speech coding, speech recognition). By modifying the processing according to the type of background noise, the performance can be enhanced. This requires noise classification. In this paper, four pattern-recognition frameworks have been used to design noise classification algorithms. Classification is done on a frame-by-frame basis (e.g. once every 20 ms). Five commonly encountered noises in mobile telephony (i.e. car, street, babble, factory, and bus) have been considered in our study. Our experimental results show that the line spectral frequencies (LSFs) are robust features in distinguishing the different classes of noises.
94 citations