scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Proceedings ArticleDOI
07 May 2001
TL;DR: An improvement in the state-of-the-art large vocabulary continuous speech recognition (LVCSR) performance is demonstrated by the use of visual information, in addition to the traditional audio one, by taking a decision fusion approach for the audio-visual information.
Abstract: We demonstrate an improvement in the state-of-the-art large vocabulary continuous speech recognition (LVCSR) performance, under clean and noisy conditions, by the use of visual information, in addition to the traditional audio one. We take a decision fusion approach for the audio-visual information, where the single-modality (audio- and visual- only) HMM classifiers are combined to recognize audio-visual speech. More specifically, we tackle the problem of estimating the appropriate combination weights for each of the modalities. Two different techniques are described: the first uses an automatically extracted estimate of the audio stream reliability in order to modify the weights for each modality (both clean and noisy audio results are reported), while the second is a discriminative model combination approach where weights on pre-defined model classes are optimized to minimize WER (clean audio only results).

95 citations

Journal ArticleDOI
E. Gunduzhan1, K. Momtahan1
TL;DR: A high performance packet loss concealment algorithm for pulse code modulation (PCM) coded speech that extracts the residual signal of the previously received speech by linear prediction analysis, uses periodic replication to generate an approximation for the excitation signal of missing speech, and generates synthesized speech using this excitation.
Abstract: One of the well-known problems in real-time packetized voice applications is the degradation in voice quality due to delayed or misrouted packets. When a voice packet does not arrive at the receiver on time, the receiver needs a packet loss concealment algorithm to generate a signal instead of the missing voice segment. In this paper we describe a high performance packet loss concealment algorithm for pulse code modulation (PCM) coded speech. The algorithm extracts the residual signal of the previously received speech by linear prediction analysis, uses periodic replication to generate an approximation for the excitation signal of missing speech, and generates synthesized speech using this excitation. It also performs overlap-and-adding and scaling operations to smooth out transitions at frame boundaries. The new algorithm is compared to other algorithms by subjective quality tests, and is found to be better than the existing algorithms in some cases.

95 citations

Patent
30 Jun 2006
TL;DR: In this article, a controller is connected for providing the time-varying control signal, which depends on the audio signal, and the controller is introduced to an encoding processor having different encoding algorithms adapted to a specific signal pattern.
Abstract: An audio encoder, an audio decoder or an audio processor includes a filter for generating a filtered audio signal, the filter having a variable warping characteristic, the characteristic being controllable in response to a time-varying control signal, the control signal indicating a small or no warping characteristic or a comparatively high warping characteristic. Furthermore, a controller is connected for providing the time-varying control signal, which depends on the audio signal. The filtered audio signal can be introduced to an encoding processor having different encoding algorithms, one of which is a coding algorithm adapted to a specific signal pattern. Alternatively, the filter is a post-filter receiving a decoded audio signal.

95 citations

Proceedings ArticleDOI
07 May 2001
TL;DR: This contribution deals with an iterative source-channel decoding approach where a simple channel decoder and a softbit-source decoder are concatenated, and derives a new formula that shows how the residual redundancy transforms into extrinsic information utilizable for iterative decoding.
Abstract: In digital mobile communications, efficient compression algorithms are needed to encode speech or audio signals. As the determined source parameters are highly sensitive to transmission errors, robust source and channel decoding schemes are required. This contribution deals with an iterative source-channel decoding approach where a simple channel decoder and a softbit-source decoder are concatenated. We mainly focus on softbit-source decoding which can be considered as an error concealment technique. This technique utilizes residual redundancy remaining after source coding. We derive a new formula that shows how the residual redundancy transforms into extrinsic information utilizable for iterative decoding. The derived formula opens several starting points for optimizations, e.g. it helps to find a robust index assignment. Furthermore, it allows the conclusion that softbit-source decoding is the limiting factor if applied to iterative decoding processes. Therefore, no significant gain will be obtainable by more than two iterations. This will be demonstrated by simulation.

94 citations

Proceedings ArticleDOI
15 Mar 1999
TL;DR: The experimental results show that the line spectral frequencies (LSFs) are robust features in distinguishing the different classes of noises.
Abstract: Background environmental noises degrade the performance of speech-processing systems (e.g. speech coding, speech recognition). By modifying the processing according to the type of background noise, the performance can be enhanced. This requires noise classification. In this paper, four pattern-recognition frameworks have been used to design noise classification algorithms. Classification is done on a frame-by-frame basis (e.g. once every 20 ms). Five commonly encountered noises in mobile telephony (i.e. car, street, babble, factory, and bus) have been considered in our study. Our experimental results show that the line spectral frequencies (LSFs) are robust features in distinguishing the different classes of noises.

94 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108