scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: SpeakSkimmer as discussed by the authors uses speech processing techniques to allow a user to hear recorded sounds quickly, and at several levels of detail, and provides continuous real-time control of the speed and detail level of the audio presentation.
Abstract: Listening to a speech recording is much more difficult than visually scanning a document because of the transient and temporal nature of audio. Audio recordings capture the richness of speech, yet it is difficult to directly browse the stored information. This article describes techniques for structuring, filtering, and presenting recorded speech, allowing a user to navigate and interactively find information in the audio domain. This article describes the SpeechSkimmer system for interactively skimming speech recordings. SpeechSkimmer uses speech-processing techniques to allow a user to hear recorded sounds quickly, and at several levels of detail. User interaction, through a manual input device, provides continuous real-time control of the speed and detail level of the audio presentation. SpeechSkimmer reduces the time needed to listen by incorporating time-compressed speech, pause shortening, automatic emphasis detection, and nonspeech audio feedback. This article also presents a multilevel structural approach to auditory skimming and user interface techniques for interacting with recorded speech. An observational usability test of SpeechSkimmer is discussed, as well as a redesign and reimplementation of the user interface based on the results of this usability test.

253 citations

Journal ArticleDOI
TL;DR: A rationale is advanced for digitally coding speech signals in terms of sub-bands of the total spectrum, which provides a means for controlling and reducing quantizing noise in the coding.
Abstract: A rationale is advanced for digitally coding speech signals in terms of sub-bands of the total spectrum. The approach provides a means for controlling and reducing quantizing noise in the coding. Each sub-band is quantized with an accuracy (bit allocation) based upon perceptual criteria. As a result, the quality of the coded signal is improved over that obtained from a single full-band coding of the total spectrum. In one implementation, the individual sub-bands are low-pass translated before coding. In another, “integer-band” sampling is employed to alias the signal in an advantageous way before coding. Other possibilities extend to complex demodulation of the sub-bands, and to representing the sub-band signals in terms of envelopes and phase-derivatives. In all techniques, adaptive quantization is used for the coding, and a parsimonious allocation of bits is made across the bands. Computer simulations are made to demonstrate the signal qualities obtained for codings at 16 and 9.6 kb/s.

252 citations

Proceedings ArticleDOI
Ara V. Nefian1, Luhong Liang1, Xiaobo Pi1, Liu Xiaoxiang1, Crusoe Mao1, Kevin Murphy1 
13 May 2002
TL;DR: This paper introduces a novel audio-visual fusion technique that uses a coupled hidden Markov model (HMM) to model the state asynchrony of the audio and visual observations sequences while still preserving their natural correlation over time.
Abstract: In recent years several speech recognition systems that use visual together with audio information showed significant increase in performance over the standard speech recognition systems. The use of visual features is justified by both the bimodality of the speech generation and by the need of features that are invariant to acoustic noise perturbation. The audio-visual speech recognition system presented in this paper introduces a novel audio-visual fusion technique that uses a coupled hidden Markov model (HMM). The statistical properties of the coupled-HMM allow us to model the state asynchrony of the audio and visual observations sequences while still preserving their natural correlation over time. The experimental results show that the coupled HMM outperforms the multistream HMM in audio visual speech recognition.

252 citations

Journal ArticleDOI
TL;DR: Experiments on a database composed of about 4- hour audio data show that the proposed classifier is very efficient on audio classification and segmentation and shows the accuracy of the SVM-based method is much better than the method based on KNN and GMM.
Abstract: Content-based audio classification and segmentation is a basis for further audio/video analysis. In this paper, we present our work on audio segmentation and classification which employs support vector machines (SVMs). Five audio classes are considered in this paper: silence, music, background sound, pure speech, and non- pure speech which includes speech over music and speech over noise. A sound stream is segmented by classifying each sub-segment into one of these five classes. We have evaluated the performance of SVM on different audio type-pairs classification with testing unit of different- length and compared the performance of SVM, K-Nearest Neighbor (KNN), and Gaussian Mixture Model (GMM). We also evaluated the effectiveness of some new proposed features. Experiments on a database composed of about 4- hour audio data show that the proposed classifier is very efficient on audio classification and segmentation. It also shows the accuracy of the SVM-based method is much better than the method based on KNN and GMM.

251 citations

Journal ArticleDOI
TL;DR: Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores.
Abstract: The evaluation of intelligibility of noise reduction algorithms is reported. IEEE sentences and consonants were corrupted by four types of noise including babble, car, street and train at two signal-to-noise ratio levels (0 and 5 dB), and then processed by eight speech enhancement methods encompassing four classes of algorithms: spectral subtractive, sub-space, statistical model based and Wiener-type algorithms. The enhanced speech was presented to normal-hearing listeners for identification. With the exception of a single noise condition, no algorithm produced significant improvements in speech intelligibility. Information transmission analysis of the consonant confusion matrices indicated that no algorithm improved significantly the place feature score, significantly, which is critically important for speech recognition. The algorithms which were found in previous studies to perform the best in terms of overall quality, were not the same algorithms that performed the best in terms of speech intelligibility. The subspace algorithm, for instance, was previously found to perform the worst in terms of overall quality, but performed well in the present study in terms of preserving speech intelligibility. Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores.

251 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108