Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

SpeechSkimmer: a system for interactively skimming recorded speech

[...]

Barry Arons¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Mar 1997-ACM Transactions on Computer-Human Interaction

TL;DR: SpeakSkimmer as discussed by the authors uses speech processing techniques to allow a user to hear recorded sounds quickly, and at several levels of detail, and provides continuous real-time control of the speed and detail level of the audio presentation.

...read moreread less

Abstract: Listening to a speech recording is much more difficult than visually scanning a document because of the transient and temporal nature of audio. Audio recordings capture the richness of speech, yet it is difficult to directly browse the stored information. This article describes techniques for structuring, filtering, and presenting recorded speech, allowing a user to navigate and interactively find information in the audio domain. This article describes the SpeechSkimmer system for interactively skimming speech recordings. SpeechSkimmer uses speech-processing techniques to allow a user to hear recorded sounds quickly, and at several levels of detail. User interaction, through a manual input device, provides continuous real-time control of the speed and detail level of the audio presentation. SpeechSkimmer reduces the time needed to listen by incorporating time-compressed speech, pause shortening, automatic emphasis detection, and nonspeech audio feedback. This article also presents a multilevel structural approach to auditory skimming and user interface techniques for interacting with recorded speech. An observational usability test of SpeechSkimmer is discussed, as well as a redesign and reimplementation of the user interface based on the results of this usability test.

...read moreread less

253 citations

Journal Article•DOI•

Digital coding of speech in sub-bands

[...]

R. Crochiere¹, S. A. Webber, James L. Flanagan•Institutions (1)

Bell Labs¹

01 Oct 1976-Bell System Technical Journal

TL;DR: A rationale is advanced for digitally coding speech signals in terms of sub-bands of the total spectrum, which provides a means for controlling and reducing quantizing noise in the coding.

...read moreread less

Abstract: A rationale is advanced for digitally coding speech signals in terms of sub-bands of the total spectrum. The approach provides a means for controlling and reducing quantizing noise in the coding. Each sub-band is quantized with an accuracy (bit allocation) based upon perceptual criteria. As a result, the quality of the coded signal is improved over that obtained from a single full-band coding of the total spectrum. In one implementation, the individual sub-bands are low-pass translated before coding. In another, “integer-band” sampling is employed to alias the signal in an advantageous way before coding. Other possibilities extend to complex demodulation of the sub-bands, and to representing the sub-band signals in terms of envelopes and phase-derivatives. In all techniques, adaptive quantization is used for the coding, and a parsimonious allocation of bits is made across the bands. Computer simulations are made to demonstrate the signal qualities obtained for codings at 16 and 9.6 kb/s.

...read moreread less

252 citations

Proceedings Article•DOI•

A coupled HMM for audio-visual speech recognition

[...]

Ara V. Nefian¹, Luhong Liang¹, Xiaobo Pi¹, Liu Xiaoxiang¹, Crusoe Mao¹, Kevin Murphy¹ - Show less +2 more•Institutions (1)

Intel¹

13 May 2002

TL;DR: This paper introduces a novel audio-visual fusion technique that uses a coupled hidden Markov model (HMM) to model the state asynchrony of the audio and visual observations sequences while still preserving their natural correlation over time.

...read moreread less

Abstract: In recent years several speech recognition systems that use visual together with audio information showed significant increase in performance over the standard speech recognition systems. The use of visual features is justified by both the bimodality of the speech generation and by the need of features that are invariant to acoustic noise perturbation. The audio-visual speech recognition system presented in this paper introduces a novel audio-visual fusion technique that uses a coupled hidden Markov model (HMM). The statistical properties of the coupled-HMM allow us to model the state asynchrony of the audio and visual observations sequences while still preserving their natural correlation over time. The experimental results show that the coupled HMM outperforms the multistream HMM in audio visual speech recognition.

...read moreread less

252 citations

Journal Article•DOI•

Content-based audio classification and segmentation by using support vector machines

[...]

Lie Lu¹, Hong-Jiang Zhang¹, Stan Z. Li¹•Institutions (1)

Microsoft¹

01 Apr 2003-Multimedia Systems

TL;DR: Experiments on a database composed of about 4- hour audio data show that the proposed classifier is very efficient on audio classification and segmentation and shows the accuracy of the SVM-based method is much better than the method based on KNN and GMM.

...read moreread less

Abstract: Content-based audio classification and segmentation is a basis for further audio/video analysis. In this paper, we present our work on audio segmentation and classification which employs support vector machines (SVMs). Five audio classes are considered in this paper: silence, music, background sound, pure speech, and non- pure speech which includes speech over music and speech over noise. A sound stream is segmented by classifying each sub-segment into one of these five classes. We have evaluated the performance of SVM on different audio type-pairs classification with testing unit of different- length and compared the performance of SVM, K-Nearest Neighbor (KNN), and Gaussian Mixture Model (GMM). We also evaluated the effectiveness of some new proposed features. Experiments on a database composed of about 4- hour audio data show that the proposed classifier is very efficient on audio classification and segmentation. It also shows the accuracy of the SVM-based method is much better than the method based on KNN and GMM.

...read moreread less

251 citations

Journal Article•DOI•

A comparative intelligibility study of single-microphone noise reduction algorithms.

[...]

Yi Hu¹, Philipos C. Loizou•Institutions (1)

University of Texas at Dallas¹

09 Oct 2007-Journal of the Acoustical Society of America

TL;DR: Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores.

...read moreread less

Abstract: The evaluation of intelligibility of noise reduction algorithms is reported. IEEE sentences and consonants were corrupted by four types of noise including babble, car, street and train at two signal-to-noise ratio levels (0 and 5 dB), and then processed by eight speech enhancement methods encompassing four classes of algorithms: spectral subtractive, sub-space, statistical model based and Wiener-type algorithms. The enhanced speech was presented to normal-hearing listeners for identification. With the exception of a single noise condition, no algorithm produced significant improvements in speech intelligibility. Information transmission analysis of the consonant confusion matrices indicated that no algorithm improved significantly the place feature score, significantly, which is critically important for speech recognition. The algorithms which were found in previous studies to perform the best in terms of overall quality, were not the same algorithms that performed the best in terms of speech intelligibility. The subspace algorithm, for instance, was previously found to perform the worst in terms of overall quality, but performed well in the present study in terms of preserving speech intelligibility. Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores.

...read moreread less

251 citations

Collapse

Network Information

Performance

Metrics

14,368

Papers

279,843

Citations

No. of papers in the topic in previous years
Year	Papers
2023	38
2022	84
2021	70
2020	62
2019	77
2018	108

Speech coding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics