scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Patent
Marquis D. Doyle1
24 Jul 2006
TL;DR: In this paper, an active audio detection circuit determines when one or more of the audio sources become active, and when the two or more audio sources are active simultaneously, the controller directs the highest priority audio source to one or multiple speakers.
Abstract: An audio integrator monitors the outputs of a plurality of audio sources, and a controller prioritizes the audio sources. An active audio detection circuit determines when one or more of the audio sources become active. When the two or more audio sources are active simultaneously, the controller directs the highest priority audio source to one or more speakers. If a lower priority audio signal is currently playing, newly active voice communication audio, such as communications or directional information, is delayed to preserve the beginning of the message during an audio switch-over. A currently-playing, lower-priority audio signal may be decreased in volume, and a tone unique to the new audio source sounded, prior to the switching the audio to the higher-priority source. During audio input (e.g., while actuating a push-to-talk button on a microphone), all active audio sources are quieted.

56 citations

Proceedings Article
01 Jan 1998
TL;DR: It is observed that by selectively combining the cepstral streams representing the LPC parameters and the residual signal it is possible to obtain recognition accuracy directly from the coded parameters that equals or exceeds the recognition accuracy obtained from the reconstructed waveforms.
Abstract: Speech coding affects speech recognition performance, with recognition accuracy deteriorating as the coded bit rate decreases. Virtually all systems that recognize coded speech reconstruct the speech waveform from the coded parameters, and then perform recognition (after possible noise and/or channel compensation) using conventional techniques. In this paper we compare the recognition accuracy of coded speech obtained by reconstructing the speech waveform with the speech recognition accuracy obtained when using cepstral features derived from the coding parameters. We focus our efforts on speech that has been coded using the 13-kbps full-rate GSM codec, a Regular Pulse Excited Long Term Prediction (RPE-LTP) codec. The GSM codec develops separate representations for the linear prediction (LPC) filter and the residual signal components of the coded speech. We measure the effects of quantization and coding on the accuracy with which these parameters are represented, and present two different methods for recombining them for speech recognition purposes. We observe that by selectively combining the cepstral streams representing the LPC parameters and the residual signal it is possible to obtain recognition accuracy directly from the coded parameters that equals or exceeds the recognition accuracy obtained from the reconstructed waveforms.

56 citations

Proceedings ArticleDOI
17 Sep 2000
TL;DR: A speech/music discrimination procedure for multi-mode wideband coding that is suitable for combined speech and audio coding and shows improved performance when compared to single-mode encoding is described.
Abstract: We propose in this paper a general solution for combined speech and audio coding. Particularly, we describe a speech/music discrimination procedure for multi-mode wideband coding. The speech/music decision is updated only when a low-energy frame is detected, and kept unchanged otherwise. The signal is classified using second-order statistics of discriminant parameters. An experimental CELP/transform coder operating at 16 kbit/s is demonstrated. Results show improved performance when compared to single-mode encoding.

56 citations

Proceedings ArticleDOI
01 Mar 1984
TL;DR: It is found that the additional glottal parameters can be coded effectively such that the total bit rate is in the same range as for conventional LPC.
Abstract: A procedure is suggested for improving LPC speech quality. The central theme is to introduce a parametric model of voiced excitation - a glottal source model. In the analysis this allows for a different method than the AR-estimation used in conventional LPC. Here, a method known as AR-X-estimation is used. A complete analysis and coding method is presented. It is found that the additional glottal parameters can be coded effectively such that the total bit rate is in the same range as for conventional LPC. The glottal LPC-vocoder does significantly improve synthesis quality as compared to standard LPC. It should be emphasized, however, that the glottal vocoder requires high quality speech as input, recorded in a phase linear system. Moreover, the computational complexity is high.

56 citations

Patent
Tong Zhang1
14 Dec 2001
TL;DR: In this article, an audio event detector detects audio events in the audio data and indexes the video data at about the beginning of the audio event, and a memory that stores video data and audio data corresponding to video data.
Abstract: A video processing device includes an audio event detector and a memory that stores video data and audio data corresponding to the video data. The audio event detector detects an audio event in the audio data and indexes the video data at about a beginning of the audio event.

56 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108