scispace - formally typeset
Search or ask a question
Topic

Speech coding

About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.


Papers
More filters
Proceedings ArticleDOI
20 Jun 1999
TL;DR: Tests show that the wide-band speech reconstructed with the new method of regenerating the high frequencies based on vector quantization of the mel-frequency cepstral coefficients is significantly more pleasant to the human ear than the original narrowband speech.
Abstract: Telephone speech is usually limited to less than 4 kHz in bandwidth. This bandwidth limitation results in the typical sound of telephone speech. We present a new method of regenerating the high frequencies (4-8 kHz) based on vector quantization of the mel-frequency cepstral coefficients (MFCC). We also present two methods to avoid perceptually annoying overestimates of the signal power in the high-band. Listening tests show the benefits of the new procedures. Use of MFCC for vector quantization instead of traditionally used spectral representations improves the quality of the speech significantly. Tests also show that the wide-band speech reconstructed with the method is significantly more pleasant to the human ear than the original narrowband speech.

103 citations

Patent
20 Mar 2000
TL;DR: In this article, a speech recognition operation is performed on the audio data initially using a speaker independent acoustic model and the recognized text in addition to audio time stamps are produced by the speech recognition operator.
Abstract: Automated methods and apparatus for synchronizing audio and text data, e.g., in the form of electronic files, representing audio and text expressions of the same work or information are described. Also described are automated methods of detecting errors and other discrepancies between the audio and text versions of the same work. A speech recognition operation is performed on the audio data initially using a speaker independent acoustic model. The recognized text in addition to audio time stamps are produced by the speech recognition operation. The recognized text is compared to the text in text data to identify correctly recognized words. The acoustic model is then retrained using the correctly recognized text and corresponding audio segments from the audio data transforming the initial acoustic model into a speaker trained acoustic model. The retrained acoustic model is then used to perform an additional speech recognition operation on the audio data. The audio and text data are synchronized using the results of the updated acoustic model. In addition, one or more error reports based on the final recognition results are generated showing discrepancies between the recognized words and the words included in the text. By retraining the acoustic model in the above described manner, improved accuracy is achieved.

103 citations

Proceedings ArticleDOI
01 Apr 1980
TL;DR: The development of a digital encoding system designed to exploit the limited detection ability of the auditory system is described, dynamically shaping the encoding error spectrum as a function of the input speech signal, the error is masked by the speech.
Abstract: The development of a digital encoding system designed to exploit the limited detection ability of the auditory system is described. By dynamically shaping the encoding error spectrum as a function of the input speech signal, the error is masked by the speech. Psychoacoustic experiments and results from the literature provide a basis for determining the system parameters that ensure that the error is inaudible. The encoder is a multi-channel system, each channel approximately of critical bandwidth. The input signal is filtered into 17 frequency channels via the quadrature mirror filter technique. Each channel is then coded using block-companding adaptive PCM. For 4.1 kHz bandwidth speech, the differential threshold of the encoding degradation occurs at a bit rate of 34.4 kbps. At 16 kbps, the encoder produces toll quality speech output.

103 citations

PatentDOI
Yu-Jih Liu1
TL;DR: In this paper, a speech coding system employs measurements of robust features of speech frames whose distribution is not strongly affected by noise/levels to make voicing decisions for input speech occurring in a noisy environment.
Abstract: A speech coding system employs measurements of robust features of speech frames whose distribution are not strongly affected by noise/levels to make voicing decisions for input speech occurring in a noisy environment. Linear programing analysis of the robust features and respective weights are used to determine an optimum linear combination of these features. The input speech vectors are matched to a vocabulary of codewords in order to select the corresponding, optimally matching codeword. Adaptive vector quantization is used in which a vocabulary of words obtained in a quiet environment is updated based upon a noise estimate of a noisy environment in which the input speech occurs, and the "noisy" vocabulary is then searched for the best match with an input speech vector. The corresponding clean codeword index is then selected for transmission and for synthesis at the receiver end. The results are better spectral reproduction and significant intelligibility enhancement over prior coding approaches. Robust features found to allow robust voicing decisions include: low-band energy; zero-crossing counts adapted for noise level; AMDF ratio (speech periodicity) measure; low-pass filtered backward correlation; low-pass filtered forward correlation; inverse-filtered backward correlation; and inverse-filtered pitch prediction gain measure.

103 citations

Patent
20 Dec 2013
TL;DR: In this article, the authors propose a speech-triggered transition of a host processor and/or computing device from a low functionality mode to a high functionality mode in which full vocabulary speech recognition can be accomplished.
Abstract: Disclosed are embodiments for seamless, single-step, and speech-triggered transition of a host processor and/or computing device from a low functionality mode to a high functionality mode in which full vocabulary speech recognition can be accomplished. First audio samples are captured by a low power audio processor while the host processor is in a low functionality mode. The low power audio processor may identify a predetermined audio pattern. The low power audio processor, upon identifying the predetermined audio pattern, triggers the host processor to transition to a high functionality mode. An end portion of the first audio samples that follow an end-point of the predetermined audio pattern may be stored in system memory accessible by the host processor. Second audio samples are captured and stored with the end portion of the first audio samples. Once the host processor transitions to a high functionality mode, multi-channel full vocabulary speech recognition can be performed and functions can be executed based on detected speech interaction phrases.

103 citations


Network Information
Related Topics (5)
Signal processing
73.4K papers, 983.5K citations
86% related
Decoding methods
65.7K papers, 900K citations
84% related
Fading
55.4K papers, 1M citations
80% related
Feature vector
48.8K papers, 954.4K citations
80% related
Feature extraction
111.8K papers, 2.1M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202338
202284
202170
202062
201977
2018108