Topic
Speech coding
About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.
Papers published on a yearly basis
Papers
More filters
••
01 Jun 1995
TL;DR: Basic approaches to speech, wideband speech, and audio bit rate compressions in audiovisual communications are explained and it will become obvious that the use of the knowledge of auditory perception helps minimizing perception of coding artifacts and leads to efficient low bit rate coding algorithms which can achieve substantially more compression than was thought possible only a few years ago.
Abstract: Current and future visual communications for applications such as broadcasting videotelephony, video- and audiographic-conferencing, and interactive multimedia services assume a substantial audio component. Even text, graphics, fax, still images, email documents, etc. will gain from voice annotation and audio clips. A wide range of speech, wideband speech, and wideband audio coders is available for such applications. In the context of audiovisual communications, the quality of telephone-bandwidth speech is acceptable for some videotelephony and videoconferencing services. Higher bandwidths (wideband speech) may be necessary to improve the intelligibility and naturalness of speech. High quality audio coding including multichannel audio will be necessary in advanced digital TV and multimedia services. This paper explains basic approaches to speech, wideband speech, and audio bit rate compressions in audiovisual communications. These signal classes differ in bandwidth, dynamic range, and in listener expectation of offered quality. It will become obvious that the use of our knowledge of auditory perception helps minimizing perception of coding artifacts and leads to efficient low bit rate coding algorithms which can achieve substantially more compression than was thought possible only a few years ago. The paper concentrates on worldwide source coding standards beneficial for consumers, service providers, and manufacturers. >
62 citations
•
27 Jan 1998TL;DR: In this paper, each speech frame is represented by a weighted average of codebook entries and the weights represent a perceptual distance of the speech frame and may be refined by a gradient descent analysis.
Abstract: A voice conversion system and methodology employing a codebook mapping approach to transforming a source voice to sound like a target voice. Each speech frame is represented by a weighted average of codebook entries (304). The weights represent a perceptual distance of the speech frame and may be refined by a gradient descent analysis. The vocal tract characteristics, represented by a line spectral frequency vector (302), the excitation characteristics (308), represented by a linear predictive coding residual, the duration, and the amplitude of the speech frame are transformed in the same weighted-average framework.
62 citations
•
05 Dec 1997TL;DR: In this article, the authors proposed a speech encoding method for low data transfer speeds, which is suitable for use at low data transmission speeds, because it offers a sound encoding method of even quality and low average bit rate.
Abstract: The invention is related digital speech encoding. In a speech codec according to the invention, for modeling a speech signal (301) both prediction parameters (321, 322, 331) modeling a speech signal in a short term and prediction parameters (341, 342, 351) modeling a speech signal in a long term are used. Each prediction parameter (321, 322, 331, 341, 342, 351) is presented using a certain accuracy, in a digital system with a certain number of bits. In speech encoding according to the invention the number of bits used for presenting prediction parameters (321, 322, 331, 341, 342, 351) is adjusted based upon information parameters (321, 322, 331, 341, 342, 351) obtained from a short-term LPC-analysis (32) and from a long-term LTP-analysis (31, 34, 35). The invention is particularly suitable for use at low data transfer speeds, because it offers a speech encoding method of even quality and low average bit rate.
61 citations
••
25 Mar 2012TL;DR: This paper proposes an effective splicing detection method for audios by detecting abnormal differences in the local noise levels in an audio signal and demonstrates the efficacy and robustness of the proposed method using both synthetic and realistic audio splicing forgeries.
Abstract: One common form of tampering in digital audio signals is known as splicing, where sections from one audio is inserted to another audio. In this paper, we propose an effective splicing detection method for audios. Our method achieves this by detecting abnormal differences in the local noise levels in an audio signal. This estimation of local noise levels is based on an observed property of audio signals that they tend to have kurtosis close to a constant in the band-pass filtered domain. We demonstrate the efficacy and robustness of the proposed method using both synthetic and realistic audio splicing forgeries.
61 citations
•
03 Mar 2005
TL;DR: This book discusses speech recognition techniques using probabilistic finite-state models, and Parsing, a method of Parsing Using Probabilistic grammars, which automates the very labor-intensive and therefore time-heavy and expensive process of parsing.
Abstract: 1. Introduction 2. Sounds and numbers 3. Digital filters and resonators 4. Frequency analysis and linear predictive coding 5. Finite state machines 6. Introduction to speech recognition techniques 7. Probabilistic finite-state models 8. Parsing 9. Using probabilistic grammars.
61 citations