Topic
Speech coding
About: Speech coding is a research topic. Over the lifetime, 14245 publications have been published within this topic receiving 271964 citations.
Papers published on a yearly basis
Papers
More filters
•
IBM1
TL;DR: In this paper, a method for generating personalized speech from text includes the steps of analyzing the input text to get standard parameters of the speech to be synthesized from a standard text-to-speech database.
Abstract: A method for generating personalized speech from text includes the steps of analyzing the input text to get standard parameters of the speech to be synthesized from a standard text-to-speech database; mapping the standard speech parameters to the personalized speech parameters via a personalization model obtained in a training process; and synthesizing speech of the input text based on the personalized speech parameters. The method can be used to simulate the speech of the target person so as to make the speech produced by a TTS system more attractive and personalized.
127 citations
••
01 Apr 1985TL;DR: Design methods for vector quantizers with the embedded coding property are presented and their performance simulated for the medium-band of 32-8 Kbits/sec.
Abstract: Embedded speech coders are characterized by the property that their output quality degrades gracefully as their bit rate is decreased. Design methods for vector quantizers with the embedded coding property are presented and their performance simulated for the medium-band of 32-8 Kbits/sec. Listening tests indicate that these coders can provide good quality speech at 32 and 24 Kbits/sec and intelligible speech down to 8 Kbits/sec.
127 citations
••
TL;DR: It is revealed that, contrary to existing thought, the inactive frames of VoIP streams are more suitable for data embedding than the active frames of the streams; that is, steganography in the inactive audio frames attains a largerData embedding capacity than that in the active audio frames under the same imperceptibility.
Abstract: This paper describes a novel high-capacity steganography algorithm for embedding data in the inactive frames of low bit rate audio streams encoded by G.723.1 source codec, which is used extensively in Voice over Internet Protocol (VoIP). This study reveals that, contrary to existing thought, the inactive frames of VoIP streams are more suitable for data embedding than the active frames of the streams; that is, steganography in the inactive audio frames attains a larger data embedding capacity than that in the active audio frames under the same imperceptibility. By analyzing the concealment of steganography in the inactive frames of low bit rate audio streams encoded by G.723.1 codec with 6.3 kb/s, the authors propose a new algorithm for steganography in different speech parameters of the inactive frames. Performance evaluation shows embedding data in various speech parameters led to different levels of concealment. An improved voice activity detection algorithm is suggested for detecting inactive audio frames taking into packet loss account. Experimental results show our proposed steganography algorithm not only achieved perfect imperceptibility but also gained a high data embedding rate up to 101 bits/frame, indicating that the data embedding capacity of the proposed algorithm is very much larger than those of previously suggested algorithms.
127 citations
••
01 Apr 1986TL;DR: A new speech coding technique at low bit-rate is presented, which split the incoming speech signal into two frequency bands in order to gain the benefits of the piecewise LP (Linear Prediction) approximation.
Abstract: A new speech coding technique at low bit-rate is presented in this paper. The coder is based upon a novel speech production model, independently developed by the authors [1,2] and by Atal and Schroeder [3,4], called CELP (Codebook Excited Linear Prediction). Differences exist between the two approaches, both in the strategy chosen to construct codebooks, and in the method to generate the innovation sequence. In this scheme, we split the incoming speech signal into two frequency bands in order to gain the benefits of the piecewise LP (Linear Prediction) approximation. Then, each residual signal is coded in blocks of 5-ms duration through an adaptive vector quantizer incorporating a noise shaping filter. Our results show that good quality speech can be obtained at 8 kbit/s.
127 citations
••
07 May 1996TL;DR: A novel high quality audio coding method using adaptive signal representation, based on sinusoidal and wavelet analysis of signals, which separates out tones, transients, and broadband noise.
Abstract: We describe a novel high quality audio coding method using adaptive signal representation, based on sinusoidal and wavelet analysis of signals. First, we perform a harmonic analysis of the signal to remove strong periodic structures or tones from the signal. Then we carry out wavelet analysis that are useful in tracking the transients of the signal. These transients are then removed from the wavelet coefficients. The remaining coefficients have broadband noise-like structure. Since this method separates out tones (sinusoids), transients, and broadband noise, we may use tonal, noise, and temporal masking information to individually encode the tones and the wavelet coefficients. Our experiments suggest that this method yields a nominal bit rate of 1 bit/sample for high quality audio compression.
126 citations