scispace - formally typeset
Proceedings ArticleDOI

A new model of LPC excitation for producing natural-sounding speech at low bit rates

B. Atal, +1 more
- Vol. 7, pp 614-617
Reads0
Chats0
TLDR
This paper describes a new approach to the excitation problem that does not require a priori knowledge of either the voiced-unvoiced decision or the pitch period, and minimizes a perceptual-distance metric representing subjectively-important differences between the waveforms of the original and the synthetic speech signals.
Abstract
The excitation for LPC speech synthesis usually consists of two separate signals - a delta-function pulse once every pitch period for voiced speech and white noise for unvoiced speech. This manner of representing excitation requires that speech segments be classified accurately into voiced and unvoiced categories and the pitch period of voiced segments be known. It is now well recognized that such a rigid idealization of the vocal excitation is often responsible for the unnatural quality associated with synthesized speech. This paper describes a new approach to the excitation problem that does not require a priori knowledge of either the voiced-unvoiced decision or the pitch period. All classes of sounds are generated by exciting the LPC filter with a sequence of pulses; the amplitudes and locations of the pulses are determined using a non-iterative analysis-by-synthesis procedure. This procedure minimizes a perceptual-distance metric representing subjectively-important differences between the waveforms of the original and the synthetic speech signals. The distance metric takes account of the finite-frequency resolution as well as the differential sensitivity of the human ear to errors in the formant and inter-formant regions of the speech spectrum.

read more

Citations
More filters
Book

Introduction to data compression

TL;DR: The author explains the development of the Huffman Coding Algorithm and some of the techniques used in its implementation, as well as some of its applications, including Image Compression, which is based on the JBIG standard.
Journal ArticleDOI

Speech analysis/Synthesis based on a sinusoidal representation

TL;DR: A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves, which forms the basis for new approaches to the problems of speech transformations including time-scale and pitch-scale modification, and midrate speech coding.
Journal ArticleDOI

Sparse solutions to linear inverse problems with multiple measurement vectors

TL;DR: This work considers in depth the extension of two classes of algorithms-Matching Pursuit and FOCal Underdetermined System Solver-to the multiple measurement case so that they may be used in applications such as neuromagnetic imaging, where multiple measurement vectors are available, and solutions with a common sparsity structure must be computed.
Book

Discrete-Time Speech Signal Processing: Principles and Practice

TL;DR: This chapter discusses the Discrete-Time Speech Signal Processing Framework, a model based on the FBS Method, and its applications in Speech Communication Pathway and Homomorphic Signal Processing.
Journal ArticleDOI

Vector quantization in speech coding

TL;DR: This tutorial review presents the basic concepts employed in vector quantization and gives a realistic assessment of its benefits and costs when compared to scalar quantization, and focuses primarily on the coding of speech signals and parameters.
References
More filters
Book

Linear Prediction of Speech

John E. Markel, +1 more
TL;DR: Speech Analysis and Synthesis Models: Basic Physical Principles, Speech Synthesis Structures, and Considerations in Choice of Analysis.
Book

Speech Analysis, Synthesis and Perception

TL;DR: A second edition was begun in 1970, the aim was to retain the original format, but to expand the content, especially in the areas of digital communications and com puter techniques for speech signal processing.
Journal ArticleDOI

Speech analysis and synthesis by linear prediction of the speech wave.

TL;DR: Application of this method for efficient transmission and storage of speech signals as well as procedures for determining other speechcharacteristics, such as formant frequencies and bandwidths, the spectral envelope, and the autocorrelation function, are discussed.
Journal ArticleDOI

Optimizing digital speech coders by exploiting masking properties of the human ear

TL;DR: New results of masking and loudness reduction of noise are reported and the design principles of speech coding systems exploiting auditory masking are described.
Journal ArticleDOI

Predictive coding of speech signals and subjective error criteria

TL;DR: Improved speech quality is obtained by efficient removal of formant and pitch-related redundant structure of speech before quantizing, and by effective masking of the quantizer noise by the speech signal.