Showing papers on "Linear predictive coding published in 1978"

PDF

Open Access

Book•

[...]

05 Sep 1978

TL;DR: This paper presents a meta-modelling framework for digital Speech Processing for Man-Machine Communication by Voice that automates the very labor-intensive and therefore time-heavy and expensive process of encoding and decoding speech.

...read moreread less

Abstract: 1. Introduction. 2. Fundamentals of Digital Speech Processing. 3. Digital Models for the Speech Signal. 4. Time-Domain Models for Speech Processing. 5. Digital Representation of the Speech Waveform. 6. Short-Time Fourier Analysis. 7. Homomorphic Speech Processing. 8. Linear Predictive Coding of Speech. 9. Digital Speech Processing for Man-Machine Communication by Voice.

...read moreread less

3,103 citations

Journal Article•DOI•

All-pole modeling of degraded speech

[...]

Jae Lim¹, Alan V. Oppenheim¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jun 1978-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: This paper considers the estimation of speech parameters in an all-pole model when the speech has been degraded by additive background noise and develops a procedure based on maximum a posteriori (MAP) estimation techniques which is related to linear prediction analysis of speech.

...read moreread less

Abstract: This paper considers the estimation of speech parameters in an all-pole model when the speech has been degraded by additive background noise. The procedure, based on maximum a posteriori (MAP) estimation techniques is first developed in the absence of noise and related to linear prediction analysis of speech. The modification in the presence of background noise is shown to be nonlinear. Two suboptimal procedures are suggested which have linear iterative implementations. A preliminary illustration and discussion based both on a synthetic example and real speech data are given.

...read moreread less

590 citations

Journal Article•DOI•

Adaptive noise canceling for speech signals

[...]

M. Sambur

01 Oct 1978-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Preliminary tests indicate that the least mean-square adaptive filtering approach for removing the deleterious effects of additive noise on the speech signal improves the perceived speech quality and increases the signal-to-noise ratio (SNR) by 7 dB in a 0 dB environment.

...read moreread less

Abstract: A least mean-square (LMS) adaptive filtering approach has been formulated for removing the deleterious effects of additive noise on the speech signal. Unlike the classical LMS adaptive filtering scheme, the proposed method is designed to cancel out the clean speech signal. This method takes advantage of the quasi-periodic nature of the speech signal to form an estimate of the clean speech signal at time t from the value of the signal at time t minus the estimated pitch period. For additive white noise distortion, preliminary tests indicate that the method improves the perceived speech quality and increases the signal-to-noise ratio (SNR) by 7 dB in a 0 dB environment. The method has also been shown to partially remove the perceived granularity of CVSD coded speech signals and to lead to an improvement in the linear prediction analysis/synthesis of noisy speech.

...read moreread less

207 citations

Proceedings Article•DOI•

Predictive coding of speech signals and subjective error criteria

[...]

Bishnu S. Atal¹, Manfred R. Schroeder²•Institutions (2)

Bell Labs¹, University of Göttingen²

10 Apr 1978

TL;DR: Improved speech quality is obtained a) by efficient removal of formant and pitch related redundant structure of speech before quantizing and b) by effective masking of the quantizer noise by the speech signal.

...read moreread less

Abstract: Predictive coding methods attempt to minimize the r.m.s. error in the coded signal. However, the human ear does not perceive signal distortion on the basis of r.m.s. error regardless of its spectral shape relative to the signal spectrum. Specifically, for speech signals, the locations of the formant frequencies and their rates of change with time influence the audibility, and thus the subjective distortion of any quantizing noise. In this paper, methods for reducing the subjective distortion in predictive coders for speech siganls are described and evaluated. Improved speech quality is obtained a) by efficient removal of formant and pitch related redundant structure of speech before quantizing and b) by effective masking of the quantizer noise by the speech signal.

...read moreread less

94 citations

Patent•DOI•

System and method for speech recognition

[...]

John Marley

08 May 1978-Journal of the Acoustical Society of America

TL;DR: A system and method for speech recognition provides a means of printing phonemes in response to received speech signals utilizing inexpensive components and an algorithm for detecting major slope transitions of the analog speech signals.

...read moreread less

Abstract: A system and method for speech recognition provides a means of printing phonemes in response to received speech signals utilizing inexpensive components. The speech signals are inputted into an amplifier which provides negative feedback to normalize the amplitude of the speech signals. The normalized speech signals are delta modulated at a first sampling rate to produce a corresponding first sequence of digital pulses. The negative feedback signal of the amplifier is delta modulated at a second sampling rate to produce a second sequence of digital pulses corresponding to amplitude information of the speech signals. The speech signals are filtered and utilized to produce a digital pulse corresponding to high frequency components of the speech signals having magnitudes in excess of a threshold voltage. A microprocessor contains an algorithm for detecting major slope transitions of the analog speech signals in response to the first sequence of digital signals by detecting information corresponding to presence and absence of predetermined numbers of successive slope reversals in the delta modulator producing the first sequence of digital pulses. The algorithm computes cues from the high frequency digital pulse and the second sequence of pulses. The algorithm computes a plurality of speech waveform characteristic ratios of time intervals between various slope transitions and compares the speech waveform characteristic ratios with a plurality of stored phoneme ratios representing a set of phonemes to detect matching therebetween. The order of comparing is determined on the basis of the cues and a configuration of a phoneme decision tree contained in the algorithm. When a matching occurs, a signal corresponding to the matched phoneme is produced and utilized to cause the phoneme to be printed. In one embodiment of the invention, the speech signals are produced by the earphone of a standard telephone headset.

...read moreread less

60 citations

Journal Article•DOI•

Time-encoded speech

[...]

R.A. King¹, W. Gosling²•Institutions (2)

United Kingdom Ministry of Defence¹, University of Bath²

20 Jul 1978-Electronics Letters

TL;DR: A new method of digitising speech waveforms is described, based on the comparison of successive segments of the waveform with a suitably stored catalogue of possible distinct shapes.

...read moreread less

Abstract: A new method of digitising speech waveforms is described, based on the comparison of successive segments of the waveform with a suitably stored catalogue of possible distinct shapes.

...read moreread less

34 citations

Proceedings Article•DOI•

9.6/7.2 Kbps Voice excited predictive coder (VEPC)

[...]

Daniel Esteban¹, Claude Galand, D. Mauduit, J. Menez•Institutions (1)

IBM¹

01 Apr 1978

TL;DR: This coding scheme, in addition to the baseband excitation concepts, takes advantage of the association of recently published digital speech processing techniques such that transversal predictive coding, splitband coding by signal decimation/interpolation and adaptive block quantization.

...read moreread less

Abstract: This paper describes a common voice coding architecture based on a Voice Excited Predictive Coding (VEPC) scheme allowing operation at different bit rates : 9600, 7200 bps or below by simply modifying the bandwidth allocated to the coding of the baseband excitation signal. This coding scheme, in addition to the baseband excitation concepts, takes advantage of the association of recently published digital speech processing techniques such that transversal predictive coding, splitband coding by signal decimation/interpolation and adaptive block quantization. Simulations have shown that the proposed architecture allows to obtain a 'standard telephone quality' assuming a 300-3400 Hz telephone bandwidth at transmission rates below 9600 bps.

...read moreread less

28 citations

Proceedings Article•DOI•

Studies on pattern recognition approach to voiced-unvoiced-silence classification

[...]

V. Sarma¹, D. Venugopal¹•Institutions (1)

Indian Institute of Science¹

10 Apr 1978

TL;DR: It is demonstrated that it is possible to achieve pattern recognition classification with much less computational effort by adopting a scheme based on the concept of variable decision space, using only three features and by avoiding the time consuming linear prediction analysis.

...read moreread less

Abstract: A pattern recognition approach for deciding whether a given segment of speech should be classified as voiced speech, unvoiced speech or silence based on a set of five measurements of the signal is given by Atal and Rabiner [1]. In this paper, we demonstrate that it is possible to achieve this classification with much less computational effort. These computational savings are mainly achieved by adopting a scheme based on the concept of variable decision space, using only three features and by avoiding the time consuming linear prediction analysis.

...read moreread less

22 citations

Fast Algorithms for Speech Modeling.

[...]

Martin Morf, D T Lee

15 Dec 1978

TL;DR: The usefulness of the new approach for speech modeling has been successfully established after several parameter quantization methods were considered to achieve the desired low bit rates.

...read moreread less

Abstract: : This constitutes our final report on a research program aimed at the development of a high quality low data rate speech transmission system based on new types of speech modeling algorithms. Several such algorithms were developed and tested on simulated and real speech data. These algorithms have many desirable features including the capability of rapidly tracking time-varying model parameters. The best algorithm was used as the basis of a speech transmission system in order to test the quality of the speech models. The model parameters (reflection coefficients) together with pitch information and speech energy form a speech parameter vector to be transmitted and used to reconstruct the original speech. Several parameter quantization methods were considered to achieve the desired low bit rates. The various algorithms as well as the complete transmission system were coded and tested. Simulation results are very promising and the usefulness of our new approach for speech modeling has been successfully established. (Author)

...read moreread less

19 citations

Patent•DOI•

Method of communicating digital speech data and a memory for storing such data

[...]

Richard H. Wiggins¹, George L. Brantingham¹•Institutions (1)

Texas Instruments¹

19 Jun 1978-Journal of the Acoustical Society of America

TL;DR: In this article, a method of communicating Digital Speech Data to a speech synthesis circuit is described. But the data is stored in a memory which is coupled to the speech synthesis circuits.

...read moreread less

Abstract: A method of communicating Digital Speech Data to a speech synthesis circuit. The data is compressed to on the order of 1000-1200 bits, per second for normal human speech. The speech synthesis circuit utilizes linear predictive coding techniques for producing high quality speech or other sounds. The data is preferably stored in a memory which is coupled to the speech synthesis circuit. The data has variable frame lengths; in the disclosed embodiment, four different frame lengths are described having frame lengths from four bits to forty-nine bits. The memory stores the variable frame length data and communicates the same to the speech synthesis circuit in response to certain control signals.

...read moreread less

12 citations

Proceedings Article•DOI•

Linear predictive coding of speech signals in a high ambient noise environment

[...]

H. Kobatake¹, J. Inari², S. Kakuta²•Institutions (2)

Tokyo University of Agriculture and Technology¹, University of Tokyo²

10 Apr 1978

TL;DR: This paper describes a method of speech coding in a high ambient noise environment and shows that the spectral envelope of speech signal is a most reliable information when the noise reduction method proposed in this paper is used.

...read moreread less

Abstract: Preservation of both the spectral distribution and the periodicity of speech signals are essential in speech processing. This paper describes a method of speech coding in a high ambient noise environment and shows that the spectral envelope of speech signal is a most reliable information when the noise reduction method proposed in this paper is used. Also reported in this paper comparisons of several pitch extraction methods with extensive experimental data, based on which a pitch extraction method suited for noisy speech signals is proposed.

...read moreread less

Proceedings Article•DOI•

Objective speech quality evaluation of narrowband LPC vocoders

[...]

R. Viswanathan¹, W. Russell, John Makhoul•Institutions (1)

BBN Technologies¹

01 Apr 1978

TL;DR: Several methods are presented for the objective speech quality evaluation of narrowband LPC vocoders, based on a framework that was proposed at the 1976 ICASSP conference, and high correlations obtained indicate the usefulness of these methods.

...read moreread less

Abstract: Several methods are presented for the objective speech quality evaluation of narrowband LPC vocoders, based on a framework that we proposed at the 1976 ICASSP conference. In each method, the error in short-term spectral behavior between vocoded speech and the original is computed once every 10 ms. These errors are appropriately weighted and averaged over an utterance to produce a single objective score. Several short-term error measures, and time-weighting and averaging techniques are investigated. We evaluate the objective methods by correlating the resulting objective scores with formal subjective speech quality judgments. High correlations obtained indicate the usefulness of these methods.

...read moreread less

Proceedings Article•DOI•

Providing channel error protection for a 2400 bps linear predictive coded voice system

[...]

J. Fussell, B. Abzug, P. Boudra, M. Cowing

01 Apr 1978

TL;DR: Several techniques for reducing the effect of channel bit errors on the synthesized speech are described, which cause no measurable degradation of the LPC speech transmitted over an error-free channel and they require less than a one percent increase in computer execution time.

...read moreread less

Abstract: The U.S. Government has developed a real-time 2400 bps Linear Predictive Coded (LPC) voice algorithm which was designed to provide maximum intelligibility and quality within the time and accuracy limitations imposed by modern high-speed minicomputers. The algorithm which resulted provides excellent intelligibility and quality when transmitted over an ideal channel. However, the speech is significantly degraded in an error environment. This paper describes several techniques for reducing the effect of channel bit errors on the synthesized speech. These techniques cause no measurable degradation of the LPC speech transmitted over an error-free channel and they require less than a one percent increase in computer execution time.

...read moreread less

Journal Article•DOI•

An intelligibility evaluation of several linear prediction vocoder modifications

[...]

D. Wong, J. Markel

01 Oct 1978-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Several linear prediction vocoder modifications and an evaluation of their effects on intelligibility are presented and lower order coefficient representation and faster analysis update during unvoiced speech improves the sustention feature with little degradation to the other features, and almost no increase to transmission rate.

...read moreread less

Abstract: Several linear prediction vocoder modifications and an evaluation of their effects on intelligibility are presented. Diagnostic rhyme test (DRT) comparisons among 1) the fixed filter order, fixed analysis frame rate vocoder, 2) various modified vocoders, and 3) the input speech, are implemented using two speakers, seven listeners, and a selected set of word pairs reflecting six phonemic attribute contrasts. Reducing the filter order from ten to four for unvoiced speech frames analyzed at a rate of 44.4 frames/s produces no significant decrease in the scores for all of the six features tested by the DRT. Updating the unvoiced analysis frames with shorter windows at twice the frame rate (88.8 frames/s) leads to a significant improvement in the scores for the sustention feature. Lower order coefficient representation and faster analysis update during unvoiced speech improves the sustention feature with little degradation to the other features, and almost no increase to transmission rate. Results of the DRT evaluation and considerations for implementing the test within ordinary laboratory facilities are discussed.

...read moreread less

Proceedings Article•DOI•

High quality adaptive predictive coding of speech

[...]

M. Berouti¹, J. Makhoul•Institutions (1)

BBN Technologies¹

01 Apr 1978

TL;DR: The pitch predictor is not useful on balance and should be eliminated, and the residual should be quantized with no clipping and encoded using a variable-length code, which seems to be adequate for all speech and all conditions.

...read moreread less

Abstract: We report on the results of research to code speech at 16 kbps under the condition that the quality of the transmitted speech be equal to that of the original. Some of the original speech had been corrupted by noise and distortions typical of long distance telephone lines. The rigorous requirements of this work led to a new outlook on adaptive predictive coding. We have found that the pitch predictor is not useful on balance and should be eliminated, and that the residual should be quantized with no clipping and encoded using a variable-length code. A single coding scheme seems to be adequate for all speech and all conditions. In addition, the adaptive predictive coding system has been modified to include a noise spectral shaping filter that effectively eliminates the perception of background granular noise.

...read moreread less

Journal Article•DOI•

Bit Rate Per Channel Halving in PCM Multiplexes by Speech Interpolation and Adaptive Quantization

[...]

Alcide Molinari, F. Vagliani

01 May 1978-IEEE Transactions on Communications

TL;DR: An all digital system, labeled PCM.RR is presented, which enables the doubling of traffic capacity of PCM links, by properly using "Adaptive Quantization and Speech Interpolation" performed by means of a "Speech Detector" that works directly on the A -law compressed digital signal.

...read moreread less

Abstract: An all digital system, labeled PCM.RR. is presented, which enables the doubling of traffic capacity of PCM links. This is obtained, although keeping the transmission quality impairment very close to the normal PCM standards, by properly using "Adaptive Quantization" and "Speech Interpolation" performed by means of a "Speech Detector" that works directly on the A -law compressed digital signal.

...read moreread less

Proceedings Article•DOI•

LMS Adaptive filtering for enhancing the quality of noisy speech

[...]

M. Sambur

01 Apr 1978

TL;DR: Preliminary tests indicate that the proposed linear mean square adaptive filtering approach improves the perceived speech quality and increases the signal to noise ratio (SNR) by 7 db in a 0 db environment.

...read moreread less

Abstract: A linear mean square (LMS) adaptive filtering approach has been formulated for removing the deleterious effects of additive noise on the speech signal; Unlike the classical LMS adaptive filtering scheme, the proposed method is designed to cancel out the clean true speech signal. This method takes advantage of the quasi-periodic nature of the speech signal to form an estimate of the clean speech signal at time t from the value of the signal at time t minus the estimated pitch period. For additive white noise distortion, preliminary tests indicate that the method improves the perceived speech quality and increases the signal to noise ratio (SNR) by 7 db in a 0 db environment. The method has also been preliminarily shown to remove the perceived granularity of CVSD coded speech signals and to lead to an improvement in the linear prediction analysis/synthesis of noisy speech.

...read moreread less

Proceedings Article•DOI•

Statistical correlation between objective and subjective measures for speech quality

[...]

Thomas P. Barnwell¹, A. M. Bush•Institutions (1)

Georgia Institute of Technology¹

01 Apr 1978

TL;DR: A statistical correlation study between 18 objective quality measures and a data base of subjective quality measures from the Paired Acceptability Rating Method (PARM) found the measure which was found to be most effective over all systems was a gain weighted L 2 spectral distance metric.

...read moreread less

Abstract: A statistical correlation study between 18 objective quality measures and a data base of subjective quality measures from the Paired Acceptability Rating Method (PARM) was done for nine communication systems, including waveform coders, channel vocoders, linear predictive coders, and adaptive predictive coders. The results of this study show which of the candidate objective measures are most effective in predicting the subjective results. The measure which was found to be most effective over all systems was a gain weighted L 2 spectral distance metric which had a correlation coefficient of -.83. Supported by DCA/DCEC via the RADC Post Doctoral Program.

...read moreread less

Journal Article•DOI•

Research on low bit rate speech coding at the Electrical Communication Laboratory, NTT

[...]

Fumitada Itakura

01 Nov 1978-Journal of the Acoustical Society of America

TL;DR: The Parcor analysis‐synthesis method is being applied to a wide range of speech coding from 1200 bps variable frame‐rate coding to high quality 16 kbps adaptive, predictive coding.

...read moreread less

Abstract: Since the introduction of speech analysis—synthesis based on the maximum likelihood spectrum estimation—in 1966, we have been conducting research activities on low bit rate speech coding techniques, and their aplication to audio response and low bit rate digital speech transmission. Parcor analysis‐synthesis, demonstrated in 1969, was one of the most fundamental methods, and it has formed the basis of the present development of linear predictive coding. Recently, various kinds of techniques have been proposed to improve speech quality, such as interpolation and nonlinear quantization of parameters, spectral smoothing, etc. They have been applied in the hardware realization of a 4 CH multiplexed 2400 bps Vocoder. At present, the Parcor method is being applied to a wide range of speech coding from 1200 bps variable frame‐rate coding to high quality 16 kbps adaptive, predictive coding.

...read moreread less

Proceedings Article•DOI•

A variable frame length linear predictive coder

[...]

J. Turner¹, B. Dickinson•Institutions (1)

Princeton University¹

01 Apr 1978

TL;DR: A variable rate speech encoding scheme is presented which determines linear predictive models over phonetically uniform intervals instead of using a fixed analysis frame length, the analysis interval is adjusted to span an entire steady sound or set at a minimum interval for transient sounds.

...read moreread less

Abstract: A variable rate speech encoding scheme is presented which determines linear predictive models over phonetically uniform intervals. Instead of using a fixed analysis frame length, the analysis interval is adjusted to span an entire steady sound or set at a minimum interval for transient sounds. This scheme offers the following advantages over a fixed frame-rate scheme: for the same perceived speech quality, the bit rate can be reduced or for the same bit rate, the quality of the perceived speech can be improved.

...read moreread less

Journal Article•DOI•

Multidimensional pseudo-maximum-likelihood pitch estimation

[...]

D. Friedman¹•Institutions (1)

Tel Aviv University¹

01 Jun 1978-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: An estimator algorithm for the pitch of voiced speech is presented, which indicates superior immunity to added noise and to bandlimiting with loss of the fundamental component.

...read moreread less

Abstract: An estimator algorithm for the pitch of voiced speech is presented, based on the following sequence of operations: 1) linear-prediction inverse filtering; 2) short-time spectral analysis by a bank of bandpass filters; 3) envelope extraction on the filter outputs; 4) period determination on the parallel envelopes considered as a multicomponent vector signal, using an algorithm described in a previous work. Results of a comparative evaluation indicate superior immunity to added noise and to bandlimiting with loss of the fundamental component.

...read moreread less

Proceedings Article•DOI•

Application of canonical coordinate methods to the characterization of a family of error minimizing signal compression techniques

[...]

C. Walter

01 Apr 1978

TL;DR: Mean-square-error minimizing signal compression techniques, such as Autoregressive Analysis or Linear Predictive Coding and Principal Component or Karhunen-Loeve Analysis, can be systematically characterized in terms of canonical coordinate or generalized eigenvector procedures.

...read moreread less

Abstract: Mean-square-error minimizing signal compression techniques, such as Autoregressive Analysis or Linear Predictive Coding and Principal Component or Karhunen-Loeve Analysis, can be systematically characterized in terms of canonical coordinate or generalized eigenvector procedures. This approach provides considerable insight into the interrelationships between a variety of seemingly different signal compression methods. The approach also provides a convenient mechanism for introducing the types of non-Euclidean error measures that are needed to adjust the signal performance optimization criteria to take into account different types of a priori statistical and dynamical information relating to both the desired signal and to various interference processes.

...read moreread less

Speech compression and evaluation

[...]

R. Viswanathan, John Makhoul, A. W. F. Huggins

01 Apr 1978

TL;DR: The development of a speech processing computer facility with the ultimate goal of transmitting narrowband speech in real time over the ARPA Network and a reliable method for measuring subjective speech quality are described.

...read moreread less

Abstract: : This report describes our work in the past three years on data compression and quality evaluation of digital speech We developed and implemented linear predictive coding (LPC) techniques with the overall objective of digitally transmitting high quality speech at the lowest possible average data rates over packet-switched communication media Major techniques reported include: covariance lattice method of linear prediction analysis, adaptive lattice methods, linear predictive spectral warping, improved quantization of LPC parameters, variable frame rate transmission of LPC parameters based on a functional perceptual model of speech, and a mixed-source model for LPC synthesizer to produce more natural-sounding speech Also, we developed a reliable method for measuring subjective speech quality This method was employed to formally demonstrate the quality improvements provided by our speech analysis/synthesis techniques as well as for studying speech quality as a function of LPC parameters As subjective procedures are generally expensive and time-consuming, we developed and tested several objective procedures for speech quality evaluation The results from these objective procedures were found to be highly correlated to the corresponding subjective quality judgments Another highlight of our work is the development of a speech processing computer facility with the ultimate goal of transmitting narrowband speech in real time over the ARPA Network

...read moreread less

Proceedings Article•DOI•

Speech generation through waveform synthesis

[...]

M. Baumwolspiner¹•Institutions (1)

Bell Labs¹

01 Apr 1978

TL;DR: The 'waveform synthesis' technique is particularly well suited for microprocessor implementation and as shown in the paper two D-A converters in conjunction with a standard microprocessor and associated ROM, RAM and I/O can be used to implement this technique.

...read moreread less

Abstract: This paper presents a time domain technique for the generation of speech which offers significant advantages over current formant synthesis and linear predictive coder (LPC) techniques. A set of basis functions in conjunction with a time-compression (and expansion) operation is shown to span the parameter space of the vocal tract model. The relationship between these basis functions and the formant synthesis parameters is derived and graphically illustrated. The 'waveform synthesis' technique is particularly well suited for microprocessor implementation and as shown in the paper two D-A converters in conjunction with a standard microprocessor and associated ROM, RAM and I/O can be used to implement this technique.

...read moreread less

Proceedings Article•DOI•

Linear prediction techniques in the Walsh spectral domain for speech analysis and synthesis

[...]

M. Ashouri¹, Anthony G. Constantinides•Institutions (1)

Imperial College London¹

10 Apr 1978

TL;DR: A method for LPC analysis in a transformed domain (LPCTD) has been developed theoretically and studied experimentally in the Walsh-Hadamard domain (LPCWHD) for low-bit- rate coding of speech signals.

...read moreread less

Abstract: A method for LPC analysis in a transformed domain (LPCTD) has been developed theoretically and studied experimentally in the Walsh-Hadamard domain (LPCWHD) for low-bit- rate coding of speech signals . Speech signals in the Walsh-Hadamard domain have been modelled by their largest variance coefficients and a few prediction coefficients which represent the remaining coefficients. Determination of the prediction coefficients has been based on the correlation between the spectral coefficients. Intelligible speech at bit-rates of 8 kb/s and 4 kb/s was achieved when 16 and 64 point Walsh-Hadamard transforms were used, respectively. At the latter bit-rate the quality was significantly improved when unvoiced sounds were coded seperately by their largest variance coefficients. The main advantage of LPCWHD system is its simplicity which can lead to a far less complex implementation than that of vocoder systems.

...read moreread less

Book Chapter•DOI•

Speech Processing for Low Data Rate Digital Voice Communications

[...]

E. V. Stansfield

01 Jan 1978

Proceedings Article•DOI•

Maximum likelihood pitch estimation using state-variable techniques

[...]

R.J. McAulay¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Apr 1978

TL;DR: The problem of estimating the pitch period of a speech waveform contaminated by acoustically coupled background noise is formulated to include the properties of the spectral envelope by postulating a state-variable model for the speech generation process using the maximum likelihood estimation technique.

...read moreread less

Abstract: The problem of estimating the pitch period of a speech waveform contaminated by acoustically coupled background noise is formulated to include the properties of the spectral envelope by postulating a state-variable model for the speech generation process. Applying the maximum likelihood estimation technique, the optimum processor uses a Kalman filter preprocessor to flatten the spectrum. The resulting signal is then passed through a bank of comb filters and the optimum pitch corresponds to the comb filter for which the output energy is smallest. The Kalman prefilter reduces to an LPC filter only when the speech is generated by an all-pole process and the signal-to-noise ratio is large. For the low signal-to-noise ratio case, a parallel formant speech generation model is more likely to lead to practical numerical algorithms for estimating the spectral coefficients.

...read moreread less

Proceedings Article•DOI•

A spectral enhancement procedure for the wideband/Narrowband tandem

[...]

L. Bergeron

01 Apr 1978

TL;DR: A procedure that reduces the spectral distortion in LPC encoded speech preprocessed by a CVSD coder by low-pass filtering the CVSD speech, on a formant adaptive basis, and narrowing the bandwidths of the primary formants more closely resembles the original unprocessed signal produces a higher quality CVSD/LPC signal than previously realized.

...read moreread less

Abstract: This paper describes a procedure that reduces the spectral distortion in LPC encoded speech preprocessed by a CVSD coder. In this type of tandem configuration (wide-band/narrowband), the CVSD process introduces extraneous wideband noise and a general broadening of the formant bandwidths. When coupled with the formant distortion introduced by the LPC process, the tandem speech appears buzzy, muffled, and of lower quality than either system considered alone. By low-pass filtering the CVSD speech, on a formant adaptive basis, and narrowing the bandwidths of the primary formants, F1 and F2, the input signal to the LPC synthesizer more closely resembles the original unprocessed signal. This spectral enhancement procedure produces a higher quality CVSD/LPC signal than previously realized.

...read moreread less

Proceedings Article•DOI•

Speech synthesis by linear interpolation of spectral parameters between dyad boundaries

[...]

Christine H. Shadle¹, Bishnu S. Atal•Institutions (1)

Bell Labs¹

01 Apr 1978

TL;DR: It is shown that area parameters derived from linear prediction analysis can be linearly interpolated between dyad boundaries with very little distortion in the resultant synthesized speech.

...read moreread less

Abstract: Recent work of Olive and Spickenagel has shown that pseudo-area parameters used for LPC synthesis can be linearly interpolated between dyad boundaries without producing excessive distortion in synthetic speech. This study investigates whether such interpolation can be done equally successfully on the power spectrum of the speech waveform. The spectrum is of special interest because speech can be synthesized in real time from spectral parameters on readily available programmable digital filters. Our results show that the distortion introduced by dyadic interpolation of spectrum is perceptually significant but it can be reduced considerably by using an additional point within the dyad boundaries for interpolation. The reasons for good quality of speech synthesized from dyadically-interpolated area parameters were also investigated. It was found that formant frequency movements are reproduced fairly accurately after dyadic interpolation. Formant bandwidths however are not reproduced accurately but the bandwidth errors are not as important subjectively.

...read moreread less

Computer modeling of voice signals with adjustable pitch and formant frequencies

[...]

Geoffrey Thomas Hall

01 Dec 1978

TL;DR: In this thesis, linear predictive coding is used to produce a set of coefficients for the characteristic polynomial of sucessive 25 msec, segments of the voice tract, in the z-domain, to encode speech waveforms at low data rates.

...read moreread less

Abstract: : Digital encoding of speech to allow more efficient transmission at low data rates involves the decomposition of the speech waveform into various parameters which are related to the physical structure of the speech production process. In this thesis, linear predictive coding is used to produce a set of coefficients for the characteristic polynomial of sucessive 25 msec. segments of the voice tract, in the z-domain. The location of the poles in the z-plane and the excitation pitch period are then shifted and the signal reformulated to cause changes of the overall frequency characteristics of the speech waveforms, while maintaining the perceived sounds and information content. The resulting audio tapes confirm the theory and conjectures of the thesis. (Author)

...read moreread less