scispace - formally typeset
Search or ask a question

Showing papers on "Linear predictive coding published in 1975"


Journal ArticleDOI
F. Itakura1
TL;DR: A computer system is described in which isolated words, spoken by a designated talker, are recognized through calculation of a minimum prediction residual through optimally registering the reference LPC onto the input autocorrelation coefficients using the dynamic programming algorithm.
Abstract: A computer system is described in which isolated words, spoken by a designated talker, are recognized through calculation of a minimum prediction residual. A reference pattern for each word to be recognized is stored as a time pattern of linear prediction coefficients (LPC). The total log prediction residual of an input signal is minimized by optimally registering the reference LPC onto the input autocorrelation coefficients using the dynamic programming algorithm (DP). The input signal is recognized as the reference word which produces the minimum prediction residual. A sequential decision procedure is used to reduce the amount of computation in DP. A frequency normalization with respect to the long-time spectral distribution is used to reduce effects of variations in the frequency response of telephone connections. The system has been implemented on a DDP-516 computer for the 200-word recognition experiment. The recognition rate for a designated male talker is 97.3 percent for telephone input, and the recognition time is about 22 times real time.

1,588 citations


Journal ArticleDOI
01 Apr 1975
TL;DR: This paper presents several digital signal processing methods for representing speech, including simple waveform coding methods; time domain techniques; frequency domain representations; nonlinear or homomorphic methods; and finaIly linear predictive coding techniques.
Abstract: This paper presents several digital signal processing methods for representing speech. Included among the representations are simple waveform coding methods; time domain techniques; frequency domain representations; nonlinear or homomorphic methods; and finaIly linear predictive coding techniques. The advantages and disadvantages of each of these representations for various speech processing applications are discussed.

238 citations


Journal ArticleDOI
Chong Un1, D. Magill
TL;DR: The concept of the RELP vocoder combines the advantages of linear predictive coding (LPC) and voice-excited vocoding and is robust in any operating environment.
Abstract: In this paper we present a new vocoder called the residual-excited linear prediction (RELP) vocoder. The concept of the RELP vocoder combines the advantages of linear predictive coding (LPC) and voice-excited vocoding. In the RELP system, vocal tract modeling is done by the LPC technique, and the LPC residual signal is used as the excitation signal. After low-pass filtering the residual signal is coded by adaptive delta modulation and is spectrally flattened before being fed in the LPC synthesizer. The range of the transmission rate is typically between 6 and 9.6 kbits/s; the synthetic speech in this range is quite good. As the transmission rate is lowered, the synthetic speech quality degrades very gradually. Since no pitch extraction is required, the vocoder is robust in any operating environment.

92 citations


Journal ArticleDOI
TL;DR: An implementation of a speaker-independent digit-recognition system based on segmenting the unknown word into three regions and then making categorical judgments as to which of six broad acoustic classes each segment falls into.
Abstract: This paper describes an implementation of a speaker-independent digit-recognition system The digit classification scheme is based on segmenting the unknown word into three regions and then making categorical judgments as to which of six broad acoustic classes each segment falls into The measurements made on the speech waveform include energy, zero crossings, two-pole linear predictive coding analysis, and normalized error of the linear predictive coding analysis A formal evaluation of the systems showed an error rate of 27 percent for a carefully controlled recording environment and a 56 percent error rate for on-line recordings in a noisy computer room

66 citations


Patent
07 Jul 1975
TL;DR: In this article, a plurality of specialized digital signal processing techniques are employed to analyze in real time four speech channels in parallel and multiplex speech frame parameters of the channels into a single data output channel for transmission through a suitable media.
Abstract: Method and apparatus for speech analysis and synthesis adapted for analyzing and multiplexing speech signals from a plurality of voice grade telephone lines for further transmission through a single voice grade telephone line. A plurality of specialized digital signal processing techniques are employed to analyze in real time four speech channels in parallel and multiplex speech frame parameters of the channels into a single data output channel for transmission through a suitable media. The received data channel is demultiplexed and the speech frame parameters for the individual channels are utilized to synthesize, in parallel, the four speech signals. Certain of the digital processing techniques utilize the characteristics of speech signals to truncate conventional signal processing time while other processing techniques are substantially statistical analyses of speech to resolve ambiguities, particularly in making the voiced/unvoiced decision for a frame of analyzed speech data.

60 citations


Patent
31 Oct 1975
TL;DR: In this paper, the autocorrelation function of a digital signal representing the speech signal is determined by a circuit which employs simple combinational logic and an updown counter circuit, and a signal representative of speech energy is provided by summing the digital speech signals over a predetermined time interval and intervals of silence are detected by comparing the speech energy in an interval of time with a predetermined or adaptively determined threshold energy.
Abstract: Apparatus for the real-time analysis of speech signals in which a digital signal representative of the speech signal is adaptive threshold center-clipped and infinite peak-clipped to form a signal comprising three logic states (+1,0,-1). The autocorrelation function of this signal is determined by a circuit which employs simple combinational logic and an updown counter circuit. Pitch period and voiced-unvoiced indication are determined from the location and magnitude of the peak value of the autocorrelation function. Additionally, a signal representative of the speech energy is provided by summing the digital speech signals over a predetermined time interval and intervals of silence are detected by comparing the speech energy in an interval of time with a predetermined or adaptively determined threshold energy.

47 citations


Journal ArticleDOI
TL;DR: Entropies (in bits per moving-area pel) for adaptive linear predictors were significantly lower than for nonadaptive predictors, indicating that substantial bit-rate savings should be possible.
Abstract: Linear predictive coding is an efficient method for transmitting the amplitudes of moving-area picture elements (pels) in a conditional replenishment coder for video-telephone signals. It has been conjectured that if the linear predictor can dynamically adapt to the speed and direction of motion in the scene, then greatly improved performance should result. To test this conjecture and to get a first-order estimate of the possible saving, computer simulations were carried out using pairs of video-telephone frames stored on digital discs. Using this data, picture quality could not be studied. However, differential signal entropies could be estimated, and this was done for several nonadaptive and adaptive linear predictors. Entropies (in bits per moving-area pel) for adaptive linear predictors were significantly lower than for nonadaptive predictors, indicating that substantial bit-rate savings should be possible. However, simpler implementations will have to be devised before adaptive prediction becomes practicable.

43 citations



Journal ArticleDOI
TL;DR: In this paper, a mathematical formulation for each of several zero-crossing feature extraction techniques is derived and related (where possible) to each of the other zero-Crossing methods.
Abstract: Zero-crossing analysis techniques have long been applied to speech analysis, to automatic speech recognition, and to many other signal-processing and pattern-recognition tasks. In this paper, a mathematical formulation for each of several zero-crossing feature extraction techniques is derived and related (where possible) to each of the other zero-crossing methods. Based upon this mathematical formulation, a physical interpretation of each analysis technique is effected, as is a discussion of the properties of each method. It is shown that four of these methods are a description of a short-time waveform in which essentially the same information is preserved. Each turns out to be a particular normalization of a count of zero-crossing intervals method. The effects of the various forms of normalization are discussed. A fifth method is shown to be a different type of measure; one which preserves information concerning the duration of zero-crossing intervals rather than their absolute number. Although reference is made as to how each of the zero-crossing methods has been applied to automatic speech recognition, an attempt is made to enumerate general characteristics of each of the techniques so as to make the mathematical analysis generally applicable.

25 citations


Patent
14 Nov 1975
TL;DR: In this paper, a speech analyzer and synthesizer features a digital adaptive linear predictor, using a recursive (rather than transversal) filter in a negative feedback loop which develops both feedforward and feedback filter coefficients.
Abstract: A speech analyzer and synthesizer features a digital adaptive linear predictor, using a recursive (rather than transversal) filter in a negative feedback loop which develops both feedforward and feedback filter coefficients. An input circuit is responsive to an input speech signal and to a first synthesized speech signal for developing an error signal. An output circuit is responsive to the error signal and to first state signals for developing multiplexed speech data signals. The multiplexed speech data signals are fed back, demultiplexed and applied to a first recursive filter to control the development of the first synthesized speech signal and the first state signals by the first recursive filter. The multiplexed speech data signals from the output circuit are also transmitted to a receiver which demultiplexes and applies the demultiplexed received speech data signals to a second recursive filter to control the development of a second synthesized speech signal by the second recursive filter. This second synthesized speech signal is then converted into an output speech signal which substantially sounds like the input speech signal.

20 citations


Patent
07 Oct 1975
TL;DR: In this paper, a plurality of speech channels uses only one speech analyzer-synthesizer by Time-Multiplexing-Demultiplexing (sampling and processing) sampling and processing the speech channels sequentially.
Abstract: A plurality of speech channels uses only one speech analyzer-synthesizer by Time-Multiplexing-Demultiplexing (sampling and processing) the speech channels sequentially. On the transmission side, speech signals of a plurality of channels are multiplexed by a pulse code modulation system, a partial autocorrelation coefficient and an excitation signal which constitute a feature parameter of the speech are extracted from the multiplexed signals by means of a speech analyzer for respective digital outputs corresponding to respective speech signals, and the extracted feature parameter is multiplexed again and then transmitted to the receiving side. On the receiving side, the received multiplexed signal of the feature parameter is applied to a speech synthesizer on a time division basis for reproducing a multiplex pulse code modulation signal of the speech wave, and the reproduced signal is distributed among respective channels.

Journal ArticleDOI
TL;DR: The extraction of acoustic cues pertinent to a phonetic feature can be tuned to classes of sounds separated on the basis of other cues, and this serves to increase the reliability of segment labeling.
Abstract: This paper considers a sequential strategy for acoustic-phonetic speech analysis. Each analysis process is applied to an appropriately labeled speech segment and results in a possible sub-segmentation of the original segment. The segments resulting from the analysis are labeled according to the analysis results. The advantages of the strategy are that no more segments are considered than those actually differentiated by the analysis steps. The extraction of acoustic cues pertinent to a phonetic feature can be tuned to classes of sounds separated on the basis of other cues, and this serves to increase the reliability of segment labeling. The analysis sequence yields a structure for the syllabic units of the speech signal that may be used to retrieve similar syllabic units for detailed comparison.

Journal ArticleDOI
N. Dixon1, H. Silverman
TL;DR: A versatile spectral analysis system, the parametrically controlled analyzer (PCA), serves as input to an hierarchically operated string transcriber (HOST), a modular acoustic processor consisting of two major components designed for work in speech recognition.
Abstract: A system, the modular acoustic processor (MAP) consisting of two major components, has been designed for work in speech recognition. A versatile spectral analysis system, the parametrically controlled analyzer (PCA), serves as input to an hierarchically operated string transcriber (HOST). In the design of this system, controllability and modularity for developmental extensibility were primary concerns. The system, with the exception of initial high-fidelity, direct A/D conversion, is entirely implemented in software, PL/I, with appropriate JCL structures for running under OS/MVT on an IBM 360-91. As an adjunct for obtaining training data, a grayscale interactive system using an IBM 1800 process-control computer has also been implemented. PCA signal processing features parametric selection of several analysis methods, including discrete Fourier transform (DFT), linear predictive coding (LPC), and chirp z-transform (CZT). Also, selection may be made among various smoothing, normalization, interpolation, and F 0 estimation methods. PCA develops high-quality spectrographic representations of speech for standard line printers, CRT display, and subsequent processing. PCA also performs spectral-similarity matching and training. HOST consists of a number of processes for performing segmentation, classification, and prosody analysis. Provision is made for complete commutability at the module level as well as at the algorithm level. The segmentation/classification output of HOST is augmented by estimates of confidence. PCA is a packaged, debugged, running system. A first version of HOST is operational.

Journal ArticleDOI
TL;DR: A real-time computer implementaation of a fourth-order adaptive predictive coder (APC) which transmits speech at 6400 bits/s and achieves an intelligibility score of 87 percent on the diagnostic rhyme tests (DRT) and produces speech with acceptable voice naturalness even in the presence of acoustic background noise.
Abstract: This paper describes a real-time computer implementaation of a fourth-order adaptive predictive coder (APC) which transmits speech at 6400 bits/s. To reduce the demands on the central processor, the terminal uses the average magnitude difference function (AMDF) for pitch extraction and calculates the predictor coefficients and reflection coefficients assuming stationarity of the input data. With 16-bit fixed-point arithmetic processing, the coder achieves an intelligibility score of 87 percent on the diagnostic rhyme tests (DRT) and produces speech with acceptable voice naturalness even in the presence of acoustic background noise.

Journal ArticleDOI
M. Knudsen1
TL;DR: The present program converts frames of 256 16-bit samples into 14 coefficients and then into 128 points of logarithmic power spectrum at 100 frames/s and introduces a new autocorrelation scheme with several valuable properties.
Abstract: The autocorrelation method for linear-predictive coding of speech [1] has been implemented in real time on the SPS-41, a commercially available system composed of three dissimilar microprocessors working in parallel. Using user-written microcode, one processor performs I/O and master control, the second handles loop indexing and counting, and the third does the actual arithmetic on data. Such parallelism allows 2 × 106I/O operations and 4 × 106multiplications/s, but actually realizing this potential requires fresh approaches to some old algorithms. Most important is a new autocorrelation scheme with several valuable properties. Using 16-bit fixed-point single-precision arithmetic to accumulate autocorrelation sums and invert the autocorrelation matrix presents problems which have been solved reasonably well. The present program converts frames of 256 16-bit samples into 14 coefficients and then into 128 points of logarithmic power spectrum at 100 frames/s.

Book ChapterDOI
01 Jan 1975
TL;DR: This paper gives some preliminary results of a resynthesis of speech sounds, based on parameters derived from this broad-band spectral information, and shows thatelligible speech is obtained, even after a further data reduction.
Abstract: Speech as a unique type of auditory input has specific ways of central processing. However, the peripheral stages of auditory processing in the inner ear for speech and non-speech sounds cannot be very different. The commonly used formant analysis is rather specific for speech, and related to speech production. We prefer a spectral analysis more in line with the frequency analyzing properties of the inner ear. Such a rather wide-band frequency analysis (1/3-octave filters) can result in an excellent discrimination between speech sounds. In this paper we give some preliminary results of a resynthesis of speech sounds, based on parameters derived from this broad-band spectral information. Intelligible speech is obtained, even after a further data reduction.

Journal ArticleDOI
R. Becker1, F. Poza1
TL;DR: This paper describes the acoustic processing in a syntactically guided natural language speech understanding system that through the interaction of pragmatic, semantic, and syntactic information, candidate words are proposed to exist at specific points in the acoustic stream.
Abstract: This paper describes the acoustic processing in a syntactically guided natural language speech understanding system. A major characteristic of this system is that through the interaction of pragmatic, semantic, and syntactic information, candidate words are proposed to exist at specific points in the acoustic stream. The purpose of the acoustic processor is to verify or reject these hypotheses. This verification is done in two stages. First, digital filtering is done to classify each 10-ms segment as one of ten primitive classes. If the proposed word is consistent with the pattern of primitive classes at the corresponding point in the acoustic stream, further analysis is done using linear predictive coding and other digital filters. The results of this analysis are used to segment the acoustic signal and to further classify the voiced segments. Because this segmentation and classification can be tailored for each word, difficult analysis problems caused by coarticulation between adjacent sounds can be successfully solved. When combined with a sophisticated linguistic processor, these acoustical processing methods yield correct understanding of natural language utterances.

01 Mar 1975
TL;DR: The objective of the work described in this report was to make it less tiring to monitor speech at low signal-to-noise ratios over long periods of time by selecting a method that intrinsically is independent of the spectrum characteristics of the communication channel or tape being monitored.
Abstract: : The objective of the work described in this report was to make it less tiring to monitor speech at low signal-to-noise ratios over long periods of time Two approaches to reducing auditory fatigue were studied: (1) automatic detection of speech and (2) automatic enhancement of the S/N of speech The first approach was aimed at reducing the amount of time spent in simply listening for speech to occur After examining several methods of detecting speech, we selected a method that intrinsically is independent of the spectrum characteristics of the communication channel or tape being monitored, of the speech characteristics of the talker, and of the language The technique proved to be capable of detecting speech in wideband noise at an S/N of -6 dB Its major disadvantage appears to be that the complexity of the required computations demands the use of a computer to implement the method

Journal ArticleDOI
TL;DR: A well-known, real-time computational algorithm for use in adaptive linear prediction of speech waveforms is discussed and is related to known research in other fields.
Abstract: A well-known, real-time computational algorithm for use in adaptive linear prediction of speech waveforms is discussed and is related to known research in other fields.

Journal ArticleDOI
TL;DR: Several versions of TRIVOC processing are presented, together with some results of subjective assessment of performance, which confirm the superior speech quality attainable by this method.
Abstract: The first‐formant region of the speech spectrum must be characterized with a high degree of accuracy in order to attain good speech quality and naturalness in a narrow‐band speech processing system. This consideration led to the concept of using linear predictive coding for the first‐formant region, vocoder channels for the upper spectrum, and detection and coding of the excitation function, to establish a narrow‐band voice digitizer. This concept has advantages in the tradeoff of speech quality versus data rate, in comparison with the use of either channel vocoding or linear prediction alone. Such a triple‐function voice coder (TRIVOC) has been implemented and investigated using a CSP‐30 Digital Signal Processor. Processing algorithms have been established to permit comparison of various combinations of baseband and vocoder channels, LPC coefficients, and data rates ranging from 2400 to 4800 bits per second. Subjective results have confirmed the superior speech quality attainable by this method. Several versions of TRIVOC processing are presented, together with some results of subjective assessment of performance.

Proceedings ArticleDOI
01 Dec 1975
TL;DR: Both differential pulse code modulation (DPCM) and adaptive predictive coding (APC) systems have been used somewhat successfully for low data rate digital voice transmission.
Abstract: Both differential pulse code modulation (DPCM) and adaptive predictive coding (APC) systems have been used somewhat successfully for low data rate digital voice transmission. A DPCM system has as its goal the removal of signal redundancy prior to transmission by a linear prediction of the incoming signal with a weighted combination of past signal estimates. The error in the prediction process is then quantized and transmitted to the receiver. An identical prediction loop is used at the receiver to reinsert the redundancy, and hence, to reconstruct the speech signal. Both the quantizer and predictor may or may not be adaptive.

Journal ArticleDOI
TL;DR: This work constructed a speech recognition system using both bandpass filtering and linear prediction in order to compare the two techniques and used this technique to achieve some remarkably good speech recognition scores.
Abstract: It has been recently proposed by Itakura [F. Itakura, “Minimum Predictive Residual Principal Applied to Speech Recognition,” IEEE Symp. Speech Recog. CMU (1974)] that the linear predictive residual can be used as a measure of speech waveform similarity. To measure the similarity between two waveforms, Itakura proposed to construct a linear predictive filter for one waveform and measure the residual (predictive error) for the other waveform. Itakura used this technique to achieve some remarkably good speech recognition scores. We constructed a speech recognition system using both bandpass filtering and linear prediction in order to compare the two techniques. The classifier used dynamic programming. A 36‐word vocabulary was used consisting of the alphabet plus digits spoken five times by the same speaker. A single word list was used for training and the other four were used for testing. Speech input was through a noise cancelling microphone. For the digital linear predictive, inverse filtering, analysis, s...

Journal Article
TL;DR: Inverse filtering as mentioned in this paper is the use of a network whose transfer function is the inverse of the transfer function of one or a combination of the articulatory system filters to modify the speech wave either in the time domain or in the frequency domain.
Abstract: This paper reviews certain speech analytical techniques to which the label 'inverse filtering' has been applied. The unifying features of these techniques are presented, namely: 1. a basis in the source-filter theory of speech production, 2. the use of a network whose transfer function is the inverse of the transfer function of one or a combination of the articulatory system filters to modify the speech wave either in the time domain or in the frequency domain. However their differences, which lie in the particular system filter being inverted and in the manner of realisation. provide a basis for the classification adopted in the paper which is as follows: (1) inverse vocal tract analogue filtering. (2) inverse vocal tract digital filtering. (3) direct inverse glottal filtering. (4) linear predictive coding. An assessment of the comparative usefulness of inverse-filtering in contemporary speech studies is given.


ReportDOI
10 Jan 1975
TL;DR: This study, using as a research vehicle the homomorphic vocoder algorithm, is directed toward determining the time resolution and the frequency resolution required to faithfully reproduce a speech signal.
Abstract: : A problem basic to the development of all digital telecommunication systems is the efficient digital encoding of speech signals for transmission. Many speech digitization algorithms have been proposed. This study, using as a research vehicle the homomorphic vocoder algorithm, is directed toward determining the time resolution and the frequency resolution required to faithfully reproduce a speech signal. The results reported are fundamental in that they are applicable to any speech digitization algorithm. The resolutions required for typical speaker situations are summarized.