scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Audio and Electroacoustics in 1973"


Journal ArticleDOI
TL;DR: A general-purpose computer program which is capable of designing a large Class of optimum (in the minimax sense) FIR linear phase digital filters and is shown to be exceedingly efficient.
Abstract: This paper presents a general-purpose computer program which is capable of designing a large Class of optimum (in the minimax sense) FIR linear phase digital filters. The program has options for designing such standard filters as low-pass, high-pass, bandpass, and bandstop filters, as well as multipassband-stopband filters, differentiators, and Hilbert transformers. The program can also be used to design filters which approximate arbitrary frequency specifications which are provided by the user. The program is written in Fortran, and is carefully documented both by comments and by detailed flowcharts. The filter design algorithm is shown to be exceedingly efficient, e.g., it is capable of designing a filter with a 100-point impulse response in about 20 s.

1,160 citations


Journal ArticleDOI
TL;DR: In this paper, a method for estimating the magnitude-squared coherence function for two zero-mean wide-sense-stationary random processes is presented, which utilizes the weighted overlapped segmentation fast Fourier transform approach.
Abstract: A method for estimating the magnitude-squared coherence function for two zero-mean wide-sense-stationary random processes is presented. The estimation technique utilizes the weighted overlapped segmentation fast Fourier transform approach. Analytical and empirical results for statistics of the estimator are presented. The analytical expressions are limited to the nonoverlapped case; empirical results show a decrease in bias and variance of the estimator with increasing overlap and suggest a 50-percent overlap as being highly desirable when cosine (Hanning) weighting is used.

521 citations


Journal ArticleDOI
TL;DR: Techniques are developed in detail for efficiently synthesizing digital lattice and ladder filters from any stable direct form and in one form, a lattice filter canonic in terms of multiplies and delays is obtained.
Abstract: There is evidence that in addition to standard digital filter forms such as the direct, parallel, and cascade forms, digital lattice and ladder filters may play an important role in finite word length implementation problems. In this paper, techniques are developed in detail for efficiently synthesizing digital lattice and ladder filters from any stable direct form. In one form, a lattice filter canonic in terms of multiplies and delays is obtained. An internal scaling procedure is also introduced that will be of importance for optimizing one of the lattice forms for finite word length implementation.

320 citations


Journal ArticleDOI
TL;DR: In this article, a new method for estimating the vocal tract area function directly from the acoustic speech waveform is described, and the filtering processes of the inverse filter model and an acoustic tube model of speech are analyzed and they are shown to be identical.
Abstract: This paper describes a new method for estimating the vocal tract area function directly from the acoustic speech waveform. Dynamic changes of the area functions and formants in voiced speech sounds are obtainable. The filtering processes of the inverse filter model and an acoustic tube model of speech are analyzed and they are shown to be identical, with the reflection coefficients in the acoustic tube model as a common factor, thus making possible the extraction of the reflection coefficients by the inverse filter processing of the speech signal. The discrete area function can easily be obtained from the set of reflection coefficients. Analysis examples for vowels, diphthongs, and consonants in vowel-consonant-vowel utterances are given.

250 citations


Journal ArticleDOI
TL;DR: In this article, the mathematical theory underlying one of the techniques currently being used in linear prediction of speech is developed from an inner product formulation, which produces a unified framework for studying the properties of autocorrelation equations.
Abstract: The mathematical theory underlying one of the techniques currently being used in linear prediction of speech is developed from an inner product formulation. This formulation produces a unified framework for studying the properties of autocorrelation equations. In particular, the solution and stability of the auto-correlation equations are studied in detail. Experimental stability results for finite word length analysis of speech are also presented.

148 citations


Journal ArticleDOI
John Makhoul1
TL;DR: In this paper, the autocorrelation method of linear prediction is formulated in the time, auto-correlation, and spectral domains, and the analysis is shown to be that of approximating the short-time signal power spectrum by an all-pole spectrum.
Abstract: The autocorrelation method of linear prediction is formulated in the time, autocorrelation, and spectral domains. The analysis is shown to be that of approximating the short-time signal power spectrum by an all-pole spectrum. The method is compared with other methods of spectral analysis such as analysis-by-synthesis and cepstral smoothing. It is shown that this method can be regarded as another method of analysis-by-synthesis where a number of poles is specified, with the advantages of noniterative computation and an error measure which leads to a better spectral envelope fit for an all-pole spectrum. Compared to spectral analysis by cepstral smoothing in conjunction with the chirp z transform (CZT), this method is expected to give a better spectral envelope fit (for an all-pole spectrum) and to be less sensitive to the effects of high pitch on the spectrum. The normalized minimum error is defined and its possible usefulness as a voicing detector is discussed.

134 citations


Journal ArticleDOI
TL;DR: In this article, a method for finding the coefficients of an nth-order linear recursive digital filter, which gives the best least squares approximation to a desired pulse response over a finite interval, is presented.
Abstract: A method for finding the coefficients of an nth-order linear recursive digital filter, which gives the best least squares approximation to a desired pulse response over a finite interval, is presented. A relationship is derived between the approximating error corresponding to an optimal set of numerator coefficients and the error produced by an overdetermined set of linear equations, which is a function of the denominator coefficients only. This relation provides a computational algorithm for calculating the optimal coefficients by iteratively solving weighted sets of linear equations in terms of the denominator coefficients only. Both theoretical and numerical results are presented. Also, bounds are found on the interval in which the norm of the optimum error must lie.

133 citations


Journal ArticleDOI
D. Chan1, Lawrence R. Rabiner1
TL;DR: In this article, an analysis of quantization effects in the direct form realization of finite impulse response (FIR) digital filters is presented, and statistical bounds on the error incurred in the frequency response of a filter due to coefficient quantization are developed and verified by extensive experimental data.
Abstract: An analysis of the three possible types of quantization effects in the direct form realization of finite impulse response (FIR) digital filters is presented. These quantization effects include roundoff noise, A-D noise, and filter frequency response errors due to coefficient quantization. Since the analysis of roundoff noise and A-D noise for the direct form is straightforward, this paper concentrates on an analysis of the effects of quantized coefficients on the resulting filter frequency response. Based on this analysis, statistical bounds on the error incurred in the frequency response of a filter due to coefficient quantization are developed and verified by extensive experimental data. Using these bounds, a procedure for applying known techniques for FIR filter design to the design of filters with finite word length coefficients is presented. On the whole, the direct form is shown to be a very attractive structure for realizing FIR filters.

127 citations


Journal ArticleDOI
TL;DR: It has been demonstrated that under such listening conditions, the phase structure of glottal pulses is of no importance and the subjective differences are small compared with the differences that would be caused by reverberation when listening to a loudspeaker in an ordinary room with good acoustics.
Abstract: A computer-simulated parallel formant synthesizer has been used to copy short samples of human speech. It is possible to make the synthetic speech almost indistinguishable from the natural in spectrum, waveform, and by earphone listening, provided that the synthetic glottal pulse is derived by inverse filtering a typical natural vowel from the same talker. Various other pulse shapes have been tried, such as the combination of cosine segments suggested by various workers as a close approximation to human glottal pulses. For producing speech acceptable as natural, none of these idealized pulse shapes has been as successful as those derived by inverse filtering. However, the subjective differences are small compared with the differences that would be caused by reverberation when listening to a loudspeaker in an ordinary room with good acoustics; it has been demonstrated that under such listening conditions, the phase structure of glottal pulses is of no importance.

122 citations


Journal ArticleDOI
TL;DR: The theoretical basis for representation of a speech signal by its short-time Fourier transform is discussed and the design tradeoffs necessary to achieve moderate information rate reductions are elucidated.
Abstract: This paper discusses the theoretical basis for representation of a speech signal by its short-time Fourier transform. The results of the theoretical studies were used to design a speech analysis-synthesis system which was simulated on a general-purpose laboratory digital computer system. The simulation uses the fast Fourier transform in the analysis stage and specially designed finite duration impulse response filters in the synthesis stage. The results of both the theoretical and computational studies lead to an understanding of the effect of several design parameters and elucidate the design tradeoffs necessary to achieve moderate information rate reductions.

116 citations


Journal ArticleDOI
TL;DR: In this paper, a new procedure for testing whether a polynomial in a two-dimensional filter is nonzero in the unit circle of a set of self-inversive polynomials is given.
Abstract: For deciding the stability of a two-dimensional filter, it is of interest to determine whether or not a prescribed polynomial in the variables z 1 and z 2 is nonzero in the region |z_{1}| \leq 1 \cap |z_{2}| \leq 1 . A new procedure for testing for this property is given, which does not involve the use of bilinear tranformations. Key parts of the test involve the construction of a Schur-Cohn matrix and the checking for positivity on the unit circle of a set of self-inversive polynomials.

Journal ArticleDOI
TL;DR: A diagrammatic representation of mixed radix and highest radix FFT algorithms is derived, and two broad classes of FFT hard-ware are explored, from the point of view of speed, parallelism, radix number, and type of memory.
Abstract: The fast Fourier transform algorithm is derived by means of successive fracturing of one-dimensional data strings into two-dimensional arrays. Using this formulation, a diagrammatic representation of mixed radix and highest radix FFT algorithms is derived. Using this representation, two broad classes of FFT hard-ware are explored, from the point of view of speed, parallelism, radix number, and type of memory.

Journal ArticleDOI
TL;DR: This paper presents a model for machine recognition of connected speech and the details of a specific implementation of the model, the HEARSAY system, and the use of semantic, syntactic, lexical, and phonological sources of knowledge in the generation and verification of hypotheses.
Abstract: This paper presents a model for machine recognition of connected speech and the details of a specific implementation of the model, the HEARSAY system. The model consists of a small set of cooperating independent parallel processes that are capable of helping in the decoding of a spoken utterance either individually or collectively. The processes use the "hypothesize-and-test" paradigm. The structure of HEARSAY is illustrated by considering its operation in a particular task situation: voice-chess. The task is to recognize a spoken move in a given board position. Procedures for determination of parameters, segmentation, and phonetic descriptions are outlined. The use of semantic, syntactic, lexical, and phonological sources of knowledge in the generation and verification of hypotheses is described. Preliminary results of recognition of some utterances are given.

Journal ArticleDOI
Lawrence R. Rabiner1
TL;DR: In this article, a set of simple, approximate relationships between FIR, linear phase, low-pass filter parameters is presented, based on these relationships, it is shown how an existing, readily available, filter design program can be used to efficiently design lowpass filters that meet or exceed arbitrary input specifications.
Abstract: In this paper, a set of simple, approximate relationships between FIR, linear phase, low-pass filter parameters is presented. Based on these relationships, it is shown how an existing, readily available, filter design program can be used to efficiently design low-pass filters that meet or exceed arbitrary input specifications.

Journal ArticleDOI
TL;DR: In this paper, a method of characterization of two-input-two-output digital filters is presented, which is particularly suitable for the analysis of cascaded networks of this type, leading to a new realization method of ladder-type digital filters.
Abstract: A method of characterization of two-input two-output digital filters, which is particularly suitable for the analysis of cascaded networks of this type, is presented. This analysis approach has led to a new realization method of ladder-type digital filters. An example is used to illustrate the realization procedure.

Journal ArticleDOI
TL;DR: A new realization of nonrecursive digital filters that are used to operate on analog signals that requires no multiplications, and exploits the relative simplicity of delta modulation as a means for analog to digital conversion is proposed.
Abstract: A new realization of nonrecursive digital filters that are used to operate on analog signals is proposed. This realization requires no multiplications, and exploits the relative simplicity of delta modulation as a means for analog to digital conversion. This realization also permits a mechanization as a "very fast" digital filter, using read-only memory (ROM). An evaluation of this realization in terms of computation time storage requirements and mean-squared error is presented. These characteristics are compared with their counterparts for existing realization methods of nonrecursive digital filters. Computer simulation results that tend to confirm the theoretical results of the error analysis are included.

Journal ArticleDOI
TL;DR: The aim is to produce acceptable synthetic speech directly from English text; and to demonstrate with speech synthesis a detailed model of human articulatory movements, and to extend the vocal tract model to closer agreement with human articulation and vocal cord control.
Abstract: We summarize work between 1969 and 1972 in a continuing project With two objectives: to produce acceptable synthetic speech directly from English text; and to demonstrate with speech synthesis a detailed model of human articulatory movements. Work in the four-year period has yielded moderately accurate rules for predicting the occurrence of pauses and lesser breaks in the sentence; rules for vowel duration in many conditions, not just primary stressed syllables immediately before a pause; rules for contextual variations of consonants; and rules for durational and other allophonic variations on consonants at word boundaries. Presently we are studying natural speech to quantify and add detail to these rules, and we are working to extend the vocal tract model to closer agreement with human articulation and vocal cord control.

Journal ArticleDOI
TL;DR: In this article, the authors present a characterization of the ill-conditioning of numerical deconvolution problems based on a classical spectral decomposition of the discrete convolution, and some explicit sensitivity measures are introduced.
Abstract: An important class of applications in digital signal processing involves the numerical solution of the convolution integral. These so-called numerical deconvolution problems are notoriously difficult to solve because of their inherent ill-conditioning. In this paper we present a characterization of this ill-conditioning based on a classical spectral decomposition of the discrete convolution. Factors prominently influencing the conditioning are identified and some explicit sensitivity measures are introduced.

Journal ArticleDOI
W. Ainsworth1
TL;DR: The feasibility of converting English text into speech using an inexpensive computer and a small amount of stored data has been investigated and the intelligibility of the resulting synthetic speech is assessed by listening tests.
Abstract: The feasibility of converting English text into speech using an inexpensive computer and a small amount of stored data has been investigated. The text is segmented into breath groups, the orthography is converted into a phonemic representation, lexical stress is assigned to appropriate syllables, then the resulting string of symbols is converted by synthesis-by-rule into the parameter values for controlling an analogue speech synthesizer. The algorithms for performing these conversions are described in detail and evaluated independently, and the intelligibility of the resulting synthetic speech is assessed by listening tests.

Journal ArticleDOI
TL;DR: In this paper, the radiation characteristics of a planar array of concentric rings were examined and the beamwidth and sidelobe level were controlled by employing optimization techniques, treating the energy in the sidelobes as a criterion for optimization.
Abstract: The radiation characteristics of a planar array of concentric rings are examined. By employing optimization techniques, control of the beamwidth and sidelobe level is accomplished. Treating the energy in the sidelobes as a criterion for optimization, side-lobes of equal amplitude are obtained.

Journal ArticleDOI
TL;DR: In this article, the personal quality of sustained vowels uttered by eight male talkers was represented multidimensionally in a psychological auditory space (PAS) by means of Kruskal's multidimensional scaling procedure based on the perceptual confusion in talker discrimination tests.
Abstract: The personal quality of sustained vowels uttered by eight male talkers was represented multidimensionally in a psychological auditory space (PAS) by means of Kruskal's multidimensional scaling procedure based on the perceptual confusion in talker discrimination tests. Physical properties of the vowels were analyzed in terms of elementary acoustical parameters, such as formant frequencies, slope of glottal source spectrum, mean fundamental pitch frequency, and rapid fluctuation of fundamental pitch period. Then the relationship between the configuration on the PAS and the acoustical parameters was examined through multiple correlation and regression analysis. The contribution of those acoustical parameters to the personal quality of the five Japanese vowels and the relative contributions of the vocal tract and the glottal source characteristics are demonstrated quantitatively. These results were obtained partially, by utilizing hybrid voices in which the source wave or the formant frequency pattern was interchanged among different talkers.

Journal ArticleDOI
TL;DR: An absolute bound on limit cycle oscillations in fixed-point digital filter implementations due to roundoff errors is presented and it is shown that this bound is equal to the rms bound of Sandberg and Kaiser for real roots, and is never more than a factor of two greater than the rMS bound for complex roots for second-order filters.
Abstract: An absolute bound on limit cycle oscillations in fixed-point digital filter implementations due to roundoff errors is presented. Periodicity of the limit cycles is assumed in the derivation. Useful design results are explicitly given for the case of second-order filter sections. In addition it is shown that this bound is equal to the rms bound of Sandberg and Kaiser for real roots, and is never more than a factor of two greater than the rms bound for complex roots for second-order filters.

Journal ArticleDOI
TL;DR: In this article, the statistics of the estimate of magnitude coherence between two random stationary Gaussian processes are presented, including the probability density function, cumulative distribution function, bias, and the variance of the estimator.
Abstract: Expressions for the statistics of the estimate of magnitude coherence between two random stationary Gaussian processes are presented These statistics include the probability density function, the cumulative distribution function, the bias, and the variance of the estimator The expressions presented are in convenient and accurate forms for digital computer evaluation Graphical examples of the bias and variance are included Simple approximations are also given for the maximum bias, variance, and mean-square error

Journal ArticleDOI
G. Maria1, M. Fahmy
TL;DR: In this article, a method to check the stability of two-dimensional recursive filters is proposed, where the Jury table is modified and used to check Huang's theorem, and some examples are solved to illustrate the method.
Abstract: This correspondence proposes a method to check the stability of two-dimensional recursive filters. In this method the Jury table is modified and used to check the first condition of Huang's theorem. Some examples are solved to illustrate the method.

Journal ArticleDOI
TL;DR: An attempt to develop a computer-based system of speech-training aids for the deaf and the importance of evolving such a system through a close interaction of developers and users is stressed.
Abstract: This paper describes an attempt to develop a computer-based system of speech-training aids for the deaf. Some of the problems associated with the speech of the deaf are briefly reviewed. Reasons for attempting to apply a digital computer to the problem of speech training are given. The system and its display capabilities are described. The importance of evolving such a system through a close interaction of developers and users is stressed.

Journal ArticleDOI
TL;DR: Mathematically experienced and mathematically naive listeners displayed similar performance that suggests that the acoustic cues used by the speaker to indicate syntactic structure in this restricted domain of discourse may have a more general applicability.
Abstract: A study of the relationship between the syntactic and prosodic organization of spoken algebraic expressions is reported. It was found that subjects were very consistent in their placement of junctures when reading algebraic expressions slowly. Furthermore, there was an almost perfect correlation between measured silence and perceived juncture. Rules were developed for inserting parentheses based on the location and measured duration of silence intervals in an utterance. Listeners were asked to insert parentheses, given the spoken form, and the consistency of their answers was measured by a chi-square test. For those cases where there was listener agreement on a single answer, the rules were tested and found to agree with the listeners from 91 to 95 percent of the time. Mathematically experienced and mathematically naive listeners displayed similar performance that suggests that the acoustic cues used by the speaker to indicate syntactic structure in this restricted domain of discourse may have a more general applicability.

Journal ArticleDOI
TL;DR: An experiment was performed in which the authors attempted to recognize a set of unknown sentences by visual examination of spectrograms and machine-aided lexical searching and found the differences between the phonetic characteristics of isolated words and of the same words when they appear in sentences are emphasized.
Abstract: An experiment was performed in which the authors attempted to recognize a set of unknown sentences by visual examination of spectrograms and machine-aided lexical searching. Ninteen sentences representing data from five talkers were analyzed. An initial partial transcription in terms of phonetic features was performed. The transcription contained many errors and omissions: 10 percent of the segments were omitted, 17 percent were incorrectly transcribed, and an additional 40 percent were transcribed only partially in terms of phonetic features. The transcription was used by the experimenters to initiate computerized scans of a 200-word lexicon. A majority of the search responses did not contain the correct word. However, following extended interactions with the computer, a word-recognition rate of 96 percent was achieved by each investigator for the sentence material. Implications for automatic speech recognition are discussed. In particular, the differences between the phonetic characteristics of isolated words and of the same words when they appear in sentences are emphasized.

Journal ArticleDOI
R. Lummis1
TL;DR: A technique for automatic speaker verification is described in which voice pitch, low-frequency intensity, and the three lowest formant frequencies, all as functions of time, are the features used to represent an individual utterance.
Abstract: A technique for automatic speaker verification is described in which voice pitch, low-frequency intensity, and the three lowest formant frequencies, all as functions of time, are the features used to represent an individual utterance. Verification consists of computing these features for a test utterance and comparing them with stored reference versions for the claimed identity. Before the test-versus-reference comparison is effected, the time dimension of the test utterance is warped to optimally register its intensity pattern onto the reference intensity pattern. Performance of the system is measured on a speaker population of moderate size. A variety of comparison formulas and various subsets of the five speech features are evaluated. The system responds either "accept" or "reject" to every utterance; "no decision" is not allowed. Automatic verification based solely upon voice pitch and intensity, both of which can be computed rapidly, yields average error rates below 1 percent.

Journal ArticleDOI
TL;DR: A model was developed that interprets weaker suppression to be, in part, a consequence of the interaction of the auditory features of the dichotic signals prior to phonetic processing.
Abstract: When patients with hemispherectomies or temporal lobectomies listen to dichotic pairs of equal-intensity C-V syllables, they do poorly identifying the stimuli presented to the ear contralateral to the lesion. This effect is similar to that seen for normals, who in the same circumstances, perform poorly on the left-ear stimulus. (The ear contralateral to a lesion for patients and the left ear for normals will be designated the "weak ear", the ear ipsilateral to a patient's lesion and the right ear for normals will be called the "strong ear".) To further explore these phenomena we investigated the ability of stimuli other than C-V's in the strong ear to suppress the perception of C-V's in the weak ear. Suppression was found when the strong-ear stimulus was a vowel. Somewhat more suppression was seen when the strong-ear stimuli were computer-generated signals with acoustic features similar to C-V's ("bleats"). Suppression was seen even if the strong-ear vowels and bleats were 20-40 dB less intense than the syllables in the weak ear. A model was developed that interprets weaker suppression to be, in part, a consequence of the interaction of the auditory features of the dichotic signals prior to phonetic processing.

Journal ArticleDOI
TL;DR: The Pade approximant technique provides a quick design of recursive digital filters in that spectrum shaping requirements as well as linear phase constraints can be handled easily, even for higher order filters.
Abstract: The Pade approximant technique provides a quick design of recursive digital filters. An added advantage of the technique lies in that spectrum shaping requirements as well as linear phase constraints can be handled easily, even for higher order filters. This is important in supplying initial guesses of the filter parameters to iterative routines that would then seek a locally optimal design solution. These advantages are among those discussed in a partly tutorial presentation of the technique that relates to filter needs found in data transmission systems. In addition, the question of stability is treated and a new criterion is presented. The criterion provides sufficient conditions in establishing stability for a filter designed by using the Pade approximant technique.