scispace - formally typeset
Search or ask a question

Showing papers on "Spectrogram published in 1989"


Journal ArticleDOI
TL;DR: The authors introduce the co-occurrence smoothing algorithm, which enables accurate recognition even with very limited training data, and can be used as benchmarks to evaluate future systems.
Abstract: Hidden Markov modeling is extended to speaker-independent phone recognition. Using multiple codebooks of various linear-predictive-coding (LPC) parameters and discrete hidden Markov models (HMMs) the authors obtain a speaker-independent phone recognition accuracy of 58.8-73.8% on the TIMIT database, depending on the type of acoustic and language models used. In comparison, the performance of expert spectrogram readers is only 69% without use of higher level knowledge. The authors introduce the co-occurrence smoothing algorithm, which enables accurate recognition even with very limited training data. Since the results were evaluated on a standard database, they can be used as benchmarks to evaluate future systems. >

895 citations


Proceedings ArticleDOI
23 May 1989
TL;DR: The colored-noise prefilter greatly enhances the quality and intelligibility of LPC output speech for noisy inputs, and it is demonstrated that such gains are unavailable with white noise assumption Kalman and Wiener filters.
Abstract: A report is presented on experiments using a colored-noise assumption Kalman filter to enhance speech additively contaminated by colored noise, such as helicopter noise and jeep noise, with a particular application to linear predictive coding (LPC) of noisy speech. The results indicate that the colored-noise Kalman filter provides a significant gain in SNR, a clear improvement in the sound spectrogram, and an audible improvement in output speech quality. The authors demonstrate that such gains are unavailable with white noise assumption Kalman and Wiener filters. The colored-noise prefilter greatly enhances the quality and intelligibility of LPC output speech for noisy inputs. >

132 citations


Book
31 Jan 1989
TL;DR: In this article, the authors proposed a time frequency energy representation for speech and showed that the representation can be used for signal detection and ridge identification in the stationary case and the quasi-stationary case.
Abstract: 1 Introduction.- 2 The Time-Frequency Energy Representation.- 2.1. The stationary case.- 2.2. The quasi-stationary case.- 2.3. Non-stationarity.- 2.4. Joint time-frequency representations.- 2.5. Design criteria for time-frequency representations.- 2.6. Relations among the design criteria.- 2.7. Satisfying the design criteria.- 2.8. Directional time-frequency transforms.- 2.9. A speech example.- 3 Time-Frequency Filtering.- 3.1. The stationary case.- 3.2. Non-stationary vocal tract.- 3.3. Time-frequency filtering.- 3.4. The stationary case - re-examined.- 3.5. Linearly varying modulation frequency.- 3.6. The quasi-stationary case.- 3.7. Smoothly varying modulation frequency.- 3.8. The vocal tract transfer function.- 3.9. The transmission channel.- 3.10. The excitation.- 4 The Schematic Spectrogram.- 4.1. Rationale.- 4.2. Spectral Peaks.- 4.3. Time-frequency ridges - non-directional kernel.- 4.4. Time-frequency ridges - directional kernel.- 4.5. Signal detection and ridge identification.- 4.6. Continuity and grouping.- 4.7. A perspective.- 5 A Catalog of Examples.- 5.1. Some general examples.- 5.2. Liquids and glides.- 5.3. Nasalized vowels.- 5.4. Consonant-vowel transitions.- 5.5. Female speech.- 5.6. Transmission channel effects.- References.

44 citations


Proceedings ArticleDOI
23 May 1989
TL;DR: A method is presented for phoneme segmentation by an expert system utilizing spectrogram reading strategy and knowledge that is able to detect about 90% of the phonemes correctly and determine their boundaries as well as their coarse categories.
Abstract: A method is presented for phoneme segmentation by an expert system utilizing spectrogram reading strategy and knowledge. The expert system detects phonemes in a spectrogram and determines their boundaries as well as their coarse categories. To simulate a human expert spectrogram reading process, the system performs assumption-based inference with certainty factors, and top-down acoustic feature extraction under phonetic context hypotheses. The system, into which Japanese consonant segmentation knowledge is incorporated, is able to detect about 90% of the phonemes correctly. In particular, the phoneme boundaries detected by the system are as accurate as those detected by human experts. The result is that the phonemes obtained by the expert system can be identified using a stochastic phoneme recognition method. >

21 citations


Journal ArticleDOI
TL;DR: A low cost, high-speed, general-purpose ditigal signal processing system was constructed using the TMS32010 digital signal processor with a parallel processing architecture to achieve realtime performance.

12 citations


Proceedings ArticleDOI
23 Jun 1989
TL;DR: A combined time- and frequency-domain speech scrambler that is designed to reduce residual intelligibility to zero by removing all clues for auditory perception from the scrambled speech is described.
Abstract: A combined time- and frequency-domain speech scrambler that is designed to reduce residual intelligibility to zero by removing all clues for auditory perception from the scrambled speech is described. The recovered speech quality is good. Although implemented using an FFT (fast Fourier transform) algorithm, which is a batch process, the method does not require frame synchronization, and synchronization requirements for key changes are relatively lax. The proposal has been tested by simulation using, among other channels, an HF radio link simulator. The results confirm the performance of the scrambler, showing that it is robust enough for use on poor-quality HF channels. >

9 citations


Proceedings Article
01 Jan 1989
TL;DR: A phoneme recognition expert system which consists of two parts: (1) rule-based phoneme segmentation, and (2) neural network- based phoneme identification for knowledge such as pattern matching.

9 citations


Proceedings ArticleDOI
08 May 1989
TL;DR: In this paper, a modification to the spectrogram of K. Kodera et al. (see Phys. Earth Planetary Interiors, vol.12, p.142-150, 1976) is applied to the pseudo-Wigner-Ville distribution (PWD), and a comparison is made between the Spectrogram and the PWD, with and without modification, using numerical examples.
Abstract: The spectrogram and the Wigner-Ville distribution are reviewed as methods for time-frequency analysis of nonstationary signals. The modification to the spectrogram of K. Kodera et al. (see Phys. Earth Planetary Interiors, vol.12, p.142-150, 1976) is applied to the pseudo-Wigner-Ville distribution (PWD), and a comparison is made between the spectrogram and the PWD, with and without modification, using numerical examples. The optimum time-frequency analysis tool is shown to depend on the nature of the input signal. The modified spectrogram is seen to be a credible alternative to the PWD. >

8 citations


Proceedings ArticleDOI
23 May 1989
TL;DR: The pattern search predictor (PSP) predicts samples of a signal by inspecting the past for patterns of (about ten) samples that match the most recent set of samples, and has some promise for filling in lost data.
Abstract: The pattern search predictor (PSP) predicts samples of a signal by inspecting the past for patterns of (about ten) samples that match the most recent set. The sample subsequent to the found pattern is used to make the required estimate. PSP has been tested in a codec algorithm based on the CCITT 32-kb adaptive differential pulse-code modulation standard, using its adaptive quantizer. Study of spectrograms has shown that the error is substantially white, as expected, and that perturbations of the signal spectrograms are substantially undetectable. PSP has some promise for filling in lost data. >

6 citations


Proceedings ArticleDOI
22 Nov 1989
TL;DR: A description is presented of a speaker independent automatic speech recognition system for a small vocabulary, employing phonetically based methods, that uses formant tracking and relative energy values to characterize each word in the vocabulary.
Abstract: A description is presented of a speaker independent automatic speech recognition system for a small vocabulary, employing phonetically based methods. The system uses formant tracking and relative energy values to characterize each word in the vocabulary (the digits, 0 to 9) and also a ratio of energies in the top and bottom half of the frequency band to detect fricatives. The formants are tracked by the second derivative of a smoothed FFT (fast Fourier transform). The system was tested on a number of speakers of both sexes, with encouraging results. Conclusions are drawn about the general feasibility of a formant based approach to automatic speech recognition. >

4 citations


Journal ArticleDOI
TL;DR: It is shown that a finite affine plane is a powerful generator of frequency-hopping codes for multiple-access channels and that it provides optimum performance codes in a noiseless environment.
Abstract: Basic notions pertinent to code-division multiple-user communication signals are defined in set-theoretic terms. A general treatment of composition codes by identifying a time-frequency spectrogram with a set of points in a finite plane is provided. It is shown that a finite affine plane is a powerful generator of frequency-hopping codes for multiple-access channels and that it provides optimum performance codes in a noiseless environment. >

Proceedings ArticleDOI
14 Nov 1989
TL;DR: The Q-distribution as discussed by the authors is a modified Wigner-Ville representation that is related to the wideband ambiguity function by an integral transform and can be used to construct a proportional bandwidth spectrogram corresponding to a bank of constant-Q filters.
Abstract: The Wigner-Ville (W-V) distribution is a time-frequency representation that yields a highly accurate estimate of instantaneous frequency. It is related to the narrowband ambiguity function by an integral transform, and it can be used in a variety of detection and estimation problems. Convolution of signal and filter W-V distributions yields a spectrogram that could also be constructed with a bank of constant bandwidth filters. The wideband, ambiguity function represents the Doppler effect with dilation or compression rather than with frequency shift as in the narrowband approximation. The "Q-distribution" is a modified W-V representation that is related to the wideband ambiguity function by an integral transform and can be used to construct a proportional bandwidth spectrogram corresponding to a bank of constant-Q filters. The Q-distribution is thus a wideband version of the W-V distribution. Properties of the Q-distribution indicate that it may prove useful for detection and parameter estimation as well as measurement of wideband scattering functions.

Journal ArticleDOI
TL;DR: A system has been developed to enhance the quality of mutilated speech by resynthesizing the speech as a sum of computer-generated sinusoids whose amplitudes and phases are derived partly from the given mutilatedspeech signal and partly from rules based on known properties of normal speech.
Abstract: A system has been developed to enhance the quality of mutilated speech. A standard spectrogram analysis of the damaged speech is performed. The speech is then resynthesized as a sum of computer-generated sinusoids whose amplitudes and phases are derived partly from the given mutilated speech signal and partly from rules based on known properties of normal speech. The sinusoids selected are only approximate harmonics of the glottal pitch and are selected by a nonlinear, noncausal set of rules to reduce the nonspeech components in the synthesized speech output. The system has been shown to increase the quality of the mutilated speech appreciably. >

Proceedings ArticleDOI
09 Nov 1989
TL;DR: The simulation of an auditory model of the inner ear including the nonlinear transduction of the auditory nerves is discussed and a scheme of cortical processing is proposed and it appears that the area under the major peak in the histogram is a more consistent measure of the signal strength than the EIH amplitude.
Abstract: The simulation of an auditory model of the inner ear including the nonlinear transduction of the auditory nerves is discussed and a scheme of cortical processing is proposed. The output of the last stage of the model, called the ensemble interval histogram (EIH), is a cortical representation of speech in both time and frequency, similar to a spectrogram. A statistical analysis of the output of this system is performed for sinusoidal and noise inputs to determine the accuracy of spectral representation in terms of frequency, amplitude, resolution, etc. Some preliminary simulation results for sinusoid and noise input at two signal-to-noise ratios are shown. It is found that although the EIH may have noise robustness, its resolution is a decreasing function of frequency. In addition, the magnitude of the EIM is sensitive to the noise in the signal as well as other discretizations in the model. It appears that the area under the major peak in the histogram is a more consistent measure of the signal strength than the EIH amplitude. >

Journal ArticleDOI
TL;DR: In this article, a summary of the definitions and computation of these three different time-frequency spectra, the modeling of acoustic signals due to propagating sources in a form allowing prediction of time frequency spectra is presented.
Abstract: The time‐frequency description of acoustic signals is a common requirement (applications include speech, sonagrams, frequency tracking, etc.), and this applies to transients and nonstationary signals (both random and nonrandom). Three time‐frequency spectra that are used are the Wigner‐Ville distribution, Priestley's evolutionary spectral density, and Kodera's modification to the moving spectrogram. This paper will present: a summary of the definitions and computation of these three different time‐frequency spectra; the modeling of acoustic signals due to propagating sources in a form allowing prediction of time‐frequency spectra; development of the interrelationships between the spectra; and theoretical and simulation results from time‐frequency spectra. Examples of the application of all three approaches will be illustrated by using frequency‐modulated signals. The signals will model sound as perceived by an observer due to a convecting acoustic source (i.e., including Doppler, range, and directivity ef...

Proceedings ArticleDOI
27 Nov 1989
TL;DR: A new distortion measure for determining the amplitudes of the regular pulse sequence used in regular-pulse excitation speech coding is introduced that reduces the degradation due to pulse amplitude quantization and provides a better reproduction of the original spectrogram.
Abstract: The authors introduce a new distortion measure for determining the amplitudes of the regular pulse sequence used in regular-pulse excitation speech coding. The method combines the adaptive predictive distortion criterion with the distortion measure used in the typical regular-pulse excitation system and reduces the degradation due to pulse amplitude quantization. Experimental results show that the proposed distortion measure provides increases of about 0.7 to 1 dB in segmental signal-to-noise ratio, a better reproduction of the original spectrogram, and perceptually good speech quality at medium bit rates (of about 16 kb/s). >

01 Jan 1989
TL;DR: In this article, a computer processing method was developed to analyze lower limb Doppler signals, extract diagnostic features from doppler spectrograms and classify the severity of the disease.
Abstract: Recent clinical results have shown that duplex ultrasound techniques can be a reliable and noninvasive method to detect lower limb arterial diseases. In the present study, a computer processing method was developed to analyze lower limb Doppler signals, extract diagnostic features from Doppler spectrograms and classify the severity of the disease. A pattern recognition method (Bayes model) was used to discriminate between nonhemodynamically significant ( 50% diameter reduction). The features investigated were based on spectral broadening and on the power of the spectrogram in various frequency bands. The performance of the pattern recognition method was compared to that of conventional biplane contrast angiogram read by experimented angioradiologists. Results showed a percentage of correct classification of 85%. Sensitivity was 77% and specificity was 90%.