Showing papers on "Spectrogram published in 1989"

PDF

Open Access

Journal Article•DOI•

Speaker-independent phone recognition using hidden Markov models

[...]

Kai-Fu Lee¹, H.-W. Hon¹•Institutions (1)

01 Nov 1989-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The authors introduce the co-occurrence smoothing algorithm, which enables accurate recognition even with very limited training data, and can be used as benchmarks to evaluate future systems.

...read moreread less

Abstract: Hidden Markov modeling is extended to speaker-independent phone recognition. Using multiple codebooks of various linear-predictive-coding (LPC) parameters and discrete hidden Markov models (HMMs) the authors obtain a speaker-independent phone recognition accuracy of 58.8-73.8% on the TIMIT database, depending on the type of acoustic and language models used. In comparison, the performance of expert spectrogram readers is only 69% without use of higher level knowledge. The authors introduce the co-occurrence smoothing algorithm, which enables accurate recognition even with very limited training data. Since the results were evaluated on a standard database, they can be used as benchmarks to evaluate future systems. >

...read moreread less

895 citations

Proceedings Article•DOI•

Filtering of colored noise for speech enhancement and coding

[...]

B. Koo¹, Jerry D. Gibson¹, S.D. Gray•Institutions (1)

Texas A&M University¹

23 May 1989

TL;DR: The colored-noise prefilter greatly enhances the quality and intelligibility of LPC output speech for noisy inputs, and it is demonstrated that such gains are unavailable with white noise assumption Kalman and Wiener filters.

...read moreread less

Abstract: A report is presented on experiments using a colored-noise assumption Kalman filter to enhance speech additively contaminated by colored noise, such as helicopter noise and jeep noise, with a particular application to linear predictive coding (LPC) of noisy speech. The results indicate that the colored-noise Kalman filter provides a significant gain in SNR, a clear improvement in the sound spectrogram, and an audible improvement in output speech quality. The authors demonstrate that such gains are unavailable with white noise assumption Kalman and Wiener filters. The colored-noise prefilter greatly enhances the quality and intelligibility of LPC output speech for noisy inputs. >

...read moreread less

132 citations

Book•

Speech Time-Frequency Representations

[...]

Michael D. Riley

31 Jan 1989

TL;DR: In this article, the authors proposed a time frequency energy representation for speech and showed that the representation can be used for signal detection and ridge identification in the stationary case and the quasi-stationary case.

...read moreread less

Abstract: 1 Introduction.- 2 The Time-Frequency Energy Representation.- 2.1. The stationary case.- 2.2. The quasi-stationary case.- 2.3. Non-stationarity.- 2.4. Joint time-frequency representations.- 2.5. Design criteria for time-frequency representations.- 2.6. Relations among the design criteria.- 2.7. Satisfying the design criteria.- 2.8. Directional time-frequency transforms.- 2.9. A speech example.- 3 Time-Frequency Filtering.- 3.1. The stationary case.- 3.2. Non-stationary vocal tract.- 3.3. Time-frequency filtering.- 3.4. The stationary case - re-examined.- 3.5. Linearly varying modulation frequency.- 3.6. The quasi-stationary case.- 3.7. Smoothly varying modulation frequency.- 3.8. The vocal tract transfer function.- 3.9. The transmission channel.- 3.10. The excitation.- 4 The Schematic Spectrogram.- 4.1. Rationale.- 4.2. Spectral Peaks.- 4.3. Time-frequency ridges - non-directional kernel.- 4.4. Time-frequency ridges - directional kernel.- 4.5. Signal detection and ridge identification.- 4.6. Continuity and grouping.- 4.7. A perspective.- 5 A Catalog of Examples.- 5.1. Some general examples.- 5.2. Liquids and glides.- 5.3. Nasalized vowels.- 5.4. Consonant-vowel transitions.- 5.5. Female speech.- 5.6. Transmission channel effects.- References.

...read moreread less

44 citations

Proceedings Article•DOI•

Phoneme segmentation using spectrogram reading knowledge

[...]

K. Hatazaki, Y. Komori, Takeshi Kawabata, Kiyohiro Shikano

23 May 1989

TL;DR: A method is presented for phoneme segmentation by an expert system utilizing spectrogram reading strategy and knowledge that is able to detect about 90% of the phonemes correctly and determine their boundaries as well as their coarse categories.

...read moreread less

Abstract: A method is presented for phoneme segmentation by an expert system utilizing spectrogram reading strategy and knowledge. The expert system detects phonemes in a spectrogram and determines their boundaries as well as their coarse categories. To simulate a human expert spectrogram reading process, the system performs assumption-based inference with certainty factors, and top-down acoustic feature extraction under phonetic context hypotheses. The system, into which Japanese consonant segmentation knowledge is incorporated, is able to detect about 90% of the phonemes correctly. In particular, the phoneme boundaries detected by the system are as accurate as those detected by human experts. The result is that the phonemes obtained by the expert system can be identified using a stochastic phoneme recognition method. >

...read moreread less

21 citations

Journal Article•DOI•

Realtime digital signal processing system using a parallel processing architecture

[...]

P. C. Ching¹, S. W. Wu¹•Institutions (1)

The Chinese University of Hong Kong¹

01 Dec 1989-Microprocessors and Microsystems

TL;DR: A low cost, high-speed, general-purpose ditigal signal processing system was constructed using the TMS32010 digital signal processor with a parallel processing architecture to achieve realtime performance.

...read moreread less

12 citations

Proceedings Article•DOI•

A time and frequency-domain speech scrambler

[...]

R.M. Milton¹•Institutions (1)

Council of Scientific and Industrial Research¹

23 Jun 1989

TL;DR: A combined time- and frequency-domain speech scrambler that is designed to reduce residual intelligibility to zero by removing all clues for auditory perception from the scrambled speech is described.

...read moreread less

Abstract: A combined time- and frequency-domain speech scrambler that is designed to reduce residual intelligibility to zero by removing all clues for auditory perception from the scrambled speech is described. The recovered speech quality is good. Although implemented using an FFT (fast Fourier transform) algorithm, which is a batch process, the method does not require frame synchronization, and synchronization requirements for key changes are relatively lax. The proposal has been tested by simulation using, among other channels, an HF radio link simulator. The results confirm the performance of the scrambler, showing that it is robust enough for use on poor-quality HF channels. >

...read moreread less

9 citations

Proceedings Article•

Phoneme recognition expert system using spectrogram reading knowledge and neural networks.

[...]

Yasuhiro Komori, Kaichiro Hatazaki, Takaharu Tanaka, Takeshi Kawabata, Kiyohiro Shikano - Show less +1 more

01 Jan 1989

TL;DR: A phoneme recognition expert system which consists of two parts: (1) rule-based phoneme segmentation, and (2) neural network- based phoneme identification for knowledge such as pattern matching.

...read moreread less

9 citations

Proceedings Article•DOI•

Time-frequency distributions: a modification applied to the pseudo-Wigner-Ville distribution and the spectrogram

[...]

J.C. Moss¹, P.G. Adamopoulos¹, J.K. Hammond¹•Institutions (1)

University of Southampton¹

08 May 1989

TL;DR: In this paper, a modification to the spectrogram of K. Kodera et al. (see Phys. Earth Planetary Interiors, vol.12, p.142-150, 1976) is applied to the pseudo-Wigner-Ville distribution (PWD), and a comparison is made between the Spectrogram and the PWD, with and without modification, using numerical examples.

...read moreread less

Abstract: The spectrogram and the Wigner-Ville distribution are reviewed as methods for time-frequency analysis of nonstationary signals. The modification to the spectrogram of K. Kodera et al. (see Phys. Earth Planetary Interiors, vol.12, p.142-150, 1976) is applied to the pseudo-Wigner-Ville distribution (PWD), and a comparison is made between the spectrogram and the PWD, with and without modification, using numerical examples. The optimum time-frequency analysis tool is shown to depend on the nature of the input signal. The modified spectrogram is seen to be a credible alternative to the PWD. >

...read moreread less

8 citations

Proceedings Article•DOI•

Pattern search prediction of speech

[...]

R.E. Bogner¹, T. Li¹•Institutions (1)

University of Adelaide¹

23 May 1989

TL;DR: The pattern search predictor (PSP) predicts samples of a signal by inspecting the past for patterns of (about ten) samples that match the most recent set of samples, and has some promise for filling in lost data.

...read moreread less

Abstract: The pattern search predictor (PSP) predicts samples of a signal by inspecting the past for patterns of (about ten) samples that match the most recent set. The sample subsequent to the found pattern is used to make the required estimate. PSP has been tested in a codec algorithm based on the CCITT 32-kb adaptive differential pulse-code modulation standard, using its adaptive quantizer. Study of spectrograms has shown that the error is substantially white, as expected, and that perturbations of the signal spectrograms are substantially undetectable. PSP has some promise for filling in lost data. >

...read moreread less

6 citations

Proceedings Article•DOI•

A phonetically based small vocabulary automatic speech recognition system

[...]

K. Walker, C.J.S. deSilva, Mike Alder, Yianni Attikiouzel, R. Hallgren - Show less +1 more

22 Nov 1989

TL;DR: A description is presented of a speaker independent automatic speech recognition system for a small vocabulary, employing phonetically based methods, that uses formant tracking and relative energy values to characterize each word in the vocabulary.

...read moreread less

Abstract: A description is presented of a speaker independent automatic speech recognition system for a small vocabulary, employing phonetically based methods. The system uses formant tracking and relative energy values to characterize each word in the vocabulary (the digits, 0 to 9) and also a ratio of energies in the top and bottom half of the frequency band to detect fricatives. The formants are tracked by the second derivative of a smoothed FFT (fast Fourier transform). The system was tested on a number of speakers of both sexes, with encouraging results. Conclusions are drawn about the general feasibility of a formant based approach to automatic speech recognition. >

...read moreread less

4 citations

Journal Article•DOI•

Frequency-hopping codes for multiple-access channels: a geometric approach

[...]

D.B. Jevtic, Hasan S. Alkhatib¹•Institutions (1)

Santa Clara University¹

01 Mar 1989-IEEE Transactions on Information Theory

TL;DR: It is shown that a finite affine plane is a powerful generator of frequency-hopping codes for multiple-access channels and that it provides optimum performance codes in a noiseless environment.

...read moreread less

Abstract: Basic notions pertinent to code-division multiple-user communication signals are defined in set-theoretic terms. A general treatment of composition codes by identifying a time-frequency spectrogram with a set of points in a finite plane is provided. It is shown that a finite affine plane is a powerful generator of frequency-hopping codes for multiple-access channels and that it provides optimum performance codes in a noiseless environment. >

...read moreread less

Proceedings Article•DOI•

Proportional Bandwidth, Wideband Wigner-Ville Analysis

[...]

Richard A. Altes

14 Nov 1989

TL;DR: The Q-distribution as discussed by the authors is a modified Wigner-Ville representation that is related to the wideband ambiguity function by an integral transform and can be used to construct a proportional bandwidth spectrogram corresponding to a bank of constant-Q filters.

...read moreread less

Abstract: The Wigner-Ville (W-V) distribution is a time-frequency representation that yields a highly accurate estimate of instantaneous frequency. It is related to the narrowband ambiguity function by an integral transform, and it can be used in a variety of detection and estimation problems. Convolution of signal and filter W-V distributions yields a spectrogram that could also be constructed with a bank of constant bandwidth filters. The wideband, ambiguity function represents the Doppler effect with dilation or compression rather than with frequency shift as in the narrowband approximation. The "Q-distribution" is a modified W-V representation that is related to the wideband ambiguity function by an integral transform and can be used to construct a proportional bandwidth spectrogram corresponding to a bank of constant-Q filters. The Q-distribution is thus a wideband version of the W-V distribution. Properties of the Q-distribution indicate that it may prove useful for detection and parameter estimation as well as measurement of wideband scattering functions.

...read moreread less

Journal Article•DOI•

Reconstruction of mutilated speech

[...]

M. Kabrisky, S.K. Rogers, N.A. Bashir

01 Sep 1989-IEEE Aerospace and Electronic Systems Magazine

TL;DR: A system has been developed to enhance the quality of mutilated speech by resynthesizing the speech as a sum of computer-generated sinusoids whose amplitudes and phases are derived partly from the given mutilatedspeech signal and partly from rules based on known properties of normal speech.

...read moreread less

Abstract: A system has been developed to enhance the quality of mutilated speech. A standard spectrogram analysis of the damaged speech is performed. The speech is then resynthesized as a sum of computer-generated sinusoids whose amplitudes and phases are derived partly from the given mutilated speech signal and partly from rules based on known properties of normal speech. The sinusoids selected are only approximate harmonics of the glottal pitch and are selected by a nonlinear, noncausal set of rules to reduce the nonspeech components in the synthesized speech output. The system has been shown to increase the quality of the mutilated speech appreciably. >

...read moreread less

Proceedings Article•DOI•

Spectral estimation properties of nonlinear auditory models for noisy signals

[...]

T.V. Sreenivas¹, K. Singh¹, R.J. Niederjohn¹, J.A. Heinen¹•Institutions (1)

Marquette University¹

09 Nov 1989

TL;DR: The simulation of an auditory model of the inner ear including the nonlinear transduction of the auditory nerves is discussed and a scheme of cortical processing is proposed and it appears that the area under the major peak in the histogram is a more consistent measure of the signal strength than the EIH amplitude.

...read moreread less

Abstract: The simulation of an auditory model of the inner ear including the nonlinear transduction of the auditory nerves is discussed and a scheme of cortical processing is proposed. The output of the last stage of the model, called the ensemble interval histogram (EIH), is a cortical representation of speech in both time and frequency, similar to a spectrogram. A statistical analysis of the output of this system is performed for sinusoidal and noise inputs to determine the accuracy of spectral representation in terms of frequency, amplitude, resolution, etc. Some preliminary simulation results for sinusoid and noise input at two signal-to-noise ratios are shown. It is found that although the EIH may have noise robustness, its resolution is a decreasing function of frequency. In addition, the magnitude of the EIM is sensitive to the noise in the signal as well as other discretizations in the model. It appears that the area under the major peak in the histogram is a more consistent measure of the signal strength than the EIH amplitude. >

...read moreread less

Journal Article•DOI•

Time‐freqyuency spectra for nonstationary acoustic signals—The Wigner distribution, the evolutionary spectrum, the modified moving window spectrum, and their interrelationships

[...]

Jennie Moss, Jong‐Sik Lee, Panos G. Adamopoulos, Joseph K. Hammond

01 May 1989-Journal of the Acoustical Society of America

TL;DR: In this article, a summary of the definitions and computation of these three different time-frequency spectra, the modeling of acoustic signals due to propagating sources in a form allowing prediction of time frequency spectra is presented.

...read moreread less

Abstract: The time‐frequency description of acoustic signals is a common requirement (applications include speech, sonagrams, frequency tracking, etc.), and this applies to transients and nonstationary signals (both random and nonrandom). Three time‐frequency spectra that are used are the Wigner‐Ville distribution, Priestley's evolutionary spectral density, and Kodera's modification to the moving spectrogram. This paper will present: a summary of the definitions and computation of these three different time‐frequency spectra; the modeling of acoustic signals due to propagating sources in a form allowing prediction of time‐frequency spectra; development of the interrelationships between the spectra; and theoretical and simulation results from time‐frequency spectra. Examples of the application of all three approaches will be illustrated by using frequency‐modulated signals. The signals will model sound as perceived by an observer due to a convecting acoustic source (i.e., including Doppler, range, and directivity ef...

...read moreread less

Proceedings Article•DOI•

New distortion measure in regular pulse excitation speech coding

[...]

S. Kang¹, T.R. Fischer•Institutions (1)

Texas A&M University¹

27 Nov 1989

TL;DR: A new distortion measure for determining the amplitudes of the regular pulse sequence used in regular-pulse excitation speech coding is introduced that reduces the degradation due to pulse amplitude quantization and provides a better reproduction of the original spectrogram.

...read moreread less

Abstract: The authors introduce a new distortion measure for determining the amplitudes of the regular pulse sequence used in regular-pulse excitation speech coding. The method combines the adaptive predictive distortion criterion with the distortion measure used in the typical regular-pulse excitation system and reduces the degradation due to pulse amplitude quantization. Experimental results show that the proposed distortion measure provides increases of about 0.7 to 1 dB in segmental signal-to-noise ratio, a better reproduction of the original spectrogram, and perceptually good speech quality at medium bit rates (of about 16 kb/s). >

...read moreread less

QUANTITATIVB ANALYSIS OF DOPPLga FLOW SPECTRA IN THE LO= LIHB ARTBRIBS

[...]

Louis Allard, Y. E. Langlois, M. Beaudoin, Guy Cloutier, Paul Roy, R. Robillard - Show less +2 more

01 Jan 1989

TL;DR: In this article, a computer processing method was developed to analyze lower limb Doppler signals, extract diagnostic features from doppler spectrograms and classify the severity of the disease.

...read moreread less

Abstract: Recent clinical results have shown that duplex ultrasound techniques can be a reliable and noninvasive method to detect lower limb arterial diseases. In the present study, a computer processing method was developed to analyze lower limb Doppler signals, extract diagnostic features from Doppler spectrograms and classify the severity of the disease. A pattern recognition method (Bayes model) was used to discriminate between nonhemodynamically significant ( 50% diameter reduction). The features investigated were based on spectral broadening and on the power of the spectrogram in various frequency bands. The performance of the pattern recognition method was compared to that of conventional biplane contrast angiogram read by experimented angioradiologists. Results showed a percentage of correct classification of 85%. Sensitivity was 77% and specificity was 90%.

...read moreread less