scispace - formally typeset
Search or ask a question

Showing papers on "Spectrogram published in 1991"


Journal ArticleDOI
TL;DR: The results indicate that the colored noise Kalman filters provide a significant gain in signal-to- noise ratio (SNR), a visible improvement in the sound spectrogram, and an audible improvement in output speech quality, none of which are available with white-noise-assumption Kalman and Wiener filters.
Abstract: Scalar and vector Kalman filters are implemented for filtering speech contaminated by additive white noise or colored noise, and an iterative signal and parameter estimator which can be used for both noise types is presented. Particular emphasis is placed on the removal of colored noise, such as helicopter noise, by using state-of-the-art colored-noise-assumption Kalman filters. The results indicate that the colored noise Kalman filters provide a significant gain in signal-to-noise ratio (SNR), a visible improvement in the sound spectrogram, and an audible improvement in output speech quality, none of which are available with white-noise-assumption Kalman and Wiener filters. When the filter is used as a prefilter for linear predictive coding, the coded output speech quality and intelligibility are enhanced in comparison to direct coding of the noisy speech. >

302 citations


Proceedings ArticleDOI
14 Apr 1991
TL;DR: The authors present a method for combining the two spectrograms by evaluating the geometric mean of their corresponding pixel values, which appears to preserve the visual features associated with high resolution in both frequency and time.
Abstract: The speech spectrogram is a two-dimensional time-frequency display of a one-dimensional signal. The wideband spectrogram and the narrowband spectrogram are deficient either in frequency or in time resolution. The authors present a method for combining the two spectrograms by evaluating the geometric mean of their corresponding pixel values. The combined spectrogram appears to preserve the visual features associated with high resolution in both frequency and time. >

24 citations


Proceedings ArticleDOI
14 Apr 1991
TL;DR: An approach to speech separation via frequency bin nonlinear adaptive filtering through HMMs (hidden Markov models) is proposed and proved effective for speech intelligibility enhancement over a TIR range between 0 dB and +or-12 dB.
Abstract: An approach to speech separation via frequency bin nonlinear adaptive filtering is proposed. The algorithm is proved effective for speech intelligibility enhancement over a TIR (target-to-interference energy ratio) range between 0 dB and +or-12 dB. Informal listening tests and spectrogram comparisons are discussed. In addition, a robust multi-pitch contour estimation via HMMs (hidden Markov models) is investigated. This HMM pitch estimator is combined with a pseudo-perceptual pitch estimator in order to simultaneously estimate two speakers' pitch from summed signals. >

18 citations


Proceedings ArticleDOI
14 Apr 1991
TL;DR: Two properties are presented that constrain the cross-terms of Cohen-class time-frequency representations (TFRs) to appear only at signal frequencies, and only when the signal is nonzero, i.e. the TFR is zero everywhere the signal or its spectrum is zero.
Abstract: Two properties are presented that constrain the cross-terms of Cohen-class time-frequency representations (TFRs) to appear only at signal frequencies, and only when the signal is nonzero. These properties thus guarantee strong finite support, i.e. the TFR is zero everywhere the signal or its spectrum is zero. When combined with cross-term attenuation, one can obtain TFRs with spectrogram-like interference suppression, but without the inherent time-frequency resolution tradeoff of the spectrogram. >

16 citations


Proceedings ArticleDOI
01 Dec 1991
TL;DR: In this paper, the Wigner-Ville distribution is smoothed into a signal dependent spectrogram using an iterative algorithm, and a generalized uncertainty principle is used to remove signal uncertainty in the time-frequency plane.
Abstract: This paper presents a review of some concepts associated with time-frequency distributions-- the instantaneous frequency, group delay, instantaneous bandwidth, and marginal properties-- and generalizes them in time-frequency via rotation of coordinates. This work emphasizes the need to examine time-frequency distributions in the general time-frequency plane, rather than restricting oneself to a time and/or frequency framework. This analysis leads to a generalized uncertainty principle, which has previously been introduced in radar theory. This uncertainty principle is invariant under rotation in the time-frequency plane, and should be used instead of the traditional definition of Gabor. It is desired to smooth a time-frequency distribution that is an energy density function into one that is an energy function. Most distributions are combinations of density and energy functions but the Wigner-Ville distribution is purely a density function. By using a local version of the generalized uncertainty principle, the Wigner- Ville distribution is smoothed into a signal dependent spectrogram using an iterative algorithm. It is believed that this procedure may represent, in some way an optimum removal of signal uncertainty in the time-frequency plane.© (1991) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

11 citations


Journal ArticleDOI
TL;DR: In this article, the Wigner distribution is applied to the detection and localization in time, of narrow-band transient signals of unknown waveform in an additive background comprised of quasiharmonic and random components.
Abstract: The Wigner distribution (WD) is applied to the detection and localization in time, of narrow‐band transient signals of unknown waveform in an additive background comprised of quasiharmonic and random components. A traditional method of processing such signals is the spectrogram or short‐time power spectrum. Based on an important relationship between the spectrogram and the WD, the application of the WD to the problem is investigated. The smoothed‐pseudo‐WD is shown to provide superior time and frequency resolutions of the received signal. Based on monitoring the received power in localized regions of the WD spectrum, a detector that achieves good time localization of the transient waveform and is relatively insensitive to changes in signal duration and frequency is obtained. The signal‐to‐noise ratio of the proposed detection statistic is derived for a transient signal in additive Gaussian noise and its dependence on various signal and detector parameters discussed.

11 citations


Proceedings ArticleDOI
04 Nov 1991
TL;DR: The application of this approach to a vowel classification task using the spectrogram is presented and is shown to provide performance which is favorable compared to that obtained with conventional methods.
Abstract: The multicomponent nature of many naturally occurring signals, such as speech, is exploited to provide a new means of detection and classification. A component of a multicomponent signal is defined in terms of the local bandwidth about the instantaneous frequency in the time-frequency distribution. The components are isolated by an adaptive partitioning algorithm which is constrained to overcome the interference terms often present in such distributions. The redundancy which may be present in the individual components of the signal is discussed along with the means to detect and classify the signal based upon this redundancy. The application of this approach to a vowel classification task using the spectrogram is presented and is shown to provide performance which is favorable compared to that obtained with conventional methods. >

11 citations


Proceedings ArticleDOI
01 Dec 1991
TL;DR: It is shown how quadratic time-frequency representations are a generalization of the spectrogram and the results for time- Frequency analysis and display of chirps and speech are reviewed and the resolution advantages over linear filtering are demonstrated.
Abstract: In this paper, we show how quadratic time-frequency representations are a generalization of the spectrogram and we review our results for time-frequency analysis and display of chirps and speech. We then show comparative performance on phase-shifted keyed communication signals. The concept of quadratic filtering is then introduced and linked to Teager's energy detector and the resolution advantages over linear filtering are demonstrated.© (1991) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

11 citations


Book ChapterDOI
01 Jan 1991
TL;DR: The purpose of the project is to find what segments a network can discover by itself, using only prediction as a teacher, and early results are encouraging because the system gives the same error rate whether network segments or the segments provided with the TIMIT database are used.
Abstract: The purpose of the project described here is to find what segments a network can discover by itself, using only prediction as a teacher A recurrent network was trained to do a prediction task, using a speech spectrogram as both input and teacher signals The error vector and hidden unit activation transitions were used to extract segments from the multi-speaker, continuous speech TIMIT database The network was analysed to see what speech segments it discovered Many of the segments the network found correspond to TIMIT phones and are very well segmented, but some TIMIT phones are not extracted We are examining the use of these segments to drive a second network to label speech Early results are encouraging because the system gives the same error rate whether network segments or the segments provided with the TIMIT database are used

11 citations


Patent
Masaru Inoue1, Shigeru Matsui1
09 Dec 1991
TL;DR: In this paper, an interferometer is used to produce interference fringes from light received from a light source, which are then imaged onto a photo-diode array which transforms the imaged fringes into a single set of electric signals.
Abstract: A device according to the present invention includes an interferometer which produces interference fringes from light received from a light source. The interference fringes are imaged onto a photo-diode array which transforms the imaged interference fringes into a single set of electric signals. The single set of electric signals is digitized and stored as a group of consecutive data points which represent an interferogram signal containing a DC component. The data points are processed to obtain moving average values representing the DC component of the interferogram signal. The moving average values are subtracted from the data points to obtain a clean interferogram signal which is Fourier-transformed to obtain a spectrogram of the light source.

11 citations


Proceedings ArticleDOI
14 Apr 1991
TL;DR: It is shown that interference-free representations of speech with a higher simultaneous resolution in time and frequency than the spectrogram are possible, and that these new representations may be applicable to the better analysis and understanding of speech.
Abstract: The assumption of quasi-stationarity in the analysis of speech is questioned from the standpoint of the best resolution of global periodicity and vocal tract formant frequency locations. The generalized class of time-frequency representations is discussed from the standpoint of speech analysis and is integrated with a description of the spectrogram. The spectrogram is known to have minimal interference artifacts, and the truly nonstationary cone-kernel time-frequency representation (CK-TFR) is shown to be similarly free of interference. The CK-TFR is observed to give higher resolution of the time point of glottal closure, and the clarity of some formant frequencies, especially for a nasal consonant, is shown to be better than the spectrogram. It is shown that interference-free representations of speech with a higher simultaneous resolution in time and frequency than the spectrogram are possible, and that these new representations may be applicable to the better analysis and understanding of speech. >


Journal ArticleDOI
TL;DR: An experiment in speech analysis in which a a parallel processor uses morphological algorithms to extract phonetic features from a spectrogram and performs an initial segmentation and labeling is described.
Abstract: An experiment in speech analysis in which a a parallel processor uses morphological algorithms to extract phonetic features from a spectrogram and performs an initial segmentation and labeling is described. Experiments in spectrogram reading by R. A. Cole et al. (1980) have suggested that more information is present in, and furthermore, this information may reside in the spectrogram, i.e., the speech image. The spectrogram readings of V. Zue et al. (1980) are automated by using image processing techniques in an image processor. A very fast and powerful parallel pipeline image processor, the cytocomputer, is used. The cytocomputer contains a serial pipeline of programmable processing stages, where each stage performs a single cellular transformation on the entire image. >

Journal ArticleDOI
TL;DR: Results indicate a systematic decrease in percent-correct score as the number of tokens representing each phoneme in the identification tests increased from one to nine, which is somewhat lower than those obtained in the fixed-context experiments.
Abstract: -In a new approach to the frequency-lowering of filtered speech. These results indicate a systematic decrease in speech, artificial codes were developed for 24 consonants (C) percent-correct score as the number of tokens representing each and 15 vowels (V) for two values of lowpass cutoff frequency phoneme in the identification tests increased from one to nine. F (300 and 500 Hz). Each individual phoneme was coded by a unique, nonvarying acoustic signal confined to frequencies less Key words: acoustic signal, artz~cial low-fequency speech code, than or equal to F. Stimuli were created through variations in bandpass noise, lowpass Jiltered speech, signal processing, spectral content, amplitude, and duration of tonal complexes or spectrogram. bandpass noise. For example, plosive and fricative sounds were constructed by specifying the duration and relative amplitude of bandpass noise with various center frequencies and band- INTRODUCTION widths, while vowels were generated through variations in the spectral shape and duration of a ten-tone harmonic complex. The ability of normal-hearing listeners to identify coded Cs and Vs in fixed-context syllables was compared to their performance on single-token sets of natural speech utterances lowpass filtered to equivalent values of F. For a set of 24 consonants in C-/a/ context, asymptotic performance on coded sounds averaged 90 percent correct for F=500 Hz and 65 percent for F=300 Hz, compared to 75 percent and 40 percent for lowpass filtered speech. For a set of 15 vowels in /b/-V-It/ context, asymptotic performance on coded sounds averaged 85 percent correct for F=500 Hz and 65 percent for F=300 Hz, compared to 85 per- cent and 50 percent for lowpass filtered speech. Identification of coded signals for F=500 Hz was also examined in CV sylla- bles where C was selected at random from the set of 24 Cs and V was selected at random from the set of 15 Vs. Asymptotic performance of roughly 67 percent correct and 71 percent correct was obtained for C and V identification, respectively. These scores are somewhat lower than those obtained in the fixed- context experiments. Finally, results were obtained concerning the effect of token variability on the identification of lowpass

02 Sep 1991
TL;DR: The SPANEX software has been developed for the extraction of phonetic patterns from different corpuses and provides a means of editing and analysing speech signals to increase the efficiency of acoustic-phonetic analysis of speech.
Abstract: The SPANEX software has been developed for the extraction of phonetic patterns from different corpuses. It provides a means of editing and analysing speech signals. The objective of this software is to increase the efficiency of acoustic-phonetic analysis of speech. Using this tool one can record and play back sentences, compute, display, and analyse spectrograms or pitch contours as well as perform the labelling of speech utterances. Moreover, coarticulation phenomena can be studied, the voiced/unvoiced error can be edited graphically and submitted to a smoothing procedure. Each speech segment may be zoomed in context, displayed and examined. Hence, SPANEX is an efficient and easy-to-use tool for the development of algorithms in speech recognition and synthesis.

Proceedings ArticleDOI
01 Oct 1991
TL;DR: To alleviate the inherent property of the WDF’s cross term interference, a modification is made by taking a local average of the computed WDF, which indicates that the reduction of the cross terms depends on the localization characteristics of the contributed terms and can be adjusted with the number of recursive operations.
Abstract: A processing algorithm derived from the mathematical formulation of the Wigner distribution function (WDF) has been developed for a joint time and frequency display of sonar signals. This discrete version of the WDF can be computed either from a time domain signal or from its complex frequency spectrum. By properly selecting the sampling rate and processing bandwidth, we also avoid the aliasing problem associated with the fast Fourier transform (FFT) computation routine. To alleviate the inherent property of the WDF’s cross term interference, a modification is made by taking a local average of the computed WDF. Our experience from the implementation of this smoothing operation indicates that the reduction of the cross terms depends on the localization characteristics of the contributed terms and can be adjusted with the number of recursive operations. Observation of the positivity of the final computed WDF value reveals that the negative valued components are highly related to the disappearance of the cross terms. Although the smoothed WDF for time and frequency display presents a close semblance to those obtained by the conventional sonargram, its computation scheme can be easily extended for highlighting features in the time and frequency display by examining its higher order derivatives. An example of the vein diagram, which traces the ridges and peaks on the two dimensional display, is used to illustrate the related features for an echo returned from an insonified object.

Journal ArticleDOI
TL;DR: It is shown that the formant tracks of rapidly time varying speech are displayed correctly by spectrograms, and if the model of the time variant formant is based on the notion of instantaneous frequency, the discrepancies in the interpretation of the spectrogram disappear.


Proceedings ArticleDOI
14 Apr 1991
TL;DR: The authors present the outline and performance of a feature based phoneme segmentation expert system, tested on speaker independent and continuous speech, which utilizing spectrogram reading knowledge and the strategy used by a human expert when reading a spectrogram determines the phoneme boundary along with its phoneme class.
Abstract: The authors present the outline and performance of a feature based phoneme segmentation expert system, tested on speaker independent and continuous speech. This system utilizes spectrogram reading knowledge and the strategy used by a human expert when reading a spectrogram, and determines the phoneme boundary along with its phoneme class. The experiments were performed both on isolated word speech uttered by six male speakers and on speaker dependent continuous speech. The results were as good as, or slightly worse than those of the previous experiment on speaker-dependent isolated word speech. The authors report on the difference in the knowledge between speakers and utterances, which was added and/or modified for this expansion to speaker independent and continuous speech. >

Proceedings ArticleDOI
01 Jan 1991
TL;DR: Three time-domain models of the processing performed by the cochlea are extended to include periodicity-sensitive TI which converts the fast-flowing neural activity pattern into a form that is much more like the auditory images the authors experience in response to sounds.
Abstract: Over the past decade, hearing scientists have developed a number of time-domain models of the processing performed by the cochlea in an effort to develop a reasonably accurate multi-channel representation of the pattern of neural activity flowing from the cochlea up the auditory nerve to the cochlear nucleus [l]. It is often assumed that peripheral auditory processing ends at the output of the cochlea and that the pattern of activity in the auditory nerve is in some sense what we hear. In reality, this neural activity pattern (NAP) is not a good representation of our auditory sensations because it includes phase differences that we do riot hear and it does not include auditory temporal integration (TI). As a result, several of the models have been extended to include periodicity-sensitive TI [2], [3], [4] which converts the fast-flowing neural activity pattern into a form that is much more like the auditory images we experience in response to sounds. When these models are applied to speech sounds, the auditory images of vowels reveal an elaborate formant structure that is absent in the more traditional representation of speech -the spectrogram. An example is presented on the left in the figure; it is the auditory image of the stationary part of the vowel /ae/ as in 'bab' [4]. The abscissa of the auditory image is 'temporal integration interval' and each line of the image shows the activity in one frequency channel of the auditory model. In general terms, activity on a vertical line in the auditory image shows that there is a correlation in the sound at that temporal interval. The coincentrations of activity are the formants of the vowel.

Proceedings ArticleDOI
Preeti Rao1
14 Apr 1991
TL;DR: The smoothed-pseudo-WD is shown to provide the superior time and frequency resolutions of the received signals.
Abstract: Th Wigner distribution (WD) is applied to the detection and localization in time of narrowband transient signals of unknown waveform in additive noise comprise of quasi-harmonic and random components. A traditional method of processing such signals is the spectrogram or short-time power spectrum. Based on the relationship between the spectrogram and the WD, the application of the WD to the detection problem is investigated. The smoothed-pseudo-WD is shown to provide the superior time and frequency resolutions of the received signals. By monitoring the received power in localized regions of the WD spectrum, a detector that achieves good time localization of the transient waveform and is relatively insensitive to changes in signal duration or frequency is obtained. >

Proceedings ArticleDOI
16 Jun 1991
TL;DR: The theory and method of the sound holo-spectrogram are presented, and an experimental system is developed, based on a multi-dimensional encoding display technique, which maintains magnitude-greyness modulation in the conventional sound spectrogram.
Abstract: The theory and method of the sound holo-spectrogram are presented. An experimental system is developed, which is based on a multi-dimensional encoding display technique. The principle of the multi-dimensional encoding display is given, and the choice of encoding domain is discussed. The display maintains magnitude-greyness modulation in the conventional sound spectrogram, retains all signal information in the time-frequency domain. The holo-spectrogram is applied to the processing of speech and other acoustic signals. Some experimental results are given. The multi-dimensional encoding display can also be used in the analysis of the two-dimensional spectrum in other transform domains, such as the spatial frequency domain. >

Proceedings ArticleDOI
31 Oct 1991
TL;DR: The white-noise like behavior of the modelling error indicates that the Doppler blood flow signal can be modelled adequately as a complex AR process.
Abstract: spectrogram obtained by complex (AR) modelling was investigated using -the Yule-Walker equations The spectral envelope area (SEA) was used to evaluate the effect of window duration on AR spectrogram, and a statistical analysis indicated that the SEA was not sensitive to window duration The white-noise like behavior of the modelling error indicates that the Doppler blood flow signal can be modelled adequately as a complex AR process