scispace - formally typeset
Search or ask a question

Showing papers on "Spectrogram published in 1994"


Journal ArticleDOI
TL;DR: A novel approach for analyzing and filtering speech is described and evaluated which utilizes the "modulation spectrogram," i.e., the two-dimensional representation of modulation frequencies versus center frequency as a function of time.
Abstract: A novel approach for analyzing and filtering speech is described and evaluated which utilizes the ‘‘modulation spectrogram,’’ i.e., the two‐dimensional representation of modulation frequencies versus center frequency as a function of time. This approach is based on physiological findings of a tonotopical organization of modulation frequencies perpendicular to carrier frequencies as well as psychoacoustical findings of ‘‘modulation tuning curves.’’ In addition, an interaction is assumed between the representation of modulation frequencies and the representation of auditory space as described by physiological and psychological models of binaural hearing. A noise‐reduction algorithm based on this approach was implemented and tested which enhances or suppresses each combination of modulation frequency and center frequency according to its phase and intensity relation between the two input signals (i.e., both stereo channels of a dummy‐head recording). When tested in several situations with interfering speakers and background noise both in anechoic and reverberant environment, the algorithm provided a small but a very robust increase in speech intelligibility which corresponds to approximately 2 dB in signal‐to‐noise ratio. Possible applications of this algorithm are noise reduction in adverse acoustical situations, digital hearing aids, processing schemes and preprocessing for speech recognition.

137 citations


Proceedings ArticleDOI
Malcolm Slaney1, D. Naar1, R.E. Lyon1
19 Apr 1994
TL;DR: Techniques to recreate sounds from perceptual displays known as cochleagrams and correlograms are developed using a convex projection framework and improved methods of initial phase estimation are explored.
Abstract: Techniques to recreate sounds from perceptual displays known as cochleagrams and correlograms are developed using a convex projection framework. Prior work on cochlear-model inversion is extended to account for rectification and gain adaptation. A prior technique for phase recovery in spectrogram inversion is combined with the synchronized overlap-and-add technique of speech rate modification, and is applied to inverting the short-time autocorrelation function representation in the auditory correlogram. Improved methods of initial phase estimation are explored. A range of computational cost options, with and without iteration, produce a range of quality levels from fair to near perfect. >

119 citations


Journal ArticleDOI
TL;DR: The authors show that all time-frequency transforms of Cohen's class may be achieved by simple changes in backprojection reconstruction filtering, and time-varying filtering by shift-Varying convolution in the Radon-Wigner domain is shown to yield superior results to its analogous Cohen'sclass adaptive transform (shift-invariant convolution).
Abstract: Since line integrals through the Wigner spectrum can be calculated by dechirping, calculation of the Wigner spectrum may be viewed as a tomographic reconstruction problem. In the paper, the authors show that all time-frequency transforms of Cohen's class may be achieved by simple changes in backprojection reconstruction filtering. The resolution/cross-term tradeoff that occurs in time-frequency kernel selection is shown to be analogous to the resolution-ringing tradeoff that occurs in computed tomography (CT). "Ideal" reconstruction using a purely differentiating backprojection filter yields the Wigner distribution, whereas low-pass differentiating filters produce cross-term suppressing distributions such as the spectrogram or the Born-Jordan distribution. It is also demonstrated how this analogy can be exploited to "tune" the reconstruction filtering (or time-frequency kernel) to improve the ringing/resolution tradeoff. Some properties of the projection domain, which is also known as the Radon-Wigner transform, are characterized, including the response to signal delays or frequency shifts and projection masking or convolution. Last, time-varying filtering by shift-varying convolution in the Radon-Wigner domain is shown to yield superior results to its analogous Cohen's class adaptive transform (shift-invariant convolution) for the multicomponent, linear-FM signals that are investigated. >

83 citations


Book
01 Nov 1994
TL;DR: In the field of speech recognition, a qualitative change in the state of the art has emerged that promises to bring speech recognition capabilities within the reach of anyone with access to a workstation.
Abstract: In the past decade, tremendous advances in the state of the art of automatic speech recognition by machine have taken place. A reduction in the word error rate by more than a factor of 5 and an increase in recognition speeds by several orders of magnitude (brought about by a combination of faster recognition search algorithms and more powerful computers), have combined to make high-accuracy, speakerindependent, continuous speech recognition for large vocabularies possible in real time, on off-the-shelf workstations, without the aid of special hardware. These advances promise to make speech recognition technology readily available to the general public. This paper focuses on the speech recognition advances made through better speech modeling techniques, chiefly through more accurate mathematical modeling of speech sounds. More and more, speech recognition technology is making its way from the laboratory to real-world applications. Recently, a qualitative change in the state of the art has emerged that promises to bring speech recognition capabilities within the reach of anyone with access to a workstation. High-accuracy, real-time, speaker-independent, continuous speech recognition for medium-sized vocabularies (a few thousand words) is now possible in software on off-the-shelf workstations. Users will be able to tailor recognition capabilities to their own applications. Such software-based, real-time solutions usher in a whole new era in the development and utility of speech recognition technology. As is often the case in technology, a paradigm shift occurs when several developments converge to make a new capability possible. In the case of continuous speech recognition, the following advances have converged to make the new technology possible: * higher-accuracy continuous speech recognition, based on better speech modeling techniques; * better recognition search strategies that reduce the time needed for high-accuracy recognition; and * increased power of audio-capable, off-the-shelf workstations. The paradigm shift is taking place in the way we view and use speech recognition. Rather than being mostly a laboratory endeavor, speech recognition is fast becoming a technology that is pervasive and will have a profound influence on the way humans communicate with machines and with each other. This paper focuses on speech modeling advances in continuous speech recognition, with an exposition of hidden Markov models (HMMs), the mathematical backbone behind these advances. While knowledge of properties of the speech signal and of speech perception have always played a role, recent improvements have relied largely on solid mathematical and The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. ?1734 solely to indicate this fact. probabilistic modeling methods, especially the use of HMMs for modeling speech sounds. These methods are capable of modeling time and spectral variability simultaneously, and the model parameters can be estimated automatically from given training speech data. The traditional processes of segmentation and labeling of speech sounds are now merged into a single probabilistic process that can optimize recognition accuracy. This paper describes the speech recognition process and provides typical recognition accuracy figures obtained in laboratory tests as a function of vocabulary, speaker dependence, grammar complexity, and the amount of speech used in training the system. As a result of modeling advances, recognition error rates have dropped several fold. Important to these improvements have been the availability of common speech corpora for training and testing purposes and the adoption of standard testing procedures. We will argue that future advances in speech recognition must continue to rely on finding better ways to incorporate our speech knowledge into advanced mathematical models, with an emphasis on methods that are robust to speaker variability, noise, and other acoustic distortions. THE SPEECH RECOGNITION PROBLEM Automatic speech recognition can be viewed as a mapping from a continuous-time signal, the speech signal, to a sequence of discrete entities-for example, phonemes (or speech sounds), words, and sentences. The major obstacle to highaccuracy recognition is the large variability in the speech signal characteristics. This variability has three main components: linguistic variability, speaker variability, and channel variability. Linguistic variability includes the effects of phonetics, phonology, syntax, semantics, and discourse on the speech signal. Speaker variability includes intraand interspeaker variability, including the effects of coarticulation-that is, the effects of neighboring sounds on the acoustic realization of a particular phoneme due to continuity and motion constraints on the human articulatory apparatus. Channel variability includes the effects of background noise and the transmission channel (e.g., microphone, telephone, reverberation). All these variabilities tend to shroud the intended message with layers of uncertainty, which must be unraveled by the recognition process. This paper will focus on modeling linguistic and speaker variabilities for the speech recognition problem. Units of Speech. To gain an appreciation of what modeling is required to perform recognition, we shall use as an example the phrase "grey whales," whose speech signal is shown at the bottom of Fig. 1 with the corresponding spectrogram (or voice print) shown immediately above. The spectrogram shows the result of a frequency analysis of the speech, with the dark bands representing resonances of the vocal tract. At the top of Fig. 1 are the two words "grey" and "whales," which are the desired output of the recognition system. The first thing to note is that the speech signal and the spectrogram show no separation

82 citations


Journal ArticleDOI
TL;DR: This work applies two iterative methods for generating positive time-frequency distributions (TFDs) to speech analysis and demonstrates that conventional sliding window techniques lose or distort much of the rich nonstationary structure of speech.
Abstract: Much of our current knowledge and intuition of speech is derived from analyses involving assumptions of short-time stationarity (e.g., the speech spectrogram). Such methods are, by their very nature, incapable of revealing the true nonstationary nature of speech. A careful consideration of the theory of time-frequency distributions (TFDs), however, allows the construction of methods that reveal far more of the nonstationarities of speech, thereby highlighting just what it is that conventional approaches miss. We apply two iterative methods for generating positive time-frequency distributions (TFDs) to speech analysis. Both methods make use of multiple sources of information (e.g., multiple spectrograms) to yield a high-resolution estimate of the joint time-frequency energy density of speech. Plosive events and formant harmonic structure are simultaneously preserved in these TFDs. Rapidly time-varying formants are also resolved by these TFDs, and harmonic structure is revealed, independent of sweep rate; this result is quite different from that seen with conventional speech spectrograms. The speech features observed in these distributions demonstrate that conventional sliding window techniques lose or distort much of the rich nonstationary structure of speech. Examples for synthetic formants and real speech are provided. The differences between joint distributions and conditional distributions are also illustrated. >

68 citations


Journal ArticleDOI
TL;DR: It has been proposed that the FM bat receives stroboscopic-like glimpses of fluttering prey whose spatial representation depends on the operation of the bat's sonar receiver.
Abstract: Through the present study, the acoustic information available to an echolocating bat that uses brief frequency‐modulated (FM) sonar sounds for the pursuit and capture of insect prey has been characterized. Computer‐generated sonar pulses were broadcast at tethered insects, and the returning echoes were recorded on analog tape at high speed for off‐line analyses. Echoes from stationary and fluttering insects were displayed using time waveform, spectrogram, power spectrum, and cross‐correlation representations. The results show echo signatures for the different insect species studied, which change with the angle of incident sound. Sequences of echoes from fluttering insects show irregular changes in sound amplitude and time‐frequency structure, reflecting a random temporal relation between the changing wing position and the arrival of incident sound. A set of recordings that controlled the temporal relation between incident sound and insect wing position suggests that information about the spatial profile o...

67 citations


Journal ArticleDOI
TL;DR: In this article, high-frequency spectra of chemical explosions and earthquakes at local and regional distances in the northeastern United States and in Norway were analyzed to understand the seismic signal characteristics of single explosions, multiple-hole instantaneous explosions, ripple-fired quarry blasts, and earthquakes.
Abstract: We analyze the high-frequency (1 to 50 Hz) spectra of chemical explosions and earthquakes at local and regional distances in the northeastern United States and in Norway to understand the seismic signal characteristics of single explosions, multiple-hole instantaneous explosions, ripple-fired quarry blasts, and earthquakes. Our purpose is to evaluate practical discriminants, and to obtain a physical understanding of their successes and failures. High-frequency spectra from ripple-fired blasts usually show clear time-independent frequency bands due to the repetitive nature of the source and are distinctively different from the spectra of instantaneous blasts or earthquakes. However, like other discriminators based on spectral estimates, the spectrogram method requires data with high signal-to-noise ratios at high frequencies for unambiguous discrimination. In addition, banding is not seen in spectrograms for shots with small delay times (less than 8 msec) and short total durations. We have successfully modeled the observed high-frequency spectral bands up to about 45 Hz of the regional signals from quarry blasts in New York and adjacent states. Using information on shot-hole patterns and charge distribution, we find that ripple firing results in an enrichment of high-frequency S waves and efficient excitation of the Rg phase. There is an azimuthal dependence of P -wave amplitude associated with orientation of the path with respect to local topography (ridges, benches) in which the shots are emplaced. To discriminate instantaneous explosions from earthquakes, we find the P/S spectral amplitude ratio at high frequencies is complementary to the use of spectrogram methods. A high P/S spectral ratio above 10 Hz is a stable characteristic of instantaneous explosions.

63 citations


Patent
23 Nov 1994
TL;DR: In this article, a method and system for characterizing the sounds of ocean captured by passive sonar listening devices is presented. But this method relies on a neural network ensemble that has been trained to favor specific features and/or parameters.
Abstract: The present invention provides a method and system for characterizing the sounds of ocean captured by passive sonar listening devices. The present invention accomplishes this by first generating a spectrogram from the received sonar signal. The spectrogram is characterized in terms of textural features and signal processing parameters. The textural features and signal processing parameters are fed into a neural network ensemble that has been trained to favor specific features and/or parameters. The trained neural network ensemble classifies the signal as either Type-I or clutter.

61 citations


Journal ArticleDOI
TL;DR: It is demonstrated that two previously proposed methods for combining the information content from multiple spectrograms into a single, positive time-frequency function are optimal in a cross-entropy sense.
Abstract: We demonstrate that two previously proposed methods for combining the information content from multiple spectrograms into a single, positive time-frequency function are optimal in a cross-entropy sense. The goal in combining the spectrograms is to obtain an improved approximation of the joint time-frequency signal density by overcoming limitations of any single spectrogram. An example of each method is provided, and results are compared with spectrograms and a Cohen-Posch (1985) time-frequency density (TFD) of a nonstationary pulsed tone signal. The proposed combinations are effective and can be efficiently computed. >

54 citations


Proceedings ArticleDOI
19 Apr 1994
TL;DR: This work extends the spectrum estimation method of Thomson to non-stationary signals by formulating a multiple window spectrogram and shows the unusual shape of the Cohen's class smoothing kernels corresponding to the Thomson method multiple windows.
Abstract: We extend the spectrum estimation method of Thomson (1982, 1990) to non-stationary signals by formulating a multiple window spectrogram. The traditional spectrogram can be represented as a member of Cohen's class of time-frequency distributions (TFDs) where the smoothing kernel is the Wigner distribution of the signal temporal window. We show the unusual shape of the Cohen's class smoothing kernels corresponding to the Thomson method multiple windows. These are a class of smoothing kernels not hitherto used in time-frequency (t-f) analysis. Examples of the multiple window spectrogram applied to a noisy dual linear FM test signal and to actual underwater acoustic data demonstrate the merit of the method. >

54 citations


Proceedings ArticleDOI
25 Oct 1994
TL;DR: A general methodology providing a better readability of any bilinear distribution, referred to as reassignment, is essentially a generalization of an improvement of the spectrogram proposed by Kodera, Gendrin and de Villedary (1978).
Abstract: A general methodology providing a better readability of any bilinear distribution has been proposed. This methodology, referred to as reassignment, is essentially a generalization of an improvement of the spectrogram proposed by Kodera, Gendrin and de Villedary (1978). After a presentation of this original work, its generalization to a wide range of distributions is shown. The close connections of this method with some related approaches are also underlined. >

Journal ArticleDOI
TL;DR: A new method for VFR using the norm of the derivative parameters in deciding to retain or to discard a frame is introduced, and informal inspection of speech spectrograms shows that this new method puts more emphasis on the transient regions of the speech signal.
Abstract: Variable frame rate (VFR) analysis is a technique used in speech processing and recognition for discarding frames that are too much alike. The article introduces a new method for VFR. Instead of calculating the distance between frames, the norm of the derivative parameters is used in deciding to retain or to discard a frame, informal inspection of speech spectrograms shows that this new method puts more emphasis on the transient regions of the speech signal. Experimental results with a hidden Markov model (HMM) based system show that the new method outperforms the classical method. >

Proceedings ArticleDOI
19 Apr 1994
TL;DR: A new formulation of this method which allows a generalization of its use for any bilinear time-frequency or time-scale representation and the resulting reassigned distributions are easily computable versatile tools which highlight the signal features and preserve many theoretical properties.
Abstract: Reassigning each value of a time-frequency representation to a different location in the plane can produce a better localization of the signal components. This idea, pioneered by Kodera et al. (1976, 1978), was only applied to the sole spectrogram. We present a new formulation of this method which allows a generalization of its use for any bilinear time-frequency or time-scale representation. The resulting reassigned distributions are easily computable versatile tools which highlight the signal features and preserve many theoretical properties. >

Journal ArticleDOI
TL;DR: Analysis of velocity responses of individual outer hair cells and Hensen's cells to sinusoidal and amplitude modulated acoustical signals applied at the ear canal reveals information about the preferred vibration frequencies of cells in the inner ear and are useful for deciding among alternative mathematical models of nonlinear cellular dynamics.
Abstract: The short-time Fourier transform (STFT) and the continuous wavelet transform (CWT) are used to analyze the time course of cellular motion in the inner ear. The velocity responses of individual outer hair cells and Hensen's cells to sinusoidal and amplitude modulated (AM) acoustical signals applied at the ear canal display characteristics typical of nonlinear systems, including the generation of harmonic and half-harmonic components. The STFT proves to be valuable for following the time course of the frequency components generated using sinusoidal and ARM input signals. The CWT is also useful for analyzing these signals; however, it is generally not as effective as the STFT when octave-band-based CWT's are used. For the transient response, the spectrogram (which is the squared magnitude of the STFT) and the octave-band-based scalogram (which is the squared magnitude of the CWT) prove equally valuable, and the authors have used both to study the responses of these cells to step-onset tones of different frequencies. Such analyses reveal information about the preferred vibration frequencies of cells in the inner ear and are useful for deciding among alternative mathematical models of nonlinear cellular dynamics. A modified Duffing oscillator model yields results that bear some similarity to the data. >

Patent
12 Oct 1994
TL;DR: In this paper, a bank of matched filters is used to detect the presence of signals whose frequency content varies with time, where the robust time domain template is assumed to be of the order of w(t)=A(t)cos{2πφ(t)} and the present invention uses the trajectory of a joint time-frequency representation of x(t), as an approximation of the instantaneous frequency function {φ'(t).
Abstract: A system and method for constructing a bank of filters which detect the presence of signals whose frequency content varies with time. The present invention includes a novel system and method for developing one or more time templates designed to match the received signals of interest and the bank of matched filters use the one or more time templates to detect the received signals. Each matched filter compares the received signal x(t) with a respective, unique time template that has been designed to approximate a form of the signals of interest. The robust time domain template is assumed to be of the order of w(t)=A(t)cos{2πφ(t)} and the present invention uses the trajectory of a joint time-frequency representation of x(t) as an approximation of the instantaneous frequency function {φ'(t). First, numerous data samples of the received signal x(t) are collected. A joint time frequency representation is then applied to represent the signal, preferably using the time frequency distribution series (also known as the Gabor spectrogram). The joint time-frequency transformation represents the analyzed signal energy at time t and frequency ƒ, P(t,f), which is a three-dimensional plot of time vs. frequency vs. signal energy. Then P(t,f) is reduced to a multivalued function f(t), a two dimensional plot of time vs. frequency, using a thresholding process. Curve fitting steps are then performed on the time/frequency plot, preferably using Levenberg-Marquardt curve fitting techniques, to derive a general instantaneous frequency function φ'(t) which best fits the multivalued function f(t), a trajectory of the joint time-frequency domain representation of x(t). Integrating φ'(t) along t yields φ(t), which is then inserted into the form of the time template equation. A suitable amplitude A(t) is also preferably determined. Once the time template has been determined, one or more filters are developed which each use a version or form of the time template.

Proceedings ArticleDOI
19 Apr 1994
TL;DR: A new algorithm for tracking formants automatically is presented that provides reliable formant trajectories especially in regions where a knowledge of the formant transitions give important information about the location of articulation for consonants.
Abstract: A new algorithm for tracking formants automatically is presented. From rough formant hypotheses a regularization method is used to provide formant trajectories both close to the spectrogram edge lines and sufficiently regular. The formant hypotheses are obtained by labelling edge lines of LPC or cepstrally smoothed spectrograms in terms of formants. Speech knowledge, in the form of admissible domains for F1, F2, F3, F1 vs. F2, F2 vs. F3 and of formant level is used to obtain consistent labellings. The advantage of this algorithm is that it provides reliable formant trajectories especially in regions where a knowledge of the formant transitions give important information about the location of articulation for consonants. We present very encouraging tracking results on a corpus of sentences consisting of stops and vowels. >

Journal ArticleDOI
TL;DR: In this paper, the accuracy of the analysis of rapidly varying formants using spectrogram and linear prediction is assessed, and it is shown that the most accurate analysis using a quasistationary method is made when windows are positioned pitch synchronously.
Abstract: In this paper, the accuracy of the analysis of rapidly varying formants using spectrogram and linear prediction is assessed. Analysis of various dynamic signals shows that, when a long analysis window, like 25 ms, is used, the quality of the representation may be impoverished. Obvious unwanted effects are staircaselike formant tracks, flattening‐off of formants close to voicing onset, and bending of the formant towards a strong energy concentration in the release burst. The parameters that have the largest influence on the quality of the representation are the length of the analysis window, the transition rate of the formant, the fundamental frequency, and the position and energy of the release burst. It is shown that the most accurate analysis using a quasistationary method is made when windows are positioned pitch synchronously. Finally, a quantitative analysis of the influence of the mentioned parameters provides evidence that no deviations due to the quasistationarity assumption occur when the effecti...

Proceedings ArticleDOI
25 Oct 1994
TL;DR: In this article, the Wigner-Gabor-Qian (WGQ) spectrogram is used to study the frequency pattern of nonlinear dynamical systems. But, the WGQ spectrogram does not provide a good solution in time-frequency representation as well as few cross-interferences.
Abstract: Time-frequency representation is helpful in studying the frequency pattern of nonlinear dynamical systems. Specifically, the Wigner-Gabor-Qian (WGQ) spectrogram, a synthesis of the Wigner distribution and the Gabor expansion through time-frequency distribution series, is a very useful tool because it achieves a good solution in time-frequency representation as well as few cross-interferences. The fine structure of frequency patterns, such as sub-harmonics of chaotic dynamics, can be revealed by the WGQ spectrogram. Frequency patterns of chaos and noise are studied for system identification in empirical analysis. Time-frequency analysis provides important information for pattern recognition and system identification in analyzing empirical time series. >

Journal ArticleDOI
TL;DR: In this paper, a modified pseudo-Wigner-Ville distribution (PWVD) was applied to the spectrogram for time-frequency analysis of monocomponent signals.

Proceedings ArticleDOI
25 Oct 1994
TL;DR: The results do not show a great improvement in the readability of the representation due to the presence of many components in the speech signal, but the reallocation defined by Kodera and al. (1978) seems a good way to improve the localisation of the spectrogram.
Abstract: The limited joint time and frequency resolution of Fourier analysis makes an accurate analysis of speech signals difficult. Fourier analysis offers either good temporal accuracy or good frequency resolution, never both. Many methods have been proposed to overcome this limitation. The results do not show a great improvement in the readability of the representation due to the presence of many components in the speech signal. The reallocation defined by Kodera and al. (1978) seems a good way to improve the localisation of the spectrogram. Recent work simplified the implementation of this method, which makes it attractive. This paper explores the applicability of this method to the analysis of speech signals. >

Proceedings Article
01 Jan 1994
TL;DR: This paper shows improved methods for spectrogram inversion (conventional pattern playback), inversion of a cochlear model, and inversions of the correlogram representation, a non-linear representation of sound.
Abstract: Deciding the appropriate representation to use for modeling human auditory processing is a critical issue in auditory science. While engineers have successfully performed many single-speaker tasks with LPC and spectrogram methods, more difficult problems will need a richer representation. This paper describes a powerful auditory representation known as the correlogram and shows how this non-linear representation can be converted back into sound, with no loss of perceptually important information. The correlogram is interesting because it is a neurophysiologically plausible representation of sound. This paper shows improved methods for spectrogram inversion (conventional pattern playback), inversion of a cochlear model, and inversion of the correlogram representation.

Proceedings ArticleDOI
25 Oct 1994
TL;DR: In this paper, the authors apply a resolution enhancement (superresolution) step to each block of samples before it is used in the computation of a TFD, which produces a stationary extension of the given block of data to a prescribed length.
Abstract: Many time-frequency distribution (TFD) implementations involve block processing of samples of the signal under analysis. The extraction of a finite amount of data and the windowing that often follows it, reduce frequency domain resolution. This paper illustrates the possible benefits that can be achieved by applying a resolution enhancement (superresolution) step to each block of samples before it is used in the computation of a TFD. The data extension is performed here with adaptive weighted norm extrapolation, a technique that produces a stationary extension of the given block of data to a prescribed length. Examples are shown where the approach is used in block processing implementations of the spectrogram, the Choi-Williams, and the adaptive radially Gaussian distributions. >

Journal ArticleDOI
TL;DR: In this paper, the authors examined the application of other smoothed Wigner distribution (SWD) to analysis and display of Doppler ultrasound signals, and compared the performance of these two SWD's as a function of the bandwidth for time-varying flow.
Abstract: Existing pulsed Doppler ultrasound systems apply the spectrogram as a tool for analysis and display of signals scattered from the blood. The spectrogram is a time-frequency representation (TFR) of a signal that maps a one-dimensional signal of time into a two-dimensional function of time and frequency. The analysis of Doppler ultrasound signals requires application of a two-dimensional TFR rather than one-dimensional spectral representations due to the nonstationary nature of the signals scattered from blood. The classical spectrogram is a smoothed Wigner distribution (SWD) with a specific smoothing function. For this smoothing function, the smoothing, and hence the resolution in time and frequency, cannot be controlled independently. The purpose of this study is to examine the application of other SWD's to analysis and display of Doppler ultrasound signals. The present paper concentrates on the pseudo-Wigner distribution (PWD). The PWD and the spectrogram are examined and compared as analysis tools for nonstationary Doppler ultrasound signals. The performance of these two TFR's as a function of Doppler bandwidth is evaluated and compared for time-varying flow. >

Proceedings ArticleDOI
10 May 1994
TL;DR: In this paper, an integral, convolution-type equation of the first kind whose kernel may be causal or not is used for the correction of the spectrograms. But the model is assumed to be an integral convolution and the kernel is not causal.
Abstract: Raw spectrograms are subject to systematic errors of an instrumental type that may be reduced provided a mathematical model of the instrumental imperfections is identified. It is assumed in the paper that this model has the form of an integral, convolution-type equation of the first kind whose kernel may be causal or not. The correction of the spectrograms consists in numerically solving this equation on the basis of the noisy data. Acquired by a spectrometer. An algorithm of correction is proposed which is based on the approximation of the solution with a spline function whose parameters are determined by means of a recursive Kalman-filter-based algorithm with a non-negativity constraint imposed on the set of feasible solutions. It is shown, using spectrophotometric data, that an improvement in the resolution of the spectrometer can be attained. >

Journal ArticleDOI
TL;DR: A graphical interface for generating voiced speech using a frequency-domain implementation of the Klatt (1980) cascade formant synthesizer that provides a useful tool for investigating the perceptually salient properties of voiced speech and other sounds.
Abstract: In this report we describe a graphical interface for generating voiced speech using a frequency-domain implementation of the Klatt (1980) cascade formant synthesizer. The input to the synthesizer is a set of parameter vectors, calledtracks, which specify the overall amplitude, fundamental frequency, formant frequencies, and formant bandwidths at specified time intervals. Tracks are drawn with the aid of a computer mouse that can be used either inpoint-draw mode, which selects a parameter value for a single time frame, or inline-draw mode, which uses piecewise linear interpolation to connect two user-selected endpoints. Three versions of the program are described: (1) SYNTH draws tracks on an empty time-frequency grid, (2) SPECSYNTH creates a spectrogram of a recorded signal upon which tracks can be superimposed, and (3) SWSYNTH is similar to SPECSYNTH, except that it generatessine-wave speech (Remez, Rubin, Pisoni, & Carrell, 1981) using a set of time-varying sinusoids rather than cascaded formants. The program is written for MATLAB, an interactive computing environment for matrix computation. Track-Draw provides a useful tool for investigating the perceptually salient properties of voiced speech and other sounds.

Journal ArticleDOI
TL;DR: Methods were developed for detecting the vocalizations of three species of mysticete: blue, finback, and minke whales.
Abstract: The automatic detection of animal cells has several potential applications: for range, distribution, and, census efforts; for acoustic behavior studies, both local and wide area; for screening of large volumes of data for sounds of interest. Methods were developed for detecting the vocalizations of three species of mysticete: blue, finback, and minke whales. Each call is modeled as a sequence of either bandlimited pulses or frequency sweeps. Sweeps and pulses are detected by cross‐correlating a specially designed kernel with a spectrogram of the target sound signal; the kernel is built from the call model, and includes excitatory regions corresponding to the sweeps or pulses in the call, and flanking inhibitory regions to inhibit response to noise and interfering sounds. This method produces as output a time series with values corresponding to the likelihood that the modeled call is present. A time‐windowed autocorrelation is then performed on this output, and the result of that is percentile‐normalized a...

Proceedings ArticleDOI
27 Jun 1994
TL;DR: A neural network is applied for the detection/identification of worn cutting tools on turning center using ART-1 neural network based on vibration signal collected from accelerometer and time-frequency spectrogram.
Abstract: A neural network is applied for the detection/identification of worn cutting tools on turning center. The vibration signal collected from accelerometer is first transformed into a time-frequency spectrogram. The spectrogram is then normalized based on either a statistical thresholding method or a stack representation of the spectrogram. A set of processed binary input image is then clustered adaptively using ART-1 neural network. >

Proceedings ArticleDOI
26 Jun 1994
TL;DR: This work reviews several methods for combining time-dependent spectral estimates that result from different evolutionary spectral models and explores the effect of noise on these combination methods.
Abstract: We review several methods for combining time-dependent spectral estimates that result from different evolutionary spectral models. Examples are presented to illustrate the value of each combination method. In addition, we explore the effect of noise on these combination methods.

Proceedings ArticleDOI
25 Sep 1994
TL;DR: This new method calculates the crosscorrelation values between adjacent segments of the digitized cry in a three-dimensional pattern of intensity versus frequency versus time, in a manner similar to that of the spectrogram.
Abstract: The plots generated by this method are well suited to the visual identification and determination of the vocal fundamental frequency (F/sub 0/). This new method calculates the crosscorrelation values between adjacent segments of the digitized cry. These values can then be plotted in a three-dimensional pattern of intensity versus frequency versus time, in a manner similar to that of the spectrogram. This plot, called a crosscorrelogram, indicates periodicity by the presence of strong peaks and valleys. These crosscorrelation values can be further manipulated to determine which peak, at any given time, corresponds to the correct F/sub 0/. Rapid changes and the progression of the fundamental frequency over time can be tracked as well, achieving an accuracy and granularity that was not previously possible when using traditionally successful and popular F/sub 0/ extraction techniques such as cepstral analysis or the linear predictive coding (LPC) based methods. In addition, the crosscorrelogram visually provides more detail regarding F/sub 0/ evolution than does the spectrogram, since the latter uses a fixed window size that includes a varying number of pitch periods per window, for narrowband spectrograms, resulting in an average or "smeared" F/sub 0/ value being displayed. >

Proceedings ArticleDOI
25 Oct 1994
TL;DR: In this article, the authors developed instantaneous power and frequency estimators for the components analyzed by the discrete time-frequency distribution (TFD) in Cohen's class for multicomponent musical signals.
Abstract: In a previous article of Pielemeier and Wakefield (see Proc. of IEEE Symposium on Time-Frequency and Time-Scale Analysis, Oct. 4-6, p.421-424, 1992) a discrete time-frequency distribution (TFD) in Cohen's class was developed for multicomponent musical signals. This TFD's separable kernel employs low pass filtering in time to achieve limited superposition between components, and either constant-bandwidth or constant-Q smoothing in frequency. We develop instantaneous power and frequency estimators for the components analyzed by this TFD. In the literature, frequency estimators for discrete distributions often compute frequency based on discrete finite phase differences using periodic statistics. We instead start with the underlying continuous analog signal, and using linear statistics, show that estimates from the discrete distribution can be made arbitrarily accurate for the single component cast, while for the multicomponent case, the estimates are minimally biased by time smoothing. Results are demonstrated showing much less bias than common spectrogram estimators. Multicomponent examples include signals which are inharmonic and contain widely varying levels, and AM and FM signals. >