scispace - formally typeset
Search or ask a question

Showing papers on "Spectrogram published in 1988"


Patent
09 Jun 1988
TL;DR: In this article, an apparatus and method of using markers for identifying particular points on a quasi-3-dimensional display, such as a color spectrogram display or a waterfall display of multiple frequency spectra on an electronic spectrum analyzer, is presented.
Abstract: An apparatus and method of using markers for identifying particular points on a quasi-3-dimensional display, such as a color spectrogram display or a waterfall display of multiple frequency spectra on an electronic spectrum analyzer, so that amplitude, time, and frequency values associated with a particular point can be conveniently read out, and so that differences in amplitude, time, and frequency between two points can be easily calculated and presented to the user. Two markers whose positions are ascertainable are generated on the quasi-3-dimensional display and are made subject to operator control. One of these markers is positioned by the operator on a particular point of interest and the values associated with that location are then displayed for readout with greater precision and convenience than would otherwise be possible. A second marker is placed at a second point of interest and the differences in the values of amplitude, time, and frequency between the two points are calculated and displayed.

33 citations


Journal ArticleDOI
TL;DR: In this paper, a smoothed pseudo Wigner estimator was used to suppress cross-terms which characterize the wigner distribution of multi-component signals. But this estimator is not suitable for the analysis of speech data.
Abstract: The Wigner distribution is shown to be a useful tool for the analysis of speech data. Cross-terms which characterize the Wigner distribution of multi-component signals have been suppressed using the smoothed pseudo Wigner estimator. This estimator possesses several advantages over the short time periodogram. Comparison of the SPWD to that of the spectrogram reveals that much better resolution is achieved by the SPWD.

14 citations



Journal ArticleDOI
TL;DR: The improvement gained by this method of parameter extraction from smoothed spectrograms, rather than from raw spectrogram, is demonstrated and the method is used successfully in a reference value study of cardiac Doppler signals.
Abstract: A computer processing method has been developed for the extraction of parameters from cardiac Doppler signals. This method is based on the nature of these signals and on the method of their measurement. The parameters are estimated after background subtraction from adequately smoothed spectrograms. The improvement gained by this method of parameter extraction from smoothed spectrograms, rather than from raw spectrograms, is demonstrated. The method is used successfully in a reference value study of cardiac Doppler signals.

10 citations


Patent
17 Mar 1988
TL;DR: In this article, a fuzzy vector quantization was proposed to decrease quantization distortions while using the conventional code book and to enable the normalization of a spectrogram with high accuracy without increasing the leaning words in speaker adaptation.
Abstract: PURPOSE: To decrease quantization distortions while using the conventional code book and to enable the normalization of a spectrogram with high accuracy without increasing the leaning words in speaker adaptation by introducing the fuzzy vector quantization which expresses input vectors by the degree of reversion to the existing code book. CONSTITUTION: The fuzzy vector quantization which expresses the input vector by the degree of the reversion from the existing code vector is executed in accordance with a digitized speech signal and thereafter, the spectrogram is extracted and the correspondence is executed between the different speakers with respect to the code book of the vector quantization. The spectrogram is normalized in accordance with this correspondence. The fuzzy vector quantization expressing the input vector by the degree of reversion to the existing code book is thereby executed and the correspondence between the different speakers is executed by using the conventional code book. The normalization of the spectrogram is executed in accordance with this correspondence. The quantization distortions are thereby decreased and the normalization of the spectrogram is executed with the high accuracy without increasing the leaning words in the speaker adaptation. COPYRIGHT: (C)1989,JPO&Japio

7 citations


Journal ArticleDOI
TL;DR: This paper describes a method for relating speech recognition performance to HMM structure using a best‐path (Viterbi) trace of the model through the input data, which is represented using well‐known LPC parameters.
Abstract: Hidden Markov models have gained wide acceptance in speech recognition due to the ability to construct optimum (maximum likelihood) models automatically from speech data. Understanding how recognition performance is related to a model's structure is not a simple matter, however, especially where the structure is complex. Thus analysis of errors, especially in terms of the properties of the speech data, is not often undertaken. This paper describes a method for relating speech recognition performance to HMM structure. The basic tool is a best‐path (Viterbi) trace of the model through the input data, which is represented using well‐known LPC parameters. Linear predictor parameters are used as auxiliary model parameters, independent of the HMM output parameters used for recognition, in order to take advantage of established LPC speech synthesis and LPC speech spectrogram utilities. The trace can then be used to gain insight into error mechanisms, often leading to improvements in model structure and system pe...

6 citations


Journal ArticleDOI
L.R. Morris1
TL;DR: The author describes how a TMS32010-based Tl-speech PC board from Texas Instruments was combined with an off-the-shelf, PC graphics board to produce a real-time speech spectrograph system that produces high-quality spectrograms within 5 s of speech input.
Abstract: The author describes how a TMS32010-based Tl-speech PC board from Texas Instruments was combined with an off-the-shelf, PC graphics board to produce a real-time speech spectrograph system. The Realtime Spectral Lab (RSL) is a relatively inexpensive, PC-based, commercial-quality system. It produces high-quality spectrograms of up to 2 s of speech within 5 s of speech input. Also discussed are the algorithms chosen and the software structures evolved to implement the system. The resulting instrument is flexible. Dual spectrograms can show either simultaneous wideband and narrowband analyses of the same utterance, or equal frequency-resolution versions with differing time resolution. In addition, users can compare a target spectrogram to changing, real-time input. They can also mark a wideband spectrogram with cursors and expand the selected time segment into another, fine time-resolution spectrogram. The discussion is a case study of nontrivial DSP (digital signal processing) systems design and implementation by hardware selection and DSP programming alone. >

5 citations


Journal ArticleDOI
TL;DR: In this article, the authors examined spectrograms and Wigner distributions of a set of synthetic speech utterances, and performed a simple formant tracking task (initial slope estimation) on them.
Abstract: The Wigner distribution, a time‐frequency signal representation similar to the spectrogram, has recently been applied to speech analysis. It offers higher resolution than the spectrogram, but introduces artifacts that do not correspond to the components of the signal. These artifacts seem to be characterized by negative values of the Wigner distribution. If so, skilled experimental subjects should be able to visually identify and disregard the artifacts. Subjects examined spectrograms and Wigner distributions of a set of synthetic speech utterances, and performed a simple formant‐tracking task (initial slope estimation) on them. The subjects performed this task marginally better using the Wigner distribution than the spectrogram. This performance advantage of the Wigner distribution held over the range of formant trajectories that the synthesizer could reliably produce. Performance depended critically on the subjects' understanding of artifacts. A theoretical prediction of a limit on formant slope, previously untested, was consistent with both spectrograms and Wigner distributions of natural speech. The overall conclusion is that the Wigner distribution is a viable alternative to the spectrogram for analysis of rapid spectrum changes. However, the burden of its greater complexity seems to outweigh its potential performance advantages.

4 citations


Book ChapterDOI
01 Jan 1988
TL;DR: A paradigm for the extraction and interpretation of speech knowledge contained in speech spectrograms is proposed which attempts to integrate knowledge-based extraction of relevant speech properties and statistical modelling of their distortions.
Abstract: Speaker-independent recognition of large or difficult vocabularies by computers is still an unsolved task, even if the words are pronounced in an isolated manner. Using existing knowledge about production and perception of speech, phonemes, diphones and syllables can be useful for conceiving prototypes of Speech Units. Speech Unit prototypes can be characterized by a redundant set of Acoustic properties. Automatic Speech Recognition (ASR) systems based on acoustic property descriptors is not very efficient if the set of properties used and the algorithms for their extraction are not well chosen and conceived. For this reason, it is worth investigating property descriptors based on those properties that are expected to be robust speaker-independent cues of fundamental phonetic events. Speech spectrogram is an invaluable tool in ASR research and it contains rich acoustic and phonetic knowledge about speech. Expert human spectrogram readers are able to interpret speech spectrograms by visual examination. The interpretation is usually based on the experts' linguistic knowledge and correlating this knowledge with the characteristic pattern of speech. Machines can have similar capability if patterns of various speech units can be collected, described, and learned statistically. Based on the above considerations, this work proposes a paradigm for the extraction and interpretation of speech knowledge contained in speech spectrograms. The model proposed attempts to integrate knowledge-based extraction of relevant speech properties and statistical modelling of their distortions. The speech spectrograms are considered as patterns and knowledge contained in these patterns is described as morphologies and represented as a taxonomy. The recognition model uses a frame-work of Procedural Network which uses networks of actions performing variable depth analysis and integrates cognitive and information-theoretic approaches. Experimental results are reported for a large number of speakers using digits and letters as test data.

3 citations


Journal ArticleDOI
TL;DR: In this paper, the smoothed pseudo-Wigner distribution (SPWD) is applied to speech signals to improve the quality of the time-frequency representation of speech signals.
Abstract: Recently, a great deal of interest has been shown in applying the Wigner‐Ville distribution to obtain time‐frequency energy representation of nonstationary signals. The utility of the Wigner distribution has proved useful in analyzing monocomponent signals. In the case of multicomponent signals, the Wigner distribution adds cross terms without any physical significance to the time‐frequency distribution. The presence of cross terms obscures the actual spectral features of interest and makes the results very misleading. To apply the Wigner‐Ville distribution to speech signals, it is essential to remove all cross terms not of interest. This problem may be solved by smoothing the Wigner‐Ville distribution independently in time and frequency directions using the “smoothed” pseudo‐Wigner estimator. This estimator possesses several advantages over the short time periodogram. This presentation will concentrate on the application of the smoothed pseudo‐Wigner distribution (SPWD) to speech signals and will demonstrate the ability of the SPWD to improve the quality of the time‐frequency representation of speech signals. In particular, a comparison will be made between the spectrogram and the SPWD, and it will be shown how high‐frequency spectral features may be easily detected from an SPWD but are far less obvious in the spectrogram. Moreover, it will be shown that the SPWD is more appropriate for the analysis of formant structure.

2 citations


Book ChapterDOI
01 Jan 1988
TL;DR: This paper describes a new technique for finding objects in spectrograms by starting with a multiscale representation of speech spectra, and illustrates the idea with an application to the formant-tracking task.
Abstract: This paper describes a new technique for finding objects in spectrograms, and illustrates the idea with an application to the formant-tracking task. Starting with a multiscale representation of speech spectra, a probabilistic relaxation labelling algorithm is applied to determine primitive interpretations of the spectral components. Finally, a cross-scale integration procedure enables the scale space to be collapsed in a principled manner. The techniques are illustrated with an example of voiced speech.

Journal ArticleDOI
TL;DR: In this paper, a flexible speech labeling system that simulates the trained reader capability is proposed, and the main task of the system is to apply the trial-and-error process used in a human reader's labeling work.
Abstract: It was revealed that a trained human spectrogram reader could perform accurate speech labeling, and that accuracy was based on the flexibility of his/her decision process using many kinds of spectrographic features [S. Katagiri, SP87‐115, IEICE Tech. Rep. (1988)]. In this paper, a new flexible speech labeling system that simulates the trained reader capability is proposed. The main task of the system is to apply the trial‐and‐error process used in a human reader's labeling work. Therefore, a relaxation method is adopted here [S. Katagiri, 2‐1‐19, A. S. J. Spring Meeting (1988)]. The system consists of three parts: an acoustic analyzer, a verifier, and a supervisor. In the acoustic analyzer, many kinds of acoustic feature candidates, e.g., formant and pitch frequencies, are calculated. In the verifier, possible speech labels are verified. The supervisor, with a behavior principle based on the relaxation method, controls the whole system. Experimental results show that the system's performance is comparable...


Proceedings ArticleDOI
10 Oct 1988
TL;DR: A discussion of nodal network methodologies that can be used to effectively implement algorithms that range from signal processing to discrete logic systems in a large-scale single-instruction multiple-data (SIMD) parallel processor such as the Connection Machine (CM) is presented.
Abstract: A discussion of nodal network methodologies that can be used to effectively implement algorithms that range from signal processing to discrete logic systems in a large-scale single-instruction multiple-data (SIMD) parallel processor such as the Connection Machine (CM) is presented. As a first step, two versions of an algorithm to track formats in speech are implemented. The first implementation uses data parallel coding techniques, the second implementation uses a nodal network. The algorithm contains logic that, at every frequency/time point in a spectrogram, chooses between several filters to find the filter that best matches linear energy structure at that point. The choice of filter at each point is determined on the basis of information in adjacent points. The nodal network implementation of the algorithm uses only two node types, a fuzzy AND and a fuzzy OR. The connections between nodes can be either noninverting or inverting. The inverting effectively produces a NOT. The algorithm relies on the parameters associated with each node and connectivity between the nodes to simulate the original algorithm. The result is a nodal network programmed to identify formats in a spectrogram. The two implementations are comparable in performance and speed of execution. >

Book ChapterDOI
01 Jan 1988
TL;DR: A model of peripheral auditory processing is implemented using a Texas Instruments TMS 320C25 Digital Signal Processor mounted on an IBM PC-AT compatible microcomputer that performs both the spectral analysis of a sampled signal and the auditory transformation at a speed that allows for real-time presentation of an “auditory spectrogram” of any audio signal.
Abstract: A model of peripheral auditory processing is implemented using a Texas Instruments TMS 320C25 Digital Signal Processor mounted on an IBM PC-AT compatible microcomputer. The TMS 320C25, running at 40 MHz, performs both the spectral analysis of a sampled signal and the auditory transformation at a speed that allows for real-time presentation of an “auditory spectrogram” of any audio signal. Such a representation would serve as the basis for parameterization, feature extraction, etc., in an ASR system.