scispace - formally typeset
Search or ask a question

Showing papers on "Spectrogram published in 1986"


Proceedings ArticleDOI
07 Apr 1986
TL;DR: This paper investigates the feasibility of constructing a knowledge-based system that mimics the process of spectrogram reading by humans and achieves performance that is comparable to that of the experts in a task of identifying stop consonants extracted from continuous speech.
Abstract: Human experts can determine the phonetic identity of unknown utterances from a visual examination of the spectrogram with performance better than available computer systems. The spectrogram-reading process involves the use of multiple sources of knowledge, including articulatory movements, acoustic phonetics, phonotactics, and linguistics. In addition, the experts' performance can be attributed to their ability to deal with partial and/or conflicting information, as well as multiple cues. This paper investigates the feasibility of constructing a knowledge-based system that mimics the process of spectrogram reading by humans. In a task of identifying stop consonants extracted from continuous speech, the system achieved performance that is comparable to that of the experts.

74 citations


Journal ArticleDOI
TL;DR: In this paper, several demonstrations of a novel representation are presented which, in some cases, can make subtle differences in input signals obvious to the human analyst.
Abstract: While the spectrogram (and related graphic analyses) have been invaluable in showing the general frequency content of an input signal, sometimes it is difficult for trained and untrained users to see on the spectrogram differences which are perceptible to the ear. In this paper, several demonstrations of a novel representation are presented which, in some cases, can make subtle differences in input signals obvious to the human analyst. The representation, a "symmetrized dot pattern" (SDP), provides a stimulus in which local visual correlations are integrated to form a global percept and can potentially be applied to the detection and characterization of significant features of any sampled data.

68 citations


Journal ArticleDOI
TL;DR: It is concluded that the approach described here offers the promise of progress towards the automatic recognition of multi-speaker continuous speech.
Abstract: An approach to the problem of automatic speech recognition based on spectrogram reading is described. Firstly, the process of spectrogram reading by humans is discussed, and experimental findings presented which confirm that it is possible to learn to carry out such a process with some success. Secondly, a knowledge-engineering approach to the automation of the linguistic transcription of spectrograms is described and some results are presented. It is concluded that the approach described here offers the promise of progress towards the automatic recognition of multi-speaker continuous speech.

22 citations


Proceedings ArticleDOI
01 Apr 1986
TL;DR: A computational model of the peripheral auditory system consisting of a bank of digital filters followed by compression and half-wave rectification stages and by a set of generalized synchrony detectors that respond to coherence in the signal at the center frequency of the channel is described.
Abstract: At the 1984 IEEE ICASSP meeting Seneff described a computational model of the peripheral auditory system consisting of a bank of digital filters followed by compression and half-wave rectification stages and by a set of generalized synchrony detectors (gsd's) that respond to coherence in the signal at the center frequency of the channel We have added adjacent-channel cross-correlation and modified the gsd This results in improved sensitivity to formants in noise and allows human frequency masking measurements to be replicated quantitatively When the output of the model is used in a speech recognition task it shows an advantage over a conventional filter-bank representation both with undistorted speech and in the presence of noise and linear distortion Spectrograms generated from the model are presented both for artificially degraded speech and for speech recorded in flight in a helicopter and a fighter/trainer

21 citations


Proceedings ArticleDOI
07 Apr 1986
TL;DR: This work uses an expert system to formalise and test knowledge in spectrogram reading, and describes a task designed to observe system functioning and evaluate knowledge presently used.
Abstract: One of the major problems that has plagued speech processing at the acoustic-phonetic level is the extreme variability of the speech signal. Experienced spectrogram readers presently seem to come closer to achieving acoustic-phonetic identification than do automatic techniques. For this reason, and conscious of the difficulty of gathering human expertise, we have chosen to use an expert system to formalise and test knowledge in spectrogram reading. Two aspects of expertise have been explored: knowledge and strategy. Our expert's specific knowledge is formalised in the form of production rules and thus can be progressively modified. This knowledge covers acoustic, phonetic, and phonotactic information. A forward-chaining inference engine with variables is implemented for the control structure. It uses a global progressive strategy which manages confidence coefficients. We describe a task designed to observe system functioning and evaluate knowledge presently used. Results show that sufficient knowledge has been collected to correctly carrry out the task.

20 citations


Proceedings ArticleDOI
07 Apr 1986
TL;DR: Preliminary results show that this processing technique retains relevant acoustic information necessary to identify the underlying phonetic representation of vowels and consonants in the speech spectrogram.
Abstract: This paper describes a system that applies vision techniques to extract acoustic patterns in the speech spectrogram. By processing a spectrographic image through a set of edge detectors and combining their outputs, the system obtains two-dimensional objects that characterize the formant patterns and general spectral properties for vowels and consonants. As a validation of the approach, a limited vowel recognition experiment was performed on the "object" spectrograms. Preliminary results show that this processing technique retains relevant acoustic information necessary to identify the underlying phonetic representation.

9 citations


Journal ArticleDOI
TL;DR: In this article, Klatt et al. compared signal processing techniques for the estimation of formant frequencies and bandwidths of synthesized and natural speech characterized by a high fundamental frequency.
Abstract: Formant measurement procedures often rely on there being a low fundamental frequency. An early study [B. Lindblom, International Congress of Phonetic Sciences, 4th, Helsingfors, 1961, 189–202 (1962)] found that the mean error in formant estimation ranged from about 40 Hz to a frequency of one‐fourth the fundamental. This study compares signal processing techniques for the estimation of formant frequencies and bandwidths of synthesized and natural speech characterized by a high fundamental frequency. Utterances were synthesized [D. H. Klatt, J. Acoust. Soc. Am. 67, 971–995 (1980)] using young children's utterances as models. The spectral and durational characteristics were matched closely by manipulating the synthesizer parameters. Spectrograms, discrete Fourier transforms, linear prediction envelopes, and auditory pseudospectrograms were computed for both the synthesized and natural utterances. The accuracy of formant estimation was judged by comparing the values determined by each of these methods to the...

4 citations


01 Jan 1986
TL;DR: This paper investigates the feasibility of constructing a knowledge-based system that mimics the process of spectrogram reading by humans and achieves performance that is comparable to that of the experts in a task of identifying stop consonants extracted from continuous speech.
Abstract: Human experts can determine the phonetic identity of unknown utterances from a visual examination of the spectrogram with performance better than available computer systems. The spectrogram-readiig process involves the use of multiple sources of knowledge, including articulatory movements, acoustic phonetics, phonotactics, and linguistics. In addition, the experts’ performance can be attributed to their ability to deal with partial and/or conflicting information, as well as multiple cues. This paper investigates the feasibility of constructing a knowledge-based system that mimics the process of spectrogram reading by humans. In a task of identifying stop consonants extracted from continuous speech, the system achieved performance that is comparable to that of the experts.

3 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compared the performance of the WD and the spectrogram in terms of the time and frequency resolutions, and showed that the WD provides excellent resolution in the time-frequency (t − ω) plane, compared with conventional short-time spectral analysis techniques.
Abstract: The WD is a powerful tool for the simultaneous time‐frequency analysis of time‐varying signals [Martin and Flandrin, IEEE Trans. Acoust. Speech Signal Process. ASSP‐33 (Dec. 1985)]. For signals with simple time‐frequency structures (e.g., frequency‐modulated signals), the WD provides excellent resolution in the time‐frequency (t − ω) plane, as compared with conventional short‐time spectral analysis techniques. The interpretation of the WD becomes more difficult as the time‐frequency relationship of the signal gets more complex (e.g., in speech), because of the presence of cross terms and of regions below (i.e., negative) the t − ω plane. This disadvantage can be overcome by smoothing the WD in the t and ω directions over a region TΩ > 1 (T and Ω are the “smear” values in time and frequency) [Janssen and Claasen, IEEE Trans. Acoust. Speech Signal Process. ASSP‐33 (Aug. 1985)]. Such a smoothing process, however, makes the WD and the spectrogram equivalent in terms of the time and frequency resolutions. This presentation is directed towards comparing the performances of the WD (with partial smoothing, i.e., T < 1) and the spectrogram. Nonspeech signals as well as speech signals will be used to demonstrate the application of WD to the analysis of speech signals. [Work supported by NSERC, Canada.]

3 citations


01 Jan 1986
TL;DR: The principal results of this study are that, in terms of the receiving operating curves (ROC), the adaptive receiver performs better than the linear one which, in turn, performsbetter than the robust soft limiter.
Abstract: Three receivers are compared for the detection of a known signal in additive ambient underwater noise of seagoing merchant ves- sels. These receivers are: the matched filter, which is the classical lin- ear receiver based on a Gaussian assumption; the soft-limiter, which is the robust receiver when the noise uncertainty is modeled as a mix- ture process with a Gaussian nominal; and the Gaussian-Gaussian mixture likelihood ratio receiver. This last receiver is adaptive in the sense that it is based on a parametric model whose parameters are computed from the actual data. The principal results of this study are that, in terms of the receiving operating curves (ROC), the adaptive receiver performs better than the linear one which, in turn, performs better than the robust soft limiter. This study illustrates the merit of the simple mixture model in adaptive processing for signal detection purposes. Abstract-The reconstruction problem we address is that of working backwards from a spectrogram to a waveform producing it. We de- velop a time-sequential algorithm for reconstruction and investigate its performance as various parameters are changed. Because the algo- rithm is time sequential, memory requirements do not grow linearly with the length of the desired reconstruction. We have found this bounded-memory property important in efficiently using an array pro- cessor to speed up computer simulations requiring reconstruction. Ad- ditionally, a time-sequential bounded-memory, algorithm would be es- sential if a hardware realization intended for continuous operation in real time were eyer to be attempted.

1 citations