scispace - formally typeset
Search or ask a question

Showing papers on "Spectrogram published in 1983"


Journal ArticleDOI
TL;DR: In this article, the authors introduce the Wigner distribution function (WDF) as a self-windowed complex spectrogram and suggest some methods for the optical generation of the WDF of two-dimensional signals.
Abstract: We introduce the Wigner distribution function (WDF) as a self-windowed complex spectrogram and suggest some methods for the optical generation of the WDF of two-dimensional signals. The resulting WDFs, since they are four-dimensional functions, are represented as sectional images displayed either in parallel or as temporal sequences. We give some experimental results for real-valued input signals obtained from different coherent-optical WDF processors.

54 citations


Proceedings ArticleDOI
01 Jan 1983
TL;DR: An expert system which attempts to simulate human performance at deriving phonetic transcriptions from speech spectrograms, and is highly interactive, allowing users to investigate hypotheses and edit rules.
Abstract: Various authors have pointed out that humans can become quite adept at deriving phonetic transcriptions from speech spectrograms (as good as 90% accuracy at the phoneme level). In this paper, we describe an expert system which attempts to simulate this performance. The speech spectrogram expert (SPEX) is actually a society made up of three experts: a 2-dimensional vision expert, an acoustic-phonetic expert, and a phonetics expert. The visual reasoning expert finds important visual features of the spectrogram. The acoustic-phonetic expert reasons about how visual features relate to phonemes, and about how phonemes change visually in different contexts. The phonetics expert reasons about allowable phoneme sequences and transformations, and deduces an English spelling for phoneme strings. The speech spectrogram expert is highly interactive, allowing users to investigate hypotheses and edit rules.

18 citations


Proceedings ArticleDOI
01 Apr 1983
TL;DR: The relatively high performance achieved using the numerical measurements, together with other considerations for selecting input representations for expert systems, suggest that the numerical tables are the most appropriate of the four forms of input.
Abstract: A series of experiments was performed in order to select a set of acoustic measurements for use as input to an expert system for stop consonant recognition. In the experiments, a trained human spectrogram reader made six-way (/b,d,g,p,t,k/) classifications of syllable-initial stops using four different data representations: DFT spectrograms, LPC spectrograms, LPC spectral slices and tables of numerical measurements. Percent correct identification was 79%, 81%, 72% and 76%, respectively, for the four data sets. The relatively high performance achieved using the numerical measurements, together with other considerations for selecting input representations for expert systems, suggest that the numerical tables are the most appropriate of the four forms of input.

15 citations


Journal ArticleDOI
TL;DR: In this article, two groups of subjects were presented with spectrograms of 50 words they had never seen before and were asked to provide a single monosyllabic English word for each spectrogram.
Abstract: Two groups of subjects were presented with spectrograms of 50 words they had never seen before and were asked to provide a single monosyllabic English word for each spectrogram. One group had learned to identify a limited set of speech spectrograms after 15 h of training using a study‐test procedure which stressed wholistic word identification. A subgroup of participants in a formal course on spectrogram reading at MIT served as a second group of subjects. These subjects learned specific acoustic and phonetic principles and strategies for interpreting spectrograms. Subjects in the first group correctly identified 33% of the possible phonetic segments in spectrograms they had never seen before. The second group of subjects correctly identified 40% of the possible segments in the same set of spectrograms given to the first group. When the data were scored for correct manner class of consonants only, the two groups did not differ significantly. Detailed descriptions of the identification results will be presented. Implications of these findings for developing visual aids for hearing impaired persons and improved phonetic recognition strategies will be discussed. [Supported by NSF.]

5 citations


Proceedings ArticleDOI
26 Oct 1983
TL;DR: It is shown how the Local Power Spectra (LPS) representation serves for the computation of a kind of tex-ture gradient and how the Spectral Power Excerpts (SPE) represenation proved to be useful in automatic interferogram evaluation.
Abstract: Two ways of representing 2D signals as 4D spectrograms and their relation to the Wigner Distribution Function (WDF) are discussed. Some methods for their coherent optical generation and suitable display, e.g. partially sampled, are given. Furthermore it is shown how the Local Power Spectra (LPS) representation serves for the computation of a kind of tex-ture gradient and how the Spectral Power Excerpts (SPE) represenation proved to be useful in automatic interferogram evaluation.© (1983) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

3 citations


Journal ArticleDOI
TL;DR: In this paper, a method for locating spectral energy concentrations is given that takes advantage of their usual continuity in time, and thus performs superior to locating peaks in spectral cross sections, by smoothing and flattening convolutions in both the time and frequency dimensions of narrow-band spectrograms.
Abstract: An initial step in wide vocabulary, continuous speech recognition is proposed that roughly consists of a schematization of the features seen in conventional spectrograms—e.g., peaks and edges of spectral energy concentrations, temporal discontinuities, and spectral balance information. In a second step, these features can be mapped onto their acoustic‐phonetic correlates—e.g., formant distribution, voice onsets, articulatory closures. A method for locating spectral energy concentrations is given that takes advantage of their usual continuity in time, and thus performs superior to locating peaks in spectral cross sections. It begins with smoothing and flattening convolutions in both the time and frequency dimensions of narrow‐band spectrograms to select the appropriate temporal and spectral scales. Ridges in the resulting two‐dimensional (time‐frequency) surfaces correspond to local spectral energy concentrations. The tops of these ridges are found by the application of a two‐dimensional differential operator at each point in the time‐frequency plane. The operator's definition in terms of the relationship between the gradient and principal directions will be given, along with justification and examples.

3 citations


Journal ArticleDOI
TL;DR: The effect of processing speech signals with a simple model of the peripheral auditory system that transforms the signals in each of the frequency, amplitude, and temporal dimensions results in distinctive responses to segments of speech exhibiting rapid changes in frequency or amplitude.
Abstract: The effect of processing speech signals with a simple model of the peripheral auditory system will be described. Compared with a standard speech spectrogram or filter bank analysis, this model transforms the signals in each of the frequency, amplitude, and temporal dimensions. In particular, the temporal transformation models the adaptation of auditory neurons to sustained energy in a particular frequency hand, and results in distinctive responses to segments of speech exhibiting rapid changes in frequency or amplitude (consonants). These transformations result in new relationships between the phonetic identity of the speech and the observable characteristics of the transformed signal. Examples of the response of this model to speech signals will be presented, and the response properties that correspond to the phonetic identity of the signals will be discussed. [Work supported by NSF.]

3 citations



Journal ArticleDOI
TL;DR: In this paper, the authors developed an algorithm for estimating formant frequencies on a time frame by time frame basis, incorporating acoustic phonetic knowledge about formants, which is based on knowledge of acoustic phonetics, and do not merely rely on spectral peaks.
Abstract: Algorithms were developed for estimating formant frequencies on a time frame by time frame basis. The algorithms incorporate acoustic phonetic knowledge about formants. The algorithms are based on knowledge of acoustic phonetics, and do not merely rely on spectral peaks. Thus two formants will be identified even when the first and second formants merge or when the second and third formants merge to form a single peak. The algorithms were evaluated by comparing formants drawn on a speech spectrogram by a trained phonetician to those drawn automatically. The data base consisted of both male and female speakers speaking phonetically balanced sentences. In addition, synthesis derived from the formant frequencies and amplitudes was tested for intelligibility. Results will be presented for these two procedures. [Supported by NSF and DARPA.]

2 citations


Journal ArticleDOI
TL;DR: A class of colored displays for analyzing the spectral content of nonstationary processes based on a natural mapping that exists between colors and pairs of zero mean, equal variance complex random variables are defined.
Abstract: This paper defines a class of colored displays for analyzing the spectral content of nonstationary processes. The displays are based on a natural mapping that exists between colors and pairs of zero mean, equal variance complex random variables, and are capable of representing signal features not easily recognizable on a conventional spectrogram. Displays are constructed for both one and two channel processing and an example is presented of the single channel display.

1 citations