scispace - formally typeset
Search or ask a question

Showing papers on "Spectrogram published in 1979"


Proceedings ArticleDOI
02 Apr 1979
TL;DR: Several spectrogram reading experiments that were designed to determine the amount of phonetic information that is contained in the speech signal are presented and implications for speech recognition and aids for the deaf are discussed.
Abstract: This paper presents the results of several spectrogram reading experiments that were designed to determine the amount of phonetic information that is contained in the speech signal. The task involved identifying the phonetic content of an unknown utterance only from a visual examination of the spectrogram. In the first experiment, one of the authors attempted to phonetically label spectrograms of normal and anomalous English utterances as well as words in a known carrier phrase. The results, when compared with the transcriptions of three phoneticians who listened to the utterances, indicated an overall agreement of better than 85% on the sentences, and 93% for words in a carrier phrase. In the second and third experiments, we investigated the speed at which spectrogram reading can be accomplished. In the final experiment, five students read spectrograms of normal English sentences after a 13-week course in acoustic phonetics. Working as a group, the class agreed with the transcribers on over 80% of the segments. Implications for speech recognition and aids for the deaf are discussed.

69 citations


Proceedings ArticleDOI
01 Apr 1979
TL;DR: An effective and computationally inexpensive method of enhancing the linear prediction analysis/synthesis of noisy speech by proposing a preprocessing filter that is capable of perfectly removing the "expected" noise signal when the input speech spectrum is closely approximated by the noisy speech spectrum.
Abstract: The goal of this study was to develop an effective and computationally inexpensive method of enhancing the linear prediction analysis/synthesis of noisy speech. To this end, a preprocessing filter has been proposed that is capable of perfectly removing the "expected" noise signal when the input speech spectrum is closely approximated by the noisy speech spectrum. The proposed filter has been evaluated by the linear prediction distance measure, perceptual listening, and spectrograms. This evaluation has demonstrated the effectiveness of the filter for broadband noise removal. The filter has also been implemented as a preprocessing filter in a real time LPC system. The total processing time for the filtering is only 2.6 msec per 22.5 msec frame. In this system, the LPC analysis and synthesis takes a combined time of 13 msec.

10 citations


Journal ArticleDOI
TL;DR: A collection of sound images generated in the course of a larger study of musical psychoacoustics, and more specifically have served as an aid in the areas of automatic pitch-tracking and melodic pattern recognition of performed music.
Abstract: The spectrum of a musical signal contains features that are important in our perception of musical sounds. We know that the timbre or tone quality depends, in part, on the spectrum of the tone (Grey:1978). Also, the location of prominent frequency components in the spectrum have been shown to relate to the perceived pitch of the tone as well (Piszczalski and Galler: 1978). Unfortunately, a single spectrum provides no information on the time-varying aspects of sound. Consequently, methods have been devised to capture and display the time-varying spectrum of sounds. Perhaps the best known display of amplitude -frequencytime information is in "sound spectrograms", more commonly known as "voiceprints." The speech community has long used the analog spectrograph to generate these images, and musical examples were studied using this technique as early as 1947 (Potter et al.). More recently digital methods have been employed to capture the spectrographic image. Both digital filters and the Fast Fourier Transform (FFT) can give spectrographic information, with the FFT being the method used in the figures presented here. For more information on the actual computer implementation cf. (Piszczalski and Galler: 1978.) For spectrographic analysis, the digital computer has advantages over the analog spectrograph by providing more sophisticated graphics displays, including a three-dimensional surface representation, which we refer to as the "spectral surface" of the sound. Also, digitized data is more amenable to further analysis and processing. Spectrographic displays in general offer a unique graphic perspective for music acoustics research; in particular, in studying sounds produced on traditional musical instruments. We can study how the harmonic envelope changes during the course of a single sustained tone. We can also view how the spectrum changes between played notes. We can also watch the effect of articulation, such as legato-tonguing, on the resulting spectrum. Spectrograms also reduce the danger of incorrectly assuming that an arbitrary spectrum is "representative" for all tones produced on that instrument. The variety of shapes the harmonic envelope may take can be especially dramatic when a sequence of notes is simultaneously displayed, as is often the case in the following figures. Depending on what perceptual features are sought, different frequency and time scales should be used. We wanted the melodic patterns to be as visually obvious as possible, so we optimized towards identifying the note sequences. In the following spectral surfaces a new spectrum was calculated for every 32 msec of music. We have found this time interval to be sufficiently dense to capture the most rapidly played note sequences that we have studied to date (up to 14 notes/sec). The frequency scale is subdivided into 128 equally spaced points between 0 Hz and the maximum frequency indicated on the respective graphs. The hidden-line, three-dimensional surface algorithm used for our graphics was implemented on a Hewlett-Packard minicomputer system by Frederick Looft. The frequency, amplitude and time scales are linear in all cases. These images were generated in the course of our larger study of musical psychoacoustics, and more specifically have served as an aid in the areas of automatic pitch-tracking and melodic pattern recognition of performed music. While we have found the spectral surface representations quite helpful, many of these graphs have generated fascinating questions which we simply have not had time yet to explore to any depth or to evaluate using other techniques. Hearing the source of the sounds while viewing the figures makes them particularly effective. Still, this collection of sound images, by themselves, should be informative for those in the music analysis/synthesis area. Except where otherwise noted, audio sources were recorded at the University of Michigan. This research is supported by the National Science Foundation (Grant No. MC578-09052).

6 citations



Journal ArticleDOI
TL;DR: In this article, a set of sentences were segmented using three different techniques: eye, ear, and computer playback of digitized speech samples, and the average measurement differences between the judges ranged from roughly 20 to 35 ms.
Abstract: Locating phoneme boundaries, or “segmenting” the acoustic speech signal, is often necessary in speech research and can be accomplished using different procedures. At the present time, there are little quantitative data on the differences between different methods of segmentation. The purpose of the present study was to compare segmentation data obtained using three different techniques. Speech segment boundaries were located for a set of sentences in the following ways: (1) by eye from digitized waveforms, (2) by eye from digitized spectrograms, and (3) by ear from computer playback of digitized speech samples. Each of the three techniques was employed by three judges who segmented the same set of sentences independently of each other. The data revealed that, on the average, the differences between boundary measurements for the various techniques ranged from roughly 5 to 25 ms. Average measurement differences between the judges ranged from roughly 20 to 35 ms. The magnitude of the differences between the ...

2 citations


Journal ArticleDOI
TL;DR: In this article, three short stories were recorded and speech spectrograms were made of the individual sentences of each story, and three stories contained a total of 670 words, and 612 (91%) were correctly identified.
Abstract: In order to assess the role of syntactic, semantic, and discourse knowledge in spectrogram reading, three short stories were recorded and speech spectrograms were made of the individual sentences of each story. The stories were presented one spectrogram at a time to an expert spectrogram reader who is instructed to read each story word‐by‐word without writing down segment labels. The three stories contained a total of 670 words, and 612 (91%) were correctly identified. The median reading time per sentence across the three stories was about 40 s, or about 20 times real time. However, syllable‐by‐syllable analysis of reading times in one story reveal that this value was inflated by a small subset of syllables and words that took a long time to decode—sometimes over a minute. The modal (most frequent) reading time per syllable fell between 1–2 s, or between 3 and 6 times real‐time. Further analysis reveal that many common syllables were immediately recognized as complete patterns (e.g., “ment”, “tion”), and the use of context to recognize words from partial information was evident in many cases. Implications of the results for real‐time spectrogram reading will be discussed.

1 citations


Journal ArticleDOI
TL;DR: In this article, a pitch and amplitude estimate is made for each short-time section (described in “Predicting musical pitch from component ratios” available from the authors), and then tone segmentation is performed by seeking tone transition patterns indicated by abrupt jumps in pitch or associated amplitude crossings over a minimum amplitude threshold.
Abstract: Successive short‐time (32 ms) spectra are calculated either with the FFT or, experimentally, with the C2T CCD to yield the digitized spectra. Pattern matching on the resulting digital spectrogram is done to detect the sharply emerging spectral peaks that often indicate tone beginnings. A check is made for the overlap of previous partials onto the hypothesized new tone at such points and these past partials are filtered out (over time) until they disappear or a new tone is suggested by another emerging spectral peak condition. A pitch and amplitude estimate is then made for each short‐time section (described in “Predicting musical pitch from component ratios” available from the authors). The resulting pitch versus time contour is first smoothed and then tone segmentation is performed by seeking tone‐transition patterns indicated by abrupt jumps in pitch or associated amplitude crossings over a minimum amplitude threshold. Single notes probably fragmented by spurious octave jumps are compacted and other ver...

1 citations


Journal ArticleDOI
TL;DR: In this paper, an adaptive non-recursive (NR) filter with application to spectrum analysis is presented, where the adaptive nature is implemented with a weight adjustment algorithm (LMS) on the filter that performs a steepest descent minimization of mean square error.

1 citations


Proceedings ArticleDOI
01 Jan 1979
TL;DR: In this article, a unified framework for echo analysis is presented, based on spectrogram analysis and pattern recognition techniques are used to determine the features of the echo spectrogram that are most important to a classifier.
Abstract: Fish schools and long-range propagation channels can be described by a scattering function, i.e., by the expected distribution of Doppler shifts and delays that are introduced by a collection of moving point scatterers. Acoustic inhomogeneities such as man-made objects and minerals on or below the ocean floor can be characterized by the frequency dependence of sound absorption and reflectivity, and by the positions of highlights, or rapid changes in acoustic impedance. Highlight structure and frequency dependence can be obtained from the time-frequency energy density function of the target impulse response. Both the scattering function and the time-frequency energy density function can be represented by the spectrogram of an echo. Spectrogram analysis therefore provides a unified framework for echo analysis. Experimental results are given in order to illustrate the use of spectrograms for detection and maximum likelihood echo classification. Pattern recognition techniques are used to determine the features of the echo spectrogram that are most important to a classifier.

Journal ArticleDOI
TL;DR: The method proposed is based on using line enhancement and grouping procedures to isolate these dominant components and a family of two-dimensional matched filters are designed for each signature and used to detect the weaker members of the family.