scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Acoustical Society of America in 1990"


Journal ArticleDOI
Hynek Hermansky1
TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
Abstract: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, is presented and examined. This technique uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum: (1) the critical-band spectral resolution, (2) the equal-loudness curve, and (3) the intensity-loudness power law. The auditory spectrum is then approximated by an autoregressive all-pole model. A 5th-order all-pole model is effective in suppressing speaker-dependent details of the auditory spectrum. In comparison with conventional linear predictive (LP) analysis, PLP analysis is more consistent with human hearing. The effective second formant F2' and the 3.5-Bark spectral-peak integration theories of vowel perception are well accounted for. PLP analysis is computationally efficient and yields a low-dimensional representation of speech. These properties are found to be useful in speaker-independent automatic-speech recognition.

2,969 citations


Journal ArticleDOI
TL;DR: It is shown that the newer extended data on human cadaver ears and from living animal preparations are quite well fit by the same basic function, which increases the function's value in plotting auditory data and in modeling concerned with speech and other bioacoustic signals.
Abstract: Accurate cochlear frequency-position functions based on physiological data would facilitate the interpretation of physiological and psychoacoustic data within and across species. Such functions might aid in developing cochlear models, and cochlear coordinates could provide potentially useful spectral transforms of speech and other acoustic signals. In 1961, an almost-exponential function was developed (Greenwood, 1961b, 1974) by integrating an exponential function fitted to a subset of frequency resolution-integration estimates (critical bandwidths). The resulting frequency-position function was found to fit cochlear observations on human cadaver ears quite well and, with changes of constants, those on elephant, cow, guinea pig, rat, mouse, and chicken (Bekesy, 1960), as well as in vivo (behavioral-anatomical) data on cats (Schucknecht, 1953). Since 1961, new mechanical and other physiological data have appeared on the human, cat, guinea pig, chinchilla, monkey, and gerbil. It is shown here that the newer extended data on human cadaver ears and from living animal preparations are quite well fit by the same basic function. The function essentially requires only empirical adjustment of a single parameter to set an upper frequency limit, while a "slope" parameter can be left constant if cochlear partition length is normalized to 1 or scaled if distance is specified in physical units. Constancy of slope and form in dead and living ears and across species increases the probability that the function fitting human cadaver data may apply as well to the living human ear. This prospect increases the function's value in plotting auditory data and in modeling concerned with speech and other bioacoustic signals, since it fits the available physiological data well and, consequently (if those data are correct), remains independent of, and an appropriate means to examine, psychoacoustic data and assumptions.

1,789 citations


Journal ArticleDOI
TL;DR: Perceptual validation of the relative importance of acoustic cues for signaling a breathy voice quality has been accomplished using a new voicing source model for synthesis of more natural male and female voices.
Abstract: Voice quality variations include a set of voicing sound source modifications ranging from laryngealized to normal to breathy phonation. Analysis of reiterant imitations of two sentences by ten female and six male talkers has shown that the potential acoustic cues to this type of voice quality variation include: (1) increases to the relative amplitude of the fundamental frequency component as open quotient increases; (2) increases to the amount of aspiration noise that replaces higher frequency harmonics as the arytenoids become more separated; (3) increases to lower formant bandwidths; and (4) introduction of extra pole zeros in the vocal-tract transfer function associated with tracheal coupling. Perceptual validation of the relative importance of these cues for signaling a breathy voice quality has been accomplished using a new voicing source model for synthesis of more natural male and female voices. The new formant synthesizer, KLSYN88, is fully documented here. Results of the perception study indicate that, contrary to previous research which emphasizes the importance of increased amplitude of the fundamental component, aspiration noise is perceptually most important. Without its presence, increases to the fundamental component may induce the sensation of nasality in a high-pitched voice. Further results of the acoustic analysis include the observations that: (1) over the course of a sentence, the acoustic manifestations of breathiness vary considerably--tending to increase for unstressed syllables, in utterance-final syllables, and at the margins of voiceless consonants; (2) on average, females are more breathy than males, but there are very large differences between subjects within each gender; (3) many utterances appear to end in a "breathy-laryngealized" type of vibration; and (4) diplophonic irregularities in the timing of glottal periods occur frequently, especially at the end of an utterance. Diplophonia and other deviations from perfect periodicity may be important aspects of naturalness in synthesis.

1,656 citations



Journal ArticleDOI
TL;DR: The speech-reception threshold (SRT) for sentences presented in a fluctuating interfering background sound of 80 dBA SPL is measured and it is shown that hearing-impaired individuals perform poorer than suggested by the loss of audibility for some parts of the speech signal.
Abstract: The speech-reception threshold (SRT) for sentences presented in a fluctuating interfering background sound of 80 dBA SPL is measured for 20 normal-hearing listeners and 20 listeners with sensorineural hearing impairment. The interfering sounds range from steady-state noise, via modulated noise, to a single competing voice. Two voices are used, one male and one female, and the spectrum of the masker is shaped according to these voices. For both voices, the SRT is measured as well in noise spectrally shaped according to the target voice as shaped according to the other voice. The results show that, for normal-hearing listeners, the SRT for sentences in modulated noise is 4-6 dB lower than for steady-state noise; for sentences masked by a competing voice, this difference is 6-8 dB. For listeners with moderate sensorineural hearing loss, elevated thresholds are obtained without an appreciable effect of masker fluctuations. The implications of these results for estimating a hearing handicap in everyday conditions are discussed. By using the articulation index (AI), it is shown that hearing-impaired individuals perform poorer than suggested by the loss of audibility for some parts of the speech signal. Finally, three mechanisms are discussed that contribute to the absence of unmasking by masker fluctuations in hearing-impaired listeners. The low sensation level at which the impaired listeners receive the masker seems a major determinant. The second and third factors are: reduced temporal resolution and a reduction in comodulation masking release, respectively.

844 citations


Journal ArticleDOI
TL;DR: In this paper, a piezoelectric laminate theory that uses the piezelectric phenomenon to effect distributed control and sensing of bending, torsion, shearing, shrinking, and stretching of a flexible plate has been developed.
Abstract: A piezoelectric laminate theory that uses the piezoelectric phenomenon to effect distributed control and sensing of bending, torsion, shearing, shrinking, and stretching of a flexible plate has been developed. This newly developed theory is capable of modeling the electromechanical (actuating) and mechanoelectrical (sensing) behavior of a laminate. Emphasis is on the rigorous formulation of distributed piezoelectric sensors and actuators. The reciprocal relationship of the piezoelectric sensors and actuators is also unveiled. Generalized functions are introduced to disclose the physical concept of these piezoelectric sensors and actuators. It is found that the reciprocal relationship is a generic feature of all piezoelectric laminates.

654 citations


Journal ArticleDOI
TL;DR: In this paper, the authors describe the first meeting of Eusebius and Schumann with the music of Chopin in the presence of an otological genius, and the feeling of this reviewer that this work of Georg von Bekesy stands supreme over the sum of all other work yet published in the field of the anatomy and the physiology of the processes of hearing.
Abstract: or speaking has had the experience of coming across a passage, a phrase, or a word in the writings of others which was so pat and appropriate in its use that it has caused him to say to himself, "Now I wish that I had said that !" As I completed the work and joy of reading this book I thought of Robert Schumann, the German composer, writer, and music critic. Speaking through the mouth of Eusebius, one of his pen names, Schumann described his first meeting with the music of Chopin. His description of this thrilling moment follows. "Eusebius came in softly the other day. You know that ironic smile with which he tries to intrigue you. I was at the piano . . . Eusebius put a piece of music before us with these words, 'Hats off, gentlemen\p=m-\agenius !' " Since I cannot claim authorship of this encomium I humbly borrow it, para¬ phrase it, and use it here: "Hats off, gentlemen, in the presence of an otological genius !" It is the feeling of this reviewer that this work of Georg von Bekesy stands supreme over the sum of all other work yet published in the field of the anatomy and the physiology of the processes of hearing. It is the integration into book form of all the work published by von Bekesy to date of publication. For this

577 citations


Journal ArticleDOI
TL;DR: The dependence of the measurement accuracy on the inclusion of shear waves, the wavelength of sound, and medium attenuation are considered, and the implications for describing the structure of biological soft tissues are discussed.
Abstract: A method for estimating structural properties of random media is described. The size, number density, and scattering strength of particles are estimated from an analysis of the radio frequency (rf) echo signal power spectrum. Simple correlation functions and the accurate scattering theory of Faran [J.J. Faran, J. Acoust. Soc. Am. 23, 405-418 (1951)], which includes the effects of shear waves, were used separately to model backscatter from spherical particles and thereby describe the structures of the medium. These methods were tested using both glass sphere-in-agar and polystyrene sphere-in-agar scattering media. With the appropriate correlation function, it was possible to measure glass sphere diameters with an accuracy of 20%. It was not possible to accurately estimate the size of polystyrene spheres with the simple spherical and Gaussian correlation models examined because of a significant shear wave contribution. Using the Faran scattering theory for spheres, however, the accuracy for estimating diameters was improved to 10% for both glass and polystyrene scattering media. It was possible to estimate the product of the average scattering particle number density and the average scattering strength per particle, but with lower accuracy than the size estimates. The dependence of the measurement accuracy on the inclusion of shear waves, the wavelength of sound, and medium attenuation are considered, and the implications for describing the structure of biological soft tissues are discussed.

478 citations


Journal ArticleDOI
TL;DR: The generally high level of performance obtained with the head orientation technique argues for its utility in continuing studies of sound localization, especially for brief sounds presented in front of the subject.
Abstract: This study measured the ability of subjects to localize broadband sound sources that varied in both horizontal and vertical location. Brief (150 ms) sounds were presented in a free field, and subjects reported the apparent stimulus location by turning to face the sound source; head orientation was measured electromagnetically. Localization of continuous sounds also was tested to estimate errors in the motor act of orienting with the head. Localization performance was excellent for brief sounds presented in front of the subject. The smallest errors, averaged across subjects, were about 2° and 3.5° in the horizontal and vertical dimensions, respectively. The sizes of errors increased, for more peripheral stimulus locations, to maxima of about 20°. Localization performance was better in the horizontal than in the vertical dimension for stimuli located on or near the frontal midline, but the opposite was true for most stimuli located further peripheral. Front/back confusions occurred in 6% of trials; the characteristics of those responses suggest that subjects derived horizontal localization information principally from interaural difference cues. The generally high level of performance obtained with the head orientation technique argues for its utility in continuing studies of sound localization.

469 citations


Journal ArticleDOI
TL;DR: In this article, the accuracy and simplicity of analytical expressions for the relations between frequency and critical bandwidth as well as critical-band rate (in Bark) are assessed for the purpose of applications in the Internet of Things (IoT).
Abstract: Accuracy and simplicity of analytical expressions for the relations between frequency and critical bandwidth as well as critical-band rate (in Bark) are assessed for the purpose of applications in ...

464 citations


Journal ArticleDOI
TL;DR: The nonlinear place-time model provides the most accurate estimates of the fo's of paris of concurrent synthetic vowels and comes closest to predicting the identification responses of listeners to such stimuli.
Abstract: If two vowels with different fundamental frequencies ( f0’s) are presented simultaneously and monaurally, listeners often hear two talkers producing different vowels on different pitches. This paper describes the evaluation of four computational models of the auditory and perceptual processes which may underlie this ability. Each model involves four stages: (i) frequency analysis using an ‘‘auditory’’ filter bank, (ii) determination of the pitches present in the stimulus, (iii) segregation of the competing speech sources by grouping energy associated with each pitch to create two derived spectral patterns, and (iv) classification of the derived spectral patterns to predict the probabilities of listeners’ vowel‐identification responses. The ‘‘place’’ models carry out the operations of pitch determination and spectral segregation by analyzing the distribution of rms levels across the channels of the filter bank. The ‘‘place–time’’ models carry out these operations by analyzing the periodicities in the wavef...

Journal ArticleDOI
TL;DR: The results suggest that, with low levels of stimulation, ADP measurements could form the basis of an objective measure of cochlear function in human subjects.
Abstract: The acoustic intermodulation distortion product 2f1-f2 (ADP) was measured in human subjects to investigate (1) the dependence of ADP level on stimulus parameters and (2) the relationship between ADP level and auditory sensitivity. The frequency ratio (f2/f1), at which ADP level is maximal, varies only slightly across frequency and subjects. The average optimal ratio is 1.225. Beyond the maximum, the ADP level declines with increasing f2/f1 ratio, at rates of up to 250 dB/oct. As the level of one stimulus is increased relative to the other, the ADP grows, saturates, and in most cases shows a bendover. Maximum distortion is generated when L 1 exceeds L 2. Growth rate and saturation point are dependent on which stimulus is incremented and on the level of the stationary stimulus. With optimal stimulus parameters (levels below 60 dB SPL; L 1 greater than L 2 by 15 dB; f2/f1 = 1.225), ADP levels are commonly 30 dB below L 2. Patterns of ADP level across frequency vary between subjects, but are repeatable within each subject. As the frequency of one or both of the stimuli is varied, changes in ADP level exhibit a broadly featured pattern with a fine structure superimposed upon it. This fine structure was compared with the features in the stimulus frequency emission spectrum in one subject. With appropriate stimulus parameters, half of our subjects show a statistically significant correlation across frequency, between ADP level and auditory sensitivity at the corresponding f1 frequency. Our results suggest that, with low levels of stimulation, ADP measurements could form the basis of an objective measure of cochlear function in human subjects.

PatentDOI
TL;DR: In this article, a low bit-rate (192 kBits per second) transform encoder/decoder system (44.1 kHz or 48 kHz sampling rate) for high quality music applications employs short time-domain sample blocks (128 samples/block) so that the system signal propagation delay is short enough for real-time aural feedback to a human operator.
Abstract: A low bit-rate (192 kBits per second) transform encoder/decoder system (44.1 kHz or 48 kHz sampling rate) for high-quality music applications employs short time-domain sample blocks (128 samples/block) so that the system signal propagation delay is short enough for real-time aural feedback to a human operator. Carefully designed pairs of analysis/synthesis windows are used to achieve sufficient transform frequency selectivity despite the use of short sample blocks. A synthesis window in the decoder has characteristics such that the product of its response and that of an analysis window in the encoder produces a composite response which sums to unity for two adjacent overlapped sample blocks. Adjacent time-domain signal samples blocks are overlapped and added to cancel the effects of the analysis and synthesis windows. A technique is provided for deriving suitable analysis/synthesis window pairs. In the encoder, a discrete transform having a function equivalent to the alternate application of a modified Discrete Cosine Transform and a modified Discrete Sine Transform according to the Time Domain Aliasing Cancellation technique or, alternatively, a Discrete Fourier Transform is used to generate frequency-domain transform coefficients. The transform coefficients are nonuniformly quantized by assigning a fixed number of bits and a variable number of bits determined adaptively based on psychoacoustic masking. A technique is described for assigning the fixed bit and adaptive bit allocations. The transmission of side information regarding adaptively allocated bits is not required. Error codes and protected data may be scattered throughout formatted frame outputs from the encoder in order to reduce sensitivity to noise bursts.

Journal ArticleDOI
TL;DR: In this article, four experiments were reported that deal with pitch perception of harmonic complex tones containing up to 11 successive harmonics, and the question was raised whether the pitch percept of the missing fundamental is mediated only by low-order resolvable harmonics or whether it can also be conveyed by high-order harmonics that the cochlea fails to resolve.
Abstract: Four experiments are reported that deal with pitch perception of harmonic complex tones containing up to 11 successive harmonics. In particular, the question is raised whether the pitch percept of the missing fundamental is mediated only by low‐order resolvable harmonics, or whether it can also be conveyed by high‐order harmonics that the cochlea fails to resolve. Melodic interval identification performance was found to remain significantly above chance level even if the range of harmonics extended from the 20th to the 30th. Just‐noticeable differences (jnd) in the pitch of the missing fundamental were found to increase with increasing harmonic order, but to level off when all harmonics are above the 12th. These results are consistent with the notion of the existence of two distinct neural pitch mechanisms in the auditory system, but are, in principle, also compatible with a single central‐spectrum mechanism that uses the interspike interval histograms of auditory‐nerve fibers as inputs.

PatentDOI
TL;DR: In this article, a voice-operated remote control system has two microphones and an ambient noise remover in a transmitter, and one of the microphones picks up a voice command, and the other picks up ambient noise.
Abstract: A voice-operated remote control system has two microphone and an ambient noise remover in a transmitter. One of the microphones picks up a voice command, and the other picks up ambient noise. When the voice command is picked up by one microphone, the ambient noise is also picked up thereby. The ambient noise remover cancels an ambient noise component with an electric signal of the ambient noise picked up by the other microphone, and outputs only a voice command component. The remote control system may have a voice command detector and a mute control circuit. If the preparation of entry of a voice command is detected by the voice command detector, the mute control circuit lowers the sound pressure level of sound reproduced by an information reproducing device before the voice command starts being inputted. The remote control system may also have a recognition condition setting unit for automatically modifying speech recognition conditions if a voice command is rejected to be recognized. The remote control system may further include a speech storage unit for storing input voice commands and a speech reproduction unit for reproducing the stored voice commands as required.

PatentDOI
TL;DR: In this paper, a vibration mount by which a subject body is mounted on a support by means of an intermediate body, and constrictive actuators (22, 24) each of whose dimensions can be changed in a controllable manner by signals supplied from a computer receiving information from sensors on the intermediate body.
Abstract: A vibration mount by which a subject body (10) is mounted on a support (12) by means of an intermediate body (16), and constrictive actuators (22, 24) each of whose dimensions can be changed in a controllable manner by signals supplied from a computer receiving information from sensors on the intermediate body, applies controlled compensating forces to the intermediate body to reduce its vibrations.

Journal ArticleDOI
TL;DR: Differences across subjects were found in the rate of recovery from the refractory state, implying that there may be difference across subjects in the accuracy with which rapid temporal cues can be coded at the level of the auditory nerve.
Abstract: This study describes a method for recording the electrically evoked, whole-nerve action potential (EAP) in users of the Ineraid cochlear implant. The method is an adaptation of one originally used by Charlet de Sauvage et al. [J. Acoust. Soc. Am. 73, 615-627 (1983)] in guinea pigs. The response, recorded from 11 subjects, consists of a single negative peak that occurs with a latency of approximately 0.4 ms. EAP input/output functions are steeply sloping and monotonic. Response amplitudes ranging up to 160 micro V have been recorded. Slope of the EAP input/output function correlates modestly (approximately 0.6-0.69) with results of tests measuring word recognition skills. The refractory properties of the auditory nerve were also assessed. Differences across subjects were found in the rate of recovery from the refractory state. These findings imply that there may be difference across subjects in the accuracy with which rapid temporal cues can be coded at the level of the auditory nerve. Reasonably strong correlations (approximately 0.74-0.85) have been found between the magnitude of the slope of these recovery curves and performance on tests of word recognition.

Journal ArticleDOI
TL;DR: In this article, measurements were made of the piezomagnetic d33 coefficient, the free permeability, μ33T, and the open-circuit elastic compliance coefficient, s33H, of grain-oriented Terfenol-D, Tb0.3Dy0.7Fe1.93, produced by a modified Bridgman technique.
Abstract: Measurements were made of the piezomagnetic d33 coefficient, the free permeability, μ33T, and the open-circuit elastic compliance coefficient, s33H, of grain-oriented Terfenol-D, Tb0.3Dy0.7Fe1.93, produced by a modified Bridgman technique. Prestress levels to 9500 psi (65 MPa) and magnetic bias fields up to 2200 Oe (175 kA/m) were applied with a laboratory electromagnetic modified so that one pole piece served as a hydraulically actuated piston. The results indicate that d33, μ33T, and s33H are dependent on stress and magnetic field, so that proper mechanical prestress and magnetic bias conditions are critical to the successful use of Terfenol-D in transducers and actuators. Radiated-power limits for underwater acoustic projectors are estimated and compared with those for projectors having lead zirconate titanate (PZT-4) piezoelectric drivers. Terfenol-D based transducers can be as much as 8 dB superior to PZT-4 under low-Q, field-limited conditions (Q=mechanical quality factor).

Journal ArticleDOI
TL;DR: Responses of children and older adults demonstrated similar context effects, with two exceptions: Children used the semantic constraints of sentences to a lesser extent than did young or older adults, and Older adults used lexical constraints to a greater extent than either of the other two groups.
Abstract: Perception is influenced both by characteristics of the stimulus, and by the context in which it is presented. The relative contributions of each of these factors depend, to some extent, on perceiver characteristics. The contributions of word and sentence context to the perception of phonemes within words and words within sentences, respectively, have been well studied for normal, young adults. However, far less is known about these context effects for much younger and older listeners. In the present study, measures of these context effects were obtained from young children (ages 4 years 6 months to 6 years 6 months) and from older adults (over 62 years), and compared with those of the young adults in an earlier study [A. Boothroyd and S. Nittrouer, J. Acoust. Soc. Am. 84, 101–114 (1988)]. Both children and older adults demonstrated poorer overall recognition scores than did young adults. However, responses of children and older adults demonstrated similar context effects, with two exceptions: Children us...

Journal ArticleDOI
TL;DR: This article developed a large-sample, 50-speaker x-ray microbeam speech database, incorporating point-parametrized representations of lingual, labial, mandibular, and velar movements in association with the resulting acoustic sound pressure wave, for a rich set of utterances and oral motor tasks.
Abstract: A broad review of literature describing lingual function during speech shows that speaker samples per study are typically small (N<3 in more than 80% of all cases), and that speech samples, and representational and analysis conventions are highly variable. Similar conclusions can be drawn for other articulators. Thus it is fair to argue that there is still not available any valid, statistically‐defensible sense of normal speech motor behavior, against which disordered articulatory behavior can be compared. Accordingly, a large‐sample, 50‐speaker x‐ray microbeam speech database will be developed at the University of Wisconsin, incorporating point‐parametrized representations of lingual, labial, mandibular, and velar movements in association with the resulting acoustic sound pressure wave, for a rich set of utterances and oral motor tasks, and lengthy recording interval (circa 18 min/speaker). The database is intended to be uniform across speakers in task inventory and descriptive kinematic framework; suffi...

Journal ArticleDOI
TL;DR: Two languages (Ndebele and Shona) with the phonemic vowels /i,e,a,o,u/ were found to have greater anticipatory coarticulation for the target vowel /a/ than does a language (Sotho) that has a more crowded mid- and low-vowel space, with the sounds of phonemic vows.
Abstract: Languages differ in their inventories of distinctive sounds and in their systems of contrast. Here, it is proposed that this observation may have predictive value with respect to how extensively various phones are coarticulated in particular languages. This hypothesis is based on three assumptions: (1) There are ‘‘output constraints’’ on just how a given phone can be articulated; (2) output constraints are, at least in part, affected by language‐particular systems of phonetic contrast; and (3) coarticulation is limited in a way that respects those output constraints. Together, these assumptions lead to the expectation that, in general, languages will tend to tolerate less coarticulation just where extensive coarticulation would lead to confusion of contrastive phones. This prediction was tested by comparing acoustic measures of anticipatory vowel‐to‐vowel coarticulation in languages that differ in how they divide up the vowel space into contrastive units. The acoustic measures were the first and second formant frequencies, measured in the middle and at the end of the target vowels /a/ and /e/, followed by /pV/, where /V/ was /i,e,a,o,u/. Two languages (Ndebele and Shona) with the phonemic vowels /i,e,a,o,u/ were found to have greater anticipatory coarticulation for the target vowel /a/ than does a language (Sotho) that has a more crowded mid‐ and low‐vowel space, with the phonemic vowels /i,e,q,a,co,u/. The data were based on recordings from three speakers of each of the languages.

Journal ArticleDOI
TL;DR: It is suggested that spectral features in the 8- to 18-kHz region provide some of the necessary spectral information for sound localization and that the contrast in spectral energy between the frequencies at the notch and its shoulders is a potential directional cue.
Abstract: Free‐field to eardrum transfer functions were measured in anesthetized cats inside an anechoic chamber. Direction‐dependent transformations were determined by measurement of sound‐pressure levels using a small probe tube microphone surgically implanted in a ventral position near the tympanic membrane. Loudspeaker and probe microphone characteristics were eliminated by subtraction of the signal recorded in the free field with no animal present. Complexities of the transfer function, which include the presence of prominent spectral notches in the 8‐ to 18‐kHz frequency region, are due primarily to the acoustical properties of the pinna. Differential amplification of frequency components within the broadband stimulus occurs as a function of source direction. Spectral features vary systematically with changes in both elevation (EL) and azimuth (AZ). The contrast between a notch and its shoulders is enhanced in the interaural spectral records. Spectral data from single source locations and spatial data for single frequencies at many locations are presented and comparisons with other species are drawn. It is suggested that spectral features in the 8‐ to 18‐kHz region provide some of the necessary spectral information for sound localization and that the contrast in spectral energy between the frequencies at the notch and its shoulders is a potential directional cue.

PatentDOI
TL;DR: In this article, an improved pulsatile system for a cochlear prosthesis is described, which employs a multi-spectral peak coding strategy to extract a number of spectral peaks from an incoming acoustic signal received by a microphone.
Abstract: An improved pulsatile system for a cochlear prosthesis is disclosed. The system employs a multi-spectral peak coding strategy to extract a number, for example five, of spectral peaks from an incoming acoustic signal received by a microphone. It encodes this information into sequential pulses that are sent to selected electrodes of a cochlear implant. The first formant (F1) spectral peak (280-1000 Hz) and the second formant (F2) spectral peak (800-4000 Hz) are encoded and presented to apical and basal electrodes, respectively, F1 and F2 electrode selection follows the tonotopic organization of the cochlea. High-frequency spectral information is sent to more basal electrodes and low-frequency spectral information is sent to more apical electrodes. Spectral energy in the regions of 2000-2800 Hz, 2800-4000 Hz, and above 4000 Hz is encoded and presented to three fixed electrodes. The fundamental or voicing frequency (F0) determines the pulse rate of the stimulation during voiced periods and a pseudo-random aperiodic rate determines the pulse rate of stimulation during unvoiced periods. The amplitude of the acoustic signal in the five bands determines the stimulus intensity.

Journal ArticleDOI
TL;DR: The role of spectral cues in the sound source to ear transfer function in median plane sound localization is investigated to investigate the role of microscopic and macroscopic patterns in the transfer functions for median plane localization.
Abstract: The role of spectral cues in the sound source to ear transfer function in median plane sound localization is investigated in this paper. At first, transfer functions were measured and analyzed. Then, these transfer functions were used in experiments where sounds from a source on the median plane were simulated and presented to subjects through headphones. In these simulation experiments, the transfer functions were smoothed by ARMA models with different degrees of simplification to investigate the role of microscopic and macroscopic patterns in the transfer functions for median plane localization. The results of the study are summarized as follows: (1) For front–rear judgment, information derived from microscopic peaks and dips in the low‐frequency region (below 2 kHz) and the macroscopic patterns in the high‐frequency region seems to be utilized; (2) for judgment of elevation angle, major cues exist in the high‐frequency region above 5 kHz. The information in macroscopic patterns is utilized instead of that in small peaks and dips.

PatentDOI
TL;DR: In this paper, a phonological rules generator associates sequences of labels derived from vocalizations of a training text with respective phonemes inferred from the training text and clustered into groups representing similar pronunciations of each phoneme.
Abstract: A continuous speech recognition system includes an automatic phonological rules generator which determines variations in the pronunciation of phonemes based on the context in which they occur. This phonological rules generator associates sequences of labels derived from vocalizations of a training text with respective phonemes inferred from the training text. These sequences are then annotated with their pheneme context from the training text and clustered into groups representing similar pronunciations of each phoneme. A decision tree is generated using the context information of the sequences to predict the clusters to which the sequences belong. The training data is processed by the decision tree to divide the sequences into leaf-groups representing similar pronunciations of each phoneme. The sequences in each leaf-group are clustered into sub-groups representing respectively different pronunciations of their corresponding phoneme in a give context. A Markov model is generated for each sub-group. The various Markov models of a leaf-group are combined into a single compound model by assigning common initial and final states to each model. The compound Markov models are used by a speech recognition system to analyze an unknown sequence of labels given its context.

Journal ArticleDOI
TL;DR: It was found that the balance between auditive and cognitive contributions to speech perception performance did not change with age, and the deterioration of speech perception in the elderly consists of two statistically independent components.
Abstract: In part I of this study [van Rooij et al., J. Acoust. Soc. Am. 86, 1294-1309 (1989)], the validity and manageability of a test battery comprising auditive (sensitivity, frequency resolution, and temporal resolution), cognitive (memory performance, processing speed, and intellectual abilities), and speech perception tests (at the phoneme, spondee, and sentence level) were investigated. In the present article, the results of a selection of these tests for 72 elderly subjects (aged 60-93 years) are analyzed by multivariate statistical techniques. The results show that the deterioration of speech perception in the elderly consists of two statistically independent components: (a) a large component mainly representing the progressive high-frequency hearing loss with age that accounts for approximately two-thirds of the systematic variance of the tests of speech perception and (b) a smaller component (accounting for one-third of the systematic variance of the speech perception tests) mainly representing a general performance decrement due to reduced mental efficiency, which is indicated by a general slowing of performance and a reduced memory capacity. Although both components are correlated with age, it was found that the balance between auditive and cognitive contributions to speech perception performance did not change with age.

Journal ArticleDOI
TL;DR: In this article, a model of a driven spherical gas bubble in water is applied to determine its dynamic properties, especially its resonance behavior and bifurcation structure, and the dynamic properties are described in a growing level of abstraction by radius-time curves, trajectories in state space, strange attractors in the Poincare plane, basins of attraction, bifurlcation diagrams, winding number diagrams, and phase diagrams.
Abstract: Methods from chaos physics are applied to a model of a driven spherical gas bubble in water to determine its dynamic properties, especially its resonance behavior and bifurcation structure. The dynamic properties are described in a growing level of abstraction by radius‐time curves, trajectories in state space, strange attractors in the Poincare plane, basins of attraction, bifurcation diagrams, winding number diagrams, and phase diagrams. A sequence of bifurcation diagrams is given, exemplifying the recurrent pattern in the bifurcation set and its relation to the resonances of the system. Period‐doubling cascades to chaos and back (‘‘period bubbling’’) are a prominent recurring feature connected with each resonance (demonstrated for period‐1, period‐2, and period‐3 resonances, and observed for some higher‐order resonances). The recurrent nature of the bifurcation set is most easily seen in the phase diagrams given. A similar structure of the bifurcation set has also been found for other nonlinear oscillators (Duffing, Toda, laser, and Morse).

Journal ArticleDOI
TL;DR: In this paper, a new computational capability is described for calculating the sound-pressure field radiated or scattered by a harmonically excited, submerged, arbitrary, three-dimensional elastic structure.
Abstract: A new computational capability is described for calculating the sound‐pressure field radiated or scattered by a harmonically excited, submerged, arbitrary, three‐dimensional elastic structure. This approach, called nashua, couples a nastranfinite element model of the structure with a boundary element model of the surrounding fluid. The surface fluid pressures and normal velocities are first calculated by coupling the finite element model of the structure with a discretized form of the Helmholtz surfaceintegral equation for the exterior fluid. After generation of the fluid matrices, most of the required matrix operations are performed using the general matrix manipulation package available in nastran. Farfield radiated pressures are then calculated from the surface solution using the Helmholtz exterior integral equation. The overall capability is very general, highly automated, and requires no independent specification of the fluid mesh. An efficient, new, out‐of‐core block equation solver was written so that very large problems could be solved. The use of nastran as the structural analyzer permits a variety of graphical displays of results, including computer animation of the dynamic response. The overall approach is illustrated and validated using known analytic solutions for submerged spherical shells subjected to both incident pressure and uniform and nonuniform applied mechanical loads.

PatentDOI
TL;DR: In this paper, a tactile aid for deaf individuals includes a conveniently wearable array of vibration transducers mounted on a flexible carrier strip so as to be positionally biased toward the individuals's skin, each transducers includes an enclosed magnetically vibratable cantilevered beam, an angled mounting slot to effect the positional bias mounting on the strip, and contact pads rather than conventional pin connectors to both reduce mass and facilitate electrical connection to contact pads on carrier strips in the form of printed circuit boards.
Abstract: A tactile aid for deaf individuals includes a conveniently wearable array of vibration transducers mounted on a flexible carrier strip so as to be positionally biased toward the individuals's skin. Each transducer includes an enclosed magnetically vibratable cantilevered beam, an angled mounting slot to effect the positional bias mounting on the strip, and contact pads rather than conventional pin connectors to both reduce mass and facilitate electrical connection to contact pads on carrier strips in the form of printed circuit boards. Received acoustic signals are processed in at least first and second formant circuits each sub-divided into plural sub-bands corresponding to the number of transducers. In each formant circuit the corresponding formant frequency and amplitude are measured, averaged over a predetermined number of cycles, and a voltage corresponding to the measured amplitude is provided on a corresponding sub-hand channel to pulse width modulate a pulse train employed to excite the transducers. Spectral resolution can be increased by averaging the amplitude/frequency measurements over fewer cycles. Glottal pulses in the speech signal are monitored to provide a glottal rate signal at a stepped down frequency related to the actual glottal rate. The channel amplitude is modulated by the glottal signal to cause alternation of transducer excitation at the related glottal rate. Additional formants may be detected to derive additional information in the excitation signal.

Journal ArticleDOI
TL;DR: It was found that variability of articulation rate, measured as the average syllable duration for interpause intervals (runs), is not random, but is the natural consequence of the content of the run.
Abstract: Further analyses have been made on readings of two scripts by six talkers [T. H. Crystal and A. S. House, J. Acoust. Soc. Am. 72, 705–716 (1982); 84, 1932–1935 (1988); 83, 1553–1573 (1988)]. Durations of syllables and stress groups are compared to earlier data and various pertinent published reports, and are used to evaluate reports of articulation rate variability. The average durations of syllables of different complexity have a quasilinear dependency on the number of phones in the syllable, where the linear factor and the vowel durations are functions of stress. The duration of stress groups has a quasilinear dependency on the number of syllables and the number of phones. It was found that variability of articulation rate, measured as the average syllable duration for interpause intervals (runs), is not random, but is the natural consequence of the content of the run. Durations of comparable runs of different talkers are highly correlated. Major determinants of average syllable duration are the average...