scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Acoustical Society of America in 1997"


Journal ArticleDOI
TL;DR: Basilar-membrane responses to single tones were measured, using laser velocimetry, at a site of the chinchilla cochlea located 3.5 mm from its basal end, and compressive growth of responses to tones with frequency near CF is accompanied by intensity-dependent phase shifts.
Abstract: Basilar-membrane responses to single tones were measured, using laser velocimetry, at a site of the chinchilla cochlea located 3.5 mm from its basal end. Responses to low-level ( 80 dB the largest responses are elicited by tones with frequency about 0.4–0.5 octave below CF. For stimulus frequencies well above CF, responses stop decreasing with increasing frequency: A plateau is reached. The compressive growth of responses to tones with frequency near CF is accompanied by intensity-dependent phase shifts. Death abolishes all nonlinearities, reduces sensitivity at CF by as much as 60–81 dB, and causes a relative phase lead at CF.

775 citations


Journal ArticleDOI
TL;DR: It is shown that at the edges of prosodic domains, initial consonant and final vowels have more extreme lingual articulations, which are called articulatory strengthening, and it is suggested that this initial strengthening could provide an alternative account for previously observed supralaryngeal declination of consonants.
Abstract: In this paper it is shown that at the edges of prosodic domains, initial consonant and final vowels have more extreme (less reduced) lingual articulations, which are called articulatory strengthening. Linguopalatal contact for consonants and vowels in different prosodic positions was compared, using reiterant-speech versions of sentences with a variety of phrasings read by three speakers of American English. Four prosodic domains were considered: the phonological word, the phonological (or intermediate) phrase, the intonational phrase, and the utterance. Domain-initial consonants show more linguopalatal contact than domain-medial or domain-final consonants, at three prosodic levels. Most vowels, on the other hand, show less linguopalatal contact in domain-final syllables compared to domain-initial and domain-medial. As a result, the articulatory difference between segments is greater around a prosodic boundary, increasing the articulatory contrast between consonant and vowels, and prosodic domains are marked at both edges. Furthermore, the consonant initial strengthening is generally cumulative, i.e., the higher the prosodic domain, the more linguopalatal contact the consonant has. However, speakers differed in how many and which levels were distinguished in this way. It is suggested that this initial strengthening could provide an alternative account for previously observed supralaryngeal declination of consonants. Acoustic duration of the consonants is also affected by prosodic position, and this lengthening is cumulative like linguopalatal contact, but the two measures are only weakly correlated.

772 citations


Journal ArticleDOI
TL;DR: This paper investigated the effects of training in /r/−/l/ perceptual identification on Japanese spoken utterances and found that the knowledge gained during perceptual learning transferred to the production domain, and thus provided novel information regarding the relationship between speech perception and production.
Abstract: This study investigated the effects of training in /r/–/l/ perceptual identification on /r/–/l/ production by adult Japanese speakers. Subjects were recorded producing English words that contrast /r/ and /l/ before and after participating in an extended period of /r/–/l/ identification training using a high-variability presentation format. All subjects showed significant perceptual learning as a result of the training program, and this perceptual learning generalized to novel items spoken by new talkers. Improvement in the Japanese trainees’ /r/–/l/ spoken utterances as a consequence of perceptual training was evaluated using two separate tests with native English listeners. First, a direct comparison of the pretest and post-test tokens showed significant improvement in the perceived rating of /r/ and /l/ productions as a consequence of perceptual learning. Second, the post-test productions were more accurately identified by English listeners than the pretest productions in a two-alternative minimal-pair identification procedure. These results indicate that the knowledge gained during perceptual learning of /r/ and /l/ transferred to the production domain, and thus provides novel information regarding the relationship between speech perception and production.

661 citations


Journal ArticleDOI
TL;DR: Formant dispersion is the averaged difference between successive formant frequencies, and was found to be closely tied to both vocal tract length and body size in macaques, and probably many other species.
Abstract: Body weight, length, and vocal tract length were measured for 23 rhesus macaques (Macaca mulatta) of various sizes using radiographs and computer graphic techniques. Linear predictive coding analysis of tape-recorded threat vocalizations was used to determine vocal tract resonance frequencies (“formants”) for the same animals. A new acoustic variable is proposed, “formant dispersion,” which should theoretically depend upon vocal tract length. Formant dispersion is the averaged difference between successive formant frequencies, and was found to be closely tied to both vocal tract length and body size. Despite the common claim that voice fundamental frequency (F0)provides an acoustic indication of body size, repeated investigations have failed to support such a relationship in many vertebrate species including humans. Formant dispersion, unlike voice pitch, is proposed to be a reliable predictor of body size in macaques, and probably many other species.

637 citations


Journal ArticleDOI
TL;DR: A quantitative model for describing data from modulation-detection and modulation-masking experiments is presented, which proposes that the typical low-pass characteristic of the temporal modulation transfer function observed with wide-band noise carriers is not due to "sluggishness" in the auditory system, but can instead be understood in terms of the interaction between modulation filters and the inherent fluctuations in the carrier.
Abstract: This paper presents a quantitative model for describing data from modulation-detection and modulation-masking experiments, which extends the model of the ‘‘effective’’ signal processing of the auditory system described in Dau et al. @J. Acoust. Soc. Am. 99, 3615‐3622 ~1996!#. The new element in the present model is a modulation filterbank, which exhibits two domains with different scaling. In the range 0‐10 Hz, the modulation filters have a constant bandwidth of 5 Hz. Between 10 Hz and 1000 Hz a logarithmic scaling with a constant Q value of 2 was assumed. To preclude spectral effects in temporal processing, measurements and corresponding simulations were performed with stochastic narrow-band noise carriers at a high center frequency ~5 kHz!. For conditions in which the modulation rate ( f mod) was smaller than half the bandwidth of the carrier (D f ), the model accounts for the low-pass characteristic in the threshold functions @e.g., Viemeister, J. Acoust. Soc. Am. 66, 1364‐1380 ~1979!#. In conditions with f mod.D f /2, the model can account for the high-pass characteristic in the threshold function. In a further experiment, a classical masking paradigm for investigating frequency selectivity was adopted and translated to the modulation-frequency domain. Masked thresholds for sinusoidal test modulation in the presence of a competing modulation masker were measured and simulated as a function of the test modulation rate. In all cases, the model describes the experimental data to within a few dB. It is proposed that the typical low-pass characteristic of the temporal modulation transfer function observed with wide-band noise carriers is not due to ‘‘sluggishness’’ in the auditory system, but can instead be understood in terms of the interaction between modulation filters and the inherent fluctuations in the carrier. © 1997 Acoustical Society of America.@S0001-4966~97!05611-7#

580 citations


PatentDOI
TL;DR: In this paper, a medical ultrasonic diagnostic imaging system is provided which is capable of being accessed over data communication networks such as the Internet, making the ultrasonic images, diagnostic reports, and ultrasound system diagnostics information and operation accessible to a conventional personal computer using commercially available software at virtually any remote location.
Abstract: A medical ultrasonic diagnostic imaging system is provided which is capable of being accessed over data communication networks such as the Internet, making the ultrasonic images, diagnostic reports, and ultrasound system diagnostics information and operation accessible to a conventional personal computer using commercially available software at virtually any remote location. In one embodiment, the ultrasound system can be remotely operated from the personal computer. The inventive apparatus and techniques make it possible for physicians to remotely access, control, and perform diagnoses using their ultrasound systems over a network such as the World Wide Web with no special hardware requirements.

528 citations


Journal ArticleDOI
TL;DR: In this paper, a simple model is constructed for the dynamic thermal permeability k′(ω), which is completely analogous to the Johnson et al. [J. 176, 379 (1987)] model of dynamic viscous permeability K(ω).
Abstract: Measurements of dynamic compressibility of air-filled porous sound-absorbing materials are compared with predictions involving two parameters, the static thermal permeability k0′ and the thermal characteristic dimension Λ′. Emphasis on the notion of dynamic and static thermal permeability—the latter being a geometrical parameter equal to the inverse trapping constant of the solid frame—is apparently new. The static thermal permeability plays, in the description of the thermal exchanges between frame and saturating fluid, a role similar to the viscous permeability in the description of the viscous forces. Using both parameters, a simple model is constructed for the dynamic thermal permeability k′(ω), which is completely analogous to the Johnson et al. [J. Fluid Mech. 176, 379 (1987)] model of dynamic viscous permeability k(ω). The resultant modeling of dynamic compressibility provides predictions which are closer to the experimental results than the previously used simpler model where the compressibility i...

497 citations


PatentDOI
TL;DR: In this paper, a touch sensor consisting of an acoustic wave transmissive medium having a surface, a plurality of acoustic wave path forming systems, each generating a set of incrementally varying paths through the transmission medium, and a receiver for receiving signals representing the sets of waves, a portion of each set overlapping temporally or physically by propagating along axes which are not orthogonal.
Abstract: A touch sensor (3) comprising an acoustic wave transmissive medium having a surface; a plurality of acoustic wave path forming systems, each generating a set of incrementally varying paths through the transmission medium; and a receiver for receiving signals representing the sets of waves, a portion of each set overlapping temporally or physically by propagating in the transmissive medium along axes which are not orthogonal.

476 citations


PatentDOI
Dimitri Kanevsky1, Stephane H. Maes1
TL;DR: In this article, a method and apparatus for securing access to a service or facility employing automatic speech recognition, text-independent speaker identification, natural language understanding techniques and additional dynamic and static features is presented.
Abstract: A method and apparatus for securing access to a service or facility employing automatic speech recognition, text-independent speaker identification, natural language understanding techniques and additional dynamic and static features. The method includes the steps of receiving and decoding speech containing indicia of the speaker such as a name, address or customer number; accessing a database containing information on candidate speakers; questioning the speaker based on the information; receiving, decoding and verifying an answer to the question; obtaining a voice sample of the speaker and verifying the voice sample against a model; generating a score based on the answer and the voice sample; and granting access if the score is equal to or greater than a threshold. Alternatively, the method includes the steps of receiving and decoding speech containing indicia of the speaker; generating a sub-list of speaker candidates having indicia substantially matching the speaker; activating databases containing information about the speaker candidates in the sub-list; performing voice classification analysis; eliminating speaker candidates based on the voice classification analysis; questioning the speaker regarding the information; eliminating speaker candidates based on the answer; and iteratively repeating prior steps until one speaker candidate (in which case the speaker is granted access), or no speaker candidate remains (in which case the speaker is not granted access).

474 citations


Journal ArticleDOI
TL;DR: It is shown that the model can simulate new experimental results that show how the quality of the pitch percept is influenced by the resolvability of the harmonic components of the stimulus complex and it is not necessary to postulate two separate mechanisms to explain different pitch percepts associated with resolved and unresolved harmonics.
Abstract: A model of the mechanism of residue pitch perception is revisited. It is evaluated in the context of some new empirical results, and it is proposed that the model is able to reconcile a number of differing approaches in the history of theories of pitch perception. The model consists of four sequential processing stages: peripheral frequency selectivity, within-channel half-wave rectification and low-pass filtering, within-channel periodicity extraction, and cross-channel aggregation of the output. The pitch percept is represented by the aggregated periodicity function. Using autocorrelation as the periodicity extraction method and the summary autocorrelation function (SACF) as the method for representing pitch information, it is shown that the model can simulate new experimental results that show how the quality of the pitch percept is influenced by the resolvability of the harmonic components of the stimulus complex. These include: (i) the pitch of harmonic stimuli whose components alternate in phase; (ii) the increased frequency difference limen of tones consisting of higher harmonics; and (iii) the influence of a mistuned harmonic on the pitch of the complex as a function of its harmonic number. To accommodate these paradigms, it was necessary to compare stimuli along the length of the SACF rather than relying upon the highest peak alone. These new results demonstrate that the model responds differently to complexes consisting of low and high harmonics. As a consequence, it is not necessary to postulate two separate mechanisms to explain different pitch percepts associated with resolved and unresolved harmonics.

432 citations


Journal ArticleDOI
TL;DR: The results of an experiment in which frequency information was altered but temporal information was not altered indicates that vowel recognition is based on information in the frequency domain even when the number of channels of stimulation is small.
Abstract: Vowels, consonants, and sentences were processed through software emulations of cochlear-implant signal processors with 2-9 output channels. The signals were then presented, as either the sum of sine waves at the center of the channels or as the sum of noise bands the width of the channels, to normal-hearing listeners for identification. The results indicate, as previous investigations have suggested, that high levels of speech understanding can be obtained using signal processors with a small number of channels. The number of channels needed for high levels of performance varied with the nature of the test material. For the most difficult material--vowels produced by men, women, and girls--no statistically significant differences in performance were observed when the number of channels was increased beyond 8. For the least difficult material--sentences--no statistically significant differences in performance were observed when the number of channels was increased beyond 5. The nature of the output signal, noise bands or sine waves, made only a small difference in performance. The mechanism mediating the high levels of speech recognition achieved with only few channels of stimulation may be the same one that mediates the recognition of signals produced by speakers with a high fundamental frequency, i.e., the levels of adjacent channels are used to determine the frequency of the input signal. The results of an experiment in which frequency information was altered but temporal information was not altered indicates that vowel recognition is based on information in the frequency domain even when the number of channels of stimulation is small.

Patent
TL;DR: In this paper, an acoustic signature recognition and identification system receives signals from a sensor placed on a designated piece of equipment, and the acoustic data is digitized and processed, via a Fast Fourier Transform routine, to create a spectrogram image of frequency versus time.
Abstract: An acoustic signature recognition and identification system receives signals from a sensor placed on a designated piece of equipment. The acoustic data is digitized and processed, via a Fast Fourier Transform routine, to create a spectrogram image of frequency versus time. The spectrogram image is then normalized to permit acoustic pattern recognition regardless of the surrounding environment or magnitude of the acoustic signal. A feature extractor then detects, tracks and characterizes the lines which form the spectrogram. Specifically, the lines are detected via a KY process that is applied to each pixel in the line. A blob coloring process then groups spatially connected pixels into a single signal object. The harmonic content of the lines is then determined and compared with stored templates of known acoustic signatures to ascertain the type of machinery. An alert is then generated in response to the recognized and identified machinery.

PatentDOI
TL;DR: An ultrasonic catheter device for removing obstructive matter from anatomical passageway includes a tubular body with distal head member in which a plurality of fluid passageways are formed about the lateral surface to expel fluid outward generally perpendicular to the longitudinal axis of the body.
Abstract: An ultrasonic catheter device for removing obstructive matter from anatomical passageway includes a tubular body with distal head member in which a plurality of fluid passageways are formed about the lateral surface to expel fluid outward generally perpendicular to the longitudinal axis of the body.

Journal ArticleDOI
TL;DR: The acoustic properties of bovine cancellous (spongy) bone have been experimentally studied in vitro by the pulse transmission technique and theoretical discussion is given to Biot's theory and the propagation of sound waves in fluid-saturated porous media.
Abstract: The acoustic properties of bovine cancellous (spongy) bone have been experimentally studied in vitro by the pulse transmission technique Fast and slow longitudinal waves have been clearly identified when the acoustic wave propagates parallel to the direction of the trabeculae Propagation speeds and attenuation of the fast and slow waves were observed in the frequency range of 05–5 MHz Theoretical discussion is given to Biot’s theory and the propagation of sound waves in fluid-saturated porous media

Journal ArticleDOI
TL;DR: A set of acoustic parameters of the voicing source that reflect individual differences in the voice qualities of female speakers are formulated to contribute to the description of normal variations of voicing characteristics across speakers and to a continuing effort to improve the analysis and synthesis of female speech.
Abstract: The aim of the research reported in this paper is to formulate a set of acoustic parameters of the voicing source that reflect individual differences in the voice qualities of female speakers. Theoretical analysis and observations of experimental data suggest that a more open glottal configuration results in a glottal volume-velocity waveform with relatively greater low-frequency and weaker high-frequency components, compared to a waveform produced with a more adducted glottal configuration. The more open glottal configuration also leads to a greater source of aspiration noise and larger bandwidths of the natural frequencies of the vocal tract, particularly the first formant. These different attributes of the glottal waveform can be measured directly from the speech spectrum or waveform. A set of acoustic parameters that are likely to indicate glottal characteristics is described. These parameters are measured in the speech of a group of female speakers, and the glottal configurations of the speakers are hypothesized. This research contributes to the description of normal variations of voicing characteristics across speakers and to a continuing effort to improve the analysis and synthesis of female speech. It may also have applications in clinical settings.

Journal ArticleDOI
TL;DR: The combination of the modulation filterbank concept and the optimal decision algorithm proposed here appears to present a powerful strategy for describing modulation-detection phenomena in narrow-band and broadband conditions.
Abstract: A multi-channel model, describing the effects of spectral and temporal integration in amplitude-modulation detection for a stochastic noise carrier, is proposed and validated. The model is based on the modulation filterbank concept which was established in the accompanying paper [Dau et al., J. Acoust. Soc. Am. 102, 2892–2905 (1997)] for modulation perception in narrow-band conditions (single-channel model). To integrate information across frequency, the detection process of the model linearly combines the channel outputs. To integrate information across time, a kind of “multiple-look” strategy, is realized within the detection stage of the model. Both data from the literature and new data are used to validate the model. The model predictions agree with the results of Eddins [J. Acoust. Soc. Am. 93, 470–479 (1993)] that the “time constants” associated with the temporal modulation transfer functions (TMTF) derived for narrow-band stimuli do not vary with carrier frequency region and that they decrease monotonically with increasing stimulus bandwidth. The model is able to predict masking patterns in the modulation-frequency domain, as observed experimentally by Houtgast [J. Acoust. Soc. Am. 85, 1676–1680 (1989)]. The model also accounts for the finding by Sheft and Yost [J. Acoust. Soc. Am. 88, 796–805 (1990)] that the long “effective” integration time constants derived from the data are two orders of magnitude larger than the time constants derived from the cutoff frequency of the TMTF. Finally, the temporal-summation properties of the model allow the prediction of data in a specific temporal paradigm used earlier by Viemeister and Wakefield [J. Acoust. Soc. Am. 90, 858–865 (1991)]. The combination of the modulation filterbank concept and the optimal decision algorithm proposed here appears to present a powerful strategy for describing modulation-detection phenomena in narrow-band and broadband conditions.

Journal ArticleDOI
TL;DR: The results demonstrate that training can generalize to listening situations beyond those used in training sessions, and that the preattentive central neurophysiology underlying perceptual learning are altered through auditory training.
Abstract: Behavioral perceptual abilities and neurophysiologic changes observed after listening training can generalize to other stimuli not used in the training paradigm, thereby demonstrating behavioral “transfer of learning” and plasticity in underlying physiologic processes. Nine normal-hearing monolingual English-speaking adults were trained to identify a prevoiced labial stop sound (one that is not used phonemically in the English language). After training, the subjects were asked to discriminate and identify a prevoiced alveolar stop. Mismatch negativity cortical evoked responses (MMN) were recorded to both labial and alveolar stimuli before and after training. Behavioral performance and MMNs also were evaluated in an age-matched control group that did not receive training. Listening training improved the experimental group’s ability to discriminate and identify an unfamiliar VOT contrast. That enhanced ability transferred from one place of articulation (labial) to another (alveolar). The behavioral training effects were reflected in the MMN, which showed an increase in duration and area when elicited by the training stimuli as well as a decrease in onset latency when elicited by the transfer stimuli. Interestingly, changes in the MMN were largest over the left hemisphere. The results demonstrate that training can generalize to listening situations beyond those used in training sessions, and that the preattentive central neurophysiology underlying perceptual learning are altered through auditory training.

PatentDOI
TL;DR: In this article, an ultrasound beam having a first focal diameter is channelled into a beam with a second smaller diameter without substantial loss of energy, and ultrasound energy is applied through a vibrating element positioned just contacting or extending into the skin.
Abstract: Methods and devices for application of ultrasound to a small area of skin for enhancing transdermal transport. An ultrasound beam having a first focal diameter is channelled into a beam having a second, smaller diameter without substantial loss of energy. Higher energy ultrasound can be used while causing less pain. Alternatively, ultrasound energy is applied through a vibrating element positioned just contacting, above or extending into the skin. Use of the element facilitates extraction of analyte and may enhance drug delivery. A two step noninvasive method involves application of ultrasound to increase skin permeability and removal of ultrasound followed by transdermal transport that can be further enhanced using a physical enhancer.

Journal ArticleDOI
TL;DR: It was found that the rate at which the thermal dose was applied plays a very important role in the total attenuation absorption, and lower thermal dose rates resulted in larger attenuation coefficients.
Abstract: The effect of temperature and thermal dose (equivalent minutes at 43 °C) on ultrasonic attenuation in fresh dog muscle, liver, and kidney in vitro, was studied over a temperature range from room temperature to 70 °C. The effect of temperature on ultrasonic absorption in muscle was also studied. The attenuation experiments were performed at 4.32 MHz, and the absorption experiments at 4 MHz. Attenuation and absorption increased at temperatures higher than 50 °C, and eventually reached a maximum at 65 °C. The rate of change of tissue attenuation as a function of temperature was between 0.239 and 0.291 Np m−1 MHz−1 °C−1 over the temperature range 50–65 °C. A change in attenuation and absorption was observed at thermal doses of 100–1000 min, where a doubling of these loss coefficients was observed over that measured at 37 °C, presumably the result of changes in tissue composition. The maximum attenuation or absorption was reached at thermal dosages on the order of 107 min. It was found that the rate at which t...

Journal ArticleDOI
TL;DR: Effects of the piriform sinuses, pharynx expansion, and nasal coupling are discussed, and the inertance of the vocal tract facilitates vocal fold vibration by lowering the oscillation threshold pressure.
Abstract: The linear source-filter theory of speech production assumes that vocal fold vibration is independent of the vocal tract. The justification is that the glottis often behaves as a high-impedance (constant flow) source. Recent imaging of the vocal tract has demonstrated, however, that the epilarynx tube is quite narrow, making the input impedance to the vocal tract comparable to the glottal impedance. Strong interactions can exist, therefore. In particular, the inertance of the vocal tract facilitates vocal fold vibration by lowering the oscillation threshold pressure. This has a significant impact on singing. Not only does the epilarynx tube produce the desirable singer’s formant (vocal ring), but it acts like the mouthpiece of a trumpet to shape the flow and influence the mode of vibration. Effects of the piriform sinuses, pharynx expansion, and nasal coupling are also discussed.

Journal ArticleDOI
TL;DR: In a background that contained both spectral and temporal dips, groups (c) and (d) performed much more poorly than group (a), and the signal-to-background ratio required for 50% intelligibility was about 19 dB higher for group (d).
Abstract: Normally hearing people have much lower speech reception thresholds (SRTs) in a background of a single talker than in the background of speech‐shaped noise, whereas hearing‐impaired people do not. The hearing impaired appear to be less able than normal to take advantage of temporal and spectral ‘‘dips’’ in interfering speech. SRTs were measured in background sounds that varied to the extent that they contained such dips. The subjects tested were: (a) young with normal hearing; (b) elderly with near‐normal hearing; (c) young with moderate to severe cochlear hearing loss and (d) elderly with moderate to severe cochlear hearing loss. In a background that contained both spectral and temporal dips, the hearing‐impaired and elderly with near‐normal hearing performed much more poorly than normals. The signal‐to‐background ratio required for 50% intelligibility was about 19 dB higher for elderly hearing impaired than for young normals. Young hearing‐impaired subjects showed a slightly smaller deficit, but still a substantial one. Linear amplification combined with appropriate frequency‐response shaping (NAL amplification) only partially compensated for these deficits. It is proposed that noise with spectral and temporal dips provides a potentially useful way of evaluating the effects of signal processing such as frequency‐selective amplification and compression.

Journal ArticleDOI
TL;DR: This work demonstrates the feasibility of nonlinear harmonic imaging in medical scanners using a simple broadband imaging arrangement in water using a 2.25-MHz circular transducer, membrane hydrophone, and polymer lens with a focal length of 262 mm.
Abstract: Medical B-mode scanners operating under conditions typically encountered during clinical work produce ultrasonic wave fields that undergo nonlinear distortion. In general, the resulting harmonic beams are narrower and have lower sidelobe levels than the fundamental beam, making them ideal for imaging purposes. This work demonstrates the feasibility of nonlinear harmonic imaging in medical scanners using a simple broadband imaging arrangement in water. The ultrasonic system comprises a 2.25-MHz circular transducer with a diameter of 38 mm, a membrane hydrophone, also with a diameter of 38 mm, and a polymer lens with a focal length of 262 mm. These components are arranged coaxially giving an imaging geometry similar to that used in many commercial B-scanners, but with a receiver bandwidth sufficient to record the first four harmonics. A series of continuous wave and pulse-echo measurements are performed on a wire phantom to give 1-D transverse pressure profiles and 2-D B-mode images, respectively. The refle...

PatentDOI
TL;DR: In this paper, a method for controlling a server using voice is described, in which a client such as a Web browser is coupled over a data communication channel to a server, and a telephone at the client side is connected to an interactive voice response (IVR) system that has a speech recognizer at the server side, over a separate, parallel voice communication channel.
Abstract: A method for controlling a server using voice is disclosed In one embodiment, a client such as a Web browser is coupled over a data communication channel to a server A telephone at the client side is connected to an interactive voice response (IVR) system that has a speech recognizer at the server side, over a separate, parallel voice communication channel The IVR system has a control connection to the server A table of associations between resource identifiers and network addresses is stored in association with the IVR system A user at the client side establishes a data connection between the client and the server, and a voice connection between the telephone and the IVR system Control software on the IVR system synchronizes an IVR session to a server session The control software receives a spoken utterance over the voice communication channel, interprets the utterance to recognize a resource identifier in the utterance, and associates the resource identifier with a network address of a server resource The IVR system commands the server to deliver the server resource identified by that network address to the client Thus, the server delivers server resources in response to voice commands at the client side In an alternate embodiment, the voice communication channel is integrated with the data communication channel The invention also encompasses an apparatus, computer system, computer program product, and computer data signal configured to carry out the foregoing steps

Journal ArticleDOI
TL;DR: In this paper, a new fifth-order polynomial describing the dependence of the speed of sound in water on temperature (ITS-90) within the limits 0−95°C was proposed.
Abstract: The speeds of sound in water measured by Del Grosso and Mader [J. Acoust. Soc. Am. 52, 1442–1446 (1972)], Kroebel and Mahrt [Acustica 35, 154–164 (1976)], and Fujii and Masui [J. Acoust. Soc. Am. 93, 276–282 (1993)] were compared. A fairly good agreement was found. A new fifth-order polynomial describing the dependence of the speed of sound in water on temperature (ITS-90) within the limits 0–95 °C was proposed. The importance of the effect of impurities of water on the systematic errors of the measurement results was pointed out. The accuracy of the relative measurements of speed of sound in liquids was discussed in the context of the proper choice of a standard (“true”) value of the speed in water.

Journal ArticleDOI
TL;DR: In this paper, a frequency-modulation term was added to the gammatone auditory filter to produce a filter with an asymmetric amplitude spectrum, which is an excellent fit to 12 sets of notched-noise masking data from three different studies.
Abstract: A frequency-modulation term has been added to the gammatone auditory filter to produce a filter with an asymmetric amplitude spectrum. When the degree of asymmetry in this “gammachirp” auditory filter is associated with stimulus level, the gammachirp is found to provide an excellent fit to 12 sets of notched-noise masking data from three different studies. The gammachirp has a well-defined impulse response, unlike the conventional roex auditory filter, and so it is an excellent candidate for an asymmetric, level-dependent auditory filterbank in time-domain models of auditory processing.

Journal ArticleDOI
TL;DR: The main purpose of the present study was to determine whether a close relationship between vowel identity and spectral change patterns is maintained when the consonant environment is allowed to vary.
Abstract: A significant body of evidence has accumulated indicating that vowel identification is influenced by spectral change patterns. For example, a large scale study of vowel formant patterns showed substantial improvements in category separability when a pattern classifier was trained on multiple samples of the formant pattern rather than a single sample at steady state [Hillenbrand et al., J. Acoust. Soc. Am. 97, 3099–3111 (1995)]. However, in the earlier study all utterances were recorded in a constant /hVd/ environment. The purpose of the present study was to determine whether a close relationship between vowel identity and spectral change patterns is maintained when the consonant environment is allowed to vary. Recordings were made of six men and six women producing the vowels /i,I,e,ae,■,■,■,u/ in isolation and in CVC syllables. The CVC utterances consisted of all combinations of seven initial consonants (/h,b,d,g,p,t,k/) and six final consonants (/b,d,g,p,t,k/). Formant frequencies for F1–F3 were measured...

Journal ArticleDOI
TL;DR: Three experiments are described here which are intended to clarify the results of the previousmonaural localization studies and provide new data on how monaural spectral cues might be processed and the role of prior knowledge of the source spectrum is concerned.
Abstract: Research reported during the past few decades has revealed the importance for human sound localization of the so-called “monaural spectral cues.” These cues are the result of the direction-dependent filtering of incoming sound waves accomplished by the pinnae. One point of view about how these cues are extracted places great emphasis on the spectrum of the received sound at each ear individually. This leads to the suggestion that an effective way of studying the influence of these cues is to measure the ability of listeners to localize sounds when one of their ears is plugged. Numerous studies have appeared using this monaural localization paradigm. Three experiments are described here which are intended to clarify the results of the previous monaural localization studies and provide new data on how monaural spectral cues might be processed. Virtual sound sources are used in the experiments in order to manipulate and control the stimuli independently at the two ears. Two of the experiments deal with the consequences of the incomplete monauralization that may have contaminated previous work. The results suggest that even very low sound levels in the occluded ear provide access to interaural localization cues. The presence of these cues complicates the interpretation of the results of nominally monaural localization studies. The third experiment concerns the role of prior knowledge of the source spectrum, which is required if monaural cues are to be useful. The results of this last experiment demonstrate that extraction of monaural spectral cues can be severely disrupted by trial-to-trial fluctuations in the source spectrum. The general conclusion of the experiments is that, while monaural spectral cues are important, the monaural localization paradigm may not be the most appropriate way to study their role.

Journal ArticleDOI
TL;DR: The variability in performance across different sentences correlates inversely with the RMSlevel of the respective sentence, which indicates that an adjustment of sentence material with respect to RMS level already yield reasonably homogeneous test material withrespect to intelligibility.
Abstract: A German sentence test was developed which is comprised of 20 test lists of ten sentences each. The test corpus is a selection from sentences for speech quality evaluation recorded with a male unschooled speaker. Performance-intensity curves were measured for each individual sentence in a speech-simulating babble noise with a total of 40 normal-hearing listeners. Based on these data and the phonemic transcription of the 200 sentences selected from the underlying speech corpus, 20 test lists were composed using a numerical optimization process. These 20 test lists are highly equivalent with respect to their performance-intensity curves, the number of words within each test list, the number of phonemes within each test list, and approximately the frequency distribution of the phonemes which approximates the phoneme frequency distribution of the German language. The equivalence of the respective performance-intensity curves was demonstrated in an independent experiment with 20 normal-hearing listeners. In addition, a comparison was performed between the "objective" intelligibility measurements and two "subjective" speech intelligibility rating methods employing the same materials. As a result, both subjective assessment procedures correlate highly with each other and with the "objective" procedure across sentences. This underlines the applicability and validity of the test in combination with time-saving subjective assessment methods. Moreover, the variability in performance across different sentences correlates inversely with the RMS level of the respective sentence. This indicates that an adjustment of sentence material with respect to RMS level already yield reasonably homogeneous test material with respect to intelligibility.

Journal ArticleDOI
TL;DR: Differences between normally hearing listeners and listeners with cochlear hearing impairment were consistent with the physiological effects of damage to the cochlea and suggest that this technique provides a straightforward method for estimating BM nonlinearity in humans.
Abstract: This paper examines the possibility of estimating basilar-membrane (BM) nonlinearity using a psychophysical technique. The level of a forward masker required to mask a brief signal was measured for conditions where the masker was either at, or one octave below, the signal frequency. The level of the forward masker at masked threshold provided an indirect measure of the BM response to the signal, as follows. Consistent with physiological studies, it was assumed that the BM responds linearly to frequencies well below the characteristic frequency (CF). Thus the ratio of the slopes of the masking functions between a masker at the signal frequency and a masker well below the signal frequency should provide an estimate of BM compression at CF. Results obtained from normally hearing listeners were in quantitative agreement with physiological estimates of BM compression. Furthermore, differences between normally hearing listeners and listeners with cochlear hearing impairment were consistent with the physiological effects of damage to the cochlea. The results support the hypothesis that BM nonlinearity governs the nonlinear growth of the upward spread of masking, and suggest that this technique provides a straightforward method for estimating BM nonlinearity in humans.

Journal ArticleDOI
TL;DR: The CBA mouse shows little evidence of hearing loss until late in life, whereas the C57BL/6 strain develops a severe and progressive, high-frequency sensorineural hearing loss beginning around 3-6 months of age.
Abstract: The CBA mouse shows little evidence of hearing loss until late in life, whereas the C57BL/6 strain develops a severe and progressive, high-frequency sensorineural hearing loss beginning around 3–6 months of age. These functional differences have been linked to genetic differences in the amount of hair cell loss as a function of age; however, a precise, quantitative description of the sensory cell loss is unavailable. The present study provides mean values of inner hair cell (IHC) and outer hair cell (OHC) loss for CBA and C57BL/6 mice at 1, 3, 8, 18, and 26 months of age. CBA mice showed little evidence of hair cell loss until 18 months of age. At 26 months of age, OHC losses in the apex and base of the cochlea were approximately 65% and 50%, respectively, and IHC losses were approximately 25% and 35%. By contrast, C57BL/6 mice showed approximately a 75% OHC and a 55% IHC loss in the base of the cochlea at 3 months of age. OHC and IHC losses increased rapidly with age along a base-to-apex gradient. By 26 months of age, more than 80% of the OHCs were missing throughout the entire cochlea; however, IHC losses ranged from 100% near the base of the cochlea to approximately 20% in the apex.